robust statistical analysis: Topics by Science.gov

Sample records for robust statistical analysis

Scaled test statistics and robust standard errors for non-normal data in covariance structure analysis: a Monte Carlo study.

PubMed

Chou, C P; Bentler, P M; Satorra, A

1991-11-01

Research studying robustness of maximum likelihood (ML) statistics in covariance structure analysis has concluded that test statistics and standard errors are biased under severe non-normality. An estimation procedure known as asymptotic distribution free (ADF), making no distributional assumption, has been suggested to avoid these biases. Corrections to the normal theory statistics to yield more adequate performance have also been proposed. This study compares the performance of a scaled test statistic and robust standard errors for two models under several non-normal conditions and also compares these with the results from ML and ADF methods. Both ML and ADF test statistics performed rather well in one model and considerably worse in the other. In general, the scaled test statistic seemed to behave better than the ML test statistic and the ADF statistic performed the worst. The robust and ADF standard errors yielded more appropriate estimates of sampling variability than the ML standard errors, which were usually downward biased, in both models under most of the non-normal conditions. ML test statistics and standard errors were found to be quite robust to the violation of the normality assumption when data had either symmetric and platykurtic distributions, or non-symmetric and zero kurtotic distributions.
GWAR: robust analysis and meta-analysis of genome-wide association studies.

PubMed

Dimou, Niki L; Tsirigos, Konstantinos D; Elofsson, Arne; Bagos, Pantelis G

2017-05-15

In the context of genome-wide association studies (GWAS), there is a variety of statistical techniques in order to conduct the analysis, but, in most cases, the underlying genetic model is usually unknown. Under these circumstances, the classical Cochran-Armitage trend test (CATT) is suboptimal. Robust procedures that maximize the power and preserve the nominal type I error rate are preferable. Moreover, performing a meta-analysis using robust procedures is of great interest and has never been addressed in the past. The primary goal of this work is to implement several robust methods for analysis and meta-analysis in the statistical package Stata and subsequently to make the software available to the scientific community. The CATT under a recessive, additive and dominant model of inheritance as well as robust methods based on the Maximum Efficiency Robust Test statistic, the MAX statistic and the MIN2 were implemented in Stata. Concerning MAX and MIN2, we calculated their asymptotic null distributions relying on numerical integration resulting in a great gain in computational time without losing accuracy. All the aforementioned approaches were employed in a fixed or a random effects meta-analysis setting using summary data with weights equal to the reciprocal of the combined cases and controls. Overall, this is the first complete effort to implement procedures for analysis and meta-analysis in GWAS using Stata. A Stata program and a web-server are freely available for academic users at http://www.compgen.org/tools/GWAR. pbagos@compgen.org. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
New robust statistical procedures for the polytomous logistic regression models.

PubMed

Castilla, Elena; Ghosh, Abhik; Martin, Nirian; Pardo, Leandro

2018-05-17

This article derives a new family of estimators, namely the minimum density power divergence estimators, as a robust generalization of the maximum likelihood estimator for the polytomous logistic regression model. Based on these estimators, a family of Wald-type test statistics for linear hypotheses is introduced. Robustness properties of both the proposed estimators and the test statistics are theoretically studied through the classical influence function analysis. Appropriate real life examples are presented to justify the requirement of suitable robust statistical procedures in place of the likelihood based inference for the polytomous logistic regression model. The validity of the theoretical results established in the article are further confirmed empirically through suitable simulation studies. Finally, an approach for the data-driven selection of the robustness tuning parameter is proposed with empirical justifications. © 2018, The International Biometric Society.
Application of Statistics in Engineering Technology Programs

ERIC Educational Resources Information Center

Zhan, Wei; Fink, Rainer; Fang, Alex

2010-01-01

Statistics is a critical tool for robustness analysis, measurement system error analysis, test data analysis, probabilistic risk assessment, and many other fields in the engineering world. Traditionally, however, statistics is not extensively used in undergraduate engineering technology (ET) programs, resulting in a major disconnect from industry…
A Statistical Analysis of Brain Morphology Using Wild Bootstrapping

PubMed Central

Ibrahim, Joseph G.; Tang, Niansheng; Rowe, Daniel B.; Hao, Xuejun; Bansal, Ravi; Peterson, Bradley S.

2008-01-01

Methods for the analysis of brain morphology, including voxel-based morphology and surface-based morphometries, have been used to detect associations between brain structure and covariates of interest, such as diagnosis, severity of disease, age, IQ, and genotype. The statistical analysis of morphometric measures usually involves two statistical procedures: 1) invoking a statistical model at each voxel (or point) on the surface of the brain or brain subregion, followed by mapping test statistics (e.g., t test) or their associated p values at each of those voxels; 2) correction for the multiple statistical tests conducted across all voxels on the surface of the brain region under investigation. We propose the use of new statistical methods for each of these procedures. We first use a heteroscedastic linear model to test the associations between the morphological measures at each voxel on the surface of the specified subregion (e.g., cortical or subcortical surfaces) and the covariates of interest. Moreover, we develop a robust test procedure that is based on a resampling method, called wild bootstrapping. This procedure assesses the statistical significance of the associations between a measure of given brain structure and the covariates of interest. The value of this robust test procedure lies in its computationally simplicity and in its applicability to a wide range of imaging data, including data from both anatomical and functional magnetic resonance imaging (fMRI). Simulation studies demonstrate that this robust test procedure can accurately control the family-wise error rate. We demonstrate the application of this robust test procedure to the detection of statistically significant differences in the morphology of the hippocampus over time across gender groups in a large sample of healthy subjects. PMID:17649909
STATISTICAL SAMPLING AND DATA ANALYSIS

EPA Science Inventory

Research is being conducted to develop approaches to improve soil and sediment sampling techniques, measurement design and geostatistics, and data analysis via chemometric, environmetric, and robust statistical methods. Improvements in sampling contaminated soil and other hetero...
Robust Mean and Covariance Structure Analysis through Iteratively Reweighted Least Squares.

ERIC Educational Resources Information Center

Yuan, Ke-Hai; Bentler, Peter M.

2000-01-01

Adapts robust schemes to mean and covariance structures, providing an iteratively reweighted least squares approach to robust structural equation modeling. Each case is weighted according to its distance, based on first and second order moments. Test statistics and standard error estimators are given. (SLD)
Robustness of Type I Error and Power in Set Correlation Analysis of Contingency Tables.

ERIC Educational Resources Information Center

Cohen, Jacob; Nee, John C. M.

1990-01-01

The analysis of contingency tables via set correlation allows the assessment of subhypotheses involving contrast functions of the categories of the nominal scales. The robustness of such methods with regard to Type I error and statistical power was studied via a Monte Carlo experiment. (TJH)
Robust Fixed-Structure Control

DTIC Science & Technology

1994-10-30

Deterministic Foundation for Statistical Energy Analysis ," J. Sound Vibr., to appear. 1.96 D. S. Bernstein and S. P. Bhat, "Lyapunov Stability, Semistability...S. Bernstein, "Power Flow, Energy Balance, and Statistical Energy Analysis for Large Scale, Interconnected Systems," Proc. Amer. Contr. Conf., pp
Development of a Comprehensive Digital Avionics Curriculum for the Aeronautical Engineer

DTIC Science & Technology

2006-03-01

able to analyze and design aircraft and missile guidance and control systems, including feedback stabilization schemes and stochastic processes, using ...Uncertainty modeling for robust control; Robust closed-loop stability and performance; Robust H- infinity control; Robustness check using mu-analysis...Controlled feedback (reduces noise) 3. Statistical group response (reduce pressure toward conformity) When used as a tool to study a complex problem
Model Uncertainty and Robustness: A Computational Framework for Multimodel Analysis

ERIC Educational Resources Information Center

Young, Cristobal; Holsteen, Katherine

2017-01-01

Model uncertainty is pervasive in social science. A key question is how robust empirical results are to sensible changes in model specification. We present a new approach and applied statistical software for computational multimodel analysis. Our approach proceeds in two steps: First, we estimate the modeling distribution of estimates across all…
Robust Mokken Scale Analysis by Means of the Forward Search Algorithm for Outlier Detection

ERIC Educational Resources Information Center

Zijlstra, Wobbe P.; van der Ark, L. Andries; Sijtsma, Klaas

2011-01-01

Exploratory Mokken scale analysis (MSA) is a popular method for identifying scales from larger sets of items. As with any statistical method, in MSA the presence of outliers in the data may result in biased results and wrong conclusions. The forward search algorithm is a robust diagnostic method for outlier detection, which we adapt here to…
Social and economic sustainability of urban systems: comparative analysis of metropolitan statistical areas in Ohio, USA

EPA Science Inventory

This article presents a general and versatile methodology for assessing sustainability with Fisher Information as a function of dynamic changes in urban systems. Using robust statistical methods, six Metropolitan Statistical Areas (MSAs) in Ohio were evaluated to comparatively as...
Robust inference for group sequential trials.

PubMed

Ganju, Jitendra; Lin, Yunzhi; Zhou, Kefei

2017-03-01

For ethical reasons, group sequential trials were introduced to allow trials to stop early in the event of extreme results. Endpoints in such trials are usually mortality or irreversible morbidity. For a given endpoint, the norm is to use a single test statistic and to use that same statistic for each analysis. This approach is risky because the test statistic has to be specified before the study is unblinded, and there is loss in power if the assumptions that ensure optimality for each analysis are not met. To minimize the risk of moderate to substantial loss in power due to a suboptimal choice of a statistic, a robust method was developed for nonsequential trials. The concept is analogous to diversification of financial investments to minimize risk. The method is based on combining P values from multiple test statistics for formal inference while controlling the type I error rate at its designated value.This article evaluates the performance of 2 P value combining methods for group sequential trials. The emphasis is on time to event trials although results from less complex trials are also included. The gain or loss in power with the combination method relative to a single statistic is asymmetric in its favor. Depending on the power of each individual test, the combination method can give more power than any single test or give power that is closer to the test with the most power. The versatility of the method is that it can combine P values from different test statistics for analysis at different times. The robustness of results suggests that inference from group sequential trials can be strengthened with the use of combined tests. Copyright © 2017 John Wiley & Sons, Ltd.
Low-Level Stratus Prediction Using Binary Statistical Regression: A Progress Report Using Moffett Field Data.

DTIC Science & Technology

1983-12-01

analysis; such work is not reported here. It seems pos- sible that a robust principle component analysis may he informa- tive (see Gnanadesikan (1977...Statistics in Atmospheric Sciences, American Meteorological Soc., Boston, Mass. (1979) pp. 46-48. a Gnanadesikan , R., Methods for Statistical Data...North Carolina Chapel Hill, NC 20742 Dr. R. Gnanadesikan Bell Telephone Lab Murray Hill, NJ 07733 -%.. *5%a: *1 *15 I ,, - . . , ,, ... . . . . . . NO
Robustness of fit indices to outliers and leverage observations in structural equation modeling.

PubMed

Yuan, Ke-Hai; Zhong, Xiaoling

2013-06-01

Normal-distribution-based maximum likelihood (NML) is the most widely used method in structural equation modeling (SEM), although practical data tend to be nonnormally distributed. The effect of nonnormally distributed data or data contamination on the normal-distribution-based likelihood ratio (LR) statistic is well understood due to many analytical and empirical studies. In SEM, fit indices are used as widely as the LR statistic. In addition to NML, robust procedures have been developed for more efficient and less biased parameter estimates with practical data. This article studies the effect of outliers and leverage observations on fit indices following NML and two robust methods. Analysis and empirical results indicate that good leverage observations following NML and one of the robust methods lead most fit indices to give more support to the substantive model. While outliers tend to make a good model superficially bad according to many fit indices following NML, they have little effect on those following the two robust procedures. Implications of the results to data analysis are discussed, and recommendations are provided regarding the use of estimation methods and interpretation of fit indices. (PsycINFO Database Record (c) 2013 APA, all rights reserved).
Contour plot assessment of existing meta-analyses confirms robust association of statin use and acute kidney injury risk.

PubMed

Chevance, Aurélie; Schuster, Tibor; Steele, Russell; Ternès, Nils; Platt, Robert W

2015-10-01

Robustness of an existing meta-analysis can justify decisions on whether to conduct an additional study addressing the same research question. We illustrate the graphical assessment of the potential impact of an additional study on an existing meta-analysis using published data on statin use and the risk of acute kidney injury. A previously proposed graphical augmentation approach is used to assess the sensitivity of the current test and heterogeneity statistics extracted from existing meta-analysis data. In addition, we extended the graphical augmentation approach to assess potential changes in the pooled effect estimate after updating a current meta-analysis and applied the three graphical contour definitions to data from meta-analyses on statin use and acute kidney injury risk. In the considered example data, the pooled effect estimates and heterogeneity indices demonstrated to be considerably robust to the addition of a future study. Supportingly, for some previously inconclusive meta-analyses, a study update might yield statistically significant kidney injury risk increase associated with higher statin exposure. The illustrated contour approach should become a standard tool for the assessment of the robustness of meta-analyses. It can guide decisions on whether to conduct additional studies addressing a relevant research question. Copyright © 2015 Elsevier Inc. All rights reserved.
Robust regression for large-scale neuroimaging studies.

PubMed

Fritsch, Virgile; Da Mota, Benoit; Loth, Eva; Varoquaux, Gaël; Banaschewski, Tobias; Barker, Gareth J; Bokde, Arun L W; Brühl, Rüdiger; Butzek, Brigitte; Conrod, Patricia; Flor, Herta; Garavan, Hugh; Lemaitre, Hervé; Mann, Karl; Nees, Frauke; Paus, Tomas; Schad, Daniel J; Schümann, Gunter; Frouin, Vincent; Poline, Jean-Baptiste; Thirion, Bertrand

2015-05-01

Multi-subject datasets used in neuroimaging group studies have a complex structure, as they exhibit non-stationary statistical properties across regions and display various artifacts. While studies with small sample sizes can rarely be shown to deviate from standard hypotheses (such as the normality of the residuals) due to the poor sensitivity of normality tests with low degrees of freedom, large-scale studies (e.g. >100 subjects) exhibit more obvious deviations from these hypotheses and call for more refined models for statistical inference. Here, we demonstrate the benefits of robust regression as a tool for analyzing large neuroimaging cohorts. First, we use an analytic test based on robust parameter estimates; based on simulations, this procedure is shown to provide an accurate statistical control without resorting to permutations. Second, we show that robust regression yields more detections than standard algorithms using as an example an imaging genetics study with 392 subjects. Third, we show that robust regression can avoid false positives in a large-scale analysis of brain-behavior relationships with over 1500 subjects. Finally we embed robust regression in the Randomized Parcellation Based Inference (RPBI) method and demonstrate that this combination further improves the sensitivity of tests carried out across the whole brain. Altogether, our results show that robust procedures provide important advantages in large-scale neuroimaging group studies. Copyright © 2015 Elsevier Inc. All rights reserved.
GPS baseline configuration design based on robustness analysis

NASA Astrophysics Data System (ADS)

Yetkin, M.; Berber, M.

2012-11-01

The robustness analysis results obtained from a Global Positioning System (GPS) network are dramatically influenced by the configurationof the observed baselines. The selection of optimal GPS baselines may allow for a cost effective survey campaign and a sufficiently robustnetwork. Furthermore, using the approach described in this paper, the required number of sessions, the baselines to be observed, and thesignificance levels for statistical testing and robustness analysis can be determined even before the GPS campaign starts. In this study, wepropose a robustness criterion for the optimal design of geodetic networks, and present a very simple and efficient algorithm based on thiscriterion for the selection of optimal GPS baselines. We also show the relationship between the number of sessions and the non-centralityparameter. Finally, a numerical example is given to verify the efficacy of the proposed approach.
Statistical analysis of the magnetization signatures of impact basins

NASA Astrophysics Data System (ADS)

Gabasova, L. R.; Wieczorek, M. A.

2017-09-01

We quantify the magnetic signatures of the largest lunar impact basins using recent mission data and robust statistical bounds, and obtain an early activity timeline for the lunar core dynamo which appears to peak earlier than indicated by Apollo paleointensity measurements.

A UNIFYING CONCEPT FOR ASSESSING TOXICOLOGICAL INTERACTIONS: CHANGES IN SLOPE

EPA Science Inventory

Robust statistical methods are important to the evaluation of interactions among chemicals in a mixture. However, different concepts of interaction as applied to the statistical analysis of chemical mixture toxicology data or as used in environmental risk assessment often can ap...
Mapping Quantitative Traits in Unselected Families: Algorithms and Examples

PubMed Central

Dupuis, Josée; Shi, Jianxin; Manning, Alisa K.; Benjamin, Emelia J.; Meigs, James B.; Cupples, L. Adrienne; Siegmund, David

2009-01-01

Linkage analysis has been widely used to identify from family data genetic variants influencing quantitative traits. Common approaches have both strengths and limitations. Likelihood ratio tests typically computed in variance component analysis can accommodate large families but are highly sensitive to departure from normality assumptions. Regression-based approaches are more robust but their use has primarily been restricted to nuclear families. In this paper, we develop methods for mapping quantitative traits in moderately large pedigrees. Our methods are based on the score statistic which in contrast to the likelihood ratio statistic, can use nonparametric estimators of variability to achieve robustness of the false positive rate against departures from the hypothesized phenotypic model. Because the score statistic is easier to calculate than the likelihood ratio statistic, our basic mapping methods utilize relatively simple computer code that performs statistical analysis on output from any program that computes estimates of identity-by-descent. This simplicity also permits development and evaluation of methods to deal with multivariate and ordinal phenotypes, and with gene-gene and gene-environment interaction. We demonstrate our methods on simulated data and on fasting insulin, a quantitative trait measured in the Framingham Heart Study. PMID:19278016
SSD for R: A Comprehensive Statistical Package to Analyze Single-System Data

ERIC Educational Resources Information Center

Auerbach, Charles; Schudrich, Wendy Zeitlin

2013-01-01

The need for statistical analysis in single-subject designs presents a challenge, as analytical methods that are applied to group comparison studies are often not appropriate in single-subject research. "SSD for R" is a robust set of statistical functions with wide applicability to single-subject research. It is a comprehensive package…
HYPOTHESIS SETTING AND ORDER STATISTIC FOR ROBUST GENOMIC META-ANALYSIS.

PubMed

Song, Chi; Tseng, George C

2014-01-01

Meta-analysis techniques have been widely developed and applied in genomic applications, especially for combining multiple transcriptomic studies. In this paper, we propose an order statistic of p-values ( r th ordered p-value, rOP) across combined studies as the test statistic. We illustrate different hypothesis settings that detect gene markers differentially expressed (DE) "in all studies", "in the majority of studies", or "in one or more studies", and specify rOP as a suitable method for detecting DE genes "in the majority of studies". We develop methods to estimate the parameter r in rOP for real applications. Statistical properties such as its asymptotic behavior and a one-sided testing correction for detecting markers of concordant expression changes are explored. Power calculation and simulation show better performance of rOP compared to classical Fisher's method, Stouffer's method, minimum p-value method and maximum p-value method under the focused hypothesis setting. Theoretically, rOP is found connected to the naïve vote counting method and can be viewed as a generalized form of vote counting with better statistical properties. The method is applied to three microarray meta-analysis examples including major depressive disorder, brain cancer and diabetes. The results demonstrate rOP as a more generalizable, robust and sensitive statistical framework to detect disease-related markers.
Robustness of S1 statistic with Hodges-Lehmann for skewed distributions

NASA Astrophysics Data System (ADS)

Ahad, Nor Aishah; Yahaya, Sharipah Soaad Syed; Yin, Lee Ping

2016-10-01

Analysis of variance (ANOVA) is a common use parametric method to test the differences in means for more than two groups when the populations are normally distributed. ANOVA is highly inefficient under the influence of non- normal and heteroscedastic settings. When the assumptions are violated, researchers are looking for alternative such as Kruskal-Wallis under nonparametric or robust method. This study focused on flexible method, S1 statistic for comparing groups using median as the location estimator. S1 statistic was modified by substituting the median with Hodges-Lehmann and the default scale estimator with the variance of Hodges-Lehmann and MADn to produce two different test statistics for comparing groups. Bootstrap method was used for testing the hypotheses since the sampling distributions of these modified S1 statistics are unknown. The performance of the proposed statistic in terms of Type I error was measured and compared against the original S1 statistic, ANOVA and Kruskal-Wallis. The propose procedures show improvement compared to the original statistic especially under extremely skewed distribution.
Modeling of a Robust Confidence Band for the Power Curve of a Wind Turbine.

PubMed

Hernandez, Wilmar; Méndez, Alfredo; Maldonado-Correa, Jorge L; Balleteros, Francisco

2016-12-07

Having an accurate model of the power curve of a wind turbine allows us to better monitor its operation and planning of storage capacity. Since wind speed and direction is of a highly stochastic nature, the forecasting of the power generated by the wind turbine is of the same nature as well. In this paper, a method for obtaining a robust confidence band containing the power curve of a wind turbine under test conditions is presented. Here, the confidence band is bound by two curves which are estimated using parametric statistical inference techniques. However, the observations that are used for carrying out the statistical analysis are obtained by using the binning method, and in each bin, the outliers are eliminated by using a censorship process based on robust statistical techniques. Then, the observations that are not outliers are divided into observation sets. Finally, both the power curve of the wind turbine and the two curves that define the robust confidence band are estimated using each of the previously mentioned observation sets.
Modeling of a Robust Confidence Band for the Power Curve of a Wind Turbine

PubMed Central

Hernandez, Wilmar; Méndez, Alfredo; Maldonado-Correa, Jorge L.; Balleteros, Francisco

2016-01-01

Having an accurate model of the power curve of a wind turbine allows us to better monitor its operation and planning of storage capacity. Since wind speed and direction is of a highly stochastic nature, the forecasting of the power generated by the wind turbine is of the same nature as well. In this paper, a method for obtaining a robust confidence band containing the power curve of a wind turbine under test conditions is presented. Here, the confidence band is bound by two curves which are estimated using parametric statistical inference techniques. However, the observations that are used for carrying out the statistical analysis are obtained by using the binning method, and in each bin, the outliers are eliminated by using a censorship process based on robust statistical techniques. Then, the observations that are not outliers are divided into observation sets. Finally, both the power curve of the wind turbine and the two curves that define the robust confidence band are estimated using each of the previously mentioned observation sets. PMID:27941604
Statistical robustness of machine-learning estimates for characterizing a groundwater-surface water system, Southland, New Zealand

NASA Astrophysics Data System (ADS)

Friedel, M. J.; Daughney, C.

2016-12-01

The development of a successful surface-groundwater management strategy depends on the quality of data provided for analysis. This study evaluates the statistical robustness when using a modified self-organizing map (MSOM) technique to estimate missing values for three hypersurface models: synoptic groundwater-surface water hydrochemistry, time-series of groundwater-surface water hydrochemistry, and mixed-survey (combination of groundwater-surface water hydrochemistry and lithologies) hydrostratigraphic unit data. These models of increasing complexity are developed and validated based on observations from the Southland region of New Zealand. In each case, the estimation method is sufficiently robust to cope with groundwater-surface water hydrochemistry vagaries due to sample size and extreme data insufficiency, even when >80% of the data are missing. The estimation of surface water hydrochemistry time series values enabled the evaluation of seasonal variation, and the imputation of lithologies facilitated the evaluation of hydrostratigraphic controls on groundwater-surface water interaction. The robust statistical results for groundwater-surface water models of increasing data complexity provide justification to apply the MSOM technique in other regions of New Zealand and abroad.
Statistical plant set estimation using Schroeder-phased multisinusoidal input design

NASA Technical Reports Server (NTRS)

Bayard, D. S.

1992-01-01

A frequency domain method is developed for plant set estimation. The estimation of a plant 'set' rather than a point estimate is required to support many methods of modern robust control design. The approach here is based on using a Schroeder-phased multisinusoid input design which has the special property of placing input energy only at the discrete frequency points used in the computation. A detailed analysis of the statistical properties of the frequency domain estimator is given, leading to exact expressions for the probability distribution of the estimation error, and many important properties. It is shown that, for any nominal parametric plant estimate, one can use these results to construct an overbound on the additive uncertainty to any prescribed statistical confidence. The 'soft' bound thus obtained can be used to replace 'hard' bounds presently used in many robust control analysis and synthesis methods.
Baseline estimation in flame's spectra by using neural networks and robust statistics

NASA Astrophysics Data System (ADS)

Garces, Hugo; Arias, Luis; Rojas, Alejandro

2014-09-01

This work presents a baseline estimation method in flame spectra based on artificial intelligence structure as a neural network, combining robust statistics with multivariate analysis to automatically discriminate measured wavelengths belonging to continuous feature for model adaptation, surpassing restriction of measuring target baseline for training. The main contributions of this paper are: to analyze a flame spectra database computing Jolliffe statistics from Principal Components Analysis detecting wavelengths not correlated with most of the measured data corresponding to baseline; to systematically determine the optimal number of neurons in hidden layers based on Akaike's Final Prediction Error; to estimate baseline in full wavelength range sampling measured spectra; and to train an artificial intelligence structure as a Neural Network which allows to generalize the relation between measured and baseline spectra. The main application of our research is to compute total radiation with baseline information, allowing to diagnose combustion process state for optimization in early stages.
A Third Moment Adjusted Test Statistic for Small Sample Factor Analysis.

PubMed

Lin, Johnny; Bentler, Peter M

2012-01-01

Goodness of fit testing in factor analysis is based on the assumption that the test statistic is asymptotically chi-square; but this property may not hold in small samples even when the factors and errors are normally distributed in the population. Robust methods such as Browne's asymptotically distribution-free method and Satorra Bentler's mean scaling statistic were developed under the presumption of non-normality in the factors and errors. This paper finds new application to the case where factors and errors are normally distributed in the population but the skewness of the obtained test statistic is still high due to sampling error in the observed indicators. An extension of Satorra Bentler's statistic is proposed that not only scales the mean but also adjusts the degrees of freedom based on the skewness of the obtained test statistic in order to improve its robustness under small samples. A simple simulation study shows that this third moment adjusted statistic asymptotically performs on par with previously proposed methods, and at a very small sample size offers superior Type I error rates under a properly specified model. Data from Mardia, Kent and Bibby's study of students tested for their ability in five content areas that were either open or closed book were used to illustrate the real-world performance of this statistic.
Sunspot activity and influenza pandemics: a statistical assessment of the purported association.

PubMed

Towers, S

2017-10-01

Since 1978, a series of papers in the literature have claimed to find a significant association between sunspot activity and the timing of influenza pandemics. This paper examines these analyses, and attempts to recreate the three most recent statistical analyses by Ertel (1994), Tapping et al. (2001), and Yeung (2006), which all have purported to find a significant relationship between sunspot numbers and pandemic influenza. As will be discussed, each analysis had errors in the data. In addition, in each analysis arbitrary selections or assumptions were also made, and the authors did not assess the robustness of their analyses to changes in those arbitrary assumptions. Varying the arbitrary assumptions to other, equally valid, assumptions negates the claims of significance. Indeed, an arbitrary selection made in one of the analyses appears to have resulted in almost maximal apparent significance; changing it only slightly yields a null result. This analysis applies statistically rigorous methodology to examine the purported sunspot/pandemic link, using more statistically powerful un-binned analysis methods, rather than relying on arbitrarily binned data. The analyses are repeated using both the Wolf and Group sunspot numbers. In all cases, no statistically significant evidence of any association was found. However, while the focus in this particular analysis was on the purported relationship of influenza pandemics to sunspot activity, the faults found in the past analyses are common pitfalls; inattention to analysis reproducibility and robustness assessment are common problems in the sciences, that are unfortunately not noted often enough in review.
EVALUATION OF A NEW MEAN SCALED AND MOMENT ADJUSTED TEST STATISTIC FOR SEM.

PubMed

Tong, Xiaoxiao; Bentler, Peter M

2013-01-01

Recently a new mean scaled and skewness adjusted test statistic was developed for evaluating structural equation models in small samples and with potentially nonnormal data, but this statistic has received only limited evaluation. The performance of this statistic is compared to normal theory maximum likelihood and two well-known robust test statistics. A modification to the Satorra-Bentler scaled statistic is developed for the condition that sample size is smaller than degrees of freedom. The behavior of the four test statistics is evaluated with a Monte Carlo confirmatory factor analysis study that varies seven sample sizes and three distributional conditions obtained using Headrick's fifth-order transformation to nonnormality. The new statistic performs badly in most conditions except under the normal distribution. The goodness-of-fit χ(2) test based on maximum-likelihood estimation performed well under normal distributions as well as under a condition of asymptotic robustness. The Satorra-Bentler scaled test statistic performed best overall, while the mean scaled and variance adjusted test statistic outperformed the others at small and moderate sample sizes under certain distributional conditions.
Robust Detection of Examinees with Aberrant Answer Changes

ERIC Educational Resources Information Center

Belov, Dmitry I.

2015-01-01

The statistical analysis of answer changes (ACs) has uncovered multiple testing irregularities on large-scale assessments and is now routinely performed at testing organizations. However, AC data has an uncertainty caused by technological or human factors. Therefore, existing statistics (e.g., number of wrong-to-right ACs) used to detect examinees…
Robust Linear Models for Cis-eQTL Analysis.

PubMed

Rantalainen, Mattias; Lindgren, Cecilia M; Holmes, Christopher C

2015-01-01

Expression Quantitative Trait Loci (eQTL) analysis enables characterisation of functional genetic variation influencing expression levels of individual genes. In outbread populations, including humans, eQTLs are commonly analysed using the conventional linear model, adjusting for relevant covariates, assuming an allelic dosage model and a Gaussian error term. However, gene expression data generally have noise that induces heavy-tailed errors relative to the Gaussian distribution and often include atypical observations, or outliers. Such departures from modelling assumptions can lead to an increased rate of type II errors (false negatives), and to some extent also type I errors (false positives). Careful model checking can reduce the risk of type-I errors but often not type II errors, since it is generally too time-consuming to carefully check all models with a non-significant effect in large-scale and genome-wide studies. Here we propose the application of a robust linear model for eQTL analysis to reduce adverse effects of deviations from the assumption of Gaussian residuals. We present results from a simulation study as well as results from the analysis of real eQTL data sets. Our findings suggest that in many situations robust models have the potential to provide more reliable eQTL results compared to conventional linear models, particularly in respect to reducing type II errors due to non-Gaussian noise. Post-genomic data, such as that generated in genome-wide eQTL studies, are often noisy and frequently contain atypical observations. Robust statistical models have the potential to provide more reliable results and increased statistical power under non-Gaussian conditions. The results presented here suggest that robust models should be considered routinely alongside other commonly used methodologies for eQTL analysis.
Variation in reaction norms: Statistical considerations and biological interpretation.

PubMed

Morrissey, Michael B; Liefting, Maartje

2016-09-01

Analysis of reaction norms, the functions by which the phenotype produced by a given genotype depends on the environment, is critical to studying many aspects of phenotypic evolution. Different techniques are available for quantifying different aspects of reaction norm variation. We examine what biological inferences can be drawn from some of the more readily applicable analyses for studying reaction norms. We adopt a strongly biologically motivated view, but draw on statistical theory to highlight strengths and drawbacks of different techniques. In particular, consideration of some formal statistical theory leads to revision of some recently, and forcefully, advocated opinions on reaction norm analysis. We clarify what simple analysis of the slope between mean phenotype in two environments can tell us about reaction norms, explore the conditions under which polynomial regression can provide robust inferences about reaction norm shape, and explore how different existing approaches may be used to draw inferences about variation in reaction norm shape. We show how mixed model-based approaches can provide more robust inferences than more commonly used multistep statistical approaches, and derive new metrics of the relative importance of variation in reaction norm intercepts, slopes, and curvatures. © 2016 The Author(s). Evolution © 2016 The Society for the Study of Evolution.
[Regression on order statistics and its application in estimating nondetects for food exposure assessment].

PubMed

Yu, Xiaojin; Liu, Pei; Min, Jie; Chen, Qiguang

2009-01-01

To explore the application of regression on order statistics (ROS) in estimating nondetects for food exposure assessment. Regression on order statistics was adopted in analysis of cadmium residual data set from global food contaminant monitoring, the mean residual was estimated basing SAS programming and compared with the results from substitution methods. The results show that ROS method performs better obviously than substitution methods for being robust and convenient for posterior analysis. Regression on order statistics is worth to adopt,but more efforts should be make for details of application of this method.
Fully Bayesian inference for structural MRI: application to segmentation and statistical analysis of T2-hypointensities.

PubMed

Schmidt, Paul; Schmid, Volker J; Gaser, Christian; Buck, Dorothea; Bührlen, Susanne; Förschler, Annette; Mühlau, Mark

2013-01-01

Aiming at iron-related T2-hypointensity, which is related to normal aging and neurodegenerative processes, we here present two practicable approaches, based on Bayesian inference, for preprocessing and statistical analysis of a complex set of structural MRI data. In particular, Markov Chain Monte Carlo methods were used to simulate posterior distributions. First, we rendered a segmentation algorithm that uses outlier detection based on model checking techniques within a Bayesian mixture model. Second, we rendered an analytical tool comprising a Bayesian regression model with smoothness priors (in the form of Gaussian Markov random fields) mitigating the necessity to smooth data prior to statistical analysis. For validation, we used simulated data and MRI data of 27 healthy controls (age: [Formula: see text]; range, [Formula: see text]). We first observed robust segmentation of both simulated T2-hypointensities and gray-matter regions known to be T2-hypointense. Second, simulated data and images of segmented T2-hypointensity were analyzed. We found not only robust identification of simulated effects but also a biologically plausible age-related increase of T2-hypointensity primarily within the dentate nucleus but also within the globus pallidus, substantia nigra, and red nucleus. Our results indicate that fully Bayesian inference can successfully be applied for preprocessing and statistical analysis of structural MRI data.
Active Nonlinear Feedback Control for Aerospace Systems. Processor

DTIC Science & Technology

1990-12-01

relating to the role of nonlinearities in feedback control. These area include Lyapunov function theory, chaotic controllers, statistical energy analysis , phase robustness, and optimal nonlinear control theory.
Robust biological parametric mapping: an improved technique for multimodal brain image analysis

NASA Astrophysics Data System (ADS)

Yang, Xue; Beason-Held, Lori; Resnick, Susan M.; Landman, Bennett A.

2011-03-01

Mapping the quantitative relationship between structure and function in the human brain is an important and challenging problem. Numerous volumetric, surface, region of interest and voxelwise image processing techniques have been developed to statistically assess potential correlations between imaging and non-imaging metrics. Recently, biological parametric mapping has extended the widely popular statistical parametric approach to enable application of the general linear model to multiple image modalities (both for regressors and regressands) along with scalar valued observations. This approach offers great promise for direct, voxelwise assessment of structural and functional relationships with multiple imaging modalities. However, as presented, the biological parametric mapping approach is not robust to outliers and may lead to invalid inferences (e.g., artifactual low p-values) due to slight mis-registration or variation in anatomy between subjects. To enable widespread application of this approach, we introduce robust regression and robust inference in the neuroimaging context of application of the general linear model. Through simulation and empirical studies, we demonstrate that our robust approach reduces sensitivity to outliers without substantial degradation in power. The robust approach and associated software package provides a reliable way to quantitatively assess voxelwise correlations between structural and functional neuroimaging modalities.

A Third Moment Adjusted Test Statistic for Small Sample Factor Analysis

PubMed Central

Lin, Johnny; Bentler, Peter M.

2012-01-01

Goodness of fit testing in factor analysis is based on the assumption that the test statistic is asymptotically chi-square; but this property may not hold in small samples even when the factors and errors are normally distributed in the population. Robust methods such as Browne’s asymptotically distribution-free method and Satorra Bentler’s mean scaling statistic were developed under the presumption of non-normality in the factors and errors. This paper finds new application to the case where factors and errors are normally distributed in the population but the skewness of the obtained test statistic is still high due to sampling error in the observed indicators. An extension of Satorra Bentler’s statistic is proposed that not only scales the mean but also adjusts the degrees of freedom based on the skewness of the obtained test statistic in order to improve its robustness under small samples. A simple simulation study shows that this third moment adjusted statistic asymptotically performs on par with previously proposed methods, and at a very small sample size offers superior Type I error rates under a properly specified model. Data from Mardia, Kent and Bibby’s study of students tested for their ability in five content areas that were either open or closed book were used to illustrate the real-world performance of this statistic. PMID:23144511
Evaluating optimal therapy robustness by virtual expansion of a sample population, with a case study in cancer immunotherapy

PubMed Central

Barish, Syndi; Ochs, Michael F.; Sontag, Eduardo D.; Gevertz, Jana L.

2017-01-01

Cancer is a highly heterogeneous disease, exhibiting spatial and temporal variations that pose challenges for designing robust therapies. Here, we propose the VEPART (Virtual Expansion of Populations for Analyzing Robustness of Therapies) technique as a platform that integrates experimental data, mathematical modeling, and statistical analyses for identifying robust optimal treatment protocols. VEPART begins with time course experimental data for a sample population, and a mathematical model fit to aggregate data from that sample population. Using nonparametric statistics, the sample population is amplified and used to create a large number of virtual populations. At the final step of VEPART, robustness is assessed by identifying and analyzing the optimal therapy (perhaps restricted to a set of clinically realizable protocols) across each virtual population. As proof of concept, we have applied the VEPART method to study the robustness of treatment response in a mouse model of melanoma subject to treatment with immunostimulatory oncolytic viruses and dendritic cell vaccines. Our analysis (i) showed that every scheduling variant of the experimentally used treatment protocol is fragile (nonrobust) and (ii) discovered an alternative region of dosing space (lower oncolytic virus dose, higher dendritic cell dose) for which a robust optimal protocol exists. PMID:28716945
The Influence Function of Principal Component Analysis by Self-Organizing Rule.

PubMed

Higuchi; Eguchi

1998-07-28

This article is concerned with a neural network approach to principal component analysis (PCA). An algorithm for PCA by the self-organizing rule has been proposed and its robustness observed through the simulation study by Xu and Yuille (1995). In this article, the robustness of the algorithm against outliers is investigated by using the theory of influence function. The influence function of the principal component vector is given in an explicit form. Through this expression, the method is shown to be robust against any directions orthogonal to the principal component vector. In addition, a statistic generated by the self-organizing rule is proposed to assess the influence of data in PCA.
Improving Robustness of Hydrologic Ensemble Predictions Through Probabilistic Pre- and Post-Processing in Sequential Data Assimilation

NASA Astrophysics Data System (ADS)

Wang, S.; Ancell, B. C.; Huang, G. H.; Baetz, B. W.

2018-03-01

Data assimilation using the ensemble Kalman filter (EnKF) has been increasingly recognized as a promising tool for probabilistic hydrologic predictions. However, little effort has been made to conduct the pre- and post-processing of assimilation experiments, posing a significant challenge in achieving the best performance of hydrologic predictions. This paper presents a unified data assimilation framework for improving the robustness of hydrologic ensemble predictions. Statistical pre-processing of assimilation experiments is conducted through the factorial design and analysis to identify the best EnKF settings with maximized performance. After the data assimilation operation, statistical post-processing analysis is also performed through the factorial polynomial chaos expansion to efficiently address uncertainties in hydrologic predictions, as well as to explicitly reveal potential interactions among model parameters and their contributions to the predictive accuracy. In addition, the Gaussian anamorphosis is used to establish a seamless bridge between data assimilation and uncertainty quantification of hydrologic predictions. Both synthetic and real data assimilation experiments are carried out to demonstrate feasibility and applicability of the proposed methodology in the Guadalupe River basin, Texas. Results suggest that statistical pre- and post-processing of data assimilation experiments provide meaningful insights into the dynamic behavior of hydrologic systems and enhance robustness of hydrologic ensemble predictions.
Validating Coherence Measurements Using Aligned and Unaligned Coherence Functions

NASA Technical Reports Server (NTRS)

Miles, Jeffrey Hilton

2006-01-01

This paper describes a novel approach based on the use of coherence functions and statistical theory for sensor validation in a harsh environment. By the use of aligned and unaligned coherence functions and statistical theory one can test for sensor degradation, total sensor failure or changes in the signal. This advanced diagnostic approach and the novel data processing methodology discussed provides a single number that conveys this information. This number as calculated with standard statistical procedures for comparing the means of two distributions is compared with results obtained using Yuen's robust statistical method to create confidence intervals. Examination of experimental data from Kulite pressure transducers mounted in a Pratt & Whitney PW4098 combustor using spectrum analysis methods on aligned and unaligned time histories has verified the effectiveness of the proposed method. All the procedures produce good results which demonstrates how robust the technique is.
The Utility of Robust Means in Statistics

ERIC Educational Resources Information Center

Goodwyn, Fara

2012-01-01

Location estimates calculated from heuristic data were examined using traditional and robust statistical methods. The current paper demonstrates the impact outliers have on the sample mean and proposes robust methods to control for outliers in sample data. Traditional methods fail because they rely on the statistical assumptions of normality and…
Detection of outliers in the response and explanatory variables of the simple circular regression model

NASA Astrophysics Data System (ADS)

Mahmood, Ehab A.; Rana, Sohel; Hussin, Abdul Ghapor; Midi, Habshah

2016-06-01

The circular regression model may contain one or more data points which appear to be peculiar or inconsistent with the main part of the model. This may be occur due to recording errors, sudden short events, sampling under abnormal conditions etc. The existence of these data points "outliers" in the data set cause lot of problems in the research results and the conclusions. Therefore, we should identify them before applying statistical analysis. In this article, we aim to propose a statistic to identify outliers in the both of the response and explanatory variables of the simple circular regression model. Our proposed statistic is robust circular distance RCDxy and it is justified by the three robust measurements such as proportion of detection outliers, masking and swamping rates.
Logistic regression applied to natural hazards: rare event logistic regression with replications

NASA Astrophysics Data System (ADS)

Guns, M.; Vanacker, V.

2012-06-01

Statistical analysis of natural hazards needs particular attention, as most of these phenomena are rare events. This study shows that the ordinary rare event logistic regression, as it is now commonly used in geomorphologic studies, does not always lead to a robust detection of controlling factors, as the results can be strongly sample-dependent. In this paper, we introduce some concepts of Monte Carlo simulations in rare event logistic regression. This technique, so-called rare event logistic regression with replications, combines the strength of probabilistic and statistical methods, and allows overcoming some of the limitations of previous developments through robust variable selection. This technique was here developed for the analyses of landslide controlling factors, but the concept is widely applicable for statistical analyses of natural hazards.
Appropriate Statistical Analysis for Two Independent Groups of Likert-Type Data

ERIC Educational Resources Information Center

Warachan, Boonyasit

2011-01-01

The objective of this research was to determine the robustness and statistical power of three different methods for testing the hypothesis that ordinal samples of five and seven Likert categories come from equal populations. The three methods are the two sample t-test with equal variances, the Mann-Whitney test, and the Kolmogorov-Smirnov test. In…
Meta-analysis and The Cochrane Collaboration: 20 years of the Cochrane Statistical Methods Group

PubMed Central

2013-01-01

The Statistical Methods Group has played a pivotal role in The Cochrane Collaboration over the past 20 years. The Statistical Methods Group has determined the direction of statistical methods used within Cochrane reviews, developed guidance for these methods, provided training, and continued to discuss and consider new and controversial issues in meta-analysis. The contribution of Statistical Methods Group members to the meta-analysis literature has been extensive and has helped to shape the wider meta-analysis landscape. In this paper, marking the 20th anniversary of The Cochrane Collaboration, we reflect on the history of the Statistical Methods Group, beginning in 1993 with the identification of aspects of statistical synthesis for which consensus was lacking about the best approach. We highlight some landmark methodological developments that Statistical Methods Group members have contributed to in the field of meta-analysis. We discuss how the Group implements and disseminates statistical methods within The Cochrane Collaboration. Finally, we consider the importance of robust statistical methodology for Cochrane systematic reviews, note research gaps, and reflect on the challenges that the Statistical Methods Group faces in its future direction. PMID:24280020
Robustness of statistical tests for multiplicative terms in the additive main effects and multiplicative interaction model for cultivar trials.

PubMed

Piepho, H P

1995-03-01

The additive main effects multiplicative interaction model is frequently used in the analysis of multilocation trials. In the analysis of such data it is of interest to decide how many of the multiplicative interaction terms are significant. Several tests for this task are available, all of which assume that errors are normally distributed with a common variance. This paper investigates the robustness of several tests (Gollob, F GH1, FGH2, FR)to departures from these assumptions. It is concluded that, because of its better robustness, the F Rtest is preferable. If the other tests are to be used, preliminary tests for the validity of assumptions should be performed.
On Statistical Analysis of Neuroimages with Imperfect Registration

PubMed Central

Kim, Won Hwa; Ravi, Sathya N.; Johnson, Sterling C.; Okonkwo, Ozioma C.; Singh, Vikas

2016-01-01

A variety of studies in neuroscience/neuroimaging seek to perform statistical inference on the acquired brain image scans for diagnosis as well as understanding the pathological manifestation of diseases. To do so, an important first step is to register (or co-register) all of the image data into a common coordinate system. This permits meaningful comparison of the intensities at each voxel across groups (e.g., diseased versus healthy) to evaluate the effects of the disease and/or use machine learning algorithms in a subsequent step. But errors in the underlying registration make this problematic, they either decrease the statistical power or make the follow-up inference tasks less effective/accurate. In this paper, we derive a novel algorithm which offers immunity to local errors in the underlying deformation field obtained from registration procedures. By deriving a deformation invariant representation of the image, the downstream analysis can be made more robust as if one had access to a (hypothetical) far superior registration procedure. Our algorithm is based on recent work on scattering transform. Using this as a starting point, we show how results from harmonic analysis (especially, non-Euclidean wavelets) yields strategies for designing deformation and additive noise invariant representations of large 3-D brain image volumes. We present a set of results on synthetic and real brain images where we achieve robust statistical analysis even in the presence of substantial deformation errors; here, standard analysis procedures significantly under-perform and fail to identify the true signal. PMID:27042168
Statistical Analyses of Scatterplots to Identify Important Factors in Large-Scale Simulations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kleijnen, J.P.C.; Helton, J.C.

1999-04-01

The robustness of procedures for identifying patterns in scatterplots generated in Monte Carlo sensitivity analyses is investigated. These procedures are based on attempts to detect increasingly complex patterns in the scatterplots under consideration and involve the identification of (1) linear relationships with correlation coefficients, (2) monotonic relationships with rank correlation coefficients, (3) trends in central tendency as defined by means, medians and the Kruskal-Wallis statistic, (4) trends in variability as defined by variances and interquartile ranges, and (5) deviations from randomness as defined by the chi-square statistic. The following two topics related to the robustness of these procedures are consideredmore » for a sequence of example analyses with a large model for two-phase fluid flow: the presence of Type I and Type II errors, and the stability of results obtained with independent Latin hypercube samples. Observations from analysis include: (1) Type I errors are unavoidable, (2) Type II errors can occur when inappropriate analysis procedures are used, (3) physical explanations should always be sought for why statistical procedures identify variables as being important, and (4) the identification of important variables tends to be stable for independent Latin hypercube samples.« less
Proficiency Testing for Determination of Water Content in Toluene of Chemical Reagents by iteration robust statistic technique

NASA Astrophysics Data System (ADS)

Wang, Hao; Wang, Qunwei; He, Ming

2018-05-01

In order to investigate and improve the level of detection technology of water content in liquid chemical reagents of domestic laboratories, proficiency testing provider PT0031 (CNAS) has organized proficiency testing program of water content in toluene, 48 laboratories from 18 provinces/cities/municipals took part in the PT. This paper introduces the implementation process of proficiency testing for determination of water content in toluene, including sample preparation, homogeneity and stability test, the results of statistics of iteration robust statistic technique and analysis, summarized and analyzed those of the different test standards which are widely used in the laboratories, put forward the technological suggestions for the improvement of the test quality of water content. Satisfactory results were obtained by 43 laboratories, amounting to 89.6% of the total participating laboratories.
Approach for Input Uncertainty Propagation and Robust Design in CFD Using Sensitivity Derivatives

NASA Technical Reports Server (NTRS)

Putko, Michele M.; Taylor, Arthur C., III; Newman, Perry A.; Green, Lawrence L.

2002-01-01

An implementation of the approximate statistical moment method for uncertainty propagation and robust optimization for quasi 3-D Euler CFD code is presented. Given uncertainties in statistically independent, random, normally distributed input variables, first- and second-order statistical moment procedures are performed to approximate the uncertainty in the CFD output. Efficient calculation of both first- and second-order sensitivity derivatives is required. In order to assess the validity of the approximations, these moments are compared with statistical moments generated through Monte Carlo simulations. The uncertainties in the CFD input variables are also incorporated into a robust optimization procedure. For this optimization, statistical moments involving first-order sensitivity derivatives appear in the objective function and system constraints. Second-order sensitivity derivatives are used in a gradient-based search to successfully execute a robust optimization. The approximate methods used throughout the analyses are found to be valid when considering robustness about input parameter mean values.
Dealing with the Conflicting Results of Psycholinguistic Experiments: How to Resolve Them with the Help of Statistical Meta-analysis.

PubMed

Rákosi, Csilla

2018-01-22

This paper proposes the use of the tools of statistical meta-analysis as a method of conflict resolution with respect to experiments in cognitive linguistics. With the help of statistical meta-analysis, the effect size of similar experiments can be compared, a well-founded and robust synthesis of the experimental data can be achieved, and possible causes of any divergence(s) in the outcomes can be revealed. This application of statistical meta-analysis offers a novel method of how diverging evidence can be dealt with. The workability of this idea is exemplified by a case study dealing with a series of experiments conducted as non-exact replications of Thibodeau and Boroditsky (PLoS ONE 6(2):e16782, 2011. https://doi.org/10.1371/journal.pone.0016782 ).
Statistical Analysis of Big Data on Pharmacogenomics

PubMed Central

Fan, Jianqing; Liu, Han

2013-01-01

This paper discusses statistical methods for estimating complex correlation structure from large pharmacogenomic datasets. We selectively review several prominent statistical methods for estimating large covariance matrix for understanding correlation structure, inverse covariance matrix for network modeling, large-scale simultaneous tests for selecting significantly differently expressed genes and proteins and genetic markers for complex diseases, and high dimensional variable selection for identifying important molecules for understanding molecule mechanisms in pharmacogenomics. Their applications to gene network estimation and biomarker selection are used to illustrate the methodological power. Several new challenges of Big data analysis, including complex data distribution, missing data, measurement error, spurious correlation, endogeneity, and the need for robust statistical methods, are also discussed. PMID:23602905
Performance of Modified Test Statistics in Covariance and Correlation Structure Analysis under Conditions of Multivariate Nonnormality.

ERIC Educational Resources Information Center

Fouladi, Rachel T.

2000-01-01

Provides an overview of standard and modified normal theory and asymptotically distribution-free covariance and correlation structure analysis techniques and details Monte Carlo simulation results on Type I and Type II error control. Demonstrates through the simulation that robustness and nonrobustness of structure analysis techniques vary as a…
Comprehension-Based versus Production-Based Grammar Instruction: A Meta-Analysis of Comparative Studies

ERIC Educational Resources Information Center

Shintani, Natsuko; Li, Shaofeng; Ellis, Rod

2013-01-01

This article reports a meta-analysis of studies that investigated the relative effectiveness of comprehension-based instruction (CBI) and production-based instruction (PBI). The meta-analysis only included studies that featured a direct comparison of CBI and PBI in order to ensure methodological and statistical robustness. A total of 35 research…
Robust Magnetotelluric Impedance Estimation

NASA Astrophysics Data System (ADS)

Sutarno, D.

2010-12-01

Robust magnetotelluric (MT) response function estimators are now in standard use by the induction community. Properly devised and applied, these have ability to reduce the influence of unusual data (outliers). The estimators always yield impedance estimates which are better than the conventional least square (LS) estimation because the `real' MT data almost never satisfy the statistical assumptions of Gaussian distribution and stationary upon which normal spectral analysis is based. This paper discuses the development and application of robust estimation procedures which can be classified as M-estimators to MT data. Starting with the description of the estimators, special attention is addressed to the recent development of a bounded-influence robust estimation, including utilization of the Hilbert Transform (HT) operation on causal MT impedance functions. The resulting robust performances are illustrated using synthetic as well as real MT data.

DARHT Multi-intelligence Seismic and Acoustic Data Analysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Stevens, Garrison Nicole; Van Buren, Kendra Lu; Hemez, Francois M.

The purpose of this report is to document the analysis of seismic and acoustic data collected at the Dual-Axis Radiographic Hydrodynamic Test (DARHT) facility at Los Alamos National Laboratory for robust, multi-intelligence decision making. The data utilized herein is obtained from two tri-axial seismic sensors and three acoustic sensors, resulting in a total of nine data channels. The goal of this analysis is to develop a generalized, automated framework to determine internal operations at DARHT using informative features extracted from measurements collected external of the facility. Our framework involves four components: (1) feature extraction, (2) data fusion, (3) classification, andmore » finally (4) robustness analysis. Two approaches are taken for extracting features from the data. The first of these, generic feature extraction, involves extraction of statistical features from the nine data channels. The second approach, event detection, identifies specific events relevant to traffic entering and leaving the facility as well as explosive activities at DARHT and nearby explosive testing sites. Event detection is completed using a two stage method, first utilizing signatures in the frequency domain to identify outliers and second extracting short duration events of interest among these outliers by evaluating residuals of an autoregressive exogenous time series model. Features extracted from each data set are then fused to perform analysis with a multi-intelligence paradigm, where information from multiple data sets are combined to generate more information than available through analysis of each independently. The fused feature set is used to train a statistical classifier and predict the state of operations to inform a decision maker. We demonstrate this classification using both generic statistical features and event detection and provide a comparison of the two methods. Finally, the concept of decision robustness is presented through a preliminary analysis where uncertainty is added to the system through noise in the measurements.« less
Robust inference from multiple test statistics via permutations: a better alternative to the single test statistic approach for randomized trials.

PubMed

Ganju, Jitendra; Yu, Xinxin; Ma, Guoguang Julie

2013-01-01

Formal inference in randomized clinical trials is based on controlling the type I error rate associated with a single pre-specified statistic. The deficiency of using just one method of analysis is that it depends on assumptions that may not be met. For robust inference, we propose pre-specifying multiple test statistics and relying on the minimum p-value for testing the null hypothesis of no treatment effect. The null hypothesis associated with the various test statistics is that the treatment groups are indistinguishable. The critical value for hypothesis testing comes from permutation distributions. Rejection of the null hypothesis when the smallest p-value is less than the critical value controls the type I error rate at its designated value. Even if one of the candidate test statistics has low power, the adverse effect on the power of the minimum p-value statistic is not much. Its use is illustrated with examples. We conclude that it is better to rely on the minimum p-value rather than a single statistic particularly when that single statistic is the logrank test, because of the cost and complexity of many survival trials. Copyright © 2013 John Wiley & Sons, Ltd.
[Application of Stata software to test heterogeneity in meta-analysis method].

PubMed

Wang, Dan; Mou, Zhen-yun; Zhai, Jun-xia; Zong, Hong-xia; Zhao, Xiao-dong

2008-07-01

To introduce the application of Stata software to heterogeneity test in meta-analysis. A data set was set up according to the example in the study, and the corresponding commands of the methods in Stata 9 software were applied to test the example. The methods used were Q-test and I2 statistic attached to the fixed effect model forest plot, H statistic and Galbraith plot. The existence of the heterogeneity among studies could be detected by Q-test and H statistic and the degree of the heterogeneity could be detected by I2 statistic. The outliers which were the sources of the heterogeneity could be spotted from the Galbraith plot. Heterogeneity test in meta-analysis can be completed by the four methods in Stata software simply and quickly. H and I2 statistics are more robust, and the outliers of the heterogeneity can be clearly seen in the Galbraith plot among the four methods.
Assessment of Reliable Change Using 95% Credible Intervals for the Differences in Proportions: A Statistical Analysis for Case-Study Methodology

ERIC Educational Resources Information Center

Unicomb, Rachael; Colyvas, Kim; Harrison, Elisabeth; Hewat, Sally

2015-01-01

Purpose: Case-study methodology studying change is often used in the field of speech-language pathology, but it can be criticized for not being statistically robust. Yet with the heterogeneous nature of many communication disorders, case studies allow clinicians and researchers to closely observe and report on change. Such information is valuable…
Design and analysis issues in quantitative proteomics studies.

PubMed

Karp, Natasha A; Lilley, Kathryn S

2007-09-01

Quantitative proteomics is the comparison of distinct proteomes which enables the identification of protein species which exhibit changes in expression or post-translational state in response to a given stimulus. Many different quantitative techniques are being utilized and generate large datasets. Independent of the technique used, these large datasets need robust data analysis to ensure valid conclusions are drawn from such studies. Approaches to address the problems that arise with large datasets are discussed to give insight into the types of statistical analyses of data appropriate for the various experimental strategies that can be employed by quantitative proteomic studies. This review also highlights the importance of employing a robust experimental design and highlights various issues surrounding the design of experiments. The concepts and examples discussed within will show how robust design and analysis will lead to confident results that will ensure quantitative proteomics delivers.
Sex differences in discriminative power of volleyball game-related statistics.

PubMed

João, Paulo Vicente; Leite, Nuno; Mesquita, Isabel; Sampaio, Jaime

2010-12-01

To identify sex differences in volleyball game-related statistics, the game-related statistics of several World Championships in 2007 (N=132) were analyzed using the software VIS from the International Volleyball Federation. Discriminant analysis was used to identify the game-related statistics which better discriminated performances by sex. Analysis yielded an emphasis on fault serves (SC = -.40), shot spikes (SC = .40), and reception digs (SC = .31). Specific robust numbers represent that considerable variability was evident in the game-related statistics profile, as men's volleyball games were better associated with terminal actions (errors of service), and women's volleyball games were characterized by continuous actions (in defense and attack). These differences may be related to the anthropometric and physiological differences between women and men and their influence on performance profiles.
Multilevel Factor Analysis by Model Segregation: New Applications for Robust Test Statistics

ERIC Educational Resources Information Center

Schweig, Jonathan

2014-01-01

Measures of classroom environments have become central to policy efforts that assess school and teacher quality. This has sparked a wide interest in using multilevel factor analysis to test measurement hypotheses about classroom-level variables. One approach partitions the total covariance matrix and tests models separately on the…
Approach for Uncertainty Propagation and Robust Design in CFD Using Sensitivity Derivatives

NASA Technical Reports Server (NTRS)

Putko, Michele M.; Newman, Perry A.; Taylor, Arthur C., III; Green, Lawrence L.

2001-01-01

This paper presents an implementation of the approximate statistical moment method for uncertainty propagation and robust optimization for a quasi 1-D Euler CFD (computational fluid dynamics) code. Given uncertainties in statistically independent, random, normally distributed input variables, a first- and second-order statistical moment matching procedure is performed to approximate the uncertainty in the CFD output. Efficient calculation of both first- and second-order sensitivity derivatives is required. In order to assess the validity of the approximations, the moments are compared with statistical moments generated through Monte Carlo simulations. The uncertainties in the CFD input variables are also incorporated into a robust optimization procedure. For this optimization, statistical moments involving first-order sensitivity derivatives appear in the objective function and system constraints. Second-order sensitivity derivatives are used in a gradient-based search to successfully execute a robust optimization. The approximate methods used throughout the analyses are found to be valid when considering robustness about input parameter mean values.
Statistical analysis of RHIC beam position monitors performance

NASA Astrophysics Data System (ADS)

Calaga, R.; Tomás, R.

2004-04-01

A detailed statistical analysis of beam position monitors (BPM) performance at RHIC is a critical factor in improving regular operations and future runs. Robust identification of malfunctioning BPMs plays an important role in any orbit or turn-by-turn analysis. Singular value decomposition and Fourier transform methods, which have evolved as powerful numerical techniques in signal processing, will aid in such identification from BPM data. This is the first attempt at RHIC to use a large set of data to statistically enhance the capability of these two techniques and determine BPM performance. A comparison from run 2003 data shows striking agreement between the two methods and hence can be used to improve BPM functioning at RHIC and possibly other accelerators.
On the Measurement of Morphology and Its Change.

DTIC Science & Technology

1982-03-01

that transitions within the ceratopsian dinosaurs necessitated overly complex or impossible grid deformations. THETA-RHO ANALYSIS An alternative...1982, A robust comparison of the three dimensional configurations of protein molelcules. Tech. Rpt. 224, Ser. 2, Dept. Statistics Princeton Univ., 18 p
Assessing the Robustness of Graph Statistics for Network Analysis Under Incomplete Information

DTIC Science & Technology

strategy for dismantling these networks based on their network structure. However, these strategies typically assume complete information about the...combat them with missing information . This thesis analyzes the performance of a variety of network statistics in the context of incomplete information by...leveraging simulation to remove nodes and edges from networks and evaluating the effect this missing information has on our ability to accurately
Multivariate Meta-Analysis of Heterogeneous Studies Using Only Summary Statistics: Efficiency and Robustness

PubMed Central

Liu, Dungang; Liu, Regina; Xie, Minge

2014-01-01

Meta-analysis has been widely used to synthesize evidence from multiple studies for common hypotheses or parameters of interest. However, it has not yet been fully developed for incorporating heterogeneous studies, which arise often in applications due to different study designs, populations or outcomes. For heterogeneous studies, the parameter of interest may not be estimable for certain studies, and in such a case, these studies are typically excluded from conventional meta-analysis. The exclusion of part of the studies can lead to a non-negligible loss of information. This paper introduces a metaanalysis for heterogeneous studies by combining the confidence density functions derived from the summary statistics of individual studies, hence referred to as the CD approach. It includes all the studies in the analysis and makes use of all information, direct as well as indirect. Under a general likelihood inference framework, this new approach is shown to have several desirable properties, including: i) it is asymptotically as efficient as the maximum likelihood approach using individual participant data (IPD) from all studies; ii) unlike the IPD analysis, it suffices to use summary statistics to carry out the CD approach. Individual-level data are not required; and iii) it is robust against misspecification of the working covariance structure of the parameter estimates. Besides its own theoretical significance, the last property also substantially broadens the applicability of the CD approach. All the properties of the CD approach are further confirmed by data simulated from a randomized clinical trials setting as well as by real data on aircraft landing performance. Overall, one obtains an unifying approach for combining summary statistics, subsuming many of the existing meta-analysis methods as special cases. PMID:26190875
How to Use Value-Added Analysis to Improve Student Learning: A Field Guide for School and District Leaders

ERIC Educational Resources Information Center

Kennedy, Kate; Peters, Mary; Thomas, Mike

2012-01-01

Value-added analysis is the most robust, statistically significant method available for helping educators quantify student progress over time. This powerful tool also reveals tangible strategies for improving instruction. Built around the work of Battelle for Kids, this book provides a field-tested continuous improvement model for using…
Passing the Test: Ecological Regression Analysis in the Los Angeles County Case and Beyond.

ERIC Educational Resources Information Center

Lichtman, Allan J.

1991-01-01

Statistical analysis of racially polarized voting prepared for the Garza v County of Los Angeles (California) (1990) voting rights case is reviewed to demonstrate that ecological regression is a flexible, robust technique that illuminates the reality of ethnic voting, and superior to the neighborhood model supported by the defendants. (SLD)
Crux: Rapid Open Source Protein Tandem Mass Spectrometry Analysis

PubMed Central

2015-01-01

Efficiently and accurately analyzing big protein tandem mass spectrometry data sets requires robust software that incorporates state-of-the-art computational, machine learning, and statistical methods. The Crux mass spectrometry analysis software toolkit (http://cruxtoolkit.sourceforge.net) is an open source project that aims to provide users with a cross-platform suite of analysis tools for interpreting protein mass spectrometry data. PMID:25182276
Control design and robustness analysis of a ball and plate system by using polynomial chaos

DOE Office of Scientific and Technical Information (OSTI.GOV)

Colón, Diego; Balthazar, José M.; Reis, Célia A. dos

2014-12-10

In this paper, we present a mathematical model of a ball and plate system, a control law and analyze its robustness properties by using the polynomial chaos method. The ball rolls without slipping. There is an auxiliary robot vision system that determines the bodies' positions and velocities, and is used for control purposes. The actuators are to orthogonal DC motors, that changes the plate's angles with the ground. The model is a extension of the ball and beam system and is highly nonlinear. The system is decoupled in two independent equations for coordinates x and y. Finally, the resulting nonlinearmore » closed loop systems are analyzed by the polynomial chaos methodology, which considers that some system parameters are random variables, and generates statistical data that can be used in the robustness analysis.« less
Control design and robustness analysis of a ball and plate system by using polynomial chaos

NASA Astrophysics Data System (ADS)

Colón, Diego; Balthazar, José M.; dos Reis, Célia A.; Bueno, Átila M.; Diniz, Ivando S.; de S. R. F. Rosa, Suelia

2014-12-01

In this paper, we present a mathematical model of a ball and plate system, a control law and analyze its robustness properties by using the polynomial chaos method. The ball rolls without slipping. There is an auxiliary robot vision system that determines the bodies' positions and velocities, and is used for control purposes. The actuators are to orthogonal DC motors, that changes the plate's angles with the ground. The model is a extension of the ball and beam system and is highly nonlinear. The system is decoupled in two independent equations for coordinates x and y. Finally, the resulting nonlinear closed loop systems are analyzed by the polynomial chaos methodology, which considers that some system parameters are random variables, and generates statistical data that can be used in the robustness analysis.
Robust crop and weed segmentation under uncontrolled outdoor illumination

USDA-ARS?s Scientific Manuscript database

A new machine vision for weed detection was developed from RGB color model images. Processes included in the algorithm for the detection were excessive green conversion, threshold value computation by statistical analysis, adaptive image segmentation by adjusting the threshold value, median filter, ...
Association between DAOA gene polymorphisms and the risk of schizophrenia, bipolar disorder and depressive disorder.

PubMed

Tan, Jinjing; Lin, Yu; Su, Li; Yan, Yan; Chen, Qing; Jiang, Haiyun; Wei, Qiugui; Gu, Lian

2014-06-03

Schizophrenia (SCZ), bipolar disorder (BD) and depressive disorder (DD) are common psychiatric disorders, which show common genetic vulnerability. Previous gene-disease association studies have reported correlations between d-amino acid oxidase activator (DAOA) gene polymorphisms and the three psychiatric disorders. However, the findings were contradictory. A meta-analysis was therefore conducted to provide more robust investigations into DAOA polymorphisms and the risk of SCZ, BD and DD. This meta-analysis recruited 46 published studies up to July 2013, including 17,515 cases and 25,189 controls. Odds ratios (ORs) with 95% confidence intervals (CIs) were used to evaluate the association between three specific DAOA SNPs and SCZ, BD and DD. Publication bias was tested by Begg's test and funnel plot, and heterogeneity was assessed by the Cochran's chi-square-based Q statistic and the inconsistency index (I(2)). Moreover, the robustness of the findings was estimated by cumulative meta-analysis. DAOA genetic polymorphisms (M15, M18 and M23) were not found to confer a statistically significant increased risk of SCZ, BD or DD in the overall sample, or in Caucasians and Asians following subgroup analysis. The current study indicated that M15, M18 and M23 might not be the risk factor for SCZ, BD or DD. However, further studies are required to provide robust evidence to estimate the association between DAOA polymorphisms and psychiatric disorders. Copyright © 2014 Elsevier Inc. All rights reserved.
Robust Kalman filter design for predictive wind shear detection

NASA Technical Reports Server (NTRS)

Stratton, Alexander D.; Stengel, Robert F.

1991-01-01

Severe, low-altitude wind shear is a threat to aviation safety. Airborne sensors under development measure the radial component of wind along a line directly in front of an aircraft. In this paper, optimal estimation theory is used to define a detection algorithm to warn of hazardous wind shear from these sensors. To achieve robustness, a wind shear detection algorithm must distinguish threatening wind shear from less hazardous gustiness, despite variations in wind shear structure. This paper presents statistical analysis methods to refine wind shear detection algorithm robustness. Computational methods predict the ability to warn of severe wind shear and avoid false warning. Comparative capability of the detection algorithm as a function of its design parameters is determined, identifying designs that provide robust detection of severe wind shear.

Gene flow analysis method, the D-statistic, is robust in a wide parameter space.

PubMed

Zheng, Yichen; Janke, Axel

2018-01-08

We evaluated the sensitivity of the D-statistic, a parsimony-like method widely used to detect gene flow between closely related species. This method has been applied to a variety of taxa with a wide range of divergence times. However, its parameter space and thus its applicability to a wide taxonomic range has not been systematically studied. Divergence time, population size, time of gene flow, distance of outgroup and number of loci were examined in a sensitivity analysis. The sensitivity study shows that the primary determinant of the D-statistic is the relative population size, i.e. the population size scaled by the number of generations since divergence. This is consistent with the fact that the main confounding factor in gene flow detection is incomplete lineage sorting by diluting the signal. The sensitivity of the D-statistic is also affected by the direction of gene flow, size and number of loci. In addition, we examined the ability of the f-statistics, [Formula: see text] and [Formula: see text], to estimate the fraction of a genome affected by gene flow; while these statistics are difficult to implement to practical questions in biology due to lack of knowledge of when the gene flow happened, they can be used to compare datasets with identical or similar demographic background. The D-statistic, as a method to detect gene flow, is robust against a wide range of genetic distances (divergence times) but it is sensitive to population size. The D-statistic should only be applied with critical reservation to taxa where population sizes are large relative to branch lengths in generations.
A statistically harmonized alignment-classification in image space enables accurate and robust alignment of noisy images in single particle analysis.

PubMed

Kawata, Masaaki; Sato, Chikara

2007-06-01

In determining the three-dimensional (3D) structure of macromolecular assemblies in single particle analysis, a large representative dataset of two-dimensional (2D) average images from huge number of raw images is a key for high resolution. Because alignments prior to averaging are computationally intensive, currently available multireference alignment (MRA) software does not survey every possible alignment. This leads to misaligned images, creating blurred averages and reducing the quality of the final 3D reconstruction. We present a new method, in which multireference alignment is harmonized with classification (multireference multiple alignment: MRMA). This method enables a statistical comparison of multiple alignment peaks, reflecting the similarities between each raw image and a set of reference images. Among the selected alignment candidates for each raw image, misaligned images are statistically excluded, based on the principle that aligned raw images of similar projections have a dense distribution around the correctly aligned coordinates in image space. This newly developed method was examined for accuracy and speed using model image sets with various signal-to-noise ratios, and with electron microscope images of the Transient Receptor Potential C3 and the sodium channel. In every data set, the newly developed method outperformed conventional methods in robustness against noise and in speed, creating 2D average images of higher quality. This statistically harmonized alignment-classification combination should greatly improve the quality of single particle analysis.
Robustness of Multiple Objective Decision Analysis Preference Functions

DTIC Science & Technology

2002-06-01

p p′ : The probability of some event. ,i ip q : The probability of event . i Π : An aggregation of proportional data used in calculating a test ...statistical tests of the significance of the term and also is conducted in a multivariate framework rather than the ROSA univariate approach. A...residual error is ˆ−e = y y (45) The coefficient provides a ready indicator of the contribution for the associated variable and statistical tests
Validating an Air Traffic Management Concept of Operation Using Statistical Modeling

NASA Technical Reports Server (NTRS)

He, Yuning; Davies, Misty Dawn

2013-01-01

Validating a concept of operation for a complex, safety-critical system (like the National Airspace System) is challenging because of the high dimensionality of the controllable parameters and the infinite number of states of the system. In this paper, we use statistical modeling techniques to explore the behavior of a conflict detection and resolution algorithm designed for the terminal airspace. These techniques predict the robustness of the system simulation to both nominal and off-nominal behaviors within the overall airspace. They also can be used to evaluate the output of the simulation against recorded airspace data. Additionally, the techniques carry with them a mathematical value of the worth of each prediction-a statistical uncertainty for any robustness estimate. Uncertainty Quantification (UQ) is the process of quantitative characterization and ultimately a reduction of uncertainties in complex systems. UQ is important for understanding the influence of uncertainties on the behavior of a system and therefore is valuable for design, analysis, and verification and validation. In this paper, we apply advanced statistical modeling methodologies and techniques on an advanced air traffic management system, namely the Terminal Tactical Separation Assured Flight Environment (T-TSAFE). We show initial results for a parameter analysis and safety boundary (envelope) detection in the high-dimensional parameter space. For our boundary analysis, we developed a new sequential approach based upon the design of computer experiments, allowing us to incorporate knowledge from domain experts into our modeling and to determine the most likely boundary shapes and its parameters. We carried out the analysis on system parameters and describe an initial approach that will allow us to include time-series inputs, such as the radar track data, into the analysis
Statistical methods for change-point detection in surface temperature records

NASA Astrophysics Data System (ADS)

Pintar, A. L.; Possolo, A.; Zhang, N. F.

2013-09-01

We describe several statistical methods to detect possible change-points in a time series of values of surface temperature measured at a meteorological station, and to assess the statistical significance of such changes, taking into account the natural variability of the measured values, and the autocorrelations between them. These methods serve to determine whether the record may suffer from biases unrelated to the climate signal, hence whether there may be a need for adjustments as considered by M. J. Menne and C. N. Williams (2009) "Homogenization of Temperature Series via Pairwise Comparisons", Journal of Climate 22 (7), 1700-1717. We also review methods to characterize patterns of seasonality (seasonal decomposition using monthly medians or robust local regression), and explain the role they play in the imputation of missing values, and in enabling robust decompositions of the measured values into a seasonal component, a possible climate signal, and a station-specific remainder. The methods for change-point detection that we describe include statistical process control, wavelet multi-resolution analysis, adaptive weights smoothing, and a Bayesian procedure, all of which are applicable to single station records.
Approximations to the distribution of a test statistic in covariance structure analysis: A comprehensive study.

PubMed

Wu, Hao

2018-05-01

In structural equation modelling (SEM), a robust adjustment to the test statistic or to its reference distribution is needed when its null distribution deviates from a χ 2 distribution, which usually arises when data do not follow a multivariate normal distribution. Unfortunately, existing studies on this issue typically focus on only a few methods and neglect the majority of alternative methods in statistics. Existing simulation studies typically consider only non-normal distributions of data that either satisfy asymptotic robustness or lead to an asymptotic scaled χ 2 distribution. In this work we conduct a comprehensive study that involves both typical methods in SEM and less well-known methods from the statistics literature. We also propose the use of several novel non-normal data distributions that are qualitatively different from the non-normal distributions widely used in existing studies. We found that several under-studied methods give the best performance under specific conditions, but the Satorra-Bentler method remains the most viable method for most situations. © 2017 The British Psychological Society.
Consequences of Assumption Violations Revisited: A Quantitative Review of Alternatives to the One-Way Analysis of Variance "F" Test.

ERIC Educational Resources Information Center

Lix, Lisa M.; And Others

1996-01-01

Meta-analytic techniques were used to summarize the statistical robustness literature on Type I error properties of alternatives to the one-way analysis of variance "F" test. The James (1951) and Welch (1951) tests performed best under violations of the variance homogeneity assumption, although their use is not always appropriate. (SLD)
Project risk management in the construction of high-rise buildings

NASA Astrophysics Data System (ADS)

Titarenko, Boris; Hasnaoui, Amir; Titarenko, Roman; Buzuk, Liliya

2018-03-01

This paper shows the project risk management methods, which allow to better identify risks in the construction of high-rise buildings and to manage them throughout the life cycle of the project. One of the project risk management processes is a quantitative analysis of risks. The quantitative analysis usually includes the assessment of the potential impact of project risks and their probabilities. This paper shows the most popular methods of risk probability assessment and tries to indicate the advantages of the robust approach over the traditional methods. Within the framework of the project risk management model a robust approach of P. Huber is applied and expanded for the tasks of regression analysis of project data. The suggested algorithms used to assess the parameters in statistical models allow to obtain reliable estimates. A review of the theoretical problems of the development of robust models built on the methodology of the minimax estimates was done and the algorithm for the situation of asymmetric "contamination" was developed.
Statistical evidence of strain induced breaking of metallic point contacts

NASA Astrophysics Data System (ADS)

Alwan, Monzer; Candoni, Nadine; Dumas, Philippe; Klein, Hubert R.

2013-06-01

A scanning tunneling microscopy in break junction regime and a mechanically controllable break junction are used to acquire thousands of conductance-elongation curves by stretching until breaking and re-connecting Au junctions. From a robust statistical analysis performed on large sets of experiments, parameters such as lifetime, elongation and occurrence probabilities are extracted. The analysis of results obtained for different stretching speeds of the electrodes indicates that the breaking mechanism of di- and mono-atomic junction is identical, and that the junctions undergo atomic rearrangement during their stretching and at the moment of breaking.
Robustness of meta-analyses in finding gene × environment interactions

PubMed Central

Shi, Gang; Nehorai, Arye

2017-01-01

Meta-analyses that synthesize statistical evidence across studies have become important analytical tools for genetic studies. Inspired by the success of genome-wide association studies of the genetic main effect, researchers are searching for gene × environment interactions. Confounders are routinely included in the genome-wide gene × environment interaction analysis as covariates; however, this does not control for any confounding effects on the results if covariate × environment interactions are present. We carried out simulation studies to evaluate the robustness to the covariate × environment confounder for meta-regression and joint meta-analysis, which are two commonly used meta-analysis methods for testing the gene × environment interaction or the genetic main effect and interaction jointly. Here we show that meta-regression is robust to the covariate × environment confounder while joint meta-analysis is subject to the confounding effect with inflated type I error rates. Given vast sample sizes employed in genome-wide gene × environment interaction studies, non-significant covariate × environment interactions at the study level could substantially elevate the type I error rate at the consortium level. When covariate × environment confounders are present, type I errors can be controlled in joint meta-analysis by including the covariate × environment terms in the analysis at the study level. Alternatively, meta-regression can be applied, which is robust to potential covariate × environment confounders. PMID:28362796
Capturing Positive Transgressive Variation From Wild And Exotic Germplasm Resources

USDA-ARS?s Scientific Manuscript database

Only a small fraction of the naturally occurring genetic diversity available in rice germplasm repositories around the world has been explored to date. This is beginning to change with the advent of affordable, high throughput genotyping approaches coupled with robust statistical analysis methods th...
Regional-scale analysis of extreme precipitation from short and fragmented records

NASA Astrophysics Data System (ADS)

Libertino, Andrea; Allamano, Paola; Laio, Francesco; Claps, Pierluigi

2018-02-01

Rain gauge is the oldest and most accurate instrument for rainfall measurement, able to provide long series of reliable data. However, rain gauge records are often plagued by gaps, spatio-temporal discontinuities and inhomogeneities that could affect their suitability for a statistical assessment of the characteristics of extreme rainfall. Furthermore, the need to discard the shorter series for obtaining robust estimates leads to ignore a significant amount of information which can be essential, especially when large return periods estimates are sought. This work describes a robust statistical framework for dealing with uneven and fragmented rainfall records on a regional spatial domain. The proposed technique, named "patched kriging" allows one to exploit all the information available from the recorded series, independently of their length, to provide extreme rainfall estimates in ungauged areas. The methodology involves the sequential application of the ordinary kriging equations, producing a homogeneous dataset of synthetic series with uniform lengths. In this way, the errors inherent to any regional statistical estimation can be easily represented in the spatial domain and, possibly, corrected. Furthermore, the homogeneity of the obtained series, provides robustness toward local artefacts during the parameter-estimation phase. The application to a case study in the north-western Italy demonstrates the potential of the methodology and provides a significant base for discussing its advantages over previous techniques.
The Problem of Size in Robust Design

NASA Technical Reports Server (NTRS)

Koch, Patrick N.; Allen, Janet K.; Mistree, Farrokh; Mavris, Dimitri

1997-01-01

To facilitate the effective solution of multidisciplinary, multiobjective complex design problems, a departure from the traditional parametric design analysis and single objective optimization approaches is necessary in the preliminary stages of design. A necessary tradeoff becomes one of efficiency vs. accuracy as approximate models are sought to allow fast analysis and effective exploration of a preliminary design space. In this paper we apply a general robust design approach for efficient and comprehensive preliminary design to a large complex system: a high speed civil transport (HSCT) aircraft. Specifically, we investigate the HSCT wing configuration design, incorporating life cycle economic uncertainties to identify economically robust solutions. The approach is built on the foundation of statistical experimentation and modeling techniques and robust design principles, and is specialized through incorporation of the compromise Decision Support Problem for multiobjective design. For large problems however, as in the HSCT example, this robust design approach developed for efficient and comprehensive design breaks down with the problem of size - combinatorial explosion in experimentation and model building with number of variables -and both efficiency and accuracy are sacrificed. Our focus in this paper is on identifying and discussing the implications and open issues associated with the problem of size for the preliminary design of large complex systems.
Causal Models and Exploratory Analysis in Heterogeneous Information Fusion for Detecting Potential Terrorists

DTIC Science & Technology

2015-11-01

issues and made some of the same distinctions (Walker, Lempert, and Kwakkel ; Bankes, Lempert, and Popper, 2005), but it did appear that we had more than...Statistics,” British Journal of Mathematical Statistics and Pscyhology, 66, pp. 8-38. Jaynes, Edwin T., and G . Larry Bretthorst (ed.) (2003) , Probability...Giroux. Lempert, Robert J., David G . Groves, Steven W. Popper, and Steven C. Bankes (2006), “A General Analytic Method for Generating Robust
Robust demarcation of basal cell carcinoma by dependent component analysis-based segmentation of multi-spectral fluorescence images.

PubMed

Kopriva, Ivica; Persin, Antun; Puizina-Ivić, Neira; Mirić, Lina

2010-07-02

This study was designed to demonstrate robust performance of the novel dependent component analysis (DCA)-based approach to demarcation of the basal cell carcinoma (BCC) through unsupervised decomposition of the red-green-blue (RGB) fluorescent image of the BCC. Robustness to intensity fluctuation is due to the scale invariance property of DCA algorithms, which exploit spectral and spatial diversities between the BCC and the surrounding tissue. Used filtering-based DCA approach represents an extension of the independent component analysis (ICA) and is necessary in order to account for statistical dependence that is induced by spectral similarity between the BCC and surrounding tissue. This generates weak edges what represents a challenge for other segmentation methods as well. By comparative performance analysis with state-of-the-art image segmentation methods such as active contours (level set), K-means clustering, non-negative matrix factorization, ICA and ratio imaging we experimentally demonstrate good performance of DCA-based BCC demarcation in two demanding scenarios where intensity of the fluorescent image has been varied almost two orders of magnitude. Copyright 2010 Elsevier B.V. All rights reserved.
Biostatistical analysis of quantitative immunofluorescence microscopy images.

PubMed

Giles, C; Albrecht, M A; Lam, V; Takechi, R; Mamo, J C

2016-12-01

Semiquantitative immunofluorescence microscopy has become a key methodology in biomedical research. Typical statistical workflows are considered in the context of avoiding pseudo-replication and marginalising experimental error. However, immunofluorescence microscopy naturally generates hierarchically structured data that can be leveraged to improve statistical power and enrich biological interpretation. Herein, we describe a robust distribution fitting procedure and compare several statistical tests, outlining their potential advantages/disadvantages in the context of biological interpretation. Further, we describe tractable procedures for power analysis that incorporates the underlying distribution, sample size and number of images captured per sample. The procedures outlined have significant potential for increasing understanding of biological processes and decreasing both ethical and financial burden through experimental optimization. © 2016 The Authors Journal of Microscopy © 2016 Royal Microscopical Society.
A robust and efficient statistical method for genetic association studies using case and control samples from multiple cohorts

PubMed Central

2013-01-01

Background The theoretical basis of genome-wide association studies (GWAS) is statistical inference of linkage disequilibrium (LD) between any polymorphic marker and a putative disease locus. Most methods widely implemented for such analyses are vulnerable to several key demographic factors and deliver a poor statistical power for detecting genuine associations and also a high false positive rate. Here, we present a likelihood-based statistical approach that accounts properly for non-random nature of case–control samples in regard of genotypic distribution at the loci in populations under study and confers flexibility to test for genetic association in presence of different confounding factors such as population structure, non-randomness of samples etc. Results We implemented this novel method together with several popular methods in the literature of GWAS, to re-analyze recently published Parkinson’s disease (PD) case–control samples. The real data analysis and computer simulation show that the new method confers not only significantly improved statistical power for detecting the associations but also robustness to the difficulties stemmed from non-randomly sampling and genetic structures when compared to its rivals. In particular, the new method detected 44 significant SNPs within 25 chromosomal regions of size < 1 Mb but only 6 SNPs in two of these regions were previously detected by the trend test based methods. It discovered two SNPs located 1.18 Mb and 0.18 Mb from the PD candidates, FGF20 and PARK8, without invoking false positive risk. Conclusions We developed a novel likelihood-based method which provides adequate estimation of LD and other population model parameters by using case and control samples, the ease in integration of these samples from multiple genetically divergent populations and thus confers statistically robust and powerful analyses of GWAS. On basis of simulation studies and analysis of real datasets, we demonstrated significant improvement of the new method over the non-parametric trend test, which is the most popularly implemented in the literature of GWAS. PMID:23394771
Enhanced echolocation via robust statistics and super-resolution of sonar images

NASA Astrophysics Data System (ADS)

Kim, Kio

Echolocation is a process in which an animal uses acoustic signals to exchange information with environments. In a recent study, Neretti et al. have shown that the use of robust statistics can significantly improve the resiliency of echolocation against noise and enhance its accuracy by suppressing the development of sidelobes in the processing of an echo signal. In this research, the use of robust statistics is extended to problems in underwater explorations. The dissertation consists of two parts. Part I describes how robust statistics can enhance the identification of target objects, which in this case are cylindrical containers filled with four different liquids. Particularly, this work employs a variation of an existing robust estimator called an L-estimator, which was first suggested by Koenker and Bassett. As pointed out by Au et al.; a 'highlight interval' is an important feature, and it is closely related with many other important features that are known to be crucial for dolphin echolocation. A varied L-estimator described in this text is used to enhance the detection of highlight intervals, which eventually leads to a successful classification of echo signals. Part II extends the problem into 2 dimensions. Thanks to the advances in material and computer technology, various sonar imaging modalities are available on the market. By registering acoustic images from such video sequences, one can extract more information on the region of interest. Computer vision and image processing allowed application of robust statistics to the acoustic images produced by forward looking sonar systems, such as Dual-frequency Identification Sonar and ProViewer. The first use of robust statistics for sonar image enhancement in this text is in image registration. Random Sampling Consensus (RANSAC) is widely used for image registration. The registration algorithm using RANSAC is optimized for sonar image registration, and the performance is studied. The second use of robust statistics is in fusing the images. It is shown that the maximum a posteriori fusion method can be formulated in a Kalman filter-like manner, and also that the resulting expression is identical to a W-estimator with a specific weight function.
Restoration of MRI data for intensity non-uniformities using local high order intensity statistics

PubMed Central

Hadjidemetriou, Stathis; Studholme, Colin; Mueller, Susanne; Weiner, Michael; Schuff, Norbert

2008-01-01

MRI at high magnetic fields (>3.0 T) is complicated by strong inhomogeneous radio-frequency fields, sometimes termed the “bias field”. These lead to non-biological intensity non-uniformities across the image. They can complicate further image analysis such as registration and tissue segmentation. Existing methods for intensity uniformity restoration have been optimized for 1.5 T, but they are less effective for 3.0 T MRI, and not at all satisfactory for higher fields. Also, many of the existing restoration algorithms require a brain template or use a prior atlas, which can restrict their practicalities. In this study an effective intensity uniformity restoration algorithm has been developed based on non-parametric statistics of high order local intensity co-occurrences. These statistics are restored with a non-stationary Wiener filter. The algorithm also assumes a smooth non-uniformity and is stable. It does not require a prior atlas and is robust to variations in anatomy. In geriatric brain imaging it is robust to variations such as enlarged ventricles and low contrast to noise ratio. The co-occurrence statistics improve robustness to whole head images with pronounced non-uniformities present in high field acquisitions. Its significantly improved performance and lower time requirements have been demonstrated by comparing it to the very commonly used N3 algorithm on BrainWeb MR simulator images as well as on real 4 T human head images. PMID:18621568
Ranking metrics in gene set enrichment analysis: do they matter?

PubMed

Zyla, Joanna; Marczyk, Michal; Weiner, January; Polanska, Joanna

2017-05-12

There exist many methods for describing the complex relation between changes of gene expression in molecular pathways or gene ontologies under different experimental conditions. Among them, Gene Set Enrichment Analysis seems to be one of the most commonly used (over 10,000 citations). An important parameter, which could affect the final result, is the choice of a metric for the ranking of genes. Applying a default ranking metric may lead to poor results. In this work 28 benchmark data sets were used to evaluate the sensitivity and false positive rate of gene set analysis for 16 different ranking metrics including new proposals. Furthermore, the robustness of the chosen methods to sample size was tested. Using k-means clustering algorithm a group of four metrics with the highest performance in terms of overall sensitivity, overall false positive rate and computational load was established i.e. absolute value of Moderated Welch Test statistic, Minimum Significant Difference, absolute value of Signal-To-Noise ratio and Baumgartner-Weiss-Schindler test statistic. In case of false positive rate estimation, all selected ranking metrics were robust with respect to sample size. In case of sensitivity, the absolute value of Moderated Welch Test statistic and absolute value of Signal-To-Noise ratio gave stable results, while Baumgartner-Weiss-Schindler and Minimum Significant Difference showed better results for larger sample size. Finally, the Gene Set Enrichment Analysis method with all tested ranking metrics was parallelised and implemented in MATLAB, and is available at https://github.com/ZAEDPolSl/MrGSEA . Choosing a ranking metric in Gene Set Enrichment Analysis has critical impact on results of pathway enrichment analysis. The absolute value of Moderated Welch Test has the best overall sensitivity and Minimum Significant Difference has the best overall specificity of gene set analysis. When the number of non-normally distributed genes is high, using Baumgartner-Weiss-Schindler test statistic gives better outcomes. Also, it finds more enriched pathways than other tested metrics, which may induce new biological discoveries.

Robust LOD scores for variance component-based linkage analysis.

PubMed

Blangero, J; Williams, J T; Almasy, L

2000-01-01

The variance component method is now widely used for linkage analysis of quantitative traits. Although this approach offers many advantages, the importance of the underlying assumption of multivariate normality of the trait distribution within pedigrees has not been studied extensively. Simulation studies have shown that traits with leptokurtic distributions yield linkage test statistics that exhibit excessive Type I error when analyzed naively. We derive analytical formulae relating the deviation from the expected asymptotic distribution of the lod score to the kurtosis and total heritability of the quantitative trait. A simple correction constant yields a robust lod score for any deviation from normality and for any pedigree structure, and effectively eliminates the problem of inflated Type I error due to misspecification of the underlying probability model in variance component-based linkage analysis.
On the Use of Statistics in Design and the Implications for Deterministic Computer Experiments

NASA Technical Reports Server (NTRS)

Simpson, Timothy W.; Peplinski, Jesse; Koch, Patrick N.; Allen, Janet K.

1997-01-01

Perhaps the most prevalent use of statistics in engineering design is through Taguchi's parameter and robust design -- using orthogonal arrays to compute signal-to-noise ratios in a process of design improvement. In our view, however, there is an equally exciting use of statistics in design that could become just as prevalent: it is the concept of metamodeling whereby statistical models are built to approximate detailed computer analysis codes. Although computers continue to get faster, analysis codes always seem to keep pace so that their computational time remains non-trivial. Through metamodeling, approximations of these codes are built that are orders of magnitude cheaper to run. These metamodels can then be linked to optimization routines for fast analysis, or they can serve as a bridge for integrating analysis codes across different domains. In this paper we first review metamodeling techniques that encompass design of experiments, response surface methodology, Taguchi methods, neural networks, inductive learning, and kriging. We discuss their existing applications in engineering design and then address the dangers of applying traditional statistical techniques to approximate deterministic computer analysis codes. We conclude with recommendations for the appropriate use of metamodeling techniques in given situations and how common pitfalls can be avoided.
Robust estimation approach for blind denoising.

PubMed

Rabie, Tamer

2005-11-01

This work develops a new robust statistical framework for blind image denoising. Robust statistics addresses the problem of estimation when the idealized assumptions about a system are occasionally violated. The contaminating noise in an image is considered as a violation of the assumption of spatial coherence of the image intensities and is treated as an outlier random variable. A denoised image is estimated by fitting a spatially coherent stationary image model to the available noisy data using a robust estimator-based regression method within an optimal-size adaptive window. The robust formulation aims at eliminating the noise outliers while preserving the edge structures in the restored image. Several examples demonstrating the effectiveness of this robust denoising technique are reported and a comparison with other standard denoising filters is presented.
Robust multivariate nonparametric tests for detection of two-sample location shift in clinical trials

PubMed Central

Jiang, Xuejun; Guo, Xu; Zhang, Ning; Wang, Bo

2018-01-01

This article presents and investigates performance of a series of robust multivariate nonparametric tests for detection of location shift between two multivariate samples in randomized controlled trials. The tests are built upon robust estimators of distribution locations (medians, Hodges-Lehmann estimators, and an extended U statistic) with both unscaled and scaled versions. The nonparametric tests are robust to outliers and do not assume that the two samples are drawn from multivariate normal distributions. Bootstrap and permutation approaches are introduced for determining the p-values of the proposed test statistics. Simulation studies are conducted and numerical results are reported to examine performance of the proposed statistical tests. The numerical results demonstrate that the robust multivariate nonparametric tests constructed from the Hodges-Lehmann estimators are more efficient than those based on medians and the extended U statistic. The permutation approach can provide a more stringent control of Type I error and is generally more powerful than the bootstrap procedure. The proposed robust nonparametric tests are applied to detect multivariate distributional difference between the intervention and control groups in the Thai Healthy Choices study and examine the intervention effect of a four-session motivational interviewing-based intervention developed in the study to reduce risk behaviors among youth living with HIV. PMID:29672555
A system for learning statistical motion patterns.

PubMed

Hu, Weiming; Xiao, Xuejuan; Fu, Zhouyu; Xie, Dan; Tan, Tieniu; Maybank, Steve

2006-09-01

Analysis of motion patterns is an effective approach for anomaly detection and behavior prediction. Current approaches for the analysis of motion patterns depend on known scenes, where objects move in predefined ways. It is highly desirable to automatically construct object motion patterns which reflect the knowledge of the scene. In this paper, we present a system for automatically learning motion patterns for anomaly detection and behavior prediction based on a proposed algorithm for robustly tracking multiple objects. In the tracking algorithm, foreground pixels are clustered using a fast accurate fuzzy K-means algorithm. Growing and prediction of the cluster centroids of foreground pixels ensure that each cluster centroid is associated with a moving object in the scene. In the algorithm for learning motion patterns, trajectories are clustered hierarchically using spatial and temporal information and then each motion pattern is represented with a chain of Gaussian distributions. Based on the learned statistical motion patterns, statistical methods are used to detect anomalies and predict behaviors. Our system is tested using image sequences acquired, respectively, from a crowded real traffic scene and a model traffic scene. Experimental results show the robustness of the tracking algorithm, the efficiency of the algorithm for learning motion patterns, and the encouraging performance of algorithms for anomaly detection and behavior prediction.
Statistical variation in progressive scrambling

NASA Astrophysics Data System (ADS)

Clark, Robert D.; Fox, Peter C.

2004-07-01

The two methods most often used to evaluate the robustness and predictivity of partial least squares (PLS) models are cross-validation and response randomization. Both methods may be overly optimistic for data sets that contain redundant observations, however. The kinds of perturbation analysis widely used for evaluating model stability in the context of ordinary least squares regression are only applicable when the descriptors are independent of each other and errors are independent and normally distributed; neither assumption holds for QSAR in general and for PLS in particular. Progressive scrambling is a novel, non-parametric approach to perturbing models in the response space in a way that does not disturb the underlying covariance structure of the data. Here, we introduce adjustments for two of the characteristic values produced by a progressive scrambling analysis - the deprecated predictivity (Q_s^{ast^2}) and standard error of prediction (SDEP s * ) - that correct for the effect of introduced perturbation. We also explore the statistical behavior of the adjusted values (Q_0^{ast^2} and SDEP 0 * ) and the sensitivity to perturbation (d q 2/d r yy ' 2). It is shown that the three statistics are all robust for stable PLS models, in terms of the stochastic component of their determination and of their variation due to sampling effects involved in training set selection.
Finding differentially expressed genes in high dimensional data: Rank based test statistic via a distance measure.

PubMed

Mathur, Sunil; Sadana, Ajit

2015-12-01

We present a rank-based test statistic for the identification of differentially expressed genes using a distance measure. The proposed test statistic is highly robust against extreme values and does not assume the distribution of parent population. Simulation studies show that the proposed test is more powerful than some of the commonly used methods, such as paired t-test, Wilcoxon signed rank test, and significance analysis of microarray (SAM) under certain non-normal distributions. The asymptotic distribution of the test statistic, and the p-value function are discussed. The application of proposed method is shown using a real-life data set. © The Author(s) 2011.
Agriculture, population growth, and statistical analysis of the radiocarbon record.

PubMed

Zahid, H Jabran; Robinson, Erick; Kelly, Robert L

2016-01-26

The human population has grown significantly since the onset of the Holocene about 12,000 y ago. Despite decades of research, the factors determining prehistoric population growth remain uncertain. Here, we examine measurements of the rate of growth of the prehistoric human population based on statistical analysis of the radiocarbon record. We find that, during most of the Holocene, human populations worldwide grew at a long-term annual rate of 0.04%. Statistical analysis of the radiocarbon record shows that transitioning farming societies experienced the same rate of growth as contemporaneous foraging societies. The same rate of growth measured for populations dwelling in a range of environments and practicing a variety of subsistence strategies suggests that the global climate and/or endogenous biological factors, not adaptability to local environment or subsistence practices, regulated the long-term growth of the human population during most of the Holocene. Our results demonstrate that statistical analyses of large ensembles of radiocarbon dates are robust and valuable for quantitatively investigating the demography of prehistoric human populations worldwide.
The Content of Statistical Requirements for Authors in Biomedical Research Journals

PubMed Central

Liu, Tian-Yi; Cai, Si-Yu; Nie, Xiao-Lu; Lyu, Ya-Qi; Peng, Xiao-Xia; Feng, Guo-Shuang

2016-01-01

Background: Robust statistical designing, sound statistical analysis, and standardized presentation are important to enhance the quality and transparency of biomedical research. This systematic review was conducted to summarize the statistical reporting requirements introduced by biomedical research journals with an impact factor of 10 or above so that researchers are able to give statistical issues’ serious considerations not only at the stage of data analysis but also at the stage of methodological design. Methods: Detailed statistical instructions for authors were downloaded from the homepage of each of the included journals or obtained from the editors directly via email. Then, we described the types and numbers of statistical guidelines introduced by different press groups. Items of statistical reporting guideline as well as particular requirements were summarized in frequency, which were grouped into design, method of analysis, and presentation, respectively. Finally, updated statistical guidelines and particular requirements for improvement were summed up. Results: Totally, 21 of 23 press groups introduced at least one statistical guideline. More than half of press groups can update their statistical instruction for authors gradually relative to issues of new statistical reporting guidelines. In addition, 16 press groups, covering 44 journals, address particular statistical requirements. The most of the particular requirements focused on the performance of statistical analysis and transparency in statistical reporting, including “address issues relevant to research design, including participant flow diagram, eligibility criteria, and sample size estimation,” and “statistical methods and the reasons.” Conclusions: Statistical requirements for authors are becoming increasingly perfected. Statistical requirements for authors remind researchers that they should make sufficient consideration not only in regards to statistical methods during the research design, but also standardized statistical reporting, which would be beneficial in providing stronger evidence and making a greater critical appraisal of evidence more accessible. PMID:27748343
The Content of Statistical Requirements for Authors in Biomedical Research Journals.

PubMed

Liu, Tian-Yi; Cai, Si-Yu; Nie, Xiao-Lu; Lyu, Ya-Qi; Peng, Xiao-Xia; Feng, Guo-Shuang

2016-10-20

Robust statistical designing, sound statistical analysis, and standardized presentation are important to enhance the quality and transparency of biomedical research. This systematic review was conducted to summarize the statistical reporting requirements introduced by biomedical research journals with an impact factor of 10 or above so that researchers are able to give statistical issues' serious considerations not only at the stage of data analysis but also at the stage of methodological design. Detailed statistical instructions for authors were downloaded from the homepage of each of the included journals or obtained from the editors directly via email. Then, we described the types and numbers of statistical guidelines introduced by different press groups. Items of statistical reporting guideline as well as particular requirements were summarized in frequency, which were grouped into design, method of analysis, and presentation, respectively. Finally, updated statistical guidelines and particular requirements for improvement were summed up. Totally, 21 of 23 press groups introduced at least one statistical guideline. More than half of press groups can update their statistical instruction for authors gradually relative to issues of new statistical reporting guidelines. In addition, 16 press groups, covering 44 journals, address particular statistical requirements. The most of the particular requirements focused on the performance of statistical analysis and transparency in statistical reporting, including "address issues relevant to research design, including participant flow diagram, eligibility criteria, and sample size estimation," and "statistical methods and the reasons." Statistical requirements for authors are becoming increasingly perfected. Statistical requirements for authors remind researchers that they should make sufficient consideration not only in regards to statistical methods during the research design, but also standardized statistical reporting, which would be beneficial in providing stronger evidence and making a greater critical appraisal of evidence more accessible.
ASCS online fault detection and isolation based on an improved MPCA

NASA Astrophysics Data System (ADS)

Peng, Jianxin; Liu, Haiou; Hu, Yuhui; Xi, Junqiang; Chen, Huiyan

2014-09-01

Multi-way principal component analysis (MPCA) has received considerable attention and been widely used in process monitoring. A traditional MPCA algorithm unfolds multiple batches of historical data into a two-dimensional matrix and cut the matrix along the time axis to form subspaces. However, low efficiency of subspaces and difficult fault isolation are the common disadvantages for the principal component model. This paper presents a new subspace construction method based on kernel density estimation function that can effectively reduce the storage amount of the subspace information. The MPCA model and the knowledge base are built based on the new subspace. Then, fault detection and isolation with the squared prediction error (SPE) statistic and the Hotelling ( T 2) statistic are also realized in process monitoring. When a fault occurs, fault isolation based on the SPE statistic is achieved by residual contribution analysis of different variables. For fault isolation of subspace based on the T 2 statistic, the relationship between the statistic indicator and state variables is constructed, and the constraint conditions are presented to check the validity of fault isolation. Then, to improve the robustness of fault isolation to unexpected disturbances, the statistic method is adopted to set the relation between single subspace and multiple subspaces to increase the corrective rate of fault isolation. Finally fault detection and isolation based on the improved MPCA is used to monitor the automatic shift control system (ASCS) to prove the correctness and effectiveness of the algorithm. The research proposes a new subspace construction method to reduce the required storage capacity and to prove the robustness of the principal component model, and sets the relationship between the state variables and fault detection indicators for fault isolation.
[Statistical validity of the Mexican Food Security Scale and the Latin American and Caribbean Food Security Scale].

PubMed

Villagómez-Ornelas, Paloma; Hernández-López, Pedro; Carrasco-Enríquez, Brenda; Barrios-Sánchez, Karina; Pérez-Escamilla, Rafael; Melgar-Quiñónez, Hugo

2014-01-01

This article validates the statistical consistency of two food security scales: the Mexican Food Security Scale (EMSA) and the Latin American and Caribbean Food Security Scale (ELCSA). Validity tests were conducted in order to verify that both scales were consistent instruments, conformed by independent, properly calibrated and adequately sorted items, arranged in a continuum of severity. The following tests were developed: sorting of items; Cronbach's alpha analysis; parallelism of prevalence curves; Rasch models; sensitivity analysis through mean differences' hypothesis test. The tests showed that both scales meet the required attributes and are robust statistical instruments for food security measurement. This is relevant given that the lack of access to food indicator, included in multidimensional poverty measurement in Mexico, is calculated with EMSA.
Method for factor analysis of GC/MS data

DOEpatents

Van Benthem, Mark H; Kotula, Paul G; Keenan, Michael R

2012-09-11

The method of the present invention provides a fast, robust, and automated multivariate statistical analysis of gas chromatography/mass spectroscopy (GC/MS) data sets. The method can involve systematic elimination of undesired, saturated peak masses to yield data that follow a linear, additive model. The cleaned data can then be subjected to a combination of PCA and orthogonal factor rotation followed by refinement with MCR-ALS to yield highly interpretable results.
Efficient Computation of Info-Gap Robustness for Finite Element Models

DOE Office of Scientific and Technical Information (OSTI.GOV)

Stull, Christopher J.; Hemez, Francois M.; Williams, Brian J.

2012-07-05

A recent research effort at LANL proposed info-gap decision theory as a framework by which to measure the predictive maturity of numerical models. Info-gap theory explores the trade-offs between accuracy, that is, the extent to which predictions reproduce the physical measurements, and robustness, that is, the extent to which predictions are insensitive to modeling assumptions. Both accuracy and robustness are necessary to demonstrate predictive maturity. However, conducting an info-gap analysis can present a formidable challenge, from the standpoint of the required computational resources. This is because a robustness function requires the resolution of multiple optimization problems. This report offers anmore » alternative, adjoint methodology to assess the info-gap robustness of Ax = b-like numerical models solved for a solution x. Two situations that can arise in structural analysis and design are briefly described and contextualized within the info-gap decision theory framework. The treatments of the info-gap problems, using the adjoint methodology are outlined in detail, and the latter problem is solved for four separate finite element models. As compared to statistical sampling, the proposed methodology offers highly accurate approximations of info-gap robustness functions for the finite element models considered in the report, at a small fraction of the computational cost. It is noted that this report considers only linear systems; a natural follow-on study would extend the methodologies described herein to include nonlinear systems.« less
Managing Student Loan Default Risk: Evidence from a Privately Guaranteed Portfolio.

ERIC Educational Resources Information Center

Monteverde, Kirk

2000-01-01

Application of the statistical techniques of survival analysis and credit scoring to private education loans extended to law students found a pronounced seasoning effect for such loans and the robust predictive power of credit bureau scoring of borrowers. Other predictors of default included school-of-attendance, school's geographic location, and…
Robust kernel representation with statistical local features for face recognition.

PubMed

Yang, Meng; Zhang, Lei; Shiu, Simon Chi-Keung; Zhang, David

2013-06-01

Factors such as misalignment, pose variation, and occlusion make robust face recognition a difficult problem. It is known that statistical features such as local binary pattern are effective for local feature extraction, whereas the recently proposed sparse or collaborative representation-based classification has shown interesting results in robust face recognition. In this paper, we propose a novel robust kernel representation model with statistical local features (SLF) for robust face recognition. Initially, multipartition max pooling is used to enhance the invariance of SLF to image registration error. Then, a kernel-based representation model is proposed to fully exploit the discrimination information embedded in the SLF, and robust regression is adopted to effectively handle the occlusion in face images. Extensive experiments are conducted on benchmark face databases, including extended Yale B, AR (A. Martinez and R. Benavente), multiple pose, illumination, and expression (multi-PIE), facial recognition technology (FERET), face recognition grand challenge (FRGC), and labeled faces in the wild (LFW), which have different variations of lighting, expression, pose, and occlusions, demonstrating the promising performance of the proposed method.
Statistical Analysis of Bus Networks in India

PubMed Central

2016-01-01

In this paper, we model the bus networks of six major Indian cities as graphs in L-space, and evaluate their various statistical properties. While airline and railway networks have been extensively studied, a comprehensive study on the structure and growth of bus networks is lacking. In India, where bus transport plays an important role in day-to-day commutation, it is of significant interest to analyze its topological structure and answer basic questions on its evolution, growth, robustness and resiliency. Although the common feature of small-world property is observed, our analysis reveals a wide spectrum of network topologies arising due to significant variation in the degree-distribution patterns in the networks. We also observe that these networks although, robust and resilient to random attacks are particularly degree-sensitive. Unlike real-world networks, such as Internet, WWW and airline, that are virtual, bus networks are physically constrained. Our findings therefore, throw light on the evolution of such geographically and constrained networks that will help us in designing more efficient bus networks in the future. PMID:27992590
A Simple and Robust Method for Partially Matched Samples Using the P-Values Pooling Approach

PubMed Central

Kuan, Pei Fen; Huang, Bo

2013-01-01

This paper focuses on statistical analyses in scenarios where some samples from the matched pairs design are missing, resulting in partially matched samples. Motivated by the idea of meta-analysis, we recast the partially matched samples as coming from two experimental designs, and propose a simple yet robust approach based on the weighted Z-test to integrate the p-values computed from these two designs. We show that the proposed approach achieves better operating characteristics in simulations and a case study, compared to existing methods for partially matched samples. PMID:23417968
RepExplore: addressing technical replicate variance in proteomics and metabolomics data analysis.

PubMed

Glaab, Enrico; Schneider, Reinhard

2015-07-01

High-throughput omics datasets often contain technical replicates included to account for technical sources of noise in the measurement process. Although summarizing these replicate measurements by using robust averages may help to reduce the influence of noise on downstream data analysis, the information on the variance across the replicate measurements is lost in the averaging process and therefore typically disregarded in subsequent statistical analyses.We introduce RepExplore, a web-service dedicated to exploit the information captured in the technical replicate variance to provide more reliable and informative differential expression and abundance statistics for omics datasets. The software builds on previously published statistical methods, which have been applied successfully to biomedical omics data but are difficult to use without prior experience in programming or scripting. RepExplore facilitates the analysis by providing a fully automated data processing and interactive ranking tables, whisker plot, heat map and principal component analysis visualizations to interpret omics data and derived statistics. Freely available at http://www.repexplore.tk enrico.glaab@uni.lu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.
Parenchymal texture analysis in digital mammography: robust texture feature identification and equivalence across devices.

PubMed

Keller, Brad M; Oustimov, Andrew; Wang, Yan; Chen, Jinbo; Acciavatti, Raymond J; Zheng, Yuanjie; Ray, Shonket; Gee, James C; Maidment, Andrew D A; Kontos, Despina

2015-04-01

An analytical framework is presented for evaluating the equivalence of parenchymal texture features across different full-field digital mammography (FFDM) systems using a physical breast phantom. Phantom images (FOR PROCESSING) are acquired from three FFDM systems using their automated exposure control setting. A panel of texture features, including gray-level histogram, co-occurrence, run length, and structural descriptors, are extracted. To identify features that are robust across imaging systems, a series of equivalence tests are performed on the feature distributions, in which the extent of their intersystem variation is compared to their intrasystem variation via the Hodges-Lehmann test statistic. Overall, histogram and structural features tend to be most robust across all systems, and certain features, such as edge enhancement, tend to be more robust to intergenerational differences between detectors of a single vendor than to intervendor differences. Texture features extracted from larger regions of interest (i.e., [Formula: see text]) and with a larger offset length (i.e., [Formula: see text]), when applicable, also appear to be more robust across imaging systems. This framework and observations from our experiments may benefit applications utilizing mammographic texture analysis on images acquired in multivendor settings, such as in multicenter studies of computer-aided detection and breast cancer risk assessment.

Automatic identification of bacterial types using statistical imaging methods

NASA Astrophysics Data System (ADS)

Trattner, Sigal; Greenspan, Hayit; Tepper, Gapi; Abboud, Shimon

2003-05-01

The objective of the current study is to develop an automatic tool to identify bacterial types using computer-vision and statistical modeling techniques. Bacteriophage (phage)-typing methods are used to identify and extract representative profiles of bacterial types, such as the Staphylococcus Aureus. Current systems rely on the subjective reading of plaque profiles by human expert. This process is time-consuming and prone to errors, especially as technology is enabling the increase in the number of phages used for typing. The statistical methodology presented in this work, provides for an automated, objective and robust analysis of visual data, along with the ability to cope with increasing data volumes.
A robust clustering algorithm for identifying problematic samples in genome-wide association studies.

PubMed

Bellenguez, Céline; Strange, Amy; Freeman, Colin; Donnelly, Peter; Spencer, Chris C A

2012-01-01

High-throughput genotyping arrays provide an efficient way to survey single nucleotide polymorphisms (SNPs) across the genome in large numbers of individuals. Downstream analysis of the data, for example in genome-wide association studies (GWAS), often involves statistical models of genotype frequencies across individuals. The complexities of the sample collection process and the potential for errors in the experimental assay can lead to biases and artefacts in an individual's inferred genotypes. Rather than attempting to model these complications, it has become a standard practice to remove individuals whose genome-wide data differ from the sample at large. Here we describe a simple, but robust, statistical algorithm to identify samples with atypical summaries of genome-wide variation. Its use as a semi-automated quality control tool is demonstrated using several summary statistics, selected to identify different potential problems, and it is applied to two different genotyping platforms and sample collections. The algorithm is written in R and is freely available at www.well.ox.ac.uk/chris-spencer chris.spencer@well.ox.ac.uk Supplementary data are available at Bioinformatics online.
Statistical Control Paradigm for Aerospace Structures Under Impulsive Disturbances

DTIC Science & Technology

2006-08-03

attitude control system with an innovative and robust statistical controller design shows significant promise for use in attitude hold mode operation...indicate that the existing attitude control system with an innovative and robust statistical controller design shows significant promise for use in...and three thrusters are for use in controlling the attitude of the satellite. Then the angular momentum of the satellite with three thrusters and a
Robust statistical methods for impulse noise suppressing of spread spectrum induced polarization data, with application to a mine site, Gansu province, China

NASA Astrophysics Data System (ADS)

Liu, Weiqiang; Chen, Rujun; Cai, Hongzhu; Luo, Weibin

2016-12-01

In this paper, we investigated the robust processing of noisy spread spectrum induced polarization (SSIP) data. SSIP is a new frequency domain induced polarization method that transmits pseudo-random m-sequence as source current where m-sequence is a broadband signal. The potential information at multiple frequencies can be obtained through measurement. Removing the noise is a crucial problem for SSIP data processing. Considering that if the ordinary mean stack and digital filter are not capable of reducing the impulse noise effectively in SSIP data processing, the impact of impulse noise will remain in the complex resistivity spectrum that will affect the interpretation of profile anomalies. We implemented a robust statistical method to SSIP data processing. The robust least-squares regression is used to fit and remove the linear trend from the original data before stacking. The robust M estimate is used to stack the data of all periods. The robust smooth filter is used to suppress the residual noise for data after stacking. For robust statistical scheme, the most appropriate influence function and iterative algorithm are chosen by testing the simulated data to suppress the outliers' influence. We tested the benefits of the robust SSIP data processing using examples of SSIP data recorded in a test site beside a mine in Gansu province, China.
Likert scales, levels of measurement and the "laws" of statistics.

PubMed

Norman, Geoff

2010-12-01

Reviewers of research reports frequently criticize the choice of statistical methods. While some of these criticisms are well-founded, frequently the use of various parametric methods such as analysis of variance, regression, correlation are faulted because: (a) the sample size is too small, (b) the data may not be normally distributed, or (c) The data are from Likert scales, which are ordinal, so parametric statistics cannot be used. In this paper, I dissect these arguments, and show that many studies, dating back to the 1930s consistently show that parametric statistics are robust with respect to violations of these assumptions. Hence, challenges like those above are unfounded, and parametric methods can be utilized without concern for "getting the wrong answer".
Developing appropriate methods for cost-effectiveness analysis of cluster randomized trials.

PubMed

Gomes, Manuel; Ng, Edmond S-W; Grieve, Richard; Nixon, Richard; Carpenter, James; Thompson, Simon G

2012-01-01

Cost-effectiveness analyses (CEAs) may use data from cluster randomized trials (CRTs), where the unit of randomization is the cluster, not the individual. However, most studies use analytical methods that ignore clustering. This article compares alternative statistical methods for accommodating clustering in CEAs of CRTs. Our simulation study compared the performance of statistical methods for CEAs of CRTs with 2 treatment arms. The study considered a method that ignored clustering--seemingly unrelated regression (SUR) without a robust standard error (SE)--and 4 methods that recognized clustering--SUR and generalized estimating equations (GEEs), both with robust SE, a "2-stage" nonparametric bootstrap (TSB) with shrinkage correction, and a multilevel model (MLM). The base case assumed CRTs with moderate numbers of balanced clusters (20 per arm) and normally distributed costs. Other scenarios included CRTs with few clusters, imbalanced cluster sizes, and skewed costs. Performance was reported as bias, root mean squared error (rMSE), and confidence interval (CI) coverage for estimating incremental net benefits (INBs). We also compared the methods in a case study. Each method reported low levels of bias. Without the robust SE, SUR gave poor CI coverage (base case: 0.89 v. nominal level: 0.95). The MLM and TSB performed well in each scenario (CI coverage, 0.92-0.95). With few clusters, the GEE and SUR (with robust SE) had coverage below 0.90. In the case study, the mean INBs were similar across all methods, but ignoring clustering underestimated statistical uncertainty and the value of further research. MLMs and the TSB are appropriate analytical methods for CEAs of CRTs with the characteristics described. SUR and GEE are not recommended for studies with few clusters.
Developing Appropriate Methods for Cost-Effectiveness Analysis of Cluster Randomized Trials

PubMed Central

Gomes, Manuel; Ng, Edmond S.-W.; Nixon, Richard; Carpenter, James; Thompson, Simon G.

2012-01-01

Aim. Cost-effectiveness analyses (CEAs) may use data from cluster randomized trials (CRTs), where the unit of randomization is the cluster, not the individual. However, most studies use analytical methods that ignore clustering. This article compares alternative statistical methods for accommodating clustering in CEAs of CRTs. Methods. Our simulation study compared the performance of statistical methods for CEAs of CRTs with 2 treatment arms. The study considered a method that ignored clustering—seemingly unrelated regression (SUR) without a robust standard error (SE)—and 4 methods that recognized clustering—SUR and generalized estimating equations (GEEs), both with robust SE, a “2-stage” nonparametric bootstrap (TSB) with shrinkage correction, and a multilevel model (MLM). The base case assumed CRTs with moderate numbers of balanced clusters (20 per arm) and normally distributed costs. Other scenarios included CRTs with few clusters, imbalanced cluster sizes, and skewed costs. Performance was reported as bias, root mean squared error (rMSE), and confidence interval (CI) coverage for estimating incremental net benefits (INBs). We also compared the methods in a case study. Results. Each method reported low levels of bias. Without the robust SE, SUR gave poor CI coverage (base case: 0.89 v. nominal level: 0.95). The MLM and TSB performed well in each scenario (CI coverage, 0.92–0.95). With few clusters, the GEE and SUR (with robust SE) had coverage below 0.90. In the case study, the mean INBs were similar across all methods, but ignoring clustering underestimated statistical uncertainty and the value of further research. Conclusions. MLMs and the TSB are appropriate analytical methods for CEAs of CRTs with the characteristics described. SUR and GEE are not recommended for studies with few clusters. PMID:22016450
Improving tablet coating robustness by selecting critical process parameters from retrospective data.

PubMed

Galí, A; García-Montoya, E; Ascaso, M; Pérez-Lozano, P; Ticó, J R; Miñarro, M; Suñé-Negre, J M

2016-09-01

Although tablet coating processes are widely used in the pharmaceutical industry, they often lack adequate robustness. Up-scaling can be challenging as minor changes in parameters can lead to varying quality results. To select critical process parameters (CPP) using retrospective data of a commercial product and to establish a design of experiments (DoE) that would improve the robustness of the coating process. A retrospective analysis of data from 36 commercial batches. Batches were selected based on the quality results generated during batch release, some of which revealed quality deviations concerning the appearance of the coated tablets. The product is already marketed and belongs to the portfolio of a multinational pharmaceutical company. The Statgraphics 5.1 software was used for data processing to determine critical process parameters in order to propose new working ranges. This study confirms that it is possible to determine the critical process parameters and create design spaces based on retrospective data of commercial batches. This type of analysis is thus converted into a tool to optimize the robustness of existing processes. Our results show that a design space can be established with minimum investment in experiments, since current commercial batch data are processed statistically.
A Robust New Method for Analzing Community Change and an Example using 83 years of Avian Response to Forest Succession

EPA Science Inventory

This manuscript describes a novel statistical analysis technique developed by the authors for use in combining survey data carried out under different field protocols. We apply the technique to 83 years of survey data on avian songbird populations in northern lower Michigan to de...
Smoking and Cancers: Case-Robust Analysis of a Classic Data Set

ERIC Educational Resources Information Center

Bentler, Peter M.; Satorra, Albert; Yuan, Ke-Hai

2009-01-01

A typical structural equation model is intended to reproduce the means, variances, and correlations or covariances among a set of variables based on parameter estimates of a highly restricted model. It is not widely appreciated that the sample statistics being modeled can be quite sensitive to outliers and influential observations, leading to bias…
The Performance of Methods to Test Upper-Level Mediation in the Presence of Nonnormal Data

ERIC Educational Resources Information Center

Pituch, Keenan A.; Stapleton, Laura M.

2008-01-01

A Monte Carlo study compared the statistical performance of standard and robust multilevel mediation analysis methods to test indirect effects for a cluster randomized experimental design under various departures from normality. The performance of these methods was examined for an upper-level mediation process, where the indirect effect is a fixed…
The comparison of robust partial least squares regression with robust principal component regression on a real

NASA Astrophysics Data System (ADS)

Polat, Esra; Gunay, Suleyman

2013-10-01

One of the problems encountered in Multiple Linear Regression (MLR) is multicollinearity, which causes the overestimation of the regression parameters and increase of the variance of these parameters. Hence, in case of multicollinearity presents, biased estimation procedures such as classical Principal Component Regression (CPCR) and Partial Least Squares Regression (PLSR) are then performed. SIMPLS algorithm is the leading PLSR algorithm because of its speed, efficiency and results are easier to interpret. However, both of the CPCR and SIMPLS yield very unreliable results when the data set contains outlying observations. Therefore, Hubert and Vanden Branden (2003) have been presented a robust PCR (RPCR) method and a robust PLSR (RPLSR) method called RSIMPLS. In RPCR, firstly, a robust Principal Component Analysis (PCA) method for high-dimensional data on the independent variables is applied, then, the dependent variables are regressed on the scores using a robust regression method. RSIMPLS has been constructed from a robust covariance matrix for high-dimensional data and robust linear regression. The purpose of this study is to show the usage of RPCR and RSIMPLS methods on an econometric data set, hence, making a comparison of two methods on an inflation model of Turkey. The considered methods have been compared in terms of predictive ability and goodness of fit by using a robust Root Mean Squared Error of Cross-validation (R-RMSECV), a robust R2 value and Robust Component Selection (RCS) statistic.
A Review of Some Aspects of Robust Inference for Time Series.

DTIC Science & Technology

1984-09-01

REVIEW OF SOME ASPECTSOF ROBUST INFERNCE FOR TIME SERIES by Ad . Dougla Main TE "iAL REPOW No. 63 Septermber 1984 Department of Statistics University of ...clear. One cannot hope to have a good method for dealing with outliers in time series by using only an instantaneous nonlinear transformation of the data...AI.49 716 A REVIEWd OF SOME ASPECTS OF ROBUST INFERENCE FOR TIME 1/1 SERIES(U) WASHINGTON UNIV SEATTLE DEPT OF STATISTICS R D MARTIN SEP 84 TR-53
The other half of the story: effect size analysis in quantitative research.

PubMed

Maher, Jessica Middlemis; Markey, Jonathan C; Ebert-May, Diane

2013-01-01

Statistical significance testing is the cornerstone of quantitative research, but studies that fail to report measures of effect size are potentially missing a robust part of the analysis. We provide a rationale for why effect size measures should be included in quantitative discipline-based education research. Examples from both biological and educational research demonstrate the utility of effect size for evaluating practical significance. We also provide details about some effect size indices that are paired with common statistical significance tests used in educational research and offer general suggestions for interpreting effect size measures. Finally, we discuss some inherent limitations of effect size measures and provide further recommendations about reporting confidence intervals.
Missing Value Monitoring Enhances the Robustness in Proteomics Quantitation.

PubMed

Matafora, Vittoria; Corno, Andrea; Ciliberto, Andrea; Bachi, Angela

2017-04-07

In global proteomic analysis, it is estimated that proteins span from millions to less than 100 copies per cell. The challenge of protein quantitation by classic shotgun proteomic techniques relies on the presence of missing values in peptides belonging to low-abundance proteins that lowers intraruns reproducibility affecting postdata statistical analysis. Here, we present a new analytical workflow MvM (missing value monitoring) able to recover quantitation of missing values generated by shotgun analysis. In particular, we used confident data-dependent acquisition (DDA) quantitation only for proteins measured in all the runs, while we filled the missing values with data-independent acquisition analysis using the library previously generated in DDA. We analyzed cell cycle regulated proteins, as they are low abundance proteins with highly dynamic expression levels. Indeed, we found that cell cycle related proteins are the major components of the missing values-rich proteome. Using the MvM workflow, we doubled the number of robustly quantified cell cycle related proteins, and we reduced the number of missing values achieving robust quantitation for proteins over ∼50 molecules per cell. MvM allows lower quantification variance among replicates for low abundance proteins with respect to DDA analysis, which demonstrates the potential of this novel workflow to measure low abundance, dynamically regulated proteins.
Super-delta: a new differential gene expression analysis procedure with robust data normalization.

PubMed

Liu, Yuhang; Zhang, Jinfeng; Qiu, Xing

2017-12-21

Normalization is an important data preparation step in gene expression analyses, designed to remove various systematic noise. Sample variance is greatly reduced after normalization, hence the power of subsequent statistical analyses is likely to increase. On the other hand, variance reduction is made possible by borrowing information across all genes, including differentially expressed genes (DEGs) and outliers, which will inevitably introduce some bias. This bias typically inflates type I error; and can reduce statistical power in certain situations. In this study we propose a new differential expression analysis pipeline, dubbed as super-delta, that consists of a multivariate extension of the global normalization and a modified t-test. A robust procedure is designed to minimize the bias introduced by DEGs in the normalization step. The modified t-test is derived based on asymptotic theory for hypothesis testing that suitably pairs with the proposed robust normalization. We first compared super-delta with four commonly used normalization methods: global, median-IQR, quantile, and cyclic loess normalization in simulation studies. Super-delta was shown to have better statistical power with tighter control of type I error rate than its competitors. In many cases, the performance of super-delta is close to that of an oracle test in which datasets without technical noise were used. We then applied all methods to a collection of gene expression datasets on breast cancer patients who received neoadjuvant chemotherapy. While there is a substantial overlap of the DEGs identified by all of them, super-delta were able to identify comparatively more DEGs than its competitors. Downstream gene set enrichment analysis confirmed that all these methods selected largely consistent pathways. Detailed investigations on the relatively small differences showed that pathways identified by super-delta have better connections to breast cancer than other methods. As a new pipeline, super-delta provides new insights to the area of differential gene expression analysis. Solid theoretical foundation supports its asymptotic unbiasedness and technical noise-free properties. Implementation on real and simulated datasets demonstrates its decent performance compared with state-of-art procedures. It also has the potential of expansion to be incorporated with other data type and/or more general between-group comparison problems.
Graphical augmentations to the funnel plot assess the impact of additional evidence on a meta-analysis.

PubMed

Langan, Dean; Higgins, Julian P T; Gregory, Walter; Sutton, Alexander J

2012-05-01

We aim to illustrate the potential impact of a new study on a meta-analysis, which gives an indication of the robustness of the meta-analysis. A number of augmentations are proposed to one of the most widely used of graphical displays, the funnel plot. Namely, 1) statistical significance contours, which define regions of the funnel plot in which a new study would have to be located to change the statistical significance of the meta-analysis; and 2) heterogeneity contours, which show how a new study would affect the extent of heterogeneity in a given meta-analysis. Several other features are also described, and the use of multiple features simultaneously is considered. The statistical significance contours suggest that one additional study, no matter how large, may have a very limited impact on the statistical significance of a meta-analysis. The heterogeneity contours illustrate that one outlying study can increase the level of heterogeneity dramatically. The additional features of the funnel plot have applications including 1) informing sample size calculations for the design of future studies eligible for inclusion in the meta-analysis; and 2) informing the updating prioritization of a portfolio of meta-analyses such as those prepared by the Cochrane Collaboration. Copyright Â© 2012 Elsevier Inc. All rights reserved.
Goodness-of-fit tests and model diagnostics for negative binomial regression of RNA sequencing data.

PubMed

Mi, Gu; Di, Yanming; Schafer, Daniel W

2015-01-01

This work is about assessing model adequacy for negative binomial (NB) regression, particularly (1) assessing the adequacy of the NB assumption, and (2) assessing the appropriateness of models for NB dispersion parameters. Tools for the first are appropriate for NB regression generally; those for the second are primarily intended for RNA sequencing (RNA-Seq) data analysis. The typically small number of biological samples and large number of genes in RNA-Seq analysis motivate us to address the trade-offs between robustness and statistical power using NB regression models. One widely-used power-saving strategy, for example, is to assume some commonalities of NB dispersion parameters across genes via simple models relating them to mean expression rates, and many such models have been proposed. As RNA-Seq analysis is becoming ever more popular, it is appropriate to make more thorough investigations into power and robustness of the resulting methods, and into practical tools for model assessment. In this article, we propose simulation-based statistical tests and diagnostic graphics to address model adequacy. We provide simulated and real data examples to illustrate that our proposed methods are effective for detecting the misspecification of the NB mean-variance relationship as well as judging the adequacy of fit of several NB dispersion models.
metaCCA: summary statistics-based multivariate meta-analysis of genome-wide association studies using canonical correlation analysis.

PubMed

Cichonska, Anna; Rousu, Juho; Marttinen, Pekka; Kangas, Antti J; Soininen, Pasi; Lehtimäki, Terho; Raitakari, Olli T; Järvelin, Marjo-Riitta; Salomaa, Veikko; Ala-Korpela, Mika; Ripatti, Samuli; Pirinen, Matti

2016-07-01

A dominant approach to genetic association studies is to perform univariate tests between genotype-phenotype pairs. However, analyzing related traits together increases statistical power, and certain complex associations become detectable only when several variants are tested jointly. Currently, modest sample sizes of individual cohorts, and restricted availability of individual-level genotype-phenotype data across the cohorts limit conducting multivariate tests. We introduce metaCCA, a computational framework for summary statistics-based analysis of a single or multiple studies that allows multivariate representation of both genotype and phenotype. It extends the statistical technique of canonical correlation analysis to the setting where original individual-level records are not available, and employs a covariance shrinkage algorithm to achieve robustness.Multivariate meta-analysis of two Finnish studies of nuclear magnetic resonance metabolomics by metaCCA, using standard univariate output from the program SNPTEST, shows an excellent agreement with the pooled individual-level analysis of original data. Motivated by strong multivariate signals in the lipid genes tested, we envision that multivariate association testing using metaCCA has a great potential to provide novel insights from already published summary statistics from high-throughput phenotyping technologies. Code is available at https://github.com/aalto-ics-kepaco anna.cichonska@helsinki.fi or matti.pirinen@helsinki.fi Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
metaCCA: summary statistics-based multivariate meta-analysis of genome-wide association studies using canonical correlation analysis

PubMed Central

Cichonska, Anna; Rousu, Juho; Marttinen, Pekka; Kangas, Antti J.; Soininen, Pasi; Lehtimäki, Terho; Raitakari, Olli T.; Järvelin, Marjo-Riitta; Salomaa, Veikko; Ala-Korpela, Mika; Ripatti, Samuli; Pirinen, Matti

2016-01-01

Motivation: A dominant approach to genetic association studies is to perform univariate tests between genotype-phenotype pairs. However, analyzing related traits together increases statistical power, and certain complex associations become detectable only when several variants are tested jointly. Currently, modest sample sizes of individual cohorts, and restricted availability of individual-level genotype-phenotype data across the cohorts limit conducting multivariate tests. Results: We introduce metaCCA, a computational framework for summary statistics-based analysis of a single or multiple studies that allows multivariate representation of both genotype and phenotype. It extends the statistical technique of canonical correlation analysis to the setting where original individual-level records are not available, and employs a covariance shrinkage algorithm to achieve robustness. Multivariate meta-analysis of two Finnish studies of nuclear magnetic resonance metabolomics by metaCCA, using standard univariate output from the program SNPTEST, shows an excellent agreement with the pooled individual-level analysis of original data. Motivated by strong multivariate signals in the lipid genes tested, we envision that multivariate association testing using metaCCA has a great potential to provide novel insights from already published summary statistics from high-throughput phenotyping technologies. Availability and implementation: Code is available at https://github.com/aalto-ics-kepaco Contacts: anna.cichonska@helsinki.fi or matti.pirinen@helsinki.fi Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27153689

Restoration of MRI Data for Field Nonuniformities using High Order Neighborhood Statistics

PubMed Central

Hadjidemetriou, Stathis; Studholme, Colin; Mueller, Susanne; Weiner, Michael; Schuff, Norbert

2007-01-01

MRI at high magnetic fields (> 3.0 T ) is complicated by strong inhomogeneous radio-frequency fields, sometimes termed the “bias field”. These lead to nonuniformity of image intensity, greatly complicating further analysis such as registration and segmentation. Existing methods for bias field correction are effective for 1.5 T or 3.0 T MRI, but are not completely satisfactory for higher field data. This paper develops an effective bias field correction for high field MRI based on the assumption that the nonuniformity is smoothly varying in space. Also, nonuniformity is quantified and unmixed using high order neighborhood statistics of intensity cooccurrences. They are computed within spherical windows of limited size over the entire image. The restoration is iterative and makes use of a novel stable stopping criterion that depends on the scaled entropy of the cooccurrence statistics, which is a non monotonic function of the iterations; the Shannon entropy of the cooccurrence statistics normalized to the effective dynamic range of the image. The algorithm restores whole head data, is robust to intense nonuniformities present in high field acquisitions, and is robust to variations in anatomy. This algorithm significantly improves bias field correction in comparison to N3 on phantom 1.5 T head data and high field 4 T human head data. PMID:18193095
ROBUST: an interactive FORTRAN-77 package for exploratory data analysis using parametric, ROBUST and nonparametric location and scale estimates, data transformations, normality tests, and outlier assessment

NASA Astrophysics Data System (ADS)

Rock, N. M. S.

ROBUST calculates 53 statistics, plus significance levels for 6 hypothesis tests, on each of up to 52 variables. These together allow the following properties of the data distribution for each variable to be examined in detail: (1) Location. Three means (arithmetic, geometric, harmonic) are calculated, together with the midrange and 19 high-performance robust L-, M-, and W-estimates of location (combined, adaptive, trimmed estimates, etc.) (2) Scale. The standard deviation is calculated along with the H-spread/2 (≈ semi-interquartile range), the mean and median absolute deviations from both mean and median, and a biweight scale estimator. The 23 location and 6 scale estimators programmed cover all possible degrees of robustness. (3) Normality: Distributions are tested against the null hypothesis that they are normal, using the 3rd (√ h1) and 4th ( b 2) moments, Geary's ratio (mean deviation/standard deviation), Filliben's probability plot correlation coefficient, and a more robust test based on the biweight scale estimator. These statistics collectively are sensitive to most usual departures from normality. (4) Presence of outliers. The maximum and minimum values are assessed individually or jointly using Grubbs' maximum Studentized residuals, Harvey's and Dixon's criteria, and the Studentized range. For a single input variable, outliers can be either winsorized or eliminated and all estimates recalculated iteratively as desired. The following data-transformations also can be applied: linear, log 10, generalized Box Cox power (including log, reciprocal, and square root), exponentiation, and standardization. For more than one variable, all results are tabulated in a single run of ROBUST. Further options are incorporated to assess ratios (of two variables) as well as discrete variables, and be concerned with missing data. Cumulative S-plots (for assessing normality graphically) also can be generated. The mutual consistency or inconsistency of all these measures helps to detect errors in data as well as to assess data-distributions themselves.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Bennett, Janine Camille; Thompson, David; Pebay, Philippe Pierre

Statistical analysis is typically used to reduce the dimensionality of and infer meaning from data. A key challenge of any statistical analysis package aimed at large-scale, distributed data is to address the orthogonal issues of parallel scalability and numerical stability. Many statistical techniques, e.g., descriptive statistics or principal component analysis, are based on moments and co-moments and, using robust online update formulas, can be computed in an embarrassingly parallel manner, amenable to a map-reduce style implementation. In this paper we focus on contingency tables, through which numerous derived statistics such as joint and marginal probability, point-wise mutual information, information entropy,more » and {chi}{sup 2} independence statistics can be directly obtained. However, contingency tables can become large as data size increases, requiring a correspondingly large amount of communication between processors. This potential increase in communication prevents optimal parallel speedup and is the main difference with moment-based statistics (which we discussed in [1]) where the amount of inter-processor communication is independent of data size. Here we present the design trade-offs which we made to implement the computation of contingency tables in parallel. We also study the parallel speedup and scalability properties of our open source implementation. In particular, we observe optimal speed-up and scalability when the contingency statistics are used in their appropriate context, namely, when the data input is not quasi-diffuse.« less
A Baseline for the Multivariate Comparison of Resting-State Networks

PubMed Central

Allen, Elena A.; Erhardt, Erik B.; Damaraju, Eswar; Gruner, William; Segall, Judith M.; Silva, Rogers F.; Havlicek, Martin; Rachakonda, Srinivas; Fries, Jill; Kalyanam, Ravi; Michael, Andrew M.; Caprihan, Arvind; Turner, Jessica A.; Eichele, Tom; Adelsheim, Steven; Bryan, Angela D.; Bustillo, Juan; Clark, Vincent P.; Feldstein Ewing, Sarah W.; Filbey, Francesca; Ford, Corey C.; Hutchison, Kent; Jung, Rex E.; Kiehl, Kent A.; Kodituwakku, Piyadasa; Komesu, Yuko M.; Mayer, Andrew R.; Pearlson, Godfrey D.; Phillips, John P.; Sadek, Joseph R.; Stevens, Michael; Teuscher, Ursina; Thoma, Robert J.; Calhoun, Vince D.

2011-01-01

As the size of functional and structural MRI datasets expands, it becomes increasingly important to establish a baseline from which diagnostic relevance may be determined, a processing strategy that efficiently prepares data for analysis, and a statistical approach that identifies important effects in a manner that is both robust and reproducible. In this paper, we introduce a multivariate analytic approach that optimizes sensitivity and reduces unnecessary testing. We demonstrate the utility of this mega-analytic approach by identifying the effects of age and gender on the resting-state networks (RSNs) of 603 healthy adolescents and adults (mean age: 23.4 years, range: 12–71 years). Data were collected on the same scanner, preprocessed using an automated analysis pipeline based in SPM, and studied using group independent component analysis. RSNs were identified and evaluated in terms of three primary outcome measures: time course spectral power, spatial map intensity, and functional network connectivity. Results revealed robust effects of age on all three outcome measures, largely indicating decreases in network coherence and connectivity with increasing age. Gender effects were of smaller magnitude but suggested stronger intra-network connectivity in females and more inter-network connectivity in males, particularly with regard to sensorimotor networks. These findings, along with the analysis approach and statistical framework described here, provide a useful baseline for future investigations of brain networks in health and disease. PMID:21442040
Enhancing efficiency and quality of statistical estimation of immunogenicity assay cut points through standardization and automation.

PubMed

Su, Cheng; Zhou, Lei; Hu, Zheng; Weng, Winnie; Subramani, Jayanthi; Tadkod, Vineet; Hamilton, Kortney; Bautista, Ami; Wu, Yu; Chirmule, Narendra; Zhong, Zhandong Don

2015-10-01

Biotherapeutics can elicit immune responses, which can alter the exposure, safety, and efficacy of the therapeutics. A well-designed and robust bioanalytical method is critical for the detection and characterization of relevant anti-drug antibody (ADA) and the success of an immunogenicity study. As a fundamental criterion in immunogenicity testing, assay cut points need to be statistically established with a risk-based approach to reduce subjectivity. This manuscript describes the development of a validated, web-based, multi-tier customized assay statistical tool (CAST) for assessing cut points of ADA assays. The tool provides an intuitive web interface that allows users to import experimental data generated from a standardized experimental design, select the assay factors, run the standardized analysis algorithms, and generate tables, figures, and listings (TFL). It allows bioanalytical scientists to perform complex statistical analysis at a click of the button to produce reliable assay parameters in support of immunogenicity studies. Copyright © 2015 Elsevier B.V. All rights reserved.
Methods for Assessment of Memory Reactivation.

PubMed

Liu, Shizhao; Grosmark, Andres D; Chen, Zhe

2018-04-13

It has been suggested that reactivation of previously acquired experiences or stored information in declarative memories in the hippocampus and neocortex contributes to memory consolidation and learning. Understanding memory consolidation depends crucially on the development of robust statistical methods for assessing memory reactivation. To date, several statistical methods have seen established for assessing memory reactivation based on bursts of ensemble neural spike activity during offline states. Using population-decoding methods, we propose a new statistical metric, the weighted distance correlation, to assess hippocampal memory reactivation (i.e., spatial memory replay) during quiet wakefulness and slow-wave sleep. The new metric can be combined with an unsupervised population decoding analysis, which is invariant to latent state labeling and allows us to detect statistical dependency beyond linearity in memory traces. We validate the new metric using two rat hippocampal recordings in spatial navigation tasks. Our proposed analysis framework may have a broader impact on assessing memory reactivations in other brain regions under different behavioral tasks.
Dark Energy Survey Year 1 Results: Multi-Probe Methodology and Simulated Likelihood Analyses

DOE Office of Scientific and Technical Information (OSTI.GOV)

Krause, E.; et al.

We present the methodology for and detail the implementation of the Dark Energy Survey (DES) 3x2pt DES Year 1 (Y1) analysis, which combines configuration-space two-point statistics from three different cosmological probes: cosmic shear, galaxy-galaxy lensing, and galaxy clustering, using data from the first year of DES observations. We have developed two independent modeling pipelines and describe the code validation process. We derive expressions for analytical real-space multi-probe covariances, and describe their validation with numerical simulations. We stress-test the inference pipelines in simulated likelihood analyses that vary 6-7 cosmology parameters plus 20 nuisance parameters and precisely resemble the analysis to be presented in the DES 3x2pt analysis paper, using a variety of simulated input data vectors with varying assumptions. We find that any disagreement between pipelines leads to changes in assigned likelihoodmore » $$\\Delta \\chi^2 \\le 0.045$$ with respect to the statistical error of the DES Y1 data vector. We also find that angular binning and survey mask do not impact our analytic covariance at a significant level. We determine lower bounds on scales used for analysis of galaxy clustering (8 Mpc$$~h^{-1}$$) and galaxy-galaxy lensing (12 Mpc$$~h^{-1}$$) such that the impact of modeling uncertainties in the non-linear regime is well below statistical errors, and show that our analysis choices are robust against a variety of systematics. These tests demonstrate that we have a robust analysis pipeline that yields unbiased cosmological parameter inferences for the flagship 3x2pt DES Y1 analysis. We emphasize that the level of independent code development and subsequent code comparison as demonstrated in this paper is necessary to produce credible constraints from increasingly complex multi-probe analyses of current data.« less
Modeling and replicating statistical topology and evidence for CMB nonhomogeneity

PubMed Central

Agami, Sarit

2017-01-01

Under the banner of “big data,” the detection and classification of structure in extremely large, high-dimensional, data sets are two of the central statistical challenges of our times. Among the most intriguing new approaches to this challenge is “TDA,” or “topological data analysis,” one of the primary aims of which is providing nonmetric, but topologically informative, preanalyses of data which make later, more quantitative, analyses feasible. While TDA rests on strong mathematical foundations from topology, in applications, it has faced challenges due to difficulties in handling issues of statistical reliability and robustness, often leading to an inability to make scientific claims with verifiable levels of statistical confidence. We propose a methodology for the parametric representation, estimation, and replication of persistence diagrams, the main diagnostic tool of TDA. The power of the methodology lies in the fact that even if only one persistence diagram is available for analysis—the typical case for big data applications—the replications permit conventional statistical hypothesis testing. The methodology is conceptually simple and computationally practical, and provides a broadly effective statistical framework for persistence diagram TDA analysis. We demonstrate the basic ideas on a toy example, and the power of the parametric approach to TDA modeling in an analysis of cosmic microwave background (CMB) nonhomogeneity. PMID:29078301
The Stability of Rankings Derived from Composite Indicators: Analysis of the "IL Sole 24 Ore" Quality of Life Report

ERIC Educational Resources Information Center

Lun, G.; Holzer, D.; Tappeiner, G.; Tappeiner, U.

2006-01-01

The calculation of composite indicators and the derivation of respective rankings is a common method used to benchmark countries or regions. However, although the statistical robustness of these rankings is often criticised, they often still spark off heated political debate. Here, we assess the sensitivity of the province ranking published by the…
The Adequacy of Different Robust Statistical Tests in Comparing Two Independent Groups

ERIC Educational Resources Information Center

Pero-Cebollero, Maribel; Guardia-Olmos, Joan

2013-01-01

In the current study, we evaluated various robust statistical methods for comparing two independent groups. Two scenarios for simulation were generated: one of equality and another of population mean differences. In each of the scenarios, 33 experimental conditions were used as a function of sample size, standard deviation and asymmetry. For each…
Comparison of beam position calculation methods for application in digital acquisition systems

NASA Astrophysics Data System (ADS)

Reiter, A.; Singh, R.

2018-05-01

Different approaches to the data analysis of beam position monitors in hadron accelerators are compared adopting the perspective of an analog-to-digital converter in a sampling acquisition system. Special emphasis is given to position uncertainty and robustness against bias and interference that may be encountered in an accelerator environment. In a time-domain analysis of data in the presence of statistical noise, the position calculation based on the difference-over-sum method with algorithms like signal integral or power can be interpreted as a least-squares analysis of a corresponding fit function. This link to the least-squares method is exploited in the evaluation of analysis properties and in the calculation of position uncertainty. In an analytical model and experimental evaluations the positions derived from a straight line fit or equivalently the standard deviation are found to be the most robust and to offer the least variance. The measured position uncertainty is consistent with the model prediction in our experiment, and the results of tune measurements improve significantly.
Detecting the contagion effect in mass killings; a constructive example of the statistical advantages of unbinned likelihood methods.

PubMed

Towers, Sherry; Mubayi, Anuj; Castillo-Chavez, Carlos

2018-01-01

When attempting to statistically distinguish between a null and an alternative hypothesis, many researchers in the life and social sciences turn to binned statistical analysis methods, or methods that are simply based on the moments of a distribution (such as the mean, and variance). These methods have the advantage of simplicity of implementation, and simplicity of explanation. However, when null and alternative hypotheses manifest themselves in subtle differences in patterns in the data, binned analysis methods may be insensitive to these differences, and researchers may erroneously fail to reject the null hypothesis when in fact more sensitive statistical analysis methods might produce a different result when the null hypothesis is actually false. Here, with a focus on two recent conflicting studies of contagion in mass killings as instructive examples, we discuss how the use of unbinned likelihood methods makes optimal use of the information in the data; a fact that has been long known in statistical theory, but perhaps is not as widely appreciated amongst general researchers in the life and social sciences. In 2015, Towers et al published a paper that quantified the long-suspected contagion effect in mass killings. However, in 2017, Lankford & Tomek subsequently published a paper, based upon the same data, that claimed to contradict the results of the earlier study. The former used unbinned likelihood methods, and the latter used binned methods, and comparison of distribution moments. Using these analyses, we also discuss how visualization of the data can aid in determination of the most appropriate statistical analysis methods to distinguish between a null and alternate hypothesis. We also discuss the importance of assessment of the robustness of analysis results to methodological assumptions made (for example, arbitrary choices of number of bins and bin widths when using binned methods); an issue that is widely overlooked in the literature, but is critical to analysis reproducibility and robustness. When an analysis cannot distinguish between a null and alternate hypothesis, care must be taken to ensure that the analysis methodology itself maximizes the use of information in the data that can distinguish between the two hypotheses. The use of binned methods by Lankford & Tomek (2017), that examined how many mass killings fell within a 14 day window from a previous mass killing, substantially reduced the sensitivity of their analysis to contagion effects. The unbinned likelihood methods used by Towers et al (2015) did not suffer from this problem. While a binned analysis might be favorable for simplicity and clarity of presentation, unbinned likelihood methods are preferable when effects might be somewhat subtle.
Detecting the contagion effect in mass killings; a constructive example of the statistical advantages of unbinned likelihood methods

PubMed Central

Mubayi, Anuj; Castillo-Chavez, Carlos

2018-01-01

Background When attempting to statistically distinguish between a null and an alternative hypothesis, many researchers in the life and social sciences turn to binned statistical analysis methods, or methods that are simply based on the moments of a distribution (such as the mean, and variance). These methods have the advantage of simplicity of implementation, and simplicity of explanation. However, when null and alternative hypotheses manifest themselves in subtle differences in patterns in the data, binned analysis methods may be insensitive to these differences, and researchers may erroneously fail to reject the null hypothesis when in fact more sensitive statistical analysis methods might produce a different result when the null hypothesis is actually false. Here, with a focus on two recent conflicting studies of contagion in mass killings as instructive examples, we discuss how the use of unbinned likelihood methods makes optimal use of the information in the data; a fact that has been long known in statistical theory, but perhaps is not as widely appreciated amongst general researchers in the life and social sciences. Methods In 2015, Towers et al published a paper that quantified the long-suspected contagion effect in mass killings. However, in 2017, Lankford & Tomek subsequently published a paper, based upon the same data, that claimed to contradict the results of the earlier study. The former used unbinned likelihood methods, and the latter used binned methods, and comparison of distribution moments. Using these analyses, we also discuss how visualization of the data can aid in determination of the most appropriate statistical analysis methods to distinguish between a null and alternate hypothesis. We also discuss the importance of assessment of the robustness of analysis results to methodological assumptions made (for example, arbitrary choices of number of bins and bin widths when using binned methods); an issue that is widely overlooked in the literature, but is critical to analysis reproducibility and robustness. Conclusions When an analysis cannot distinguish between a null and alternate hypothesis, care must be taken to ensure that the analysis methodology itself maximizes the use of information in the data that can distinguish between the two hypotheses. The use of binned methods by Lankford & Tomek (2017), that examined how many mass killings fell within a 14 day window from a previous mass killing, substantially reduced the sensitivity of their analysis to contagion effects. The unbinned likelihood methods used by Towers et al (2015) did not suffer from this problem. While a binned analysis might be favorable for simplicity and clarity of presentation, unbinned likelihood methods are preferable when effects might be somewhat subtle. PMID:29742115
Performance of Between-Study Heterogeneity Measures in the Cochrane Library.

PubMed

Ma, Xiaoyue; Lin, Lifeng; Qu, Zhiyong; Zhu, Motao; Chu, Haitao

2018-05-29

The growth in comparative effectiveness research and evidence-based medicine has increased attention to systematic reviews and meta-analyses. Meta-analysis synthesizes and contrasts evidence from multiple independent studies to improve statistical efficiency and reduce bias. Assessing heterogeneity is critical for performing a meta-analysis and interpreting results. As a widely used heterogeneity measure, the I statistic quantifies the proportion of total variation across studies that is due to real differences in effect size. The presence of outlying studies can seriously exaggerate the I statistic. Two alternative heterogeneity measures, the Ir and Im, have been recently proposed to reduce the impact of outlying studies. To evaluate these measures' performance empirically, we applied them to 20,599 meta-analyses in the Cochrane Library. We found that the Ir and Im have strong agreement with the I, while they are more robust than the I when outlying studies appear.
Better P-curves: Making P-curve analysis more robust to errors, fraud, and ambitious P-hacking, a Reply to Ulrich and Miller (2015).

PubMed

Simonsohn, Uri; Simmons, Joseph P; Nelson, Leif D

2015-12-01

When studies examine true effects, they generate right-skewed p-curves, distributions of statistically significant results with more low (.01 s) than high (.04 s) p values. What else can cause a right-skewed p-curve? First, we consider the possibility that researchers report only the smallest significant p value (as conjectured by Ulrich & Miller, 2015), concluding that it is a very uncommon problem. We then consider more common problems, including (a) p-curvers selecting the wrong p values, (b) fake data, (c) honest errors, and (d) ambitiously p-hacked (beyond p < .05) results. We evaluate the impact of these common problems on the validity of p-curve analysis, and provide practical solutions that substantially increase its robustness. (c) 2015 APA, all rights reserved).
Distributed video data fusion and mining

NASA Astrophysics Data System (ADS)

Chang, Edward Y.; Wang, Yuan-Fang; Rodoplu, Volkan

2004-09-01

This paper presents an event sensing paradigm for intelligent event-analysis in a wireless, ad hoc, multi-camera, video surveillance system. In particilar, we present statistical methods that we have developed to support three aspects of event sensing: 1) energy-efficient, resource-conserving, and robust sensor data fusion and analysis, 2) intelligent event modeling and recognition, and 3) rapid deployment, dynamic configuration, and continuous operation of the camera networks. We outline our preliminary results, and discuss future directions that research might take.
Tolerancing aspheres based on manufacturing statistics

NASA Astrophysics Data System (ADS)

Wickenhagen, S.; Möhl, A.; Fuchs, U.

2017-11-01

A standard way of tolerancing optical elements or systems is to perform a Monte Carlo based analysis within a common optical design software package. Although, different weightings and distributions are assumed they are all counting on statistics, which usually means several hundreds or thousands of systems for reliable results. Thus, employing these methods for small batch sizes is unreliable, especially when aspheric surfaces are involved. The huge database of asphericon was used to investigate the correlation between the given tolerance values and measured data sets. The resulting probability distributions of these measured data were analyzed aiming for a robust optical tolerancing process.
Linguistic Analysis of the Human Heartbeat Using Frequency and Rank Order Statistics

NASA Astrophysics Data System (ADS)

Yang, Albert C.-C.; Hseu, Shu-Shya; Yien, Huey-Wen; Goldberger, Ary L.; Peng, C.-K.

2003-03-01

Complex physiologic signals may carry unique dynamical signatures that are related to their underlying mechanisms. We present a method based on rank order statistics of symbolic sequences to investigate the profile of different types of physiologic dynamics. We apply this method to heart rate fluctuations, the output of a central physiologic control system. The method robustly discriminates patterns generated from healthy and pathologic states, as well as aging. Furthermore, we observe increased randomness in the heartbeat time series with physiologic aging and pathologic states and also uncover nonrandom patterns in the ventricular response to atrial fibrillation.
Gender discrimination and prediction on the basis of facial metric information.

PubMed

Fellous, J M

1997-07-01

Horizontal and vertical facial measurements are statistically independent. Discriminant analysis shows that five of such normalized distances explain over 95% of the gender differences of "training" samples and predict the gender of 90% novel test faces exhibiting various facial expressions. The robustness of the method and its results are assessed. It is argued that these distances (termed fiducial) are compatible with those found experimentally by psychophysical and neurophysiological studies. In consequence, partial explanations for the effects observed in these experiments can be found in the intrinsic statistical nature of the facial stimuli used.
Analysis tools for discovering strong parity violation at hadron colliders

NASA Astrophysics Data System (ADS)

Backović, Mihailo; Ralston, John P.

2011-07-01

Several arguments suggest parity violation may be observable in high energy strong interactions. We introduce new analysis tools to describe the azimuthal dependence of multiparticle distributions, or “azimuthal flow.” Analysis uses the representations of the orthogonal group O(2) and dihedral groups DN necessary to define parity completely in two dimensions. Classification finds that collective angles used in event-by-event statistics represent inequivalent tensor observables that cannot generally be represented by a single “reaction plane.” Many new parity-violating observables exist that have never been measured, while many parity-conserving observables formerly lumped together are now distinguished. We use the concept of “event-shape sorting” to suggest separating right- and left-handed events, and we discuss the effects of transverse and longitudinal spin. The analysis tools are statistically robust, and can be applied equally to low or high multiplicity events at the Tevatron, RHIC or RHIC Spin, and the LHC.

Robustly detecting differential expression in RNA sequencing data using observation weights

PubMed Central

Zhou, Xiaobei; Lindsay, Helen; Robinson, Mark D.

2014-01-01

A popular approach for comparing gene expression levels between (replicated) conditions of RNA sequencing data relies on counting reads that map to features of interest. Within such count-based methods, many flexible and advanced statistical approaches now exist and offer the ability to adjust for covariates (e.g. batch effects). Often, these methods include some sort of ‘sharing of information’ across features to improve inferences in small samples. It is important to achieve an appropriate tradeoff between statistical power and protection against outliers. Here, we study the robustness of existing approaches for count-based differential expression analysis and propose a new strategy based on observation weights that can be used within existing frameworks. The results suggest that outliers can have a global effect on differential analyses. We demonstrate the effectiveness of our new approach with real data and simulated data that reflects properties of real datasets (e.g. dispersion-mean trend) and develop an extensible framework for comprehensive testing of current and future methods. In addition, we explore the origin of such outliers, in some cases highlighting additional biological or technical factors within the experiment. Further details can be downloaded from the project website: http://imlspenticton.uzh.ch/robinson_lab/edgeR_robust/. PMID:24753412
Suitability and setup of next-generation sequencing-based method for taxonomic characterization of aquatic microbial biofilm.

PubMed

Bakal, Tomas; Janata, Jiri; Sabova, Lenka; Grabic, Roman; Zlabek, Vladimir; Najmanova, Lucie

2018-06-16

A robust and widely applicable method for sampling of aquatic microbial biofilm and further sample processing is presented. The method is based on next-generation sequencing of V4-V5 variable regions of 16S rRNA gene and further statistical analysis of sequencing data, which could be useful not only to investigate taxonomic composition of biofilm bacterial consortia but also to assess aquatic ecosystem health. Five artificial materials commonly used for biofilm growth (glass, stainless steel, aluminum, polypropylene, polyethylene) were tested to determine the one giving most robust and reproducible results. The effect of used sampler material on total microbial composition was not statistically significant; however, the non-plastic materials (glass, metal) gave more stable outputs without irregularities among sample parallels. The bias of the method is assessed with respect to the employment of a non-quantitative step (PCR amplification) to obtain quantitative results (relative abundance of identified taxa). This aspect is often overlooked in ecological and medical studies. We document that sequencing of a mixture of three merged primary PCR reactions for each sample and further evaluation of median values from three technical replicates for each sample enables to overcome this bias and gives robust and repeatable results well distinguishing among sampling localities and seasons.
Structural damage detection based on stochastic subspace identification and statistical pattern recognition: I. Theory

NASA Astrophysics Data System (ADS)

Ren, W. X.; Lin, Y. Q.; Fang, S. E.

2011-11-01

One of the key issues in vibration-based structural health monitoring is to extract the damage-sensitive but environment-insensitive features from sampled dynamic response measurements and to carry out the statistical analysis of these features for structural damage detection. A new damage feature is proposed in this paper by using the system matrices of the forward innovation model based on the covariance-driven stochastic subspace identification of a vibrating system. To overcome the variations of the system matrices, a non-singularity transposition matrix is introduced so that the system matrices are normalized to their standard forms. For reducing the effects of modeling errors, noise and environmental variations on measured structural responses, a statistical pattern recognition paradigm is incorporated into the proposed method. The Mahalanobis and Euclidean distance decision functions of the damage feature vector are adopted by defining a statistics-based damage index. The proposed structural damage detection method is verified against one numerical signal and two numerical beams. It is demonstrated that the proposed statistics-based damage index is sensitive to damage and shows some robustness to the noise and false estimation of the system ranks. The method is capable of locating damage of the beam structures under different types of excitations. The robustness of the proposed damage detection method to the variations in environmental temperature is further validated in a companion paper by a reinforced concrete beam tested in the laboratory and a full-scale arch bridge tested in the field.
SU-F-T-187: Quantifying Normal Tissue Sparing with 4D Robust Optimization of Intensity Modulated Proton Therapy

DOE Office of Scientific and Technical Information (OSTI.GOV)

Newpower, M; Ge, S; Mohan, R

Purpose: To report an approach to quantify the normal tissue sparing for 4D robustly-optimized versus PTV-optimized IMPT plans. Methods: We generated two sets of 90 DVHs from a patient’s 10-phase 4D CT set; one by conventional PTV-based optimization done in the Eclipse treatment planning system, and the other by an in-house robust optimization algorithm. The 90 DVHs were created for the following scenarios in each of the ten phases of the 4DCT: ± 5mm shift along x, y, z; ± 3.5% range uncertainty and a nominal scenario. A Matlab function written by Gay and Niemierko was modified to calculate EUDmore » for each DVH for the following structures: esophagus, heart, ipsilateral lung and spinal cord. An F-test determined whether or not the variances of each structure’s DVHs were statistically different. Then a t-test determined if the average EUDs for each optimization algorithm were statistically significantly different. Results: T-test results showed each structure had a statistically significant difference in average EUD when comparing robust optimization versus PTV-based optimization. Under robust optimization all structures except the spinal cord received lower EUDs than PTV-based optimization. Using robust optimization the average EUDs decreased 1.45% for the esophagus, 1.54% for the heart and 5.45% for the ipsilateral lung. The average EUD to the spinal cord increased 24.86% but was still well below tolerance. Conclusion: This work has helped quantify a qualitative relationship noted earlier in our work: that robust optimization leads to plans with greater normal tissue sparing compared to PTV-based optimization. Except in the case of the spinal cord all structures received a lower EUD under robust optimization and these results are statistically significant. While the average EUD to the spinal cord increased to 25.06 Gy under robust optimization it is still well under the TD50 value of 66.5 Gy from Emami et al. Supported in part by the NCI U19 CA021239.« less
An Analytical Investigation of the Robustness and Power of ANCOVA with the Presence of Heterogeneous Regression Slopes.

ERIC Educational Resources Information Center

Hollingsworth, Holly H.

This study shows that the test statistic for Analysis of Covariance (ANCOVA) has a noncentral F-districution with noncentrality parameter equal to zero if and only if the regression planes are homogeneous and/or the vector of overall covariate means is the null vector. The effect of heterogeneous regression slope parameters is to either increase…
A robust statistical estimation (RoSE) algorithm jointly recovers the 3D location and intensity of single molecules accurately and precisely

NASA Astrophysics Data System (ADS)

Mazidi, Hesam; Nehorai, Arye; Lew, Matthew D.

2018-02-01

In single-molecule (SM) super-resolution microscopy, the complexity of a biological structure, high molecular density, and a low signal-to-background ratio (SBR) may lead to imaging artifacts without a robust localization algorithm. Moreover, engineered point spread functions (PSFs) for 3D imaging pose difficulties due to their intricate features. We develop a Robust Statistical Estimation algorithm, called RoSE, that enables joint estimation of the 3D location and photon counts of SMs accurately and precisely using various PSFs under conditions of high molecular density and low SBR.
OPTIMA: sensitive and accurate whole-genome alignment of error-prone genomic maps by combinatorial indexing and technology-agnostic statistical analysis.

PubMed

Verzotto, Davide; M Teo, Audrey S; Hillmer, Axel M; Nagarajan, Niranjan

2016-01-01

Resolution of complex repeat structures and rearrangements in the assembly and analysis of large eukaryotic genomes is often aided by a combination of high-throughput sequencing and genome-mapping technologies (for example, optical restriction mapping). In particular, mapping technologies can generate sparse maps of large DNA fragments (150 kilo base pairs (kbp) to 2 Mbp) and thus provide a unique source of information for disambiguating complex rearrangements in cancer genomes. Despite their utility, combining high-throughput sequencing and mapping technologies has been challenging because of the lack of efficient and sensitive map-alignment algorithms for robustly aligning error-prone maps to sequences. We introduce a novel seed-and-extend glocal (short for global-local) alignment method, OPTIMA (and a sliding-window extension for overlap alignment, OPTIMA-Overlap), which is the first to create indexes for continuous-valued mapping data while accounting for mapping errors. We also present a novel statistical model, agnostic with respect to technology-dependent error rates, for conservatively evaluating the significance of alignments without relying on expensive permutation-based tests. We show that OPTIMA and OPTIMA-Overlap outperform other state-of-the-art approaches (1.6-2 times more sensitive) and are more efficient (170-200 %) and precise in their alignments (nearly 99 % precision). These advantages are independent of the quality of the data, suggesting that our indexing approach and statistical evaluation are robust, provide improved sensitivity and guarantee high precision.
Horsetail matching: a flexible approach to optimization under uncertainty

NASA Astrophysics Data System (ADS)

Cook, L. W.; Jarrett, J. P.

2018-04-01

It is important to design engineering systems to be robust with respect to uncertainties in the design process. Often, this is done by considering statistical moments, but over-reliance on statistical moments when formulating a robust optimization can produce designs that are stochastically dominated by other feasible designs. This article instead proposes a formulation for optimization under uncertainty that minimizes the difference between a design's cumulative distribution function and a target. A standard target is proposed that produces stochastically non-dominated designs, but the formulation also offers enough flexibility to recover existing approaches for robust optimization. A numerical implementation is developed that employs kernels to give a differentiable objective function. The method is applied to algebraic test problems and a robust transonic airfoil design problem where it is compared to multi-objective, weighted-sum and density matching approaches to robust optimization; several advantages over these existing methods are demonstrated.
Robust geostatistical analysis of spatial data

NASA Astrophysics Data System (ADS)

Papritz, Andreas; Künsch, Hans Rudolf; Schwierz, Cornelia; Stahel, Werner A.

2013-04-01

Most of the geostatistical software tools rely on non-robust algorithms. This is unfortunate, because outlying observations are rather the rule than the exception, in particular in environmental data sets. Outliers affect the modelling of the large-scale spatial trend, the estimation of the spatial dependence of the residual variation and the predictions by kriging. Identifying outliers manually is cumbersome and requires expertise because one needs parameter estimates to decide which observation is a potential outlier. Moreover, inference after the rejection of some observations is problematic. A better approach is to use robust algorithms that prevent automatically that outlying observations have undue influence. Former studies on robust geostatistics focused on robust estimation of the sample variogram and ordinary kriging without external drift. Furthermore, Richardson and Welsh (1995) proposed a robustified version of (restricted) maximum likelihood ([RE]ML) estimation for the variance components of a linear mixed model, which was later used by Marchant and Lark (2007) for robust REML estimation of the variogram. We propose here a novel method for robust REML estimation of the variogram of a Gaussian random field that is possibly contaminated by independent errors from a long-tailed distribution. It is based on robustification of estimating equations for the Gaussian REML estimation (Welsh and Richardson, 1997). Besides robust estimates of the parameters of the external drift and of the variogram, the method also provides standard errors for the estimated parameters, robustified kriging predictions at both sampled and non-sampled locations and kriging variances. Apart from presenting our modelling framework, we shall present selected simulation results by which we explored the properties of the new method. This will be complemented by an analysis a data set on heavy metal contamination of the soil in the vicinity of a metal smelter. Marchant, B.P. and Lark, R.M. 2007. Robust estimation of the variogram by residual maximum likelihood. Geoderma 140: 62-72. Richardson, A.M. and Welsh, A.H. 1995. Robust restricted maximum likelihood in mixed linear models. Biometrics 51: 1429-1439. Welsh, A.H. and Richardson, A.M. 1997. Approaches to the robust estimation of mixed models. In: Handbook of Statistics Vol. 15, Elsevier, pp. 343-384.
Anthropometric geography applied to the analysis of socioeconomic disparities: cohort trends and spatial patterns of height and robustness in 20th-century Spain.

PubMed

Camara, Antonio D; Roman, Joan Garcia

2015-11-01

Anthropometrics have been widely used to study the influence of environmental factors on health and nutritional status. In contrast, anthropometric geography has not often been employed to approximate the dynamics of spatial disparities associated with socioeconomic and demographic changes. Spain exhibited intense disparity and change during the middle decades of the 20 th century, with the result that the life courses of the corresponding cohorts were associated with diverse environmental conditions. This was also true of the Spanish territories. This paper presents insights concerning the relationship between socioeconomic changes and living conditions by combining the analysis of cohort trends and the anthropometric cartography of height and physical build. This analysis is conducted for Spanish male cohorts born 1934-1973 that were recorded in the Spanish military statistics. This information is interpreted in light of region-level data on GDP and infant mortality. Our results show an anthropometric convergence across regions that, nevertheless, did not substantially modify the spatial patterns of robustness, featuring primarily robust northeastern regions and weak Central-Southern regions. These patterns persisted until the 1990s (cohorts born during the 1970s). For the most part, anthropometric disparities were associated with socioeconomic disparities, although the former lessened over time to a greater extent than the latter. Interestingly, the various anthropometric indicators utilized here do not point to the same conclusions. Some discrepancies between height and robustness patterns have been found that moderate the statements from the analysis of cohort height alone regarding the level and evolution of living conditions across Spanish regions.
Using permutation tests to enhance causal inference in interrupted time series analysis.

PubMed

Linden, Ariel

2018-06-01

Interrupted time series analysis (ITSA) is an evaluation methodology in which a single treatment unit's outcome is studied serially over time and the intervention is expected to "interrupt" the level and/or trend of that outcome. The internal validity is strengthened considerably when the treated unit is contrasted with a comparable control group. In this paper, we introduce a robustness check based on permutation tests to further improve causal inference. We evaluate the effect of California's Proposition 99 for reducing cigarette sales by iteratively casting each nontreated state into the role of "treated," creating a comparable control group using the ITSAMATCH package in Stata, and then evaluating treatment effects using ITSA regression. If statistically significant "treatment effects" are estimated for pseudotreated states, then any significant changes in the outcome of the actual treatment unit (California) cannot be attributed to the intervention. We perform these analyses setting the cutpoint significance level to P > .40 for identifying balanced matches (the highest threshold possible for which controls could still be found for California) and use the difference in differences of trends as the treatment effect estimator. Only California attained a statistically significant treatment effect, strengthening confidence in the conclusion that Proposition 99 reduced cigarette sales. The proposed permutation testing framework provides an additional robustness check to either support or refute a treatment effect identified in for the true treated unit in ITSA. Given its value and ease of implementation, this framework should be considered as a standard robustness test in all multiple group interrupted time series analyses. © 2018 John Wiley & Sons, Ltd.
Robust Lee local statistic filter for removal of mixed multiplicative and impulse noise

NASA Astrophysics Data System (ADS)

Ponomarenko, Nikolay N.; Lukin, Vladimir V.; Egiazarian, Karen O.; Astola, Jaakko T.

2004-05-01

A robust version of Lee local statistic filter able to effectively suppress the mixed multiplicative and impulse noise in images is proposed. The performance of the proposed modification is studied for a set of test images, several values of multiplicative noise variance, Gaussian and Rayleigh probability density functions of speckle, and different characteris-tics of impulse noise. The advantages of the designed filter in comparison to the conventional Lee local statistic filter and some other filters able to cope with mixed multiplicative+impulse noise are demonstrated.
Statistical analysis of water-quality data containing multiple detection limits: S-language software for regression on order statistics

USGS Publications Warehouse

Lee, L.; Helsel, D.

2005-01-01

Trace contaminants in water, including metals and organics, often are measured at sufficiently low concentrations to be reported only as values below the instrument detection limit. Interpretation of these "less thans" is complicated when multiple detection limits occur. Statistical methods for multiply censored, or multiple-detection limit, datasets have been developed for medical and industrial statistics, and can be employed to estimate summary statistics or model the distributions of trace-level environmental data. We describe S-language-based software tools that perform robust linear regression on order statistics (ROS). The ROS method has been evaluated as one of the most reliable procedures for developing summary statistics of multiply censored data. It is applicable to any dataset that has 0 to 80% of its values censored. These tools are a part of a software library, or add-on package, for the R environment for statistical computing. This library can be used to generate ROS models and associated summary statistics, plot modeled distributions, and predict exceedance probabilities of water-quality standards. ?? 2005 Elsevier Ltd. All rights reserved.
A Weibull statistics-based lignocellulose saccharification model and a built-in parameter accurately predict lignocellulose hydrolysis performance.

PubMed

Wang, Mingyu; Han, Lijuan; Liu, Shasha; Zhao, Xuebing; Yang, Jinghua; Loh, Soh Kheang; Sun, Xiaomin; Zhang, Chenxi; Fang, Xu

2015-09-01

Renewable energy from lignocellulosic biomass has been deemed an alternative to depleting fossil fuels. In order to improve this technology, we aim to develop robust mathematical models for the enzymatic lignocellulose degradation process. By analyzing 96 groups of previously published and newly obtained lignocellulose saccharification results and fitting them to Weibull distribution, we discovered Weibull statistics can accurately predict lignocellulose saccharification data, regardless of the type of substrates, enzymes and saccharification conditions. A mathematical model for enzymatic lignocellulose degradation was subsequently constructed based on Weibull statistics. Further analysis of the mathematical structure of the model and experimental saccharification data showed the significance of the two parameters in this model. In particular, the λ value, defined the characteristic time, represents the overall performance of the saccharification system. This suggestion was further supported by statistical analysis of experimental saccharification data and analysis of the glucose production levels when λ and n values change. In conclusion, the constructed Weibull statistics-based model can accurately predict lignocellulose hydrolysis behavior and we can use the λ parameter to assess the overall performance of enzymatic lignocellulose degradation. Advantages and potential applications of the model and the λ value in saccharification performance assessment were discussed. Copyright © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Assessment of Reliable Change Using 95% Credible Intervals for the Differences in Proportions: A Statistical Analysis for Case-Study Methodology.

PubMed

Unicomb, Rachael; Colyvas, Kim; Harrison, Elisabeth; Hewat, Sally

2015-06-01

Case-study methodology studying change is often used in the field of speech-language pathology, but it can be criticized for not being statistically robust. Yet with the heterogeneous nature of many communication disorders, case studies allow clinicians and researchers to closely observe and report on change. Such information is valuable and can further inform large-scale experimental designs. In this research note, a statistical analysis for case-study data is outlined that employs a modification to the Reliable Change Index (Jacobson & Truax, 1991). The relationship between reliable change and clinical significance is discussed. Example data are used to guide the reader through the use and application of this analysis. A method of analysis is detailed that is suitable for assessing change in measures with binary categorical outcomes. The analysis is illustrated using data from one individual, measured before and after treatment for stuttering. The application of this approach to assess change in categorical, binary data has potential application in speech-language pathology. It enables clinicians and researchers to analyze results from case studies for their statistical and clinical significance. This new method addresses a gap in the research design literature, that is, the lack of analysis methods for noncontinuous data (such as counts, rates, proportions of events) that may be used in case-study designs.
Use of Bayesian Methods to Analyze and Visualize Content Uniformity Capability Versus United States Pharmacopeia and ASTM Standards.

PubMed

Hofer, Jeffrey D; Rauk, Adam P

2017-02-01

The purpose of this work was to develop a straightforward and robust approach to analyze and summarize the ability of content uniformity data to meet different criteria. A robust Bayesian statistical analysis methodology is presented which provides a concise and easily interpretable visual summary of the content uniformity analysis results. The visualization displays individual batch analysis results and shows whether there is high confidence that different content uniformity criteria could be met a high percentage of the time in the future. The 3 tests assessed are as follows: (a) United States Pharmacopeia Uniformity of Dosage Units <905>, (b) a specific ASTM E2810 Sampling Plan 1 criterion to potentially be used for routine release testing, and (c) another specific ASTM E2810 Sampling Plan 2 criterion to potentially be used for process validation. The approach shown here could readily be used to create similar result summaries for other potential criteria. Copyright © 2017 American Pharmacists Association®. Published by Elsevier Inc. All rights reserved.
Introduction to the special issue on recentering science: Replication, robustness, and reproducibility in psychophysiology.

PubMed

Kappenman, Emily S; Keil, Andreas

2017-01-01

In recent years, the psychological and behavioral sciences have increased efforts to strengthen methodological practices and publication standards, with the ultimate goal of enhancing the value and reproducibility of published reports. These issues are especially important in the multidisciplinary field of psychophysiology, which yields rich and complex data sets with a large number of observations. In addition, the technological tools and analysis methods available in the field of psychophysiology are continually evolving, widening the array of techniques and approaches available to researchers. This special issue presents articles detailing rigorous and systematic evaluations of tasks, measures, materials, analysis approaches, and statistical practices in a variety of subdisciplines of psychophysiology. These articles highlight challenges in conducting and interpreting psychophysiological research and provide data-driven, evidence-based recommendations for overcoming those challenges to produce robust, reproducible results in the field of psychophysiology. © 2016 Society for Psychophysiological Research.
An Intercompany Perspective on Biopharmaceutical Drug Product Robustness Studies.

PubMed

Morar-Mitrica, Sorina; Adams, Monica L; Crotts, George; Wurth, Christine; Ihnat, Peter M; Tabish, Tanvir; Antochshuk, Valentyn; DiLuzio, Willow; Dix, Daniel B; Fernandez, Jason E; Gupta, Kapil; Fleming, Michael S; He, Bing; Kranz, James K; Liu, Dingjiang; Narasimhan, Chakravarthy; Routhier, Eric; Taylor, Katherine D; Truong, Nobel; Stokes, Elaine S E

2018-02-01

The Biophorum Development Group (BPDG) is an industry-wide consortium enabling networking and sharing of best practices for the development of biopharmaceuticals. To gain a better understanding of current industry approaches for establishing biopharmaceutical drug product (DP) robustness, the BPDG-Formulation Point Share group conducted an intercompany collaboration exercise, which included a bench-marking survey and extensive group discussions around the scope, design, and execution of robustness studies. The results of this industry collaboration revealed several key common themes: (1) overall DP robustness is defined by both the formulation and the manufacturing process robustness; (2) robustness integrates the principles of quality by design (QbD); (3) DP robustness is an important factor in setting critical quality attribute control strategies and commercial specifications; (4) most companies employ robustness studies, along with prior knowledge, risk assessments, and statistics, to develop the DP design space; (5) studies are tailored to commercial development needs and the practices of each company. Three case studies further illustrate how a robustness study design for a biopharmaceutical DP balances experimental complexity, statistical power, scientific understanding, and risk assessment to provide the desired product and process knowledge. The BPDG-Formulation Point Share discusses identified industry challenges with regard to biopharmaceutical DP robustness and presents some recommendations for best practices. Copyright © 2018 American Pharmacists Association®. Published by Elsevier Inc. All rights reserved.
Defining surfaces for skewed, highly variable data

USGS Publications Warehouse

Helsel, D.R.; Ryker, S.J.

2002-01-01

Skewness of environmental data is often caused by more than simply a handful of outliers in an otherwise normal distribution. Statistical procedures for such datasets must be sufficiently robust to deal with distributions that are strongly non-normal, containing both a large proportion of outliers and a skewed main body of data. In the field of water quality, skewness is commonly associated with large variation over short distances. Spatial analysis of such data generally requires either considerable effort at modeling or the use of robust procedures not strongly affected by skewness and local variability. Using a skewed dataset of 675 nitrate measurements in ground water, commonly used methods for defining a surface (least-squares regression and kriging) are compared to a more robust method (loess). Three choices are critical in defining a surface: (i) is the surface to be a central mean or median surface? (ii) is either a well-fitting transformation or a robust and scale-independent measure of center used? (iii) does local spatial autocorrelation assist in or detract from addressing objectives? Published in 2002 by John Wiley & Sons, Ltd.
Robust mislabel logistic regression without modeling mislabel probabilities.

PubMed

Hung, Hung; Jou, Zhi-Yu; Huang, Su-Yun

2018-03-01

Logistic regression is among the most widely used statistical methods for linear discriminant analysis. In many applications, we only observe possibly mislabeled responses. Fitting a conventional logistic regression can then lead to biased estimation. One common resolution is to fit a mislabel logistic regression model, which takes into consideration of mislabeled responses. Another common method is to adopt a robust M-estimation by down-weighting suspected instances. In this work, we propose a new robust mislabel logistic regression based on γ-divergence. Our proposal possesses two advantageous features: (1) It does not need to model the mislabel probabilities. (2) The minimum γ-divergence estimation leads to a weighted estimating equation without the need to include any bias correction term, that is, it is automatically bias-corrected. These features make the proposed γ-logistic regression more robust in model fitting and more intuitive for model interpretation through a simple weighting scheme. Our method is also easy to implement, and two types of algorithms are included. Simulation studies and the Pima data application are presented to demonstrate the performance of γ-logistic regression. © 2017, The International Biometric Society.

Fisher statistics for analysis of diffusion tensor directional information.

PubMed

Hutchinson, Elizabeth B; Rutecki, Paul A; Alexander, Andrew L; Sutula, Thomas P

2012-04-30

A statistical approach is presented for the quantitative analysis of diffusion tensor imaging (DTI) directional information using Fisher statistics, which were originally developed for the analysis of vectors in the field of paleomagnetism. In this framework, descriptive and inferential statistics have been formulated based on the Fisher probability density function, a spherical analogue of the normal distribution. The Fisher approach was evaluated for investigation of rat brain DTI maps to characterize tissue orientation in the corpus callosum, fornix, and hilus of the dorsal hippocampal dentate gyrus, and to compare directional properties in these regions following status epilepticus (SE) or traumatic brain injury (TBI) with values in healthy brains. Direction vectors were determined for each region of interest (ROI) for each brain sample and Fisher statistics were applied to calculate the mean direction vector and variance parameters in the corpus callosum, fornix, and dentate gyrus of normal rats and rats that experienced TBI or SE. Hypothesis testing was performed by calculation of Watson's F-statistic and associated p-value giving the likelihood that grouped observations were from the same directional distribution. In the fornix and midline corpus callosum, no directional differences were detected between groups, however in the hilus, significant (p<0.0005) differences were found that robustly confirmed observations that were suggested by visual inspection of directionally encoded color DTI maps. The Fisher approach is a potentially useful analysis tool that may extend the current capabilities of DTI investigation by providing a means of statistical comparison of tissue structural orientation. Copyright © 2012 Elsevier B.V. All rights reserved.
A Simple Simulation Technique for Nonnormal Data with Prespecified Skewness, Kurtosis, and Covariance Matrix.

PubMed

Foldnes, Njål; Olsson, Ulf Henning

2016-01-01

We present and investigate a simple way to generate nonnormal data using linear combinations of independent generator (IG) variables. The simulated data have prespecified univariate skewness and kurtosis and a given covariance matrix. In contrast to the widely used Vale-Maurelli (VM) transform, the obtained data are shown to have a non-Gaussian copula. We analytically obtain asymptotic robustness conditions for the IG distribution. We show empirically that popular test statistics in covariance analysis tend to reject true models more often under the IG transform than under the VM transform. This implies that overly optimistic evaluations of estimators and fit statistics in covariance structure analysis may be tempered by including the IG transform for nonnormal data generation. We provide an implementation of the IG transform in the R environment.
Statistics based sampling for controller and estimator design

NASA Astrophysics Data System (ADS)

Tenne, Dirk

The purpose of this research is the development of statistical design tools for robust feed-forward/feedback controllers and nonlinear estimators. This dissertation is threefold and addresses the aforementioned topics nonlinear estimation, target tracking and robust control. To develop statistically robust controllers and nonlinear estimation algorithms, research has been performed to extend existing techniques, which propagate the statistics of the state, to achieve higher order accuracy. The so-called unscented transformation has been extended to capture higher order moments. Furthermore, higher order moment update algorithms based on a truncated power series have been developed. The proposed techniques are tested on various benchmark examples. Furthermore, the unscented transformation has been utilized to develop a three dimensional geometrically constrained target tracker. The proposed planar circular prediction algorithm has been developed in a local coordinate framework, which is amenable to extension of the tracking algorithm to three dimensional space. This tracker combines the predictions of a circular prediction algorithm and a constant velocity filter by utilizing the Covariance Intersection. This combined prediction can be updated with the subsequent measurement using a linear estimator. The proposed technique is illustrated on a 3D benchmark trajectory, which includes coordinated turns and straight line maneuvers. The third part of this dissertation addresses the design of controller which include knowledge of parametric uncertainties and their distributions. The parameter distributions are approximated by a finite set of points which are calculated by the unscented transformation. This set of points is used to design robust controllers which minimize a statistical performance of the plant over the domain of uncertainty consisting of a combination of the mean and variance. The proposed technique is illustrated on three benchmark problems. The first relates to the design of prefilters for a linear and nonlinear spring-mass-dashpot system and the second applies a feedback controller to a hovering helicopter. Lastly, the statistical robust controller design is devoted to a concurrent feed-forward/feedback controller structure for a high-speed low tension tape drive.
Descent graphs in pedigree analysis: applications to haplotyping, location scores, and marker-sharing statistics.

PubMed Central

Sobel, E.; Lange, K.

1996-01-01

The introduction of stochastic methods in pedigree analysis has enabled geneticists to tackle computations intractable by standard deterministic methods. Until now these stochastic techniques have worked by running a Markov chain on the set of genetic descent states of a pedigree. Each descent state specifies the paths of gene flow in the pedigree and the founder alleles dropped down each path. The current paper follows up on a suggestion by Elizabeth Thompson that genetic descent graphs offer a more appropriate space for executing a Markov chain. A descent graph specifies the paths of gene flow but not the particular founder alleles traveling down the paths. This paper explores algorithms for implementing Thompson's suggestion for codominant markers in the context of automatic haplotyping, estimating location scores, and computing gene-clustering statistics for robust linkage analysis. Realistic numerical examples demonstrate the feasibility of the algorithms. PMID:8651310
Is There an Effect of Incremental Welfare Benefits on Fertility Behavior?: A Look at the Family Cap

ERIC Educational Resources Information Center

Kearney, Melissa Schettini

2004-01-01

This analysis exploits the variation across states in the timing of policy implementation to determine if family cap policies lead to a reduction in births to women aged 15 to 34. Vital statistics birth data for the years 1989 to 1998 offer no such evidence. The data reject a decline in births of more than one percent. The finding is robust to…
Robust detection of multiple sclerosis lesions from intensity-normalized multi-channel MRI

NASA Astrophysics Data System (ADS)

Karpate, Yogesh; Commowick, Olivier; Barillot, Christian

2015-03-01

Multiple sclerosis (MS) is a disease with heterogeneous evolution among the patients. Quantitative analysis of longitudinal Magnetic Resonance Images (MRI) provides a spatial analysis of the brain tissues which may lead to the discovery of biomarkers of disease evolution. Better understanding of the disease will lead to a better discovery of pathogenic mechanisms, allowing for patient-adapted therapeutic strategies. To characterize MS lesions, we propose a novel paradigm to detect white matter lesions based on a statistical framework. It aims at studying the benefits of using multi-channel MRI to detect statistically significant differences between each individual MS patient and a database of control subjects. This framework consists in two components. First, intensity standardization is conducted to minimize the inter-subject intensity difference arising from variability of the acquisition process and different scanners. The intensity normalization maps parameters obtained using a robust Gaussian Mixture Model (GMM) estimation not affected by the presence of MS lesions. The second part studies the comparison of multi-channel MRI of MS patients with respect to an atlas built from the control subjects, thereby allowing us to look for differences in normal appearing white matter, in and around the lesions of each patient. Experimental results demonstrate that our technique accurately detects significant differences in lesions consequently improving the results of MS lesion detection.
Multiple Phenotype Association Tests Using Summary Statistics in Genome-Wide Association Studies

PubMed Central

Liu, Zhonghua; Lin, Xihong

2017-01-01

Summary We study in this paper jointly testing the associations of a genetic variant with correlated multiple phenotypes using the summary statistics of individual phenotype analysis from Genome-Wide Association Studies (GWASs). We estimated the between-phenotype correlation matrix using the summary statistics of individual phenotype GWAS analyses, and developed genetic association tests for multiple phenotypes by accounting for between-phenotype correlation without the need to access individual-level data. Since genetic variants often affect multiple phenotypes differently across the genome and the between-phenotype correlation can be arbitrary, we proposed robust and powerful multiple phenotype testing procedures by jointly testing a common mean and a variance component in linear mixed models for summary statistics. We computed the p-values of the proposed tests analytically. This computational advantage makes our methods practically appealing in large-scale GWASs. We performed simulation studies to show that the proposed tests maintained correct type I error rates, and to compare their powers in various settings with the existing methods. We applied the proposed tests to a GWAS Global Lipids Genetics Consortium summary statistics data set and identified additional genetic variants that were missed by the original single-trait analysis. PMID:28653391
Multiple phenotype association tests using summary statistics in genome-wide association studies.

PubMed

Liu, Zhonghua; Lin, Xihong

2018-03-01

We study in this article jointly testing the associations of a genetic variant with correlated multiple phenotypes using the summary statistics of individual phenotype analysis from Genome-Wide Association Studies (GWASs). We estimated the between-phenotype correlation matrix using the summary statistics of individual phenotype GWAS analyses, and developed genetic association tests for multiple phenotypes by accounting for between-phenotype correlation without the need to access individual-level data. Since genetic variants often affect multiple phenotypes differently across the genome and the between-phenotype correlation can be arbitrary, we proposed robust and powerful multiple phenotype testing procedures by jointly testing a common mean and a variance component in linear mixed models for summary statistics. We computed the p-values of the proposed tests analytically. This computational advantage makes our methods practically appealing in large-scale GWASs. We performed simulation studies to show that the proposed tests maintained correct type I error rates, and to compare their powers in various settings with the existing methods. We applied the proposed tests to a GWAS Global Lipids Genetics Consortium summary statistics data set and identified additional genetic variants that were missed by the original single-trait analysis. © 2017, The International Biometric Society.
Design of robust flow processing networks with time-programmed responses

NASA Astrophysics Data System (ADS)

Kaluza, P.; Mikhailov, A. S.

2012-04-01

Can artificially designed networks reach the levels of robustness against local damage which are comparable with those of the biochemical networks of a living cell? We consider a simple model where the flow applied to an input node propagates through the network and arrives at different times to the output nodes, thus generating a pattern of coordinated responses. By using evolutionary optimization algorithms, functional networks - with required time-programmed responses - were constructed. Then, continuing the evolution, such networks were additionally optimized for robustness against deletion of individual nodes or links. In this manner, large ensembles of functional networks with different kinds of robustness were obtained, making statistical investigations and comparison of their structural properties possible. We have found that, generally, different architectures are needed for various kinds of robustness. The differences are statistically revealed, for example, in the Laplacian spectra of the respective graphs. On the other hand, motif distributions of robust networks do not differ from those of the merely functional networks; they are found to belong to the first Alon superfamily, the same as that of the gene transcription networks of single-cell organisms.
Robust functional statistics applied to Probability Density Function shape screening of sEMG data.

PubMed

Boudaoud, S; Rix, H; Al Harrach, M; Marin, F

2014-01-01

Recent studies pointed out possible shape modifications of the Probability Density Function (PDF) of surface electromyographical (sEMG) data according to several contexts like fatigue and muscle force increase. Following this idea, criteria have been proposed to monitor these shape modifications mainly using High Order Statistics (HOS) parameters like skewness and kurtosis. In experimental conditions, these parameters are confronted with small sample size in the estimation process. This small sample size induces errors in the estimated HOS parameters restraining real-time and precise sEMG PDF shape monitoring. Recently, a functional formalism, the Core Shape Model (CSM), has been used to analyse shape modifications of PDF curves. In this work, taking inspiration from CSM method, robust functional statistics are proposed to emulate both skewness and kurtosis behaviors. These functional statistics combine both kernel density estimation and PDF shape distances to evaluate shape modifications even in presence of small sample size. Then, the proposed statistics are tested, using Monte Carlo simulations, on both normal and Log-normal PDFs that mimic observed sEMG PDF shape behavior during muscle contraction. According to the obtained results, the functional statistics seem to be more robust than HOS parameters to small sample size effect and more accurate in sEMG PDF shape screening applications.
Periodontal disease and carotid atherosclerosis: A meta-analysis of 17,330 participants.

PubMed

Zeng, Xian-Tao; Leng, Wei-Dong; Lam, Yat-Yin; Yan, Bryan P; Wei, Xue-Mei; Weng, Hong; Kwong, Joey S W

2016-01-15

The association between periodontal disease and carotid atherosclerosis has been evaluated primarily in single-center studies, and whether periodontal disease is an independent risk factor of carotid atherosclerosis remains uncertain. This meta-analysis aimed to evaluate the association between periodontal disease and carotid atherosclerosis. We searched PubMed and Embase for relevant observational studies up to February 20, 2015. Two authors independently extracted data from included studies, and odds ratios (ORs) with 95% confidence intervals (CIs) were calculated for overall and subgroup meta-analyses. Statistical heterogeneity was assessed by the chi-squared test (P<0.1 for statistical significance) and quantified by the I(2) statistic. Data analysis was conducted using the Comprehensive Meta-Analysis (CMA) software. Fifteen observational studies involving 17,330 participants were included in the meta-analysis. The overall pooled result showed that periodontal disease was associated with carotid atherosclerosis (OR: 1.27, 95% CI: 1.14-1.41; P<0.001) but statistical heterogeneity was substantial (I(2)=78.90%). Subgroup analysis of adjusted smoking and diabetes mellitus showed borderline significance (OR: 1.08; 95% CI: 1.00-1.18; P=0.05). Sensitivity and cumulative analyses both indicated that our results were robust. Findings of our meta-analysis indicated that the presence of periodontal disease was associated with carotid atherosclerosis; however, further large-scale, well-conducted clinical studies are needed to explore the precise risk of developing carotid atherosclerosis in patients with periodontal disease. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Noise level and MPEG-2 encoder statistics

NASA Astrophysics Data System (ADS)

Lee, Jungwoo

1997-01-01

Most software in the movie and broadcasting industries are still in analog film or tape format, which typically contains random noise that originated from film, CCD camera, and tape recording. The performance of the MPEG-2 encoder may be significantly degraded by the noise. It is also affected by the scene type that includes spatial and temporal activity. The statistical property of noise originating from camera and tape player is analyzed and the models for the two types of noise are developed. The relationship between the noise, the scene type, and encoder statistics of a number of MPEG-2 parameters such as motion vector magnitude, prediction error, and quant scale are discussed. This analysis is intended to be a tool for designing robust MPEG encoding algorithms such as preprocessing and rate control.
Statistical Analysis of CFD Solutions from the Drag Prediction Workshop

NASA Technical Reports Server (NTRS)

Hemsch, Michael J.

2002-01-01

A simple, graphical framework is presented for robust statistical evaluation of results obtained from N-Version testing of a series of RANS CFD codes. The solutions were obtained by a variety of code developers and users for the June 2001 Drag Prediction Workshop sponsored by the AIAA Applied Aerodynamics Technical Committee. The aerodynamic configuration used for the computational tests is the DLR-F4 wing-body combination previously tested in several European wind tunnels and for which a previous N-Version test had been conducted. The statistical framework is used to evaluate code results for (1) a single cruise design point, (2) drag polars and (3) drag rise. The paper concludes with a discussion of the meaning of the results, especially with respect to predictability, Validation, and reporting of solutions.
Toward statistical modeling of saccadic eye-movement and visual saliency.

PubMed

Sun, Xiaoshuai; Yao, Hongxun; Ji, Rongrong; Liu, Xian-Ming

2014-11-01

In this paper, we present a unified statistical framework for modeling both saccadic eye movements and visual saliency. By analyzing the statistical properties of human eye fixations on natural images, we found that human attention is sparsely distributed and usually deployed to locations with abundant structural information. This observations inspired us to model saccadic behavior and visual saliency based on super-Gaussian component (SGC) analysis. Our model sequentially obtains SGC using projection pursuit, and generates eye movements by selecting the location with maximum SGC response. Besides human saccadic behavior simulation, we also demonstrated our superior effectiveness and robustness over state-of-the-arts by carrying out dense experiments on synthetic patterns and human eye fixation benchmarks. Multiple key issues in saliency modeling research, such as individual differences, the effects of scale and blur, are explored in this paper. Based on extensive qualitative and quantitative experimental results, we show promising potentials of statistical approaches for human behavior research.
Dynamic heterogeneity and non-Gaussian statistics for acetylcholine receptors on live cell membrane

NASA Astrophysics Data System (ADS)

He, W.; Song, H.; Su, Y.; Geng, L.; Ackerson, B. J.; Peng, H. B.; Tong, P.

2016-05-01

The Brownian motion of molecules at thermal equilibrium usually has a finite correlation time and will eventually be randomized after a long delay time, so that their displacement follows the Gaussian statistics. This is true even when the molecules have experienced a complex environment with a finite correlation time. Here, we report that the lateral motion of the acetylcholine receptors on live muscle cell membranes does not follow the Gaussian statistics for normal Brownian diffusion. From a careful analysis of a large volume of the protein trajectories obtained over a wide range of sampling rates and long durations, we find that the normalized histogram of the protein displacements shows an exponential tail, which is robust and universal for cells under different conditions. The experiment indicates that the observed non-Gaussian statistics and dynamic heterogeneity are inherently linked to the slow-active remodelling of the underlying cortical actin network.
Effect of non-normality on test statistics for one-way independent groups designs.

PubMed

Cribbie, Robert A; Fiksenbaum, Lisa; Keselman, H J; Wilcox, Rand R

2012-02-01

The data obtained from one-way independent groups designs is typically non-normal in form and rarely equally variable across treatment populations (i.e., population variances are heterogeneous). Consequently, the classical test statistic that is used to assess statistical significance (i.e., the analysis of variance F test) typically provides invalid results (e.g., too many Type I errors, reduced power). For this reason, there has been considerable interest in finding a test statistic that is appropriate under conditions of non-normality and variance heterogeneity. Previously recommended procedures for analysing such data include the James test, the Welch test applied either to the usual least squares estimators of central tendency and variability, or the Welch test with robust estimators (i.e., trimmed means and Winsorized variances). A new statistic proposed by Krishnamoorthy, Lu, and Mathew, intended to deal with heterogeneous variances, though not non-normality, uses a parametric bootstrap procedure. In their investigation of the parametric bootstrap test, the authors examined its operating characteristics under limited conditions and did not compare it to the Welch test based on robust estimators. Thus, we investigated how the parametric bootstrap procedure and a modified parametric bootstrap procedure based on trimmed means perform relative to previously recommended procedures when data are non-normal and heterogeneous. The results indicated that the tests based on trimmed means offer the best Type I error control and power when variances are unequal and at least some of the distribution shapes are non-normal. © 2011 The British Psychological Society.
A Robust Parameterization of Human Gait Patterns Across Phase-Shifting Perturbations

PubMed Central

Villarreal, Dario J.; Poonawala, Hasan A.; Gregg, Robert D.

2016-01-01

The phase of human gait is difficult to quantify accurately in the presence of disturbances. In contrast, recent bipedal robots use time-independent controllers relying on a mechanical phase variable to synchronize joint patterns through the gait cycle. This concept has inspired studies to determine if human joint patterns can also be parameterized by a mechanical variable. Although many phase variable candidates have been proposed, it remains unclear which, if any, provide a robust representation of phase for human gait analysis or control. In this paper we analytically derive an ideal phase variable (the hip phase angle) that is provably monotonic and bounded throughout the gait cycle. To examine the robustness of this phase variable, ten able-bodied human subjects walked over a platform that randomly applied phase-shifting perturbations to the stance leg. A statistical analysis found the correlations between nominal and perturbed joint trajectories to be significantly greater when parameterized by the hip phase angle (0.95+) than by time or a different phase variable. The hip phase angle also best parameterized the transient errors about the nominal periodic orbit. Finally, interlimb phasing was best explained by local (ipsilateral) hip phase angles that are synchronized during the double-support period. PMID:27187967
Analysis of covariance as a remedy for demographic mismatch of research subject groups: some sobering simulations.

PubMed

Adams, K M; Brown, G G; Grant, I

1985-08-01

Analysis of Covariance (ANCOVA) is often used in neuropsychological studies to effect ex-post-facto adjustment of performance variables amongst groups of subjects mismatched on some relevant demographic variable. This paper reviews some of the statistical assumptions underlying this usage. In an attempt to illustrate the complexities of this statistical technique, three sham studies using actual patient data are presented. These staged simulations have varying relationships between group test performance differences and levels of covariate discrepancy. The results were robust and consistent in their nature, and were held to support the wisdom of previous cautions by statisticians concerning the employment of ANCOVA to justify comparisons between incomparable groups. ANCOVA should not be used in neuropsychological research to equate groups unequal on variables such as age and education or to exert statistical control whose objective is to eliminate consideration of the covariate as an explanation for results. Finally, the report advocates by example the use of simulation to further our understanding of neuropsychological variables.
Ariadne's Thread: A Robust Software Solution Leading to Automated Absolute and Relative Quantification of SRM Data.

PubMed

Nasso, Sara; Goetze, Sandra; Martens, Lennart

2015-09-04

Selected reaction monitoring (SRM) MS is a highly selective and sensitive technique to quantify protein abundances in complex biological samples. To enhance the pace of SRM large studies, a validated, robust method to fully automate absolute quantification and to substitute for interactive evaluation would be valuable. To address this demand, we present Ariadne, a Matlab software. To quantify monitored targets, Ariadne exploits metadata imported from the transition lists, and targets can be filtered according to mProphet output. Signal processing and statistical learning approaches are combined to compute peptide quantifications. To robustly estimate absolute abundances, the external calibration curve method is applied, ensuring linearity over the measured dynamic range. Ariadne was benchmarked against mProphet and Skyline by comparing its quantification performance on three different dilution series, featuring either noisy/smooth traces without background or smooth traces with complex background. Results, evaluated as efficiency, linearity, accuracy, and precision of quantification, showed that Ariadne's performance is independent of data smoothness and complex background presence and that Ariadne outperforms mProphet on the noisier data set and improved 2-fold Skyline's accuracy and precision for the lowest abundant dilution with complex background. Remarkably, Ariadne could statistically distinguish from each other all different abundances, discriminating dilutions as low as 0.1 and 0.2 fmol. These results suggest that Ariadne offers reliable and automated analysis of large-scale SRM differential expression studies.
A statistically robust EEG re-referencing procedure to mitigate reference effect

PubMed Central

Lepage, Kyle Q.; Kramer, Mark A.; Chu, Catherine J.

2014-01-01

Background The electroencephalogram (EEG) remains the primary tool for diagnosis of abnormal brain activity in clinical neurology and for in vivo recordings of human neurophysiology in neuroscience research. In EEG data acquisition, voltage is measured at positions on the scalp with respect to a reference electrode. When this reference electrode responds to electrical activity or artifact all electrodes are affected. Successful analysis of EEG data often involves re-referencing procedures that modify the recorded traces and seek to minimize the impact of reference electrode activity upon functions of the original EEG recordings. New method We provide a novel, statistically robust procedure that adapts a robust maximum-likelihood type estimator to the problem of reference estimation, reduces the influence of neural activity from the re-referencing operation, and maintains good performance in a wide variety of empirical scenarios. Results The performance of the proposed and existing re-referencing procedures are validated in simulation and with examples of EEG recordings. To facilitate this comparison, channel-to-channel correlations are investigated theoretically and in simulation. Comparison with existing methods The proposed procedure avoids using data contaminated by neural signal and remains unbiased in recording scenarios where physical references, the common average reference (CAR) and the reference estimation standardization technique (REST) are not optimal. Conclusion The proposed procedure is simple, fast, and avoids the potential for substantial bias when analyzing low-density EEG data. PMID:24975291

Effectiveness of propolis on oral health: a meta-analysis.

PubMed

Hwu, Yueh-Juen; Lin, Feng-Yu

2014-12-01

The use of propolis mouth rinse or gel as a supplementary intervention has increased during the last decade in Taiwan. However, the effect of propolis on oral health is not well understood. The purpose of this meta-analysis was to present the best available evidence regarding the effects of propolis use on oral health, including oral infection, dental plaque, and stomatitis. Researchers searched seven electronic databases for relevant articles published between 1969 and 2012. Data were collected using inclusion and exclusion criteria. The Joanna Briggs Institute Meta Analysis of Statistics Assessment and Review Instrument was used to evaluate the quality of the identified articles. Eight trials published from 1997 to 2011 with 194 participants had extractable data. The result of the meta-analysis indicated that, although propolis had an effect on reducing dental plaque, this effect was not statistically significant. The results were not statistically significant for oral infection or stomatitis. Although there are a number of promising indications, in view of the limited number and quality of studies and the variation in results among studies, this review highlights the need for additional well-designed trials to draw conclusions that are more robust.
Association analysis of multiple traits by an approach of combining P values.

PubMed

Chen, Lili; Wang, Yong; Zhou, Yajing

2018-03-01

Increasing evidence shows that one variant can affect multiple traits, which is a widespread phenomenon in complex diseases. Joint analysis of multiple traits can increase statistical power of association analysis and uncover the underlying genetic mechanism. Although there are many statistical methods to analyse multiple traits, most of these methods are usually suitable for detecting common variants associated with multiple traits. However, because of low minor allele frequency of rare variant, these methods are not optimal for rare variant association analysis. In this paper, we extend an adaptive combination of P values method (termed ADA) for single trait to test association between multiple traits and rare variants in the given region. For a given region, we use reverse regression model to test each rare variant associated with multiple traits and obtain the P value of single-variant test. Further, we take the weighted combination of these P values as the test statistic. Extensive simulation studies show that our approach is more powerful than several other comparison methods in most cases and is robust to the inclusion of a high proportion of neutral variants and the different directions of effects of causal variants.
A probabilistic approach to aircraft design emphasizing stability and control uncertainties

NASA Astrophysics Data System (ADS)

Delaurentis, Daniel Andrew

In order to address identified deficiencies in current approaches to aerospace systems design, a new method has been developed. This new method for design is based on the premise that design is a decision making activity, and that deterministic analysis and synthesis can lead to poor, or misguided decision making. This is due to a lack of disciplinary knowledge of sufficient fidelity about the product, to the presence of uncertainty at multiple levels of the aircraft design hierarchy, and to a failure to focus on overall affordability metrics as measures of goodness. Design solutions are desired which are robust to uncertainty and are based on the maximum knowledge possible. The new method represents advances in the two following general areas. 1. Design models and uncertainty. The research performed completes a transition from a deterministic design representation to a probabilistic one through a modeling of design uncertainty at multiple levels of the aircraft design hierarchy, including: (1) Consistent, traceable uncertainty classification and representation; (2) Concise mathematical statement of the Probabilistic Robust Design problem; (3) Variants of the Cumulative Distribution Functions (CDFs) as decision functions for Robust Design; (4) Probabilistic Sensitivities which identify the most influential sources of variability. 2. Multidisciplinary analysis and design. Imbedded in the probabilistic methodology is a new approach for multidisciplinary design analysis and optimization (MDA/O), employing disciplinary analysis approximations formed through statistical experimentation and regression. These approximation models are a function of design variables common to the system level as well as other disciplines. For aircraft, it is proposed that synthesis/sizing is the proper avenue for integrating multiple disciplines. Research hypotheses are translated into a structured method, which is subsequently tested for validity. Specifically, the implementation involves the study of the relaxed static stability technology for a supersonic commercial transport aircraft. The probabilistic robust design method is exercised resulting in a series of robust design solutions based on different interpretations of "robustness". Insightful results are obtained and the ability of the method to expose trends in the design space are noted as a key advantage.
A Robust Approach to Risk Assessment Based on Species Sensitivity Distributions.

PubMed

Monti, Gianna S; Filzmoser, Peter; Deutsch, Roland C

2018-05-03

The guidelines for setting environmental quality standards are increasingly based on probabilistic risk assessment due to a growing general awareness of the need for probabilistic procedures. One of the commonly used tools in probabilistic risk assessment is the species sensitivity distribution (SSD), which represents the proportion of species affected belonging to a biological assemblage as a function of exposure to a specific toxicant. Our focus is on the inverse use of the SSD curve with the aim of estimating the concentration, HCp, of a toxic compound that is hazardous to p% of the biological community under study. Toward this end, we propose the use of robust statistical methods in order to take into account the presence of outliers or apparent skew in the data, which may occur without any ecological basis. A robust approach exploits the full neighborhood of a parametric model, enabling the analyst to account for the typical real-world deviations from ideal models. We examine two classic HCp estimation approaches and consider robust versions of these estimators. In addition, we also use data transformations in conjunction with robust estimation methods in case of heteroscedasticity. Different scenarios using real data sets as well as simulated data are presented in order to illustrate and compare the proposed approaches. These scenarios illustrate that the use of robust estimation methods enhances HCp estimation. © 2018 Society for Risk Analysis.
FRAIL Questionnaire Screening Tool and Short-Term Outcomes in Geriatric Fracture Patients.

PubMed

Gleason, Lauren Jan; Benton, Emily A; Alvarez-Nebreda, M Loreto; Weaver, Michael J; Harris, Mitchel B; Javedan, Houman

2017-12-01

There are limited screening tools to predict adverse postoperative outcomes for the geriatric surgical fracture population. Frailty is increasingly recognized as a risk assessment to capture complexity. The goal of this study was to use a short screening tool, the FRAIL scale, to categorize the level of frailty of older adults admitted with a fracture to determine the association of each frailty category with postoperative and 30-day outcomes. Retrospective cohort study. Level 1 trauma center. A total of 175 consecutive patients over age 70 years admitted to co-managed orthopedic trauma and geriatrics services. The FRAIL scale (short 5-question assessment of fatigue, resistance, aerobic capacity, illnesses, and loss of weight) classified the patients into 3 categories: robust (score = 0), prefrail (score = 1-2), and frail (score = 3-5). Postoperative outcome variables collected were postoperative complications, unplanned intensive care unit admission, length of stay (LOS), discharge disposition, and orthopedic follow-up after surgery. Thirty-day outcomes measured were 30-day readmission and 30-day mortality. Analysis of variance (1-way) and Kruskal-Wallis tests were used to compare continuous variables across the 3 FRAIL categories. Fisher exact tests were used to compare categorical variables. Multiple regression analysis, adjusted by age, sex, and Charlson index, was conducted to study the association between frailty category and outcomes. FRAIL scale categorized the patients into 3 groups: robust (n = 29), prefrail (n = 73), and frail (n = 73). There were statistically significant differences between groups in terms of age, comorbidity, dementia, functional dependency, polypharmacy, and rate of institutionalization, being higher in the frailest patients. Hip fracture was the most frequent fracture, and it was more frequent as the frailty of the patient increased (48%, 61%, and 75% in robust, prefrail, and frail groups, respectively). The American Society of Anesthesiologists preoperative risk significantly correlated with the frailty of the patient (American Society of Anesthesiologists score 3-4: 41%, 82% and 86%, in robust, prefrail, and frail groups, P < .001). After adjustment by age, sex, and comorbidity, there was a statistically significant association between frailty and both LOS and the development of any complication after surgery (LOS: 4.2, 5.0, and 7.1 days, P = .002; any complication: 3.4%, 26%, and 39.7%, P = .03; in robust, prefrail, and frail groups). There were also significant differences in discharge disposition (31% of robust vs 4.1% frail, P = .008) and follow-up completion (97% of robust vs 69% of the frail ones). Differences in time to surgery, unplanned intensive care unit admission, and 30-day readmission and mortality, although showing a trend, did not reach statistical significance. Frailty, measured by the FRAIL scale, was associated with increase LOS, complications after surgery, and discharge to rehabilitation facility in geriatric fracture patients. The FRAIL scale is a promising short screen to stratify and help operationalize the perioperative care of older surgical patients. Copyright © 2017 AMDA – The Society for Post-Acute and Long-Term Care Medicine. Published by Elsevier Inc. All rights reserved.
Incorporation of support vector machines in the LIBS toolbox for sensitive and robust classification amidst unexpected sample and system variability

PubMed Central

ChariDingari, Narahara; Barman, Ishan; Myakalwar, Ashwin Kumar; Tewari, Surya P.; Kumar, G. Manoj

2012-01-01

Despite the intrinsic elemental analysis capability and lack of sample preparation requirements, laser-induced breakdown spectroscopy (LIBS) has not been extensively used for real world applications, e.g. quality assurance and process monitoring. Specifically, variability in sample, system and experimental parameters in LIBS studies present a substantive hurdle for robust classification, even when standard multivariate chemometric techniques are used for analysis. Considering pharmaceutical sample investigation as an example, we propose the use of support vector machines (SVM) as a non-linear classification method over conventional linear techniques such as soft independent modeling of class analogy (SIMCA) and partial least-squares discriminant analysis (PLS-DA) for discrimination based on LIBS measurements. Using over-the-counter pharmaceutical samples, we demonstrate that application of SVM enables statistically significant improvements in prospective classification accuracy (sensitivity), due to its ability to address variability in LIBS sample ablation and plasma self-absorption behavior. Furthermore, our results reveal that SVM provides nearly 10% improvement in correct allocation rate and a concomitant reduction in misclassification rates of 75% (cf. PLS-DA) and 80% (cf. SIMCA)-when measurements from samples not included in the training set are incorporated in the test data – highlighting its robustness. While further studies on a wider matrix of sample types performed using different LIBS systems is needed to fully characterize the capability of SVM to provide superior predictions, we anticipate that the improved sensitivity and robustness observed here will facilitate application of the proposed LIBS-SVM toolbox for screening drugs and detecting counterfeit samples as well as in related areas of forensic and biological sample analysis. PMID:22292496
Anthropometric geography applied to the analysis of socioeconomic disparities: cohort trends and spatial patterns of height and robustness in 20th-century Spain

PubMed Central

Camara, Antonio D.; Roman, Joan Garcia

2014-01-01

Anthropometrics have been widely used to study the influence of environmental factors on health and nutritional status. In contrast, anthropometric geography has not often been employed to approximate the dynamics of spatial disparities associated with socioeconomic and demographic changes. Spain exhibited intense disparity and change during the middle decades of the 20th century, with the result that the life courses of the corresponding cohorts were associated with diverse environmental conditions. This was also true of the Spanish territories. This paper presents insights concerning the relationship between socioeconomic changes and living conditions by combining the analysis of cohort trends and the anthropometric cartography of height and physical build. This analysis is conducted for Spanish male cohorts born 1934–1973 that were recorded in the Spanish military statistics. This information is interpreted in light of region-level data on GDP and infant mortality. Our results show an anthropometric convergence across regions that, nevertheless, did not substantially modify the spatial patterns of robustness, featuring primarily robust northeastern regions and weak Central-Southern regions. These patterns persisted until the 1990s (cohorts born during the 1970s). For the most part, anthropometric disparities were associated with socioeconomic disparities, although the former lessened over time to a greater extent than the latter. Interestingly, the various anthropometric indicators utilized here do not point to the same conclusions. Some discrepancies between height and robustness patterns have been found that moderate the statements from the analysis of cohort height alone regarding the level and evolution of living conditions across Spanish regions. PMID:26640422
Incorporation of support vector machines in the LIBS toolbox for sensitive and robust classification amidst unexpected sample and system variability.

PubMed

Dingari, Narahara Chari; Barman, Ishan; Myakalwar, Ashwin Kumar; Tewari, Surya P; Kumar Gundawar, Manoj

2012-03-20

Despite the intrinsic elemental analysis capability and lack of sample preparation requirements, laser-induced breakdown spectroscopy (LIBS) has not been extensively used for real-world applications, e.g., quality assurance and process monitoring. Specifically, variability in sample, system, and experimental parameters in LIBS studies present a substantive hurdle for robust classification, even when standard multivariate chemometric techniques are used for analysis. Considering pharmaceutical sample investigation as an example, we propose the use of support vector machines (SVM) as a nonlinear classification method over conventional linear techniques such as soft independent modeling of class analogy (SIMCA) and partial least-squares discriminant analysis (PLS-DA) for discrimination based on LIBS measurements. Using over-the-counter pharmaceutical samples, we demonstrate that the application of SVM enables statistically significant improvements in prospective classification accuracy (sensitivity), because of its ability to address variability in LIBS sample ablation and plasma self-absorption behavior. Furthermore, our results reveal that SVM provides nearly 10% improvement in correct allocation rate and a concomitant reduction in misclassification rates of 75% (cf. PLS-DA) and 80% (cf. SIMCA)-when measurements from samples not included in the training set are incorporated in the test data-highlighting its robustness. While further studies on a wider matrix of sample types performed using different LIBS systems is needed to fully characterize the capability of SVM to provide superior predictions, we anticipate that the improved sensitivity and robustness observed here will facilitate application of the proposed LIBS-SVM toolbox for screening drugs and detecting counterfeit samples, as well as in related areas of forensic and biological sample analysis.
New methods in hydrologic modeling and decision support for culvert flood risk under climate change

NASA Astrophysics Data System (ADS)

Rosner, A.; Letcher, B. H.; Vogel, R. M.; Rees, P. S.

2015-12-01

Assessing culvert flood vulnerability under climate change poses an unusual combination of challenges. We seek a robust method of planning for an uncertain future, and therefore must consider a wide range of plausible future conditions. Culverts in our case study area, northwestern Massachusetts, USA, are predominantly found in small, ungaged basins. The need to predict flows both at numerous sites and under numerous plausible climate conditions requires a statistical model with low data and computational requirements. We present a statistical streamflow model that is driven by precipitation and temperature, allowing us to predict flows without reliance on reference gages of observed flows. The hydrological analysis is used to determine each culvert's risk of failure under current conditions. We also explore the hydrological response to a range of plausible future climate conditions. These results are used to determine the tolerance of each culvert to future increases in precipitation. In a decision support context, current flood risk as well as tolerance to potential climate changes are used to provide a robust assessment and prioritization for culvert replacements.
Using high-resolution variant frequencies to empower clinical genome interpretation.

PubMed

Whiffin, Nicola; Minikel, Eric; Walsh, Roddy; O'Donnell-Luria, Anne H; Karczewski, Konrad; Ing, Alexander Y; Barton, Paul J R; Funke, Birgit; Cook, Stuart A; MacArthur, Daniel; Ware, James S

2017-10-01

PurposeWhole-exome and whole-genome sequencing have transformed the discovery of genetic variants that cause human Mendelian disease, but discriminating pathogenic from benign variants remains a daunting challenge. Rarity is recognized as a necessary, although not sufficient, criterion for pathogenicity, but frequency cutoffs used in Mendelian analysis are often arbitrary and overly lenient. Recent very large reference datasets, such as the Exome Aggregation Consortium (ExAC), provide an unprecedented opportunity to obtain robust frequency estimates even for very rare variants.MethodsWe present a statistical framework for the frequency-based filtering of candidate disease-causing variants, accounting for disease prevalence, genetic and allelic heterogeneity, inheritance mode, penetrance, and sampling variance in reference datasets.ResultsUsing the example of cardiomyopathy, we show that our approach reduces by two-thirds the number of candidate variants under consideration in the average exome, without removing true pathogenic variants (false-positive rate<0.001).ConclusionWe outline a statistically robust framework for assessing whether a variant is "too common" to be causative for a Mendelian disorder of interest. We present precomputed allele frequency cutoffs for all variants in the ExAC dataset.
A reliability study on brain activation during active and passive arm movements supported by an MRI-compatible robot.

PubMed

Estévez, Natalia; Yu, Ningbo; Brügger, Mike; Villiger, Michael; Hepp-Reymond, Marie-Claude; Riener, Robert; Kollias, Spyros

2014-11-01

In neurorehabilitation, longitudinal assessment of arm movement related brain function in patients with motor disability is challenging due to variability in task performance. MRI-compatible robots monitor and control task performance, yielding more reliable evaluation of brain function over time. The main goals of the present study were first to define the brain network activated while performing active and passive elbow movements with an MRI-compatible arm robot (MaRIA) in healthy subjects, and second to test the reproducibility of this activation over time. For the fMRI analysis two models were compared. In model 1 movement onset and duration were included, whereas in model 2 force and range of motion were added to the analysis. Reliability of brain activation was tested with several statistical approaches applied on individual and group activation maps and on summary statistics. The activated network included mainly the primary motor cortex, primary and secondary somatosensory cortex, superior and inferior parietal cortex, medial and lateral premotor regions, and subcortical structures. Reliability analyses revealed robust activation for active movements with both fMRI models and all the statistical methods used. Imposed passive movements also elicited mainly robust brain activation for individual and group activation maps, and reliability was improved by including additional force and range of motion using model 2. These findings demonstrate that the use of robotic devices, such as MaRIA, can be useful to reliably assess arm movement related brain activation in longitudinal studies and may contribute in studies evaluating therapies and brain plasticity following injury in the nervous system.
Sensitivity Analyses of the Change in FVC in a Phase 3 Trial of Pirfenidone for Idiopathic Pulmonary Fibrosis.

PubMed

Lederer, David J; Bradford, Williamson Z; Fagan, Elizabeth A; Glaspole, Ian; Glassberg, Marilyn K; Glasscock, Kenneth F; Kardatzke, David; King, Talmadge E; Lancaster, Lisa H; Nathan, Steven D; Pereira, Carlos A; Sahn, Steven A; Swigris, Jeffrey J; Noble, Paul W

2015-07-01

FVC outcomes in clinical trials on idiopathic pulmonary fibrosis (IPF) can be substantially influenced by the analytic methodology and the handling of missing data. We conducted a series of sensitivity analyses to assess the robustness of the statistical finding and the stability of the estimate of the magnitude of treatment effect on the primary end point of FVC change in a phase 3 trial evaluating pirfenidone in adults with IPF. Source data included all 555 study participants randomized to treatment with pirfenidone or placebo in the Assessment of Pirfenidone to Confirm Efficacy and Safety in Idiopathic Pulmonary Fibrosis (ASCEND) study. Sensitivity analyses were conducted to assess whether alternative statistical tests and methods for handling missing data influenced the observed magnitude of treatment effect on the primary end point of change from baseline to week 52 in FVC. The distribution of FVC change at week 52 was systematically different between the two treatment groups and favored pirfenidone in each analysis. The method used to impute missing data due to death had a marked effect on the magnitude of change in FVC in both treatment groups; however, the magnitude of treatment benefit was generally consistent on a relative basis, with an approximate 50% reduction in FVC decline observed in the pirfenidone group in each analysis. Our results confirm the robustness of the statistical finding on the primary end point of change in FVC in the ASCEND trial and corroborate the estimated magnitude of the pirfenidone treatment effect in patients with IPF. ClinicalTrials.gov; No.: NCT01366209; URL: www.clinicaltrials.gov.
Robust hierarchical state-space models reveal diel variation in travel rates of migrating leatherback turtles.

PubMed

Jonsen, Ian D; Myers, Ransom A; James, Michael C

2006-09-01

1. Biological and statistical complexity are features common to most ecological data that hinder our ability to extract meaningful patterns using conventional tools. Recent work on implementing modern statistical methods for analysis of such ecological data has focused primarily on population dynamics but other types of data, such as animal movement pathways obtained from satellite telemetry, can also benefit from the application of modern statistical tools. 2. We develop a robust hierarchical state-space approach for analysis of multiple satellite telemetry pathways obtained via the Argos system. State-space models are time-series methods that allow unobserved states and biological parameters to be estimated from data observed with error. We show that the approach can reveal important patterns in complex, noisy data where conventional methods cannot. 3. Using the largest Atlantic satellite telemetry data set for critically endangered leatherback turtles, we show that the diel pattern in travel rates of these turtles changes over different phases of their migratory cycle. While foraging in northern waters the turtles show similar travel rates during day and night, but on their southward migration to tropical waters travel rates are markedly faster during the day. These patterns are generally consistent with diving data, and may be related to changes in foraging behaviour. Interestingly, individuals that migrate southward to breed generally show higher daytime travel rates than individuals that migrate southward in a non-breeding year. 4. Our approach is extremely flexible and can be applied to many ecological analyses that use complex, sequential data.
Least median of squares and iteratively re-weighted least squares as robust linear regression methods for fluorimetric determination of α-lipoic acid in capsules in ideal and non-ideal cases of linearity.

PubMed

Korany, Mohamed A; Gazy, Azza A; Khamis, Essam F; Ragab, Marwa A A; Kamal, Miranda F

2018-06-01

This study outlines two robust regression approaches, namely least median of squares (LMS) and iteratively re-weighted least squares (IRLS) to investigate their application in instrument analysis of nutraceuticals (that is, fluorescence quenching of merbromin reagent upon lipoic acid addition). These robust regression methods were used to calculate calibration data from the fluorescence quenching reaction (∆F and F-ratio) under ideal or non-ideal linearity conditions. For each condition, data were treated using three regression fittings: Ordinary Least Squares (OLS), LMS and IRLS. Assessment of linearity, limits of detection (LOD) and quantitation (LOQ), accuracy and precision were carefully studied for each condition. LMS and IRLS regression line fittings showed significant improvement in correlation coefficients and all regression parameters for both methods and both conditions. In the ideal linearity condition, the intercept and slope changed insignificantly, but a dramatic change was observed for the non-ideal condition and linearity intercept. Under both linearity conditions, LOD and LOQ values after the robust regression line fitting of data were lower than those obtained before data treatment. The results obtained after statistical treatment indicated that the linearity ranges for drug determination could be expanded to lower limits of quantitation by enhancing the regression equation parameters after data treatment. Analysis results for lipoic acid in capsules, using both fluorimetric methods, treated by parametric OLS and after treatment by robust LMS and IRLS were compared for both linearity conditions. Copyright © 2018 John Wiley & Sons, Ltd.
On robust parameter estimation in brain-computer interfacing

NASA Astrophysics Data System (ADS)

Samek, Wojciech; Nakajima, Shinichi; Kawanabe, Motoaki; Müller, Klaus-Robert

2017-12-01

Objective. The reliable estimation of parameters such as mean or covariance matrix from noisy and high-dimensional observations is a prerequisite for successful application of signal processing and machine learning algorithms in brain-computer interfacing (BCI). This challenging task becomes significantly more difficult if the data set contains outliers, e.g. due to subject movements, eye blinks or loose electrodes, as they may heavily bias the estimation and the subsequent statistical analysis. Although various robust estimators have been developed to tackle the outlier problem, they ignore important structural information in the data and thus may not be optimal. Typical structural elements in BCI data are the trials consisting of a few hundred EEG samples and indicating the start and end of a task. Approach. This work discusses the parameter estimation problem in BCI and introduces a novel hierarchical view on robustness which naturally comprises different types of outlierness occurring in structured data. Furthermore, the class of minimum divergence estimators is reviewed and a robust mean and covariance estimator for structured data is derived and evaluated with simulations and on a benchmark data set. Main results. The results show that state-of-the-art BCI algorithms benefit from robustly estimated parameters. Significance. Since parameter estimation is an integral part of various machine learning algorithms, the presented techniques are applicable to many problems beyond BCI.
Comment on "Ducklings imprint on the relational concept of 'same or different'".

PubMed

Hupé, Jean-Michel

2017-02-24

Martinho and Kacelnik's (Reports, 15 July 2016, p. 286) finding that mallard ducklings can deal with abstract concepts is important for understanding the evolution of cognition. However, a statistically more robust analysis of the data calls their conclusions into question. This example brings to light the risk of drawing too strong an inference by relying solely on P values. Copyright © 2017, American Association for the Advancement of Science.
Alignments of parity even/odd-only multipoles in CMB

NASA Astrophysics Data System (ADS)

Aluri, Pavan K.; Ralston, John P.; Weltman, Amanda

2017-12-01

We compare the statistics of parity even and odd multipoles of the cosmic microwave background (CMB) sky from Planck full mission temperature measurements. An excess power in odd multipoles compared to even multipoles has previously been found on large angular scales. Motivated by this apparent parity asymmetry, we evaluate directional statistics associated with even compared to odd multipoles, along with their significances. Primary tools are the Power tensor and Alignment tensor statistics. We limit our analysis to the first 60 multipoles i.e. l = [2, 61]. We find no evidence for statistically unusual alignments of even parity multipoles. More than one independent statistic finds evidence for alignments of anisotropy axes of odd multipoles, with a significance equivalent to ∼2σ or more. The robustness of alignment axes is tested by making Galactic cuts and varying the multipole range. Very interestingly, the region spanned by the (a)symmetry axes is found to broadly contain other parity (a)symmetry axes previously observed in the literature.
Simulated performance of an order statistic threshold strategy for detection of narrowband signals

NASA Technical Reports Server (NTRS)

Satorius, E.; Brady, R.; Deich, W.; Gulkis, S.; Olsen, E.

1988-01-01

The application of order statistics to signal detection is becoming an increasingly active area of research. This is due to the inherent robustness of rank estimators in the presence of large outliers that would significantly degrade more conventional mean-level-based detection systems. A detection strategy is presented in which the threshold estimate is obtained using order statistics. The performance of this algorithm in the presence of simulated interference and broadband noise is evaluated. In this way, the robustness of the proposed strategy in the presence of the interference can be fully assessed as a function of the interference, noise, and detector parameters.
Employing Sensitivity Derivatives for Robust Optimization under Uncertainty in CFD

NASA Technical Reports Server (NTRS)

Newman, Perry A.; Putko, Michele M.; Taylor, Arthur C., III

2004-01-01

A robust optimization is demonstrated on a two-dimensional inviscid airfoil problem in subsonic flow. Given uncertainties in statistically independent, random, normally distributed flow parameters (input variables), an approximate first-order statistical moment method is employed to represent the Computational Fluid Dynamics (CFD) code outputs as expected values with variances. These output quantities are used to form the objective function and constraints. The constraints are cast in probabilistic terms; that is, the probability that a constraint is satisfied is greater than or equal to some desired target probability. Gradient-based robust optimization of this stochastic problem is accomplished through use of both first and second-order sensitivity derivatives. For each robust optimization, the effect of increasing both input standard deviations and target probability of constraint satisfaction are demonstrated. This method provides a means for incorporating uncertainty when considering small deviations from input mean values.
Ultra-trace analysis of 41Ca in urine by accelerator mass spectrometry: an inter-laboratory comparison

PubMed Central

Jackson, George S.; Hillegonds, Darren J.; Muzikar, Paul; Goehring, Brent

2013-01-01

A 41Ca interlaboratory comparison between Lawrence Livermore National Laboratory (LLNL) and the Purdue Rare Isotope Laboratory (PRIME Lab) has been completed. Analysis of the ratios assayed by accelerator mass spectrometry (AMS) shows that there is no statistically significant difference in the ratios. Further, Bayesian analysis shows that the uncertainties reported by both facilities are correct with the possibility of a slight under-estimation by one laboratory. Finally, the chemistry procedures used by the two facilities to produce CaF2 for the cesium sputter ion source are robust and don't yield any significant differences in the final result. PMID:24179312

Evaluation of peak-picking algorithms for protein mass spectrometry.

PubMed

Bauer, Chris; Cramer, Rainer; Schuchhardt, Johannes

2011-01-01

Peak picking is an early key step in MS data analysis. We compare three commonly used approaches to peak picking and discuss their merits by means of statistical analysis. Methods investigated encompass signal-to-noise ratio, continuous wavelet transform, and a correlation-based approach using a Gaussian template. Functionality of the three methods is illustrated and discussed in a practical context using a mass spectral data set created with MALDI-TOF technology. Sensitivity and specificity are investigated using a manually defined reference set of peaks. As an additional criterion, the robustness of the three methods is assessed by a perturbation analysis and illustrated using ROC curves.
Quick Overview Scout 2008 Version 1.0

EPA Science Inventory

The Scout 2008 version 1.0 statistical software package has been updated from past DOS and Windows versions to provide classical and robust univariate and multivariate graphical and statistical methods that are not typically available in commercial or freeware statistical softwar...
Analysis of the dependence of extreme rainfalls

NASA Astrophysics Data System (ADS)

Padoan, Simone; Ancey, Christophe; Parlange, Marc

2010-05-01

The aim of spatial analysis is to quantitatively describe the behavior of environmental phenomena such as precipitation levels, wind speed or daily temperatures. A number of generic approaches to spatial modeling have been developed[1], but these are not necessarily ideal for handling extremal aspects given their focus on mean process levels. The areal modelling of the extremes of a natural process observed at points in space is important in environmental statistics; for example, understanding extremal spatial rainfall is crucial in flood protection. In light of recent concerns over climate change, the use of robust mathematical and statistical methods for such analyses has grown in importance. Multivariate extreme value models and the class of maxstable processes [2] have a similar asymptotic motivation to the univariate Generalized Extreme Value (GEV) distribution , but providing a general approach to modeling extreme processes incorporating temporal or spatial dependence. Statistical methods for max-stable processes and data analyses of practical problems are discussed by [3] and [4]. This work illustrates methods to the statistical modelling of spatial extremes and gives examples of their use by means of a real extremal data analysis of Switzerland precipitation levels. [1] Cressie, N. A. C. (1993). Statistics for Spatial Data. Wiley, New York. [2] de Haan, L and Ferreria A. (2006). Extreme Value Theory An Introduction. Springer, USA. [3] Padoan, S. A., Ribatet, M and Sisson, S. A. (2009). Likelihood-Based Inference for Max-Stable Processes. Journal of the American Statistical Association, Theory & Methods. In press. [4] Davison, A. C. and Gholamrezaee, M. (2009), Geostatistics of extremes. Journal of the Royal Statistical Society, Series B. To appear.
A novel methodology for building robust design rules by using design based metrology (DBM)

NASA Astrophysics Data System (ADS)

Lee, Myeongdong; Choi, Seiryung; Choi, Jinwoo; Kim, Jeahyun; Sung, Hyunju; Yeo, Hyunyoung; Shim, Myoungseob; Jin, Gyoyoung; Chung, Eunseung; Roh, Yonghan

2013-03-01

This paper addresses a methodology for building robust design rules by using design based metrology (DBM). Conventional method for building design rules has been using a simulation tool and a simple pattern spider mask. At the early stage of the device, the estimation of simulation tool is poor. And the evaluation of the simple pattern spider mask is rather subjective because it depends on the experiential judgment of an engineer. In this work, we designed a huge number of pattern situations including various 1D and 2D design structures. In order to overcome the difficulties of inspecting many types of patterns, we introduced Design Based Metrology (DBM) of Nano Geometry Research, Inc. And those mass patterns could be inspected at a fast speed with DBM. We also carried out quantitative analysis on PWQ silicon data to estimate process variability. Our methodology demonstrates high speed and accuracy for building design rules. All of test patterns were inspected within a few hours. Mass silicon data were handled with not personal decision but statistical processing. From the results, robust design rules are successfully verified and extracted. Finally we found out that our methodology is appropriate for building robust design rules.
Statistical primer: how to deal with missing data in scientific research?

PubMed

Papageorgiou, Grigorios; Grant, Stuart W; Takkenberg, Johanna J M; Mokhles, Mostafa M

2018-05-10

Missing data are a common challenge encountered in research which can compromise the results of statistical inference when not handled appropriately. This paper aims to introduce basic concepts of missing data to a non-statistical audience, list and compare some of the most popular approaches for handling missing data in practice and provide guidelines and recommendations for dealing with and reporting missing data in scientific research. Complete case analysis and single imputation are simple approaches for handling missing data and are popular in practice, however, in most cases they are not guaranteed to provide valid inferences. Multiple imputation is a robust and general alternative which is appropriate for data missing at random, surpassing the disadvantages of the simpler approaches, but should always be conducted with care. The aforementioned approaches are illustrated and compared in an example application using Cox regression.
Robust tissue-air volume segmentation of MR images based on the statistics of phase and magnitude: Its applications in the display of susceptibility-weighted imaging of the brain.

PubMed

Du, Yiping P; Jin, Zhaoyang

2009-10-01

To develop a robust algorithm for tissue-air segmentation in magnetic resonance imaging (MRI) using the statistics of phase and magnitude of the images. A multivariate measure based on the statistics of phase and magnitude was constructed for tissue-air volume segmentation. The standard deviation of first-order phase difference and the standard deviation of magnitude were calculated in a 3 x 3 x 3 kernel in the image domain. To improve differentiation accuracy, the uniformity of phase distribution in the kernel was also calculated and linear background phase introduced by field inhomogeneity was corrected. The effectiveness of the proposed volume segmentation technique was compared to a conventional approach that uses the magnitude data alone. The proposed algorithm was shown to be more effective and robust in volume segmentation in both synthetic phantom and susceptibility-weighted images of human brain. Using our proposed volume segmentation method, veins in the peripheral regions of the brain were well depicted in the minimum-intensity projection of the susceptibility-weighted images. Using the additional statistics of phase, tissue-air volume segmentation can be substantially improved compared to that using the statistics of magnitude data alone. (c) 2009 Wiley-Liss, Inc.
Algorithm for computing descriptive statistics for very large data sets and the exa-scale era

NASA Astrophysics Data System (ADS)

Beekman, Izaak

2017-11-01

An algorithm for Single-point, Parallel, Online, Converging Statistics (SPOCS) is presented. It is suited for in situ analysis that traditionally would be relegated to post-processing, and can be used to monitor the statistical convergence and estimate the error/residual in the quantity-useful for uncertainty quantification too. Today, data may be generated at an overwhelming rate by numerical simulations and proliferating sensing apparatuses in experiments and engineering applications. Monitoring descriptive statistics in real time lets costly computations and experiments be gracefully aborted if an error has occurred, and monitoring the level of statistical convergence allows them to be run for the shortest amount of time required to obtain good results. This algorithm extends work by Pébay (Sandia Report SAND2008-6212). Pébay's algorithms are recast into a converging delta formulation, with provably favorable properties. The mean, variance, covariances and arbitrary higher order statistical moments are computed in one pass. The algorithm is tested using Sillero, Jiménez, & Moser's (2013, 2014) publicly available UPM high Reynolds number turbulent boundary layer data set, demonstrating numerical robustness, efficiency and other favorable properties.
MetaGenyo: a web tool for meta-analysis of genetic association studies.

PubMed

Martorell-Marugan, Jordi; Toro-Dominguez, Daniel; Alarcon-Riquelme, Marta E; Carmona-Saez, Pedro

2017-12-16

Genetic association studies (GAS) aims to evaluate the association between genetic variants and phenotypes. In the last few years, the number of this type of study has increased exponentially, but the results are not always reproducible due to experimental designs, low sample sizes and other methodological errors. In this field, meta-analysis techniques are becoming very popular tools to combine results across studies to increase statistical power and to resolve discrepancies in genetic association studies. A meta-analysis summarizes research findings, increases statistical power and enables the identification of genuine associations between genotypes and phenotypes. Meta-analysis techniques are increasingly used in GAS, but it is also increasing the amount of published meta-analysis containing different errors. Although there are several software packages that implement meta-analysis, none of them are specifically designed for genetic association studies and in most cases their use requires advanced programming or scripting expertise. We have developed MetaGenyo, a web tool for meta-analysis in GAS. MetaGenyo implements a complete and comprehensive workflow that can be executed in an easy-to-use environment without programming knowledge. MetaGenyo has been developed to guide users through the main steps of a GAS meta-analysis, covering Hardy-Weinberg test, statistical association for different genetic models, analysis of heterogeneity, testing for publication bias, subgroup analysis and robustness testing of the results. MetaGenyo is a useful tool to conduct comprehensive genetic association meta-analysis. The application is freely available at http://bioinfo.genyo.es/metagenyo/ .
Development of a universal measure of quadrupedal forelimb-hindlimb coordination using digital motion capture and computerised analysis.

PubMed

Hamilton, Lindsay; Franklin, Robin J M; Jeffery, Nick D

2007-09-18

Clinical spinal cord injury in domestic dogs provides a model population in which to test the efficacy of putative therapeutic interventions for human spinal cord injury. To achieve this potential a robust method of functional analysis is required so that statistical comparison of numerical data derived from treated and control animals can be achieved. In this study we describe the use of digital motion capture equipment combined with mathematical analysis to derive a simple quantitative parameter - 'the mean diagonal coupling interval' - to describe coordination between forelimb and hindlimb movement. In normal dogs this parameter is independent of size, conformation, speed of walking or gait pattern. We show here that mean diagonal coupling interval is highly sensitive to alterations in forelimb-hindlimb coordination in dogs that have suffered spinal cord injury, and can be accurately quantified, but is unaffected by orthopaedic perturbations of gait. Mean diagonal coupling interval is an easily derived, highly robust measurement that provides an ideal method to compare the functional effect of therapeutic interventions after spinal cord injury in quadrupeds.
Standard cell electrical and physical variability analysis based on automatic physical measurement for design-for-manufacturing purposes

NASA Astrophysics Data System (ADS)

Shauly, Eitan; Parag, Allon; Khmaisy, Hafez; Krispil, Uri; Adan, Ofer; Levi, Shimon; Latinski, Sergey; Schwarzband, Ishai; Rotstein, Israel

2011-04-01

A fully automated system for process variability analysis of high density standard cell was developed. The system consists of layout analysis with device mapping: device type, location, configuration and more. The mapping step was created by a simple DRC run-set. This database was then used as an input for choosing locations for SEM images and for specific layout parameter extraction, used by SPICE simulation. This method was used to analyze large arrays of standard cell blocks, manufactured using Tower TS013LV (Low Voltage for high-speed applications) Platforms. Variability of different physical parameters like and like Lgate, Line-width-roughness and more as well as of electrical parameters like drive current (Ion), off current (Ioff) were calculated and statistically analyzed, in order to understand the variability root cause. Comparison between transistors having the same W/L but with different layout configurations and different layout environments (around the transistor) was made in terms of performances as well as process variability. We successfully defined "robust" and "less-robust" transistors configurations, and updated guidelines for Design-for-Manufacturing (DfM).
Regional flux analysis for discovering and quantifying anatomical changes: An application to the brain morphometry in Alzheimer's disease.

PubMed

Lorenzi, M; Ayache, N; Pennec, X

2015-07-15

In this study we introduce the regional flux analysis, a novel approach to deformation based morphometry based on the Helmholtz decomposition of deformations parameterized by stationary velocity fields. We use the scalar pressure map associated to the irrotational component of the deformation to discover the critical regions of volume change. These regions are used to consistently quantify the associated measure of volume change by the probabilistic integration of the flux of the longitudinal deformations across the boundaries. The presented framework unifies voxel-based and regional approaches, and robustly describes the volume changes at both group-wise and subject-specific level as a spatial process governed by consistently defined regions. Our experiments on the large cohorts of the ADNI dataset show that the regional flux analysis is a powerful and flexible instrument for the study of Alzheimer's disease in a wide range of scenarios: cross-sectional deformation based morphometry, longitudinal discovery and quantification of group-wise volume changes, and statistically powered and robust quantification of hippocampal and ventricular atrophy. Copyright © 2015 Elsevier Inc. All rights reserved.
How Reliable is Bayesian Model Averaging Under Noisy Data? Statistical Assessment and Implications for Robust Model Selection

NASA Astrophysics Data System (ADS)

Schöniger, Anneli; Wöhling, Thomas; Nowak, Wolfgang

2014-05-01

Bayesian model averaging ranks the predictive capabilities of alternative conceptual models based on Bayes' theorem. The individual models are weighted with their posterior probability to be the best one in the considered set of models. Finally, their predictions are combined into a robust weighted average and the predictive uncertainty can be quantified. This rigorous procedure does, however, not yet account for possible instabilities due to measurement noise in the calibration data set. This is a major drawback, since posterior model weights may suffer a lack of robustness related to the uncertainty in noisy data, which may compromise the reliability of model ranking. We present a new statistical concept to account for measurement noise as source of uncertainty for the weights in Bayesian model averaging. Our suggested upgrade reflects the limited information content of data for the purpose of model selection. It allows us to assess the significance of the determined posterior model weights, the confidence in model selection, and the accuracy of the quantified predictive uncertainty. Our approach rests on a brute-force Monte Carlo framework. We determine the robustness of model weights against measurement noise by repeatedly perturbing the observed data with random realizations of measurement error. Then, we analyze the induced variability in posterior model weights and introduce this "weighting variance" as an additional term into the overall prediction uncertainty analysis scheme. We further determine the theoretical upper limit in performance of the model set which is imposed by measurement noise. As an extension to the merely relative model ranking, this analysis provides a measure of absolute model performance. To finally decide, whether better data or longer time series are needed to ensure a robust basis for model selection, we resample the measurement time series and assess the convergence of model weights for increasing time series length. We illustrate our suggested approach with an application to model selection between different soil-plant models following up on a study by Wöhling et al. (2013). Results show that measurement noise compromises the reliability of model ranking and causes a significant amount of weighting uncertainty, if the calibration data time series is not long enough to compensate for its noisiness. This additional contribution to the overall predictive uncertainty is neglected without our approach. Thus, we strongly advertise to include our suggested upgrade in the Bayesian model averaging routine.
Robust Combining of Disparate Classifiers Through Order Statistics

NASA Technical Reports Server (NTRS)

Tumer, Kagan; Ghosh, Joydeep

2001-01-01

Integrating the outputs of multiple classifiers via combiners or meta-learners has led to substantial improvements in several difficult pattern recognition problems. In this article we investigate a family of combiners based on order statistics, for robust handling of situations where there are large discrepancies in performance of individual classifiers. Based on a mathematical modeling of how the decision boundaries are affected by order statistic combiners, we derive expressions for the reductions in error expected when simple output combination methods based on the the median, the maximum and in general, the ith order statistic, are used. Furthermore, we analyze the trim and spread combiners, both based on linear combinations of the ordered classifier outputs, and show that in the presence of uneven classifier performance, they often provide substantial gains over both linear and simple order statistics combiners. Experimental results on both real world data and standard public domain data sets corroborate these findings.
Open Source Tools for Seismicity Analysis

NASA Astrophysics Data System (ADS)

Powers, P.

2010-12-01

The spatio-temporal analysis of seismicity plays an important role in earthquake forecasting and is integral to research on earthquake interactions and triggering. For instance, the third version of the Uniform California Earthquake Rupture Forecast (UCERF), currently under development, will use Epidemic Type Aftershock Sequences (ETAS) as a model for earthquake triggering. UCERF will be a "living" model and therefore requires robust, tested, and well-documented ETAS algorithms to ensure transparency and reproducibility. Likewise, as earthquake aftershock sequences unfold, real-time access to high quality hypocenter data makes it possible to monitor the temporal variability of statistical properties such as the parameters of the Omori Law and the Gutenberg Richter b-value. Such statistical properties are valuable as they provide a measure of how much a particular sequence deviates from expected behavior and can be used when assigning probabilities of aftershock occurrence. To address these demands and provide public access to standard methods employed in statistical seismology, we present well-documented, open-source JavaScript and Java software libraries for the on- and off-line analysis of seismicity. The Javascript classes facilitate web-based asynchronous access to earthquake catalog data and provide a framework for in-browser display, analysis, and manipulation of catalog statistics; implementations of this framework will be made available on the USGS Earthquake Hazards website. The Java classes, in addition to providing tools for seismicity analysis, provide tools for modeling seismicity and generating synthetic catalogs. These tools are extensible and will be released as part of the open-source OpenSHA Commons library.
Evolutionary computing for the design search and optimization of space vehicle power subsystems

NASA Technical Reports Server (NTRS)

Kordon, Mark; Klimeck, Gerhard; Hanks, David; Hua, Hook

2004-01-01

Evolutionary computing has proven to be a straightforward and robust approach for optimizing a wide range of difficult analysis and design problems. This paper discusses the application of these techniques to an existing space vehicle power subsystem resource and performance analysis simulation in a parallel processing environment. Out preliminary results demonstrate that this approach has the potential to improve the space system trade study process by allowing engineers to statistically weight subsystem goals of mass, cost and performance then automatically size power elements based on anticipated performance of the subsystem rather than on worst-case estimates.
A generalized association test based on U statistics.

PubMed

Wei, Changshuai; Lu, Qing

2017-07-01

Second generation sequencing technologies are being increasingly used for genetic association studies, where the main research interest is to identify sets of genetic variants that contribute to various phenotypes. The phenotype can be univariate disease status, multivariate responses and even high-dimensional outcomes. Considering the genotype and phenotype as two complex objects, this also poses a general statistical problem of testing association between complex objects. We here proposed a similarity-based test, generalized similarity U (GSU), that can test the association between complex objects. We first studied the theoretical properties of the test in a general setting and then focused on the application of the test to sequencing association studies. Based on theoretical analysis, we proposed to use Laplacian Kernel-based similarity for GSU to boost power and enhance robustness. Through simulation, we found that GSU did have advantages over existing methods in terms of power and robustness. We further performed a whole genome sequencing (WGS) scan for Alzherimer's disease neuroimaging initiative data, identifying three genes, APOE , APOC1 and TOMM40 , associated with imaging phenotype. We developed a C ++ package for analysis of WGS data using GSU. The source codes can be downloaded at https://github.com/changshuaiwei/gsu . weichangshuai@gmail.com ; qlu@epi.msu.edu. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
A robust bayesian estimate of the concordance correlation coefficient.

PubMed

Feng, Dai; Baumgartner, Richard; Svetnik, Vladimir

2015-01-01

A need for assessment of agreement arises in many situations including statistical biomarker qualification or assay or method validation. Concordance correlation coefficient (CCC) is one of the most popular scaled indices reported in evaluation of agreement. Robust methods for CCC estimation currently present an important statistical challenge. Here, we propose a novel Bayesian method of robust estimation of CCC based on multivariate Student's t-distribution and compare it with its alternatives. Furthermore, we extend the method to practically relevant settings, enabling incorporation of confounding covariates and replications. The superiority of the new approach is demonstrated using simulation as well as real datasets from biomarker application in electroencephalography (EEG). This biomarker is relevant in neuroscience for development of treatments for insomnia.
A Statistics-based Platform for Quantitative N-terminome Analysis and Identification of Protease Cleavage Products*

PubMed Central

auf dem Keller, Ulrich; Prudova, Anna; Gioia, Magda; Butler, Georgina S.; Overall, Christopher M.

2010-01-01

Terminal amine isotopic labeling of substrates (TAILS), our recently introduced platform for quantitative N-terminome analysis, enables wide dynamic range identification of original mature protein N-termini and protease cleavage products. Modifying TAILS by use of isobaric tag for relative and absolute quantification (iTRAQ)-like labels for quantification together with a robust statistical classifier derived from experimental protease cleavage data, we report reliable and statistically valid identification of proteolytic events in complex biological systems in MS2 mode. The statistical classifier is supported by a novel parameter evaluating ion intensity-dependent quantification confidences of single peptide quantifications, the quantification confidence factor (QCF). Furthermore, the isoform assignment score (IAS) is introduced, a new scoring system for the evaluation of single peptide-to-protein assignments based on high confidence protein identifications in the same sample prior to negative selection enrichment of N-terminal peptides. By these approaches, we identified and validated, in addition to known substrates, low abundance novel bioactive MMP-2 targets including the plasminogen receptor S100A10 (p11) and the proinflammatory cytokine proEMAP/p43 that were previously undescribed. PMID:20305283
Mapping differential interactomes by affinity purification coupled with data-independent mass spectrometry acquisition.

PubMed

Lambert, Jean-Philippe; Ivosev, Gordana; Couzens, Amber L; Larsen, Brett; Taipale, Mikko; Lin, Zhen-Yuan; Zhong, Quan; Lindquist, Susan; Vidal, Marc; Aebersold, Ruedi; Pawson, Tony; Bonner, Ron; Tate, Stephen; Gingras, Anne-Claude

2013-12-01

Characterizing changes in protein-protein interactions associated with sequence variants (e.g., disease-associated mutations or splice forms) or following exposure to drugs, growth factors or hormones is critical to understanding how protein complexes are built, localized and regulated. Affinity purification (AP) coupled with mass spectrometry permits the analysis of protein interactions under near-physiological conditions, yet monitoring interaction changes requires the development of a robust and sensitive quantitative approach, especially for large-scale studies in which cost and time are major considerations. We have coupled AP to data-independent mass spectrometric acquisition (sequential window acquisition of all theoretical spectra, SWATH) and implemented an automated data extraction and statistical analysis pipeline to score modulated interactions. We used AP-SWATH to characterize changes in protein-protein interactions imparted by the HSP90 inhibitor NVP-AUY922 or melanoma-associated mutations in the human kinase CDK4. We show that AP-SWATH is a robust label-free approach to characterize such changes and propose a scalable pipeline for systems biology studies.
A survey sampling approach for pesticide monitoring of community water systems using groundwater as a drinking water source.

PubMed

Whitmore, Roy W; Chen, Wenlin

2013-12-04

The ability to infer human exposure to substances from drinking water using monitoring data helps determine and/or refine potential risks associated with drinking water consumption. We describe a survey sampling approach and its application to an atrazine groundwater monitoring study to adequately characterize upper exposure centiles and associated confidence intervals with predetermined precision. Study design and data analysis included sampling frame definition, sample stratification, sample size determination, allocation to strata, analysis weights, and weighted population estimates. Sampling frame encompassed 15 840 groundwater community water systems (CWS) in 21 states throughout the U. S. Median, and 95th percentile atrazine concentrations were 0.0022 and 0.024 ppb, respectively, for all CWS. Statistical estimates agreed with historical monitoring results, suggesting that the study design was adequate and robust. This methodology makes no assumptions regarding the occurrence distribution (e.g., lognormality); thus analyses based on the design-induced distribution provide the most robust basis for making inferences from the sample to target population.

Fiction reading has a small positive impact on social cognition: A meta-analysis.

PubMed

Dodell-Feder, David; Tamir, Diana I

2018-02-26

Scholars from both the social sciences and the humanities have credited fiction reading with a range of positive real-world social effects. Research in psychology has suggested that readers may make good citizens because fiction reading is associated with better social cognition. But does fiction reading causally improve social cognition? Here, we meta-analyze extant published and unpublished experimental data to address this question. Multilevel random-effects meta-analysis of 53 effect sizes from 14 studies demonstrated that it does: compared to nonfiction reading and no reading, fiction reading leads to a small, statistically significant improvement in social-cognitive performance (g = .15-.16). This effect is robust across sensitivity analyses and does not appear to be the result of publication bias. We recommend that in future work, researchers use more robust reading manipulations, assess whether the effects transfer to improved real-world social functioning, and investigate mechanisms. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
THESEUS: maximum likelihood superpositioning and analysis of macromolecular structures

PubMed Central

Theobald, Douglas L.; Wuttke, Deborah S.

2008-01-01

Summary THESEUS is a command line program for performing maximum likelihood (ML) superpositions and analysis of macromolecular structures. While conventional superpositioning methods use ordinary least-squares (LS) as the optimization criterion, ML superpositions provide substantially improved accuracy by down-weighting variable structural regions and by correcting for correlations among atoms. ML superpositioning is robust and insensitive to the specific atoms included in the analysis, and thus it does not require subjective pruning of selected variable atomic coordinates. Output includes both likelihood-based and frequentist statistics for accurate evaluation of the adequacy of a superposition and for reliable analysis of structural similarities and differences. THESEUS performs principal components analysis for analyzing the complex correlations found among atoms within a structural ensemble. PMID:16777907
Complexity quantification of dense array EEG using sample entropy analysis.

PubMed

Ramanand, Pravitha; Nampoori, V P N; Sreenivasan, R

2004-09-01

In this paper, a time series complexity analysis of dense array electroencephalogram signals is carried out using the recently introduced Sample Entropy (SampEn) measure. This statistic quantifies the regularity in signals recorded from systems that can vary from the purely deterministic to purely stochastic realm. The present analysis is conducted with an objective of gaining insight into complexity variations related to changing brain dynamics for EEG recorded from the three cases of passive, eyes closed condition, a mental arithmetic task and the same mental task carried out after a physical exertion task. It is observed that the statistic is a robust quantifier of complexity suited for short physiological signals such as the EEG and it points to the specific brain regions that exhibit lowered complexity during the mental task state as compared to a passive, relaxed state. In the case of mental tasks carried out before and after the performance of a physical exercise, the statistic can detect the variations brought in by the intermediate fatigue inducing exercise period. This enhances its utility in detecting subtle changes in the brain state that can find wider scope for applications in EEG based brain studies.
Compositional data analysis for physical activity, sedentary time and sleep research.

PubMed

Dumuid, Dorothea; Stanford, Tyman E; Martin-Fernández, Josep-Antoni; Pedišić, Željko; Maher, Carol A; Lewis, Lucy K; Hron, Karel; Katzmarzyk, Peter T; Chaput, Jean-Philippe; Fogelholm, Mikael; Hu, Gang; Lambert, Estelle V; Maia, José; Sarmiento, Olga L; Standage, Martyn; Barreira, Tiago V; Broyles, Stephanie T; Tudor-Locke, Catrine; Tremblay, Mark S; Olds, Timothy

2017-01-01

The health effects of daily activity behaviours (physical activity, sedentary time and sleep) are widely studied. While previous research has largely examined activity behaviours in isolation, recent studies have adjusted for multiple behaviours. However, the inclusion of all activity behaviours in traditional multivariate analyses has not been possible due to the perfect multicollinearity of 24-h time budget data. The ensuing lack of adjustment for known effects on the outcome undermines the validity of study findings. We describe a statistical approach that enables the inclusion of all daily activity behaviours, based on the principles of compositional data analysis. Using data from the International Study of Childhood Obesity, Lifestyle and the Environment, we demonstrate the application of compositional multiple linear regression to estimate adiposity from children's daily activity behaviours expressed as isometric log-ratio coordinates. We present a novel method for predicting change in a continuous outcome based on relative changes within a composition, and for calculating associated confidence intervals to allow for statistical inference. The compositional data analysis presented overcomes the lack of adjustment that has plagued traditional statistical methods in the field, and provides robust and reliable insights into the health effects of daily activity behaviours.
On the Computation of the RMSEA and CFI from the Mean-And-Variance Corrected Test Statistic with Nonnormal Data in SEM.

PubMed

Savalei, Victoria

2018-01-01

A new type of nonnormality correction to the RMSEA has recently been developed, which has several advantages over existing corrections. In particular, the new correction adjusts the sample estimate of the RMSEA for the inflation due to nonnormality, while leaving its population value unchanged, so that established cutoff criteria can still be used to judge the degree of approximate fit. A confidence interval (CI) for the new robust RMSEA based on the mean-corrected ("Satorra-Bentler") test statistic has also been proposed. Follow up work has provided the same type of nonnormality correction for the CFI (Brosseau-Liard & Savalei, 2014). These developments have recently been implemented in lavaan. This note has three goals: a) to show how to compute the new robust RMSEA and CFI from the mean-and-variance corrected test statistic; b) to offer a new CI for the robust RMSEA based on the mean-and-variance corrected test statistic; and c) to caution that the logic of the new nonnormality corrections to RMSEA and CFI is most appropriate for the maximum likelihood (ML) estimator, and cannot easily be generalized to the most commonly used categorical data estimators.
Advanced Vibration Analysis Tool Developed for Robust Engine Rotor Designs

NASA Technical Reports Server (NTRS)

Min, James B.

2005-01-01

The primary objective of this research program is to develop vibration analysis tools, design tools, and design strategies to significantly improve the safety and robustness of turbine engine rotors. Bladed disks in turbine engines always feature small, random blade-to-blade differences, or mistuning. Mistuning can lead to a dramatic increase in blade forced-response amplitudes and stresses. Ultimately, this results in high-cycle fatigue, which is a major safety and cost concern. In this research program, the necessary steps will be taken to transform a state-of-the-art vibration analysis tool, the Turbo- Reduce forced-response prediction code, into an effective design tool by enhancing and extending the underlying modeling and analysis methods. Furthermore, novel techniques will be developed to assess the safety of a given design. In particular, a procedure will be established for using natural-frequency curve veerings to identify ranges of operating conditions (rotational speeds and engine orders) in which there is a great risk that the rotor blades will suffer high stresses. This work also will aid statistical studies of the forced response by reducing the necessary number of simulations. Finally, new strategies for improving the design of rotors will be pursued.
Nontargeted metabolomic analysis and "commercial-homophyletic" comparison-induced biomarkers verification for the systematic chemical differentiation of five different parts of Panax ginseng.

PubMed

Qiu, Shi; Yang, Wen-Zhi; Yao, Chang-Liang; Qiu, Zhi-Dong; Shi, Xiao-Jian; Zhang, Jing-Xian; Hou, Jin-Jun; Wang, Qiu-Rong; Wu, Wan-Ying; Guo, De-An

2016-07-01

A key segment in authentication of herbal medicines is the establishment of robust biomarkers that embody the intrinsic metabolites difference independent of the growing environment or processing technics. We present a strategy by nontargeted metabolomics and "Commercial-homophyletic" comparison-induced biomarkers verification with new bioinformatic vehicles, to improve the efficiency and reliability in authentication of herbal medicines. The chemical differentiation of five different parts (root, leaf, flower bud, berry, and seed) of Panax ginseng was illustrated as a case study. First, an optimized ultra-performance liquid chromatography/quadrupole time-of-flight-MS(E) (UPLC/QTOF-MS(E)) approach was established for global metabolites profiling. Second, UNIFI™ combined with search of an in-house library was employed to automatically characterize the metabolites. Third, pattern recognition multivariate statistical analysis of the MS(E) data of different parts of commercial and homophyletic samples were separately performed to explore potential biomarkers. Fourth, potential biomarkers deduced from commercial and homophyletic root and leaf samples were cross-compared to infer robust biomarkers. Fifth, discriminating models by artificial neutral network (ANN) were established to identify different parts of P. ginseng. Consequently, 164 compounds were characterized, and 11 robust biomarkers enabling the differentiation among root, leaf, flower bud, and berry, were discovered by removing those structurally unstable and possibly processing-related ones. The ANN models using the robust biomarkers managed to exactly discriminate four different parts and root adulterant with leaf as well. Conclusively, biomarkers verification using homophyletic samples conduces to the discovery of robust biomarkers. The integrated strategy facilitates authentication of herbal medicines in a more efficient and more intelligent manner. Copyright © 2016 Elsevier B.V. All rights reserved.
Time-variant random interval natural frequency analysis of structures

NASA Astrophysics Data System (ADS)

Wu, Binhua; Wu, Di; Gao, Wei; Song, Chongmin

2018-02-01

This paper presents a new robust method namely, unified interval Chebyshev-based random perturbation method, to tackle hybrid random interval structural natural frequency problem. In the proposed approach, random perturbation method is implemented to furnish the statistical features (i.e., mean and standard deviation) and Chebyshev surrogate model strategy is incorporated to formulate the statistical information of natural frequency with regards to the interval inputs. The comprehensive analysis framework combines the superiority of both methods in a way that computational cost is dramatically reduced. This presented method is thus capable of investigating the day-to-day based time-variant natural frequency of structures accurately and efficiently under concrete intrinsic creep effect with probabilistic and interval uncertain variables. The extreme bounds of the mean and standard deviation of natural frequency are captured through the embedded optimization strategy within the analysis procedure. Three particularly motivated numerical examples with progressive relationship in perspective of both structure type and uncertainty variables are demonstrated to justify the computational applicability, accuracy and efficiency of the proposed method.
Parasites as valuable stock markers for fisheries in Australasia, East Asia and the Pacific Islands.

PubMed

Lester, R J G; Moore, B R

2015-01-01

Over 30 studies in Australasia, East Asia and the Pacific Islands region have collected and analysed parasite data to determine the ranges of individual fish, many leading to conclusions about stock delineation. Parasites used as biological tags have included both those known to have long residence times in the fish and those thought to be relatively transient. In many cases the parasitological conclusions have been supported by other methods especially analysis of the chemical constituents of otoliths, and to a lesser extent, genetic data. In analysing parasite data, authors have applied multiple different statistical methodologies, including summary statistics, and univariate and multivariate approaches. Recently, a growing number of researchers have found non-parametric methods, such as analysis of similarities and cluster analysis, to be valuable. Future studies into the residence times, life cycles and geographical distributions of parasites together with more robust analytical methods will yield much important information to clarify stock structures in the area.
A global compilation of coral sea-level benchmarks: Implications and new challenges

NASA Astrophysics Data System (ADS)

Medina-Elizalde, Martín

2013-01-01

I present a quality-controlled compilation of sea-level data from U-Th dated corals, encompassing 30 studies of 13 locations around the world. The compilation contains relative sea level (RSL) data from each location based on both conventional and open-system U-Th ages. I have applied a commonly used age quality control criterion based on the initial 234U/238U activity ratios of corals in order to select reliable ages and to reconstruct sea level histories for the last 150,000 yr. This analysis reveals scatter of RSL estimates among coeval coral benchmarks both within individual locations and between locations, particularly during Marine Isotope Stage (MIS) 5a and the glacial inception following the last interglacial. The character of data scatter during these time intervals imply that uncertainties still exist regarding tectonics, glacio-isostacy, U-series dating, and/or coral position. To elucidate robust underlying patterns, with confidence limits, I performed a Monte Carlo-style statistical analysis of the compiled coral data considering appropriate age and sea-level uncertainties. By its nature, such an analysis has the tendency to smooth/obscure millennial-scale (and finer) details that may be important in individual datasets, and favour the major underlying patterns that are supported by all datasets. This statistical analysis is thus functional to illustrate major trends that are statistically robust ('what we know'), trends that are suggested but still are supported by few data ('what we might know, subject to addition of more supporting data and improved corrections'), and which patterns/data are clear outliers ('unlikely to be realistic given the rest of the global data and possibly needing further adjustments'). Prior to the last glacial maximum and with the possible exception of the 130-120 ka period, available coral data generally have insufficient temporal resolution and unexplained scatter, which hinders identification of a well-defined pattern with usefully narrow confidence limits. This analysis thus provides a framework that objectively identifies critical targets for new data collection, improved corrections, and integration of coral data with independent, stratigraphically continuous methods of sea-level reconstruction.
Sensitivity Analyses of the Change in FVC in a Phase 3 Trial of Pirfenidone for Idiopathic Pulmonary Fibrosis

PubMed Central

Bradford, Williamson Z.; Fagan, Elizabeth A.; Glaspole, Ian; Glassberg, Marilyn K.; Glasscock, Kenneth F.; King, Talmadge E.; Lancaster, Lisa H.; Nathan, Steven D.; Pereira, Carlos A.; Sahn, Steven A.; Swigris, Jeffrey J.; Noble, Paul W.

2015-01-01

BACKGROUND: FVC outcomes in clinical trials on idiopathic pulmonary fibrosis (IPF) can be substantially influenced by the analytic methodology and the handling of missing data. We conducted a series of sensitivity analyses to assess the robustness of the statistical finding and the stability of the estimate of the magnitude of treatment effect on the primary end point of FVC change in a phase 3 trial evaluating pirfenidone in adults with IPF. METHODS: Source data included all 555 study participants randomized to treatment with pirfenidone or placebo in the Assessment of Pirfenidone to Confirm Efficacy and Safety in Idiopathic Pulmonary Fibrosis (ASCEND) study. Sensitivity analyses were conducted to assess whether alternative statistical tests and methods for handling missing data influenced the observed magnitude of treatment effect on the primary end point of change from baseline to week 52 in FVC. RESULTS: The distribution of FVC change at week 52 was systematically different between the two treatment groups and favored pirfenidone in each analysis. The method used to impute missing data due to death had a marked effect on the magnitude of change in FVC in both treatment groups; however, the magnitude of treatment benefit was generally consistent on a relative basis, with an approximate 50% reduction in FVC decline observed in the pirfenidone group in each analysis. CONCLUSIONS: Our results confirm the robustness of the statistical finding on the primary end point of change in FVC in the ASCEND trial and corroborate the estimated magnitude of the pirfenidone treatment effect in patients with IPF. TRIAL REGISTRY: ClinicalTrials.gov; No.: NCT01366209; URL: www.clinicaltrials.gov PMID:25856121
Simulation-Based Probabilistic Tsunami Hazard Analysis: Empirical and Robust Hazard Predictions

NASA Astrophysics Data System (ADS)

De Risi, Raffaele; Goda, Katsuichiro

2017-08-01

Probabilistic tsunami hazard analysis (PTHA) is the prerequisite for rigorous risk assessment and thus for decision-making regarding risk mitigation strategies. This paper proposes a new simulation-based methodology for tsunami hazard assessment for a specific site of an engineering project along the coast, or, more broadly, for a wider tsunami-prone region. The methodology incorporates numerous uncertain parameters that are related to geophysical processes by adopting new scaling relationships for tsunamigenic seismic regions. Through the proposed methodology it is possible to obtain either a tsunami hazard curve for a single location, that is the representation of a tsunami intensity measure (such as inundation depth) versus its mean annual rate of occurrence, or tsunami hazard maps, representing the expected tsunami intensity measures within a geographical area, for a specific probability of occurrence in a given time window. In addition to the conventional tsunami hazard curve that is based on an empirical statistical representation of the simulation-based PTHA results, this study presents a robust tsunami hazard curve, which is based on a Bayesian fitting methodology. The robust approach allows a significant reduction of the number of simulations and, therefore, a reduction of the computational effort. Both methods produce a central estimate of the hazard as well as a confidence interval, facilitating the rigorous quantification of the hazard uncertainties.
Temporal assessment of radiomic features on clinical mammography in a high-risk population

NASA Astrophysics Data System (ADS)

Mendel, Kayla R.; Li, Hui; Lan, Li; Chan, Chun-Wai; King, Lauren M.; Tayob, Nabihah; Whitman, Gary; El-Zein, Randa; Bedrosian, Isabelle; Giger, Maryellen L.

2018-02-01

Extraction of high-dimensional quantitative data from medical images has become necessary in disease risk assessment, diagnostics and prognostics. Radiomic workflows for mammography typically involve a single medical image for each patient although medical images may exist for multiple imaging exams, especially in screening protocols. Our study takes advantage of the availability of mammograms acquired over multiple years for the prediction of cancer onset. This study included 841 images from 328 patients who developed subsequent mammographic abnormalities, which were confirmed as either cancer (n=173) or non-cancer (n=155) through diagnostic core needle biopsy. Quantitative radiomic analysis was conducted on antecedent FFDMs acquired a year or more prior to diagnostic biopsy. Analysis was limited to the breast contralateral to that in which the abnormality arose. Novel metrics were used to identify robust radiomic features. The most robust features were evaluated in the task of predicting future malignancies on a subset of 72 subjects (23 cancer cases and 49 non-cancer controls) with mammograms over multiple years. Using linear discriminant analysis, the robust radiomic features were merged into predictive signatures by: (i) using features from only the most recent contralateral mammogram, (ii) change in feature values between mammograms, and (iii) ratio of feature values over time, yielding AUCs of 0.57 (SE=0.07), 0.63 (SE=0.06), and 0.66 (SE=0.06), respectively. The AUCs for temporal radiomics (ratio) statistically differed from chance, suggesting that changes in radiomics over time may be critical for risk assessment. Overall, we found that our two-stage process of robustness assessment followed by performance evaluation served well in our investigation on the role of temporal radiomics in risk assessment.
Scout 2008 Version 1.0 User Guide

EPA Science Inventory

The Scout 2008 version 1.0 software package provides a wide variety of classical and robust statistical methods that are not typically available in other commercial software packages. A major part of Scout deals with classical, robust, and resistant univariate and multivariate ou...
Evaluation of a New Mean Scaled and Moment Adjusted Test Statistic for SEM

ERIC Educational Resources Information Center

Tong, Xiaoxiao; Bentler, Peter M.

2013-01-01

Recently a new mean scaled and skewness adjusted test statistic was developed for evaluating structural equation models in small samples and with potentially nonnormal data, but this statistic has received only limited evaluation. The performance of this statistic is compared to normal theory maximum likelihood and 2 well-known robust test…
Adaptive transmission disequilibrium test for family trio design.

PubMed

Yuan, Min; Tian, Xin; Zheng, Gang; Yang, Yaning

2009-01-01

The transmission disequilibrium test (TDT) is a standard method to detect association using family trio design. It is optimal for an additive genetic model. Other TDT-type tests optimal for recessive and dominant models have also been developed. Association tests using family data, including the TDT-type statistics, have been unified to a class of more comprehensive and flexable family-based association tests (FBAT). TDT-type tests have high efficiency when the genetic model is known or correctly specified, but may lose power if the model is mis-specified. Hence tests that are robust to genetic model mis-specification yet efficient are preferred. Constrained likelihood ratio test (CLRT) and MAX-type test have been shown to be efficiency robust. In this paper we propose a new efficiency robust procedure, referred to as adaptive TDT (aTDT). It uses the Hardy-Weinberg disequilibrium coefficient to identify the potential genetic model underlying the data and then applies the TDT-type test (or FBAT for general applications) corresponding to the selected model. Simulation demonstrates that aTDT is efficiency robust to model mis-specifications and generally outperforms the MAX test and CLRT in terms of power. We also show that aTDT has power close to, but much more robust, than the optimal TDT-type test based on a single genetic model. Applications to real and simulated data from Genetic Analysis Workshop (GAW) illustrate the use of our adaptive TDT.
Template protection and its implementation in 3D face recognition systems

NASA Astrophysics Data System (ADS)

Zhou, Xuebing

2007-04-01

As biometric recognition systems are widely applied in various application areas, security and privacy risks have recently attracted the attention of the biometric community. Template protection techniques prevent stored reference data from revealing private biometric information and enhance the security of biometrics systems against attacks such as identity theft and cross matching. This paper concentrates on a template protection algorithm that merges methods from cryptography, error correction coding and biometrics. The key component of the algorithm is to convert biometric templates into binary vectors. It is shown that the binary vectors should be robust, uniformly distributed, statistically independent and collision-free so that authentication performance can be optimized and information leakage can be avoided. Depending on statistical character of the biometric template, different approaches for transforming biometric templates into compact binary vectors are presented. The proposed methods are integrated into a 3D face recognition system and tested on the 3D facial images of the FRGC database. It is shown that the resulting binary vectors provide an authentication performance that is similar to the original 3D face templates. A high security level is achieved with reasonable false acceptance and false rejection rates of the system, based on an efficient statistical analysis. The algorithm estimates the statistical character of biometric templates from a number of biometric samples in the enrollment database. For the FRGC 3D face database, the small distinction of robustness and discriminative power between the classification results under the assumption of uniquely distributed templates and the ones under the assumption of Gaussian distributed templates is shown in our tests.
Secondary electrospray ionization-mass spectrometry and a novel statistical bioinformatic approach identifies a cancer-related profile in exhaled breath of breast cancer patients: a pilot study.

PubMed

Martinez-Lozano Sinues, Pablo; Landoni, Elena; Miceli, Rosalba; Dibari, Vincenza F; Dugo, Matteo; Agresti, Roberto; Tagliabue, Elda; Cristoni, Simone; Orlandi, Rosaria

2015-09-21

Breath analysis represents a new frontier in medical diagnosis and a powerful tool for cancer biomarker discovery due to the recent development of analytical platforms for the detection and identification of human exhaled volatile compounds. Statistical and bioinformatic tools may represent an effective complement to the technical and instrumental enhancements needed to fully exploit clinical applications of breath analysis. Our exploratory study in a cohort of 14 breast cancer patients and 11 healthy volunteers used secondary electrospray ionization-mass spectrometry (SESI-MS) to detect a cancer-related volatile profile. SESI-MS full-scan spectra were acquired in a range of 40-350 mass-to-charge ratio (m/z), converted to matrix data and analyzed using a procedure integrating data pre-processing for quality control, and a two-step class prediction based on machine-learning techniques, including a robust feature selection, and a classifier development with internal validation. MS spectra from exhaled breath showed an individual-specific breath profile and high reciprocal homogeneity among samples, with strong agreement among technical replicates, suggesting a robust responsiveness of SESI-MS. Supervised analysis of breath data identified a support vector machine (SVM) model including 8 features corresponding to m/z 106, 126, 147, 78, 148, 52, 128, 315 and able to discriminate exhaled breath from breast cancer patients from that of healthy individuals, with sensitivity and specificity above 0.9.Our data highlight the significance of SESI-MS as an analytical technique for clinical studies of breath analysis and provide evidence that our noninvasive strategy detects volatile signatures that may support existing technologies to diagnose breast cancer.
Avoid lost discoveries, because of violations of standard assumptions, by using modern robust statistical methods.

PubMed

Wilcox, Rand; Carlson, Mike; Azen, Stan; Clark, Florence

2013-03-01

Recently, there have been major advances in statistical techniques for assessing central tendency and measures of association. The practical utility of modern methods has been documented extensively in the statistics literature, but they remain underused and relatively unknown in clinical trials. Our objective was to address this issue. STUDY DESIGN AND PURPOSE: The first purpose was to review common problems associated with standard methodologies (low power, lack of control over type I errors, and incorrect assessments of the strength of the association). The second purpose was to summarize some modern methods that can be used to circumvent such problems. The third purpose was to illustrate the practical utility of modern robust methods using data from the Well Elderly 2 randomized controlled trial. In multiple instances, robust methods uncovered differences among groups and associations among variables that were not detected by classic techniques. In particular, the results demonstrated that details of the nature and strength of the association were sometimes overlooked when using ordinary least squares regression and Pearson correlation. Modern robust methods can make a practical difference in detecting and describing differences between groups and associations between variables. Such procedures should be applied more frequently when analyzing trial-based data. Copyright © 2013 Elsevier Inc. All rights reserved.
Robustness of movement models: can models bridge the gap between temporal scales of data sets and behavioural processes?

PubMed

Schlägel, Ulrike E; Lewis, Mark A

2016-12-01

Discrete-time random walks and their extensions are common tools for analyzing animal movement data. In these analyses, resolution of temporal discretization is a critical feature. Ideally, a model both mirrors the relevant temporal scale of the biological process of interest and matches the data sampling rate. Challenges arise when resolution of data is too coarse due to technological constraints, or when we wish to extrapolate results or compare results obtained from data with different resolutions. Drawing loosely on the concept of robustness in statistics, we propose a rigorous mathematical framework for studying movement models' robustness against changes in temporal resolution. In this framework, we define varying levels of robustness as formal model properties, focusing on random walk models with spatially-explicit component. With the new framework, we can investigate whether models can validly be applied to data across varying temporal resolutions and how we can account for these different resolutions in statistical inference results. We apply the new framework to movement-based resource selection models, demonstrating both analytical and numerical calculations, as well as a Monte Carlo simulation approach. While exact robustness is rare, the concept of approximate robustness provides a promising new direction for analyzing movement models.

Combining synthetic controls and interrupted time series analysis to improve causal inference in program evaluation.

PubMed

Linden, Ariel

2018-04-01

Interrupted time series analysis (ITSA) is an evaluation methodology in which a single treatment unit's outcome is studied over time and the intervention is expected to "interrupt" the level and/or trend of the outcome. The internal validity is strengthened considerably when the treated unit is contrasted with a comparable control group. In this paper, we introduce a robust evaluation framework that combines the synthetic controls method (SYNTH) to generate a comparable control group and ITSA regression to assess covariate balance and estimate treatment effects. We evaluate the effect of California's Proposition 99 for reducing cigarette sales, by comparing California to other states not exposed to smoking reduction initiatives. SYNTH is used to reweight nontreated units to make them comparable to the treated unit. These weights are then used in ITSA regression models to assess covariate balance and estimate treatment effects. Covariate balance was achieved for all but one covariate. While California experienced a significant decrease in the annual trend of cigarette sales after Proposition 99, there was no statistically significant treatment effect when compared to synthetic controls. The advantage of using this framework over regression alone is that it ensures that a comparable control group is generated. Additionally, it offers a common set of statistical measures familiar to investigators, the capability for assessing covariate balance, and enhancement of the evaluation with a comprehensive set of postestimation measures. Therefore, this robust framework should be considered as a primary approach for evaluating treatment effects in multiple group time series analysis. © 2018 John Wiley & Sons, Ltd.
Gene set analysis approaches for RNA-seq data: performance evaluation and application guideline

PubMed Central

Rahmatallah, Yasir; Emmert-Streib, Frank

2016-01-01

Transcriptome sequencing (RNA-seq) is gradually replacing microarrays for high-throughput studies of gene expression. The main challenge of analyzing microarray data is not in finding differentially expressed genes, but in gaining insights into the biological processes underlying phenotypic differences. To interpret experimental results from microarrays, gene set analysis (GSA) has become the method of choice, in particular because it incorporates pre-existing biological knowledge (in a form of functionally related gene sets) into the analysis. Here we provide a brief review of several statistically different GSA approaches (competitive and self-contained) that can be adapted from microarrays practice as well as those specifically designed for RNA-seq. We evaluate their performance (in terms of Type I error rate, power, robustness to the sample size and heterogeneity, as well as the sensitivity to different types of selection biases) on simulated and real RNA-seq data. Not surprisingly, the performance of various GSA approaches depends only on the statistical hypothesis they test and does not depend on whether the test was developed for microarrays or RNA-seq data. Interestingly, we found that competitive methods have lower power as well as robustness to the samples heterogeneity than self-contained methods, leading to poor results reproducibility. We also found that the power of unsupervised competitive methods depends on the balance between up- and down-regulated genes in tested gene sets. These properties of competitive methods have been overlooked before. Our evaluation provides a concise guideline for selecting GSA approaches, best performing under particular experimental settings in the context of RNA-seq. PMID:26342128
Engineering diverse changes in beta-turn propensities in the N-terminal beta-hairpin of ubiquitin reveals significant effects on stability and kinetics but a robust folding transition state.

PubMed

Simpson, Emma R; Meldrum, Jill K; Searle, Mark S

2006-04-04

Using the N-terminal 17-residue beta-hairpin of ubiquitin as a "host" for mutational studies, we have investigated the influence of the beta-turn sequence on protein stability and folding kinetics by replacing the native G-bulged turn (TLTGK) with more flexible analogues (TG3K and TG5K) and a series of four-residue type I' beta-turn sequences, commonly found in beta-hairpins. Although a statistical analysis of type I' turns demonstrates residue preferences at specific sites, the frequency of occurrence appears to only broadly correlate with experimentally determined protein stabilities. The subsequent engineering of context-dependent non-native tertiary contacts involving turn residues is shown to produce large changes in stability. Relatively few point mutations have been described that probe secondary structure formation in ubiquitin in a manner that is independent of tertiary contacts. To this end, we have used the more rigorous rate-equilibrium free energy relationship (Leffler analysis), rather than the two-point phi value analysis, to show for a family of engineered beta-turn mutants that stability (range of approximately 20 kJ/mol) and folding kinetics (190-fold variation in refolding rate) are linearly correlated (alpha(f) = 0.74 +/- 0.08). The data are consistent with a transition state that is robust with regard to a wide range of statistically favored and disfavored beta-turn mutations and implicate a loosely assembled beta-hairpin as a key template in transition state stabilization with the beta-turn playing a central role.
Completely automated modal analysis procedure based on the combination of different OMA methods

NASA Astrophysics Data System (ADS)

Ripamonti, Francesco; Bussini, Alberto; Resta, Ferruccio

2018-03-01

In this work a completely automated output-only Modal Analysis procedure is presented and all its benefits are listed. Based on the merging of different Operational Modal Analysis methods and a statistical approach, the identification process has been improved becoming more robust and giving as results only the real natural frequencies, damping ratios and mode shapes of the system. The effect of the temperature can be taken into account as well, leading to the creation of a better tool for automated Structural Health Monitoring. The algorithm has been developed and tested on a numerical model of a scaled three-story steel building present in the laboratories of Politecnico di Milano.
A Robust Alternative to the Normal Distribution.

DTIC Science & Technology

1982-07-07

for any Purpose of the United States Governuent DEPARTMENT OF STATISTICS t -, STANFORD UIVERSITY I STANFORD, CALIFORNIA A Robust Alternative to the...Stanford University Technical Report No. 3. [5] Bhattacharya, S. K. (1966). A Modified Bessel Function lodel in Life Testing. Metrika 10, 133-144
Notes on power of normality tests of error terms in regression models

DOE Office of Scientific and Technical Information (OSTI.GOV)

Střelec, Luboš

2015-03-10

Normality is one of the basic assumptions in applying statistical procedures. For example in linear regression most of the inferential procedures are based on the assumption of normality, i.e. the disturbance vector is assumed to be normally distributed. Failure to assess non-normality of the error terms may lead to incorrect results of usual statistical inference techniques such as t-test or F-test. Thus, error terms should be normally distributed in order to allow us to make exact inferences. As a consequence, normally distributed stochastic errors are necessary in order to make a not misleading inferences which explains a necessity and importancemore » of robust tests of normality. Therefore, the aim of this contribution is to discuss normality testing of error terms in regression models. In this contribution, we introduce the general RT class of robust tests for normality, and present and discuss the trade-off between power and robustness of selected classical and robust normality tests of error terms in regression models.« less
Robust Statistics: What They Are, and Why They Are So Important

ERIC Educational Resources Information Center

Corlu, Sencer M.

2009-01-01

The problem with "classical" statistics all invoking the mean is that these estimates are notoriously influenced by atypical scores (outliers), partly because the mean itself is differentially influenced by outliers. In theory, "modern" statistics may generate more replicable characterizations of data, because at least in some…
Robust Proton Pencil Beam Scanning Treatment Planning for Rectal Cancer Radiation Therapy

DOE Office of Scientific and Technical Information (OSTI.GOV)

Blanco Kiely, Janid Patricia, E-mail: jkiely@sas.upenn.edu; White, Benjamin M.

2016-05-01

Purpose: To investigate, in a treatment plan design and robustness study, whether proton pencil beam scanning (PBS) has the potential to offer advantages, relative to interfraction uncertainties, over photon volumetric modulated arc therapy (VMAT) in a locally advanced rectal cancer patient population. Methods and Materials: Ten patients received a planning CT scan, followed by an average of 4 weekly offline CT verification CT scans, which were rigidly co-registered to the planning CT. Clinical PBS plans were generated on the planning CT, using a single-field uniform-dose technique with single-posterior and parallel-opposed (LAT) fields geometries. The VMAT plans were generated on the planningmore » CT using 2 6-MV, 220° coplanar arcs. Clinical plans were forward-calculated on verification CTs to assess robustness relative to anatomic changes. Setup errors were assessed by forward-calculating clinical plans with a ±5-mm (left–right, anterior–posterior, superior–inferior) isocenter shift on the planning CT. Differences in clinical target volume and organ at risk dose–volume histogram (DHV) indicators between plans were tested for significance using an appropriate Wilcoxon test (P<.05). Results: Dosimetrically, PBS plans were statistically different from VMAT plans, showing greater organ at risk sparing. However, the bladder was statistically identical among LAT and VMAT plans. The clinical target volume coverage was statistically identical among all plans. The robustness test found that all DVH indicators for PBS and VMAT plans were robust, except the LAT's genitalia (V5, V35). The verification CT plans showed that all DVH indicators were robust. Conclusions: Pencil beam scanning plans were found to be as robust as VMAT plans relative to interfractional changes during treatment when posterior beam angles and appropriate range margins are used. Pencil beam scanning dosimetric gains in the bowel (V15, V20) over VMAT suggest that using PBS to treat rectal cancer may reduce radiation treatment–related toxicity.« less
A statistical framework for evaluating neural networks to predict recurrent events in breast cancer

NASA Astrophysics Data System (ADS)

Gorunescu, Florin; Gorunescu, Marina; El-Darzi, Elia; Gorunescu, Smaranda

2010-07-01

Breast cancer is the second leading cause of cancer deaths in women today. Sometimes, breast cancer can return after primary treatment. A medical diagnosis of recurrent cancer is often a more challenging task than the initial one. In this paper, we investigate the potential contribution of neural networks (NNs) to support health professionals in diagnosing such events. The NN algorithms are tested and applied to two different datasets. An extensive statistical analysis has been performed to verify our experiments. The results show that a simple network structure for both the multi-layer perceptron and radial basis function can produce equally good results, not all attributes are needed to train these algorithms and, finally, the classification performances of all algorithms are statistically robust. Moreover, we have shown that the best performing algorithm will strongly depend on the features of the datasets, and hence, there is not necessarily a single best classifier.
Neural Correlates of Morphology Acquisition through a Statistical Learning Paradigm.

PubMed

Sandoval, Michelle; Patterson, Dianne; Dai, Huanping; Vance, Christopher J; Plante, Elena

2017-01-01

The neural basis of statistical learning as it occurs over time was explored with stimuli drawn from a natural language (Russian nouns). The input reflected the "rules" for marking categories of gendered nouns, without making participants explicitly aware of the nature of what they were to learn. Participants were scanned while listening to a series of gender-marked nouns during four sequential scans, and were tested for their learning immediately after each scan. Although participants were not told the nature of the learning task, they exhibited learning after their initial exposure to the stimuli. Independent component analysis of the brain data revealed five task-related sub-networks. Unlike prior statistical learning studies of word segmentation, this morphological learning task robustly activated the inferior frontal gyrus during the learning period. This region was represented in multiple independent components, suggesting it functions as a network hub for this type of learning. Moreover, the results suggest that subnetworks activated by statistical learning are driven by the nature of the input, rather than reflecting a general statistical learning system.
Neural Correlates of Morphology Acquisition through a Statistical Learning Paradigm

PubMed Central

Sandoval, Michelle; Patterson, Dianne; Dai, Huanping; Vance, Christopher J.; Plante, Elena

2017-01-01

The neural basis of statistical learning as it occurs over time was explored with stimuli drawn from a natural language (Russian nouns). The input reflected the “rules” for marking categories of gendered nouns, without making participants explicitly aware of the nature of what they were to learn. Participants were scanned while listening to a series of gender-marked nouns during four sequential scans, and were tested for their learning immediately after each scan. Although participants were not told the nature of the learning task, they exhibited learning after their initial exposure to the stimuli. Independent component analysis of the brain data revealed five task-related sub-networks. Unlike prior statistical learning studies of word segmentation, this morphological learning task robustly activated the inferior frontal gyrus during the learning period. This region was represented in multiple independent components, suggesting it functions as a network hub for this type of learning. Moreover, the results suggest that subnetworks activated by statistical learning are driven by the nature of the input, rather than reflecting a general statistical learning system. PMID:28798703
QSAR study of curcumine derivatives as HIV-1 integrase inhibitors.

PubMed

Gupta, Pawan; Sharma, Anju; Garg, Prabha; Roy, Nilanjan

2013-03-01

A QSAR study was performed on curcumine derivatives as HIV-1 integrase inhibitors using multiple linear regression. The statistically significant model was developed with squared correlation coefficients (r(2)) 0.891 and cross validated r(2) (r(2) cv) 0.825. The developed model revealed that electronic, shape, size, geometry, substitution's information and hydrophilicity were important atomic properties for determining the inhibitory activity of these molecules. The model was also tested successfully for external validation (r(2) pred = 0.849) as well as Tropsha's test for model predictability. Furthermore, the domain analysis was carried out to evaluate the prediction reliability of external set molecules. The model was statistically robust and had good predictive power which can be successfully utilized for screening of new molecules.
Quality Assessments of Long-Term Quantitative Proteomic Analysis of Breast Cancer Xenograft Tissues

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhou, Jian-Ying; Chen, Lijun; Zhang, Bai

The identification of protein biomarkers requires large-scale analysis of human specimens to achieve statistical significance. In this study, we evaluated the long-term reproducibility of an iTRAQ (isobaric tags for relative and absolute quantification) based quantitative proteomics strategy using one channel for universal normalization across all samples. A total of 307 liquid chromatography tandem mass spectrometric (LC-MS/MS) analyses were completed, generating 107 one-dimensional (1D) LC-MS/MS datasets and 8 offline two-dimensional (2D) LC-MS/MS datasets (25 fractions for each set) for human-in-mouse breast cancer xenograft tissues representative of basal and luminal subtypes. Such large-scale studies require the implementation of robust metrics to assessmore » the contributions of technical and biological variability in the qualitative and quantitative data. Accordingly, we developed a quantification confidence score based on the quality of each peptide-spectrum match (PSM) to remove quantification outliers from each analysis. After combining confidence score filtering and statistical analysis, reproducible protein identification and quantitative results were achieved from LC-MS/MS datasets collected over a 16 month period.« less
Factor analysis in optimization of formulation of high content uniformity tablets containing low dose active substance.

PubMed

Lukášová, Ivana; Muselík, Jan; Franc, Aleš; Goněc, Roman; Mika, Filip; Vetchý, David

2017-11-15

Warfarin is intensively discussed drug with narrow therapeutic range. There have been cases of bleeding attributed to varying content or altered quality of the active substance. Factor analysis is useful for finding suitable technological parameters leading to high content uniformity of tablets containing low amount of active substance. The composition of tabletting blend and technological procedure were set with respect to factor analysis of previously published results. The correctness of set parameters was checked by manufacturing and evaluation of tablets containing 1-10mg of warfarin sodium. The robustness of suggested technology was checked by using "worst case scenario" and statistical evaluation of European Pharmacopoeia (EP) content uniformity limits with respect to Bergum division and process capability index (Cpk). To evaluate the quality of active substance and tablets, dissolution method was developed (water; EP apparatus II; 25rpm), allowing for statistical comparison of dissolution profiles. Obtained results prove the suitability of factor analysis to optimize the composition with respect to batches manufactured previously and thus the use of metaanalysis under industrial conditions is feasible. Copyright © 2017 Elsevier B.V. All rights reserved.
Robust detection-isolation-accommodation for sensor failures

NASA Technical Reports Server (NTRS)

Weiss, J. L.; Pattipati, K. R.; Willsky, A. S.; Eterno, J. S.; Crawford, J. T.

1985-01-01

The results of a one year study to: (1) develop a theory for Robust Failure Detection and Identification (FDI) in the presence of model uncertainty, (2) develop a design methodology which utilizes the robust FDI ththeory, (3) apply the methodology to a sensor FDI problem for the F-100 jet engine, and (4) demonstrate the application of the theory to the evaluation of alternative FDI schemes are presented. Theoretical results in statistical discrimination are used to evaluate the robustness of residual signals (or parity relations) in terms of their usefulness for FDI. Furthermore, optimally robust parity relations are derived through the optimization of robustness metrics. The result is viewed as decentralization of the FDI process. A general structure for decentralized FDI is proposed and robustness metrics are used for determining various parameters of the algorithm.
Marginal Structural Models with Counterfactual Effect Modifiers.

PubMed

Zheng, Wenjing; Luo, Zhehui; van der Laan, Mark J

2018-06-08

In health and social sciences, research questions often involve systematic assessment of the modification of treatment causal effect by patient characteristics. In longitudinal settings, time-varying or post-intervention effect modifiers are also of interest. In this work, we investigate the robust and efficient estimation of the Counterfactual-History-Adjusted Marginal Structural Model (van der Laan MJ, Petersen M. Statistical learning of origin-specific statically optimal individualized treatment rules. Int J Biostat. 2007;3), which models the conditional intervention-specific mean outcome given a counterfactual modifier history in an ideal experiment. We establish the semiparametric efficiency theory for these models, and present a substitution-based, semiparametric efficient and doubly robust estimator using the targeted maximum likelihood estimation methodology (TMLE, e.g. van der Laan MJ, Rubin DB. Targeted maximum likelihood learning. Int J Biostat. 2006;2, van der Laan MJ, Rose S. Targeted learning: causal inference for observational and experimental data, 1st ed. Springer Series in Statistics. Springer, 2011). To facilitate implementation in applications where the effect modifier is high dimensional, our third contribution is a projected influence function (and the corresponding projected TMLE estimator), which retains most of the robustness of its efficient peer and can be easily implemented in applications where the use of the efficient influence function becomes taxing. We compare the projected TMLE estimator with an Inverse Probability of Treatment Weighted estimator (e.g. Robins JM. Marginal structural models. In: Proceedings of the American Statistical Association. Section on Bayesian Statistical Science, 1-10. 1997a, Hernan MA, Brumback B, Robins JM. Marginal structural models to estimate the causal effect of zidovudine on the survival of HIV-positive men. 2000;11:561-570), and a non-targeted G-computation estimator (Robins JM. A new approach to causal inference in mortality studies with sustained exposure periods - application to control of the healthy worker survivor effect. Math Modell. 1986;7:1393-1512.). The comparative performance of these estimators is assessed in a simulation study. The use of the projected TMLE estimator is illustrated in a secondary data analysis for the Sequenced Treatment Alternatives to Relieve Depression (STAR*D) trial where effect modifiers are subject to missing at random.
Novel features and enhancements in BioBin, a tool for the biologically inspired binning and association analysis of rare variants

PubMed Central

Byrska-Bishop, Marta; Wallace, John; Frase, Alexander T; Ritchie, Marylyn D

2018-01-01

Abstract Motivation BioBin is an automated bioinformatics tool for the multi-level biological binning of sequence variants. Herein, we present a significant update to BioBin which expands the software to facilitate a comprehensive rare variant analysis and incorporates novel features and analysis enhancements. Results In BioBin 2.3, we extend our software tool by implementing statistical association testing, updating the binning algorithm, as well as incorporating novel analysis features providing for a robust, highly customizable, and unified rare variant analysis tool. Availability and implementation The BioBin software package is open source and freely available to users at http://www.ritchielab.com/software/biobin-download Contact mdritchie@geisinger.edu Supplementary information Supplementary data are available at Bioinformatics online. PMID:28968757
Manufacturing Execution Systems: Examples of Performance Indicator and Operational Robustness Tools.

PubMed

Gendre, Yannick; Waridel, Gérard; Guyon, Myrtille; Demuth, Jean-François; Guelpa, Hervé; Humbert, Thierry

Manufacturing Execution Systems (MES) are computerized systems used to measure production performance in terms of productivity, yield, and quality. In the first part, performance indicator and overall equipment effectiveness (OEE), process robustness tools and statistical process control are described. The second part details some tools to help process robustness and control by operators by preventing deviations from target control charts. MES was developed by Syngenta together with CIMO for automation.
Use of the Generating Options for Active Risk Control (GO-ARC) Technique can lead to more robust risk control options.

PubMed

Card, Alan J; Simsekler, Mecit Can Emre; Clark, Michael; Ward, James R; Clarkson, P John

2014-01-01

Risk assessment is widely used to improve patient safety, but healthcare workers are not trained to design robust solutions to the risks they uncover. This leads to an overreliance on the weakest category of risk control recommendations: administrative controls. Increasing the proportion of non-administrative risk control options (NARCOs) generated would enable (though not ensure) the adoption of more robust solutions. Experimentally assess a method for generating stronger risk controls: The Generating Options for Active Risk Control (GO-ARC) Technique. Participants generated risk control options in response to two patient safety scenarios. Scenario 1 (baseline): All participants used current practice (unstructured brainstorming). Scenario 2: Control group used current practice; intervention group used the GO-ARC Technique. To control for individual differences between participants, analysis focused on the change in the proportion of NARCOs for each group. Proportion of NARCOs decreased from 0.18 at baseline to 0.12. Intervention group: Proportion increased from 0.10 at baseline to 0.29 using the GO-ARC Technique. Results were statistically significant. There was no decrease in the number of administrative controls generated by the intervention group. The Generating Options for Active Risk Control (GO-ARC) Technique appears to lead to more robust risk control options.
Change Detection in Rough Time Series

DTIC Science & Technology

2014-09-01

Business Statistics : An Inferential Approach, Dellen: San Francisco. [18] Winston, W. (1997) Operations Research Applications and Algorithms, Duxbury...distribution that can present significant challenges to conventional statistical tracking techniques. To address this problem the proposed method...applies hybrid fuzzy statistical techniques to series granules instead of to individual measures. Three examples demonstrated the robust nature of the

Multivariate meta-analysis: a robust approach based on the theory of U-statistic.

PubMed

Ma, Yan; Mazumdar, Madhu

2011-10-30

Meta-analysis is the methodology for combining findings from similar research studies asking the same question. When the question of interest involves multiple outcomes, multivariate meta-analysis is used to synthesize the outcomes simultaneously taking into account the correlation between the outcomes. Likelihood-based approaches, in particular restricted maximum likelihood (REML) method, are commonly utilized in this context. REML assumes a multivariate normal distribution for the random-effects model. This assumption is difficult to verify, especially for meta-analysis with small number of component studies. The use of REML also requires iterative estimation between parameters, needing moderately high computation time, especially when the dimension of outcomes is large. A multivariate method of moments (MMM) is available and is shown to perform equally well to REML. However, there is a lack of information on the performance of these two methods when the true data distribution is far from normality. In this paper, we propose a new nonparametric and non-iterative method for multivariate meta-analysis on the basis of the theory of U-statistic and compare the properties of these three procedures under both normal and skewed data through simulation studies. It is shown that the effect on estimates from REML because of non-normal data distribution is marginal and that the estimates from MMM and U-statistic-based approaches are very similar. Therefore, we conclude that for performing multivariate meta-analysis, the U-statistic estimation procedure is a viable alternative to REML and MMM. Easy implementation of all three methods are illustrated by their application to data from two published meta-analysis from the fields of hip fracture and periodontal disease. We discuss ideas for future research based on U-statistic for testing significance of between-study heterogeneity and for extending the work to meta-regression setting. Copyright © 2011 John Wiley & Sons, Ltd.
A Complete Color Normalization Approach to Histopathology Images Using Color Cues Computed From Saturation-Weighted Statistics.

PubMed

Li, Xingyu; Plataniotis, Konstantinos N

2015-07-01

In digital histopathology, tasks of segmentation and disease diagnosis are achieved by quantitative analysis of image content. However, color variation in image samples makes it challenging to produce reliable results. This paper introduces a complete normalization scheme to address the problem of color variation in histopathology images jointly caused by inconsistent biopsy staining and nonstandard imaging condition. Method : Different from existing normalization methods that either address partial cause of color variation or lump them together, our method identifies causes of color variation based on a microscopic imaging model and addresses inconsistency in biopsy imaging and staining by an illuminant normalization module and a spectral normalization module, respectively. In evaluation, we use two public datasets that are representative of histopathology images commonly received in clinics to examine the proposed method from the aspects of robustness to system settings, performance consistency against achromatic pixels, and normalization effectiveness in terms of histological information preservation. As the saturation-weighted statistics proposed in this study generates stable and reliable color cues for stain normalization, our scheme is robust to system parameters and insensitive to image content and achromatic colors. Extensive experimentation suggests that our approach outperforms state-of-the-art normalization methods as the proposed method is the only approach that succeeds to preserve histological information after normalization. The proposed color normalization solution would be useful to mitigate effects of color variation in pathology images on subsequent quantitative analysis.
Flood risk assessment and robust management under deep uncertainty: Application to Dhaka City

NASA Astrophysics Data System (ADS)

Mojtahed, Vahid; Gain, Animesh Kumar; Giupponi, Carlo

2014-05-01

The socio-economic changes as well as climatic changes have been the main drivers of uncertainty in environmental risk assessment and in particular flood. The level of future uncertainty that researchers face when dealing with problems in a future perspective with focus on climate change is known as Deep Uncertainty (also known as Knightian uncertainty), since nobody has already experienced and undergone those changes before and our knowledge is limited to the extent that we have no notion of probabilities, and therefore consolidated risk management approaches have limited potential.. Deep uncertainty is referred to circumstances that analysts and experts do not know or parties to decision making cannot agree on: i) the appropriate models describing the interaction among system variables, ii) probability distributions to represent uncertainty about key parameters in the model 3) how to value the desirability of alternative outcomes. The need thus emerges to assist policy-makers by providing them with not a single and optimal solution to the problem at hand, such as crisp estimates for the costs of damages of natural hazards considered, but instead ranges of possible future costs, based on the outcomes of ensembles of assessment models and sets of plausible scenarios. Accordingly, we need to substitute optimality as a decision criterion with robustness. Under conditions of deep uncertainty, the decision-makers do not have statistical and mathematical bases to identify optimal solutions, while instead they should prefer to implement "robust" decisions that perform relatively well over all conceivable outcomes out of all future unknown scenarios. Under deep uncertainty, analysts cannot employ probability theory or other statistics that usually can be derived from observed historical data and therefore, we turn to non-statistical measures such as scenario analysis. We construct several plausible scenarios with each scenario being a full description of what may happen in future and based on a meaningful synthesis of parameters' values with control of their correlations for maintaining internal consistencies. This paper aims at incorporating a set of data mining and sampling tools to assess uncertainty of model outputs under future climatic and socio-economic changes for Dhaka city and providing a decision support system for robust flood management and mitigation policies. After constructing an uncertainty matrix to identify the main sources of uncertainty for Dhaka City, we identify several hazard and vulnerability maps based on future climatic and socio-economic scenarios. The vulnerability of each flood management alternative under different set of scenarios is determined and finally the robustness of each plausible solution considered is defined based on the above assessment.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Soffientini, Chiara Dolores, E-mail: chiaradolores.soffientini@polimi.it; Baselli, Giuseppe; De Bernardi, Elisabetta

Purpose: Quantitative {sup 18}F-fluorodeoxyglucose positron emission tomography is limited by the uncertainty in lesion delineation due to poor SNR, low resolution, and partial volume effects, subsequently impacting oncological assessment, treatment planning, and follow-up. The present work develops and validates a segmentation algorithm based on statistical clustering. The introduction of constraints based on background features and contiguity priors is expected to improve robustness vs clinical image characteristics such as lesion dimension, noise, and contrast level. Methods: An eight-class Gaussian mixture model (GMM) clustering algorithm was modified by constraining the mean and variance parameters of four background classes according to the previousmore » analysis of a lesion-free background volume of interest (background modeling). Hence, expectation maximization operated only on the four classes dedicated to lesion detection. To favor the segmentation of connected objects, a further variant was introduced by inserting priors relevant to the classification of neighbors. The algorithm was applied to simulated datasets and acquired phantom data. Feasibility and robustness toward initialization were assessed on a clinical dataset manually contoured by two expert clinicians. Comparisons were performed with respect to a standard eight-class GMM algorithm and to four different state-of-the-art methods in terms of volume error (VE), Dice index, classification error (CE), and Hausdorff distance (HD). Results: The proposed GMM segmentation with background modeling outperformed standard GMM and all the other tested methods. Medians of accuracy indexes were VE <3%, Dice >0.88, CE <0.25, and HD <1.2 in simulations; VE <23%, Dice >0.74, CE <0.43, and HD <1.77 in phantom data. Robustness toward image statistic changes (±15%) was shown by the low index changes: <26% for VE, <17% for Dice, and <15% for CE. Finally, robustness toward the user-dependent volume initialization was demonstrated. The inclusion of the spatial prior improved segmentation accuracy only for lesions surrounded by heterogeneous background: in the relevant simulation subset, the median VE significantly decreased from 13% to 7%. Results on clinical data were found in accordance with simulations, with absolute VE <7%, Dice >0.85, CE <0.30, and HD <0.81. Conclusions: The sole introduction of constraints based on background modeling outperformed standard GMM and the other tested algorithms. Insertion of a spatial prior improved the accuracy for realistic cases of objects in heterogeneous backgrounds. Moreover, robustness against initialization supports the applicability in a clinical setting. In conclusion, application-driven constraints can generally improve the capabilities of GMM and statistical clustering algorithms.« less
No-Reference Video Quality Assessment Based on Statistical Analysis in 3D-DCT Domain.

PubMed

Li, Xuelong; Guo, Qun; Lu, Xiaoqiang

2016-05-13

It is an important task to design models for universal no-reference video quality assessment (NR-VQA) in multiple video processing and computer vision applications. However, most existing NR-VQA metrics are designed for specific distortion types which are not often aware in practical applications. A further deficiency is that the spatial and temporal information of videos is hardly considered simultaneously. In this paper, we propose a new NR-VQA metric based on the spatiotemporal natural video statistics (NVS) in 3D discrete cosine transform (3D-DCT) domain. In the proposed method, a set of features are firstly extracted based on the statistical analysis of 3D-DCT coefficients to characterize the spatiotemporal statistics of videos in different views. These features are used to predict the perceived video quality via the efficient linear support vector regression (SVR) model afterwards. The contributions of this paper are: 1) we explore the spatiotemporal statistics of videos in 3DDCT domain which has the inherent spatiotemporal encoding advantage over other widely used 2D transformations; 2) we extract a small set of simple but effective statistical features for video visual quality prediction; 3) the proposed method is universal for multiple types of distortions and robust to different databases. The proposed method is tested on four widely used video databases. Extensive experimental results demonstrate that the proposed method is competitive with the state-of-art NR-VQA metrics and the top-performing FR-VQA and RR-VQA metrics.
The case for increasing the statistical power of eddy covariance ecosystem studies: why, where and how?

PubMed

Hill, Timothy; Chocholek, Melanie; Clement, Robert

2017-06-01

Eddy covariance (EC) continues to provide invaluable insights into the dynamics of Earth's surface processes. However, despite its many strengths, spatial replication of EC at the ecosystem scale is rare. High equipment costs are likely to be partially responsible. This contributes to the low sampling, and even lower replication, of ecoregions in Africa, Oceania (excluding Australia) and South America. The level of replication matters as it directly affects statistical power. While the ergodicity of turbulence and temporal replication allow an EC tower to provide statistically robust flux estimates for its footprint, these principles do not extend to larger ecosystem scales. Despite the challenge of spatially replicating EC, it is clearly of interest to be able to use EC to provide statistically robust flux estimates for larger areas. We ask: How much spatial replication of EC is required for statistical confidence in our flux estimates of an ecosystem? We provide the reader with tools to estimate the number of EC towers needed to achieve a given statistical power. We show that for a typical ecosystem, around four EC towers are needed to have 95% statistical confidence that the annual flux of an ecosystem is nonzero. Furthermore, if the true flux is small relative to instrument noise and spatial variability, the number of towers needed can rise dramatically. We discuss approaches for improving statistical power and describe one solution: an inexpensive EC system that could help by making spatial replication more affordable. However, we note that diverting limited resources from other key measurements in order to allow spatial replication may not be optimal, and a balance needs to be struck. While individual EC towers are well suited to providing fluxes from the flux footprint, we emphasize that spatial replication is essential for statistically robust fluxes if a wider ecosystem is being studied. © 2016 The Authors Global Change Biology Published by John Wiley & Sons Ltd.
An empirical likelihood ratio test robust to individual heterogeneity for differential expression analysis of RNA-seq.

PubMed

Xu, Maoqi; Chen, Liang

2018-01-01

The individual sample heterogeneity is one of the biggest obstacles in biomarker identification for complex diseases such as cancers. Current statistical models to identify differentially expressed genes between disease and control groups often overlook the substantial human sample heterogeneity. Meanwhile, traditional nonparametric tests lose detailed data information and sacrifice the analysis power, although they are distribution free and robust to heterogeneity. Here, we propose an empirical likelihood ratio test with a mean-variance relationship constraint (ELTSeq) for the differential expression analysis of RNA sequencing (RNA-seq). As a distribution-free nonparametric model, ELTSeq handles individual heterogeneity by estimating an empirical probability for each observation without making any assumption about read-count distribution. It also incorporates a constraint for the read-count overdispersion, which is widely observed in RNA-seq data. ELTSeq demonstrates a significant improvement over existing methods such as edgeR, DESeq, t-tests, Wilcoxon tests and the classic empirical likelihood-ratio test when handling heterogeneous groups. It will significantly advance the transcriptomics studies of cancers and other complex disease. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Two-dimensional statistical linear discriminant analysis for real-time robust vehicle-type recognition

NASA Astrophysics Data System (ADS)

Zafar, I.; Edirisinghe, E. A.; Acar, S.; Bez, H. E.

2007-02-01

Automatic vehicle Make and Model Recognition (MMR) systems provide useful performance enhancements to vehicle recognitions systems that are solely based on Automatic License Plate Recognition (ALPR) systems. Several car MMR systems have been proposed in literature. However these approaches are based on feature detection algorithms that can perform sub-optimally under adverse lighting and/or occlusion conditions. In this paper we propose a real time, appearance based, car MMR approach using Two Dimensional Linear Discriminant Analysis that is capable of addressing this limitation. We provide experimental results to analyse the proposed algorithm's robustness under varying illumination and occlusions conditions. We have shown that the best performance with the proposed 2D-LDA based car MMR approach is obtained when the eigenvectors of lower significance are ignored. For the given database of 200 car images of 25 different make-model classifications, a best accuracy of 91% was obtained with the 2D-LDA approach. We use a direct Principle Component Analysis (PCA) based approach as a benchmark to compare and contrast the performance of the proposed 2D-LDA approach to car MMR. We conclude that in general the 2D-LDA based algorithm supersedes the performance of the PCA based approach.
Implementation of statistical process control for proteomic experiments via LC MS/MS.

PubMed

Bereman, Michael S; Johnson, Richard; Bollinger, James; Boss, Yuval; Shulman, Nick; MacLean, Brendan; Hoofnagle, Andrew N; MacCoss, Michael J

2014-04-01

Statistical process control (SPC) is a robust set of tools that aids in the visualization, detection, and identification of assignable causes of variation in any process that creates products, services, or information. A tool has been developed termed Statistical Process Control in Proteomics (SProCoP) which implements aspects of SPC (e.g., control charts and Pareto analysis) into the Skyline proteomics software. It monitors five quality control metrics in a shotgun or targeted proteomic workflow. None of these metrics require peptide identification. The source code, written in the R statistical language, runs directly from the Skyline interface, which supports the use of raw data files from several of the mass spectrometry vendors. It provides real time evaluation of the chromatographic performance (e.g., retention time reproducibility, peak asymmetry, and resolution), and mass spectrometric performance (targeted peptide ion intensity and mass measurement accuracy for high resolving power instruments) via control charts. Thresholds are experiment- and instrument-specific and are determined empirically from user-defined quality control standards that enable the separation of random noise and systematic error. Finally, Pareto analysis provides a summary of performance metrics and guides the user to metrics with high variance. The utility of these charts to evaluate proteomic experiments is illustrated in two case studies.
Gene coexpression measures in large heterogeneous samples using count statistics.

PubMed

Wang, Y X Rachel; Waterman, Michael S; Huang, Haiyan

2014-11-18

With the advent of high-throughput technologies making large-scale gene expression data readily available, developing appropriate computational tools to process these data and distill insights into systems biology has been an important part of the "big data" challenge. Gene coexpression is one of the earliest techniques developed that is still widely in use for functional annotation, pathway analysis, and, most importantly, the reconstruction of gene regulatory networks, based on gene expression data. However, most coexpression measures do not specifically account for local features in expression profiles. For example, it is very likely that the patterns of gene association may change or only exist in a subset of the samples, especially when the samples are pooled from a range of experiments. We propose two new gene coexpression statistics based on counting local patterns of gene expression ranks to take into account the potentially diverse nature of gene interactions. In particular, one of our statistics is designed for time-course data with local dependence structures, such as time series coupled over a subregion of the time domain. We provide asymptotic analysis of their distributions and power, and evaluate their performance against a wide range of existing coexpression measures on simulated and real data. Our new statistics are fast to compute, robust against outliers, and show comparable and often better general performance.
Selection vector filter framework

NASA Astrophysics Data System (ADS)

Lukac, Rastislav; Plataniotis, Konstantinos N.; Smolka, Bogdan; Venetsanopoulos, Anastasios N.

2003-10-01

We provide a unified framework of nonlinear vector techniques outputting the lowest ranked vector. The proposed framework constitutes a generalized filter class for multichannel signal processing. A new class of nonlinear selection filters are based on the robust order-statistic theory and the minimization of the weighted distance function to other input samples. The proposed method can be designed to perform a variety of filtering operations including previously developed filtering techniques such as vector median, basic vector directional filter, directional distance filter, weighted vector median filters and weighted directional filters. A wide range of filtering operations is guaranteed by the filter structure with two independent weight vectors for angular and distance domains of the vector space. In order to adapt the filter parameters to varying signal and noise statistics, we provide also the generalized optimization algorithms taking the advantage of the weighted median filters and the relationship between standard median filter and vector median filter. Thus, we can deal with both statistical and deterministic aspects of the filter design process. It will be shown that the proposed method holds the required properties such as the capability of modelling the underlying system in the application at hand, the robustness with respect to errors in the model of underlying system, the availability of the training procedure and finally, the simplicity of filter representation, analysis, design and implementation. Simulation studies also indicate that the new filters are computationally attractive and have excellent performance in environments corrupted by bit errors and impulsive noise.
Optimizing ELISAs for precision and robustness using laboratory automation and statistical design of experiments.

PubMed

Joelsson, Daniel; Moravec, Phil; Troutman, Matthew; Pigeon, Joseph; DePhillips, Pete

2008-08-20

Transferring manual ELISAs to automated platforms requires optimizing the assays for each particular robotic platform. These optimization experiments are often time consuming and difficult to perform using a traditional one-factor-at-a-time strategy. In this manuscript we describe the development of an automated process using statistical design of experiments (DOE) to quickly optimize immunoassays for precision and robustness on the Tecan EVO liquid handler. By using fractional factorials and a split-plot design, five incubation time variables and four reagent concentration variables can be optimized in a short period of time.
Status of selected ion flow tube MS: accomplishments and challenges in breath analysis and other areas.

PubMed

Smith, David; Španěl, Patrik

2016-06-01

This article reflects our observations of recent accomplishments made using selected ion flow tube MS (SIFT-MS). Only brief descriptions are given of SIFT-MS as an analytical method and of the recent extensions to the underpinning analytical ion chemistry required to realize more robust analyses. The challenge of breath analysis is given special attention because, when achieved, it renders analysis of other air media relatively straightforward. Brief overviews are given of recent SIFT-MS breath analyses by leading research groups, noting the desirability of detection and quantification of single volatile biomarkers rather than reliance on statistical analyses, if breath analysis is to be accepted into clinical practice. A 'strengths, weaknesses, opportunities and threats' analysis of SIFT-MS is made, which should help to increase its utility for trace gas analysis.
A new feedback image encryption scheme based on perturbation with dynamical compound chaotic sequence cipher generator

NASA Astrophysics Data System (ADS)

Tong, Xiaojun; Cui, Minggen; Wang, Zhu

2009-07-01

The design of the new compound two-dimensional chaotic function is presented by exploiting two one-dimensional chaotic functions which switch randomly, and the design is used as a chaotic sequence generator which is proved by Devaney's definition proof of chaos. The properties of compound chaotic functions are also proved rigorously. In order to improve the robustness against difference cryptanalysis and produce avalanche effect, a new feedback image encryption scheme is proposed using the new compound chaos by selecting one of the two one-dimensional chaotic functions randomly and a new image pixels method of permutation and substitution is designed in detail by array row and column random controlling based on the compound chaos. The results from entropy analysis, difference analysis, statistical analysis, sequence randomness analysis, cipher sensitivity analysis depending on key and plaintext have proven that the compound chaotic sequence cipher can resist cryptanalytic, statistical and brute-force attacks, and especially it accelerates encryption speed, and achieves higher level of security. By the dynamical compound chaos and perturbation technology, the paper solves the problem of computer low precision of one-dimensional chaotic function.
An empirical comparison of methods for analyzing correlated data from a discrete choice survey to elicit patient preference for colorectal cancer screening

PubMed Central

2012-01-01

Background A discrete choice experiment (DCE) is a preference survey which asks participants to make a choice among product portfolios comparing the key product characteristics by performing several choice tasks. Analyzing DCE data needs to account for within-participant correlation because choices from the same participant are likely to be similar. In this study, we empirically compared some commonly-used statistical methods for analyzing DCE data while accounting for within-participant correlation based on a survey of patient preference for colorectal cancer (CRC) screening tests conducted in Hamilton, Ontario, Canada in 2002. Methods A two-stage DCE design was used to investigate the impact of six attributes on participants' preferences for CRC screening test and willingness to undertake the test. We compared six models for clustered binary outcomes (logistic and probit regressions using cluster-robust standard error (SE), random-effects and generalized estimating equation approaches) and three models for clustered nominal outcomes (multinomial logistic and probit regressions with cluster-robust SE and random-effects multinomial logistic model). We also fitted a bivariate probit model with cluster-robust SE treating the choices from two stages as two correlated binary outcomes. The rank of relative importance between attributes and the estimates of β coefficient within attributes were used to assess the model robustness. Results In total 468 participants with each completing 10 choices were analyzed. Similar results were reported for the rank of relative importance and β coefficients across models for stage-one data on evaluating participants' preferences for the test. The six attributes ranked from high to low as follows: cost, specificity, process, sensitivity, preparation and pain. However, the results differed across models for stage-two data on evaluating participants' willingness to undertake the tests. Little within-patient correlation (ICC ≈ 0) was found in stage-one data, but substantial within-patient correlation existed (ICC = 0.659) in stage-two data. Conclusions When small clustering effect presented in DCE data, results remained robust across statistical models. However, results varied when larger clustering effect presented. Therefore, it is important to assess the robustness of the estimates via sensitivity analysis using different models for analyzing clustered data from DCE studies. PMID:22348526
A New Color Image Encryption Scheme Using CML and a Fractional-Order Chaotic System

PubMed Central

Wu, Xiangjun; Li, Yang; Kurths, Jürgen

2015-01-01

The chaos-based image cryptosystems have been widely investigated in recent years to provide real-time encryption and transmission. In this paper, a novel color image encryption algorithm by using coupled-map lattices (CML) and a fractional-order chaotic system is proposed to enhance the security and robustness of the encryption algorithms with a permutation-diffusion structure. To make the encryption procedure more confusing and complex, an image division-shuffling process is put forward, where the plain-image is first divided into four sub-images, and then the position of the pixels in the whole image is shuffled. In order to generate initial conditions and parameters of two chaotic systems, a 280-bit long external secret key is employed. The key space analysis, various statistical analysis, information entropy analysis, differential analysis and key sensitivity analysis are introduced to test the security of the new image encryption algorithm. The cryptosystem speed is analyzed and tested as well. Experimental results confirm that, in comparison to other image encryption schemes, the new algorithm has higher security and is fast for practical image encryption. Moreover, an extensive tolerance analysis of some common image processing operations such as noise adding, cropping, JPEG compression, rotation, brightening and darkening, has been performed on the proposed image encryption technique. Corresponding results reveal that the proposed image encryption method has good robustness against some image processing operations and geometric attacks. PMID:25826602
A wavelet-based statistical analysis of FMRI data: I. motivation and data distribution modeling.

PubMed

Dinov, Ivo D; Boscardin, John W; Mega, Michael S; Sowell, Elizabeth L; Toga, Arthur W

2005-01-01

We propose a new method for statistical analysis of functional magnetic resonance imaging (fMRI) data. The discrete wavelet transformation is employed as a tool for efficient and robust signal representation. We use structural magnetic resonance imaging (MRI) and fMRI to empirically estimate the distribution of the wavelet coefficients of the data both across individuals and spatial locations. An anatomical subvolume probabilistic atlas is used to tessellate the structural and functional signals into smaller regions each of which is processed separately. A frequency-adaptive wavelet shrinkage scheme is employed to obtain essentially optimal estimations of the signals in the wavelet space. The empirical distributions of the signals on all the regions are computed in a compressed wavelet space. These are modeled by heavy-tail distributions because their histograms exhibit slower tail decay than the Gaussian. We discovered that the Cauchy, Bessel K Forms, and Pareto distributions provide the most accurate asymptotic models for the distribution of the wavelet coefficients of the data. Finally, we propose a new model for statistical analysis of functional MRI data using this atlas-based wavelet space representation. In the second part of our investigation, we will apply this technique to analyze a large fMRI dataset involving repeated presentation of sensory-motor response stimuli in young, elderly, and demented subjects.
Model based inference from microvascular measurements: Combining experimental measurements and model predictions using a Bayesian probabilistic approach

PubMed Central

Rasmussen, Peter M.; Smith, Amy F.; Sakadžić, Sava; Boas, David A.; Pries, Axel R.; Secomb, Timothy W.; Østergaard, Leif

2017-01-01

Objective In vivo imaging of the microcirculation and network-oriented modeling have emerged as powerful means of studying microvascular function and understanding its physiological significance. Network-oriented modeling may provide the means of summarizing vast amounts of data produced by high-throughput imaging techniques in terms of key, physiological indices. To estimate such indices with sufficient certainty, however, network-oriented analysis must be robust to the inevitable presence of uncertainty due to measurement errors as well as model errors. Methods We propose the Bayesian probabilistic data analysis framework as a means of integrating experimental measurements and network model simulations into a combined and statistically coherent analysis. The framework naturally handles noisy measurements and provides posterior distributions of model parameters as well as physiological indices associated with uncertainty. Results We applied the analysis framework to experimental data from three rat mesentery networks and one mouse brain cortex network. We inferred distributions for more than five hundred unknown pressure and hematocrit boundary conditions. Model predictions were consistent with previous analyses, and remained robust when measurements were omitted from model calibration. Conclusion Our Bayesian probabilistic approach may be suitable for optimizing data acquisition and for analyzing and reporting large datasets acquired as part of microvascular imaging studies. PMID:27987383
Biologically-inspired data decorrelation for hyper-spectral imaging

NASA Astrophysics Data System (ADS)

Picon, Artzai; Ghita, Ovidiu; Rodriguez-Vaamonde, Sergio; Iriondo, Pedro Ma; Whelan, Paul F.

2011-12-01

Hyper-spectral data allows the construction of more robust statistical models to sample the material properties than the standard tri-chromatic color representation. However, because of the large dimensionality and complexity of the hyper-spectral data, the extraction of robust features (image descriptors) is not a trivial issue. Thus, to facilitate efficient feature extraction, decorrelation techniques are commonly applied to reduce the dimensionality of the hyper-spectral data with the aim of generating compact and highly discriminative image descriptors. Current methodologies for data decorrelation such as principal component analysis (PCA), linear discriminant analysis (LDA), wavelet decomposition (WD), or band selection methods require complex and subjective training procedures and in addition the compressed spectral information is not directly related to the physical (spectral) characteristics associated with the analyzed materials. The major objective of this article is to introduce and evaluate a new data decorrelation methodology using an approach that closely emulates the human vision. The proposed data decorrelation scheme has been employed to optimally minimize the amount of redundant information contained in the highly correlated hyper-spectral bands and has been comprehensively evaluated in the context of non-ferrous material classification
Optimization of an electromagnetic linear actuator using a network and a finite element model

NASA Astrophysics Data System (ADS)

Neubert, Holger; Kamusella, Alfred; Lienig, Jens

2011-03-01

Model based design optimization leads to robust solutions only if the statistical deviations of design, load and ambient parameters from nominal values are considered. We describe an optimization methodology that involves these deviations as stochastic variables for an exemplary electromagnetic actuator used to drive a Braille printer. A combined model simulates the dynamic behavior of the actuator and its non-linear load. It consists of a dynamic network model and a stationary magnetic finite element (FE) model. The network model utilizes lookup tables of the magnetic force and the flux linkage computed by the FE model. After a sensitivity analysis using design of experiment (DoE) methods and a nominal optimization based on gradient methods, a robust design optimization is performed. Selected design variables are involved in form of their density functions. In order to reduce the computational effort we use response surfaces instead of the combined system model obtained in all stochastic analysis steps. Thus, Monte-Carlo simulations can be applied. As a result we found an optimum system design meeting our requirements with regard to function and reliability.

Longitudinal Mediation Analysis with Time-varying Mediators and Exposures, with Application to Survival Outcomes

PubMed Central

Zheng, Wenjing; van der Laan, Mark

2017-01-01

In this paper, we study the effect of a time-varying exposure mediated by a time-varying intermediate variable. We consider general longitudinal settings, including survival outcomes. At a given time point, the exposure and mediator of interest are influenced by past covariates, mediators and exposures, and affect future covariates, mediators and exposures. Right censoring, if present, occurs in response to past history. To address the challenges in mediation analysis that are unique to these settings, we propose a formulation in terms of random interventions based on conditional distributions for the mediator. This formulation, in particular, allows for well-defined natural direct and indirect effects in the survival setting, and natural decomposition of the standard total effect. Upon establishing identifiability and the corresponding statistical estimands, we derive the efficient influence curves and establish their robustness properties. Applying Targeted Maximum Likelihood Estimation, we use these efficient influence curves to construct multiply robust and efficient estimators. We also present an inverse probability weighted estimator and a nested non-targeted substitution estimator for these parameters. PMID:29387520
Integrating hidden Markov model and PRAAT: a toolbox for robust automatic speech transcription

NASA Astrophysics Data System (ADS)

Kabir, A.; Barker, J.; Giurgiu, M.

2010-09-01

An automatic time-aligned phone transcription toolbox of English speech corpora has been developed. Especially the toolbox would be very useful to generate robust automatic transcription and able to produce phone level transcription using speaker independent models as well as speaker dependent models without manual intervention. The system is based on standard Hidden Markov Models (HMM) approach and it was successfully experimented over a large audiovisual speech corpus namely GRID corpus. One of the most powerful features of the toolbox is the increased flexibility in speech processing where the speech community would be able to import the automatic transcription generated by HMM Toolkit (HTK) into a popular transcription software, PRAAT, and vice-versa. The toolbox has been evaluated through statistical analysis on GRID data which shows that automatic transcription deviates by an average of 20 ms with respect to manual transcription.
Standard and Robust Methods in Regression Imputation

ERIC Educational Resources Information Center

Moraveji, Behjat; Jafarian, Koorosh

2014-01-01

The aim of this paper is to provide an introduction of new imputation algorithms for estimating missing values from official statistics in larger data sets of data pre-processing, or outliers. The goal is to propose a new algorithm called IRMI (iterative robust model-based imputation). This algorithm is able to deal with all challenges like…
Probability density function formalism for optical coherence tomography signal analysis: a controlled phantom study.

PubMed

Weatherbee, Andrew; Sugita, Mitsuro; Bizheva, Kostadinka; Popov, Ivan; Vitkin, Alex

2016-06-15

The distribution of backscattered intensities as described by the probability density function (PDF) of tissue-scattered light contains information that may be useful for tissue assessment and diagnosis, including characterization of its pathology. In this Letter, we examine the PDF description of the light scattering statistics in a well characterized tissue-like particulate medium using optical coherence tomography (OCT). It is shown that for low scatterer density, the governing statistics depart considerably from a Gaussian description and follow the K distribution for both OCT amplitude and intensity. The PDF formalism is shown to be independent of the scatterer flow conditions; this is expected from theory, and suggests robustness and motion independence of the OCT amplitude (and OCT intensity) PDF metrics in the context of potential biomedical applications.
Analysis of multi-fragmentation reactions induced by relativistic heavy ions using the statistical multi-fragmentation model

NASA Astrophysics Data System (ADS)

Ogawa, T.; Sato, T.; Hashimoto, S.; Niita, K.

2013-09-01

The fragmentation cross-sections of relativistic energy nucleus-nucleus collisions were analyzed using the statistical multi-fragmentation model (SMM) incorporated with the Monte-Carlo radiation transport simulation code particle and heavy ion transport code system (PHITS). Comparison with the literature data showed that PHITS-SMM reproduces fragmentation cross-sections of heavy nuclei at relativistic energies better than the original PHITS by up to two orders of magnitude. It was also found that SMM does not degrade the neutron production cross-sections in heavy ion collisions or the fragmentation cross-sections of light nuclei, for which SMM has not been benchmarked. Therefore, SMM is a robust model that can supplement conventional nucleus-nucleus reaction models, enabling more accurate prediction of fragmentation cross-sections.
Fundamentals of Counting Statistics in Digital PCR: I Just Measured Two Target Copies-What Does It Mean?

PubMed

Tzonev, Svilen

2018-01-01

Current commercially available digital PCR (dPCR) systems and assays are capable of detecting individual target molecules with considerable reliability. As tests are developed and validated for use on clinical samples, the need to understand and develop robust statistical analysis routines increases. This chapter covers the fundamental processes and limitations of detecting and reporting on single molecule detection. We cover the basics of quantification of targets and sources of imprecision. We describe the basic test concepts: sensitivity, specificity, limit of blank, limit of detection, and limit of quantification in the context of dPCR. We provide basic guidelines how to determine those, how to choose and interpret the operating point, and what factors may influence overall test performance in practice.
Mortality and long-term exposure to ambient air pollution: ongoing analyses based on the American Cancer Society cohort.

PubMed

Krewski, Daniel; Burnett, Richard; Jerrett, Michael; Pope, C Arden; Rainham, Daniel; Calle, Eugenia; Thurston, George; Thun, Michael

This article provides an overview of previous analysis and reanalysis of the American Cancer Society (ACS) cohort, along with an indication of current ongoing analyses of the cohort with additional follow-up information through to 2000. Results of the first analysis conducted by Pope et al. (1995) showed that higher average sulfate levels were associated with increased mortality, particularly from cardiopulmonary disease. A reanalysis of the ACS cohort, undertaken by Krewski et al. (2000), found the original risk estimates for fine-particle and sulfate air pollution to be highly robust against alternative statistical techniques and spatial modeling approaches. A detailed investigation of covariate effects found a significant modifying effect of education with risk of mortality associated with fine particles declining with increasing educational attainment. Pope et al. (2002) subsequently reported results of a subsequent study using an additional 10 yr of follow-up of the ACS cohort. This updated analysis included gaseous copollutant and new fine-particle measurements, more comprehensive information on occupational exposures, dietary variables, and the most recent developments in statistical modeling integrating random effects and nonparametric spatial smoothing into the Cox proportional hazards model. Robust associations between ambient fine particulate air pollution and elevated risks of cardiopulmonary and lung cancer mortality were clearly evident, providing the strongest evidence to date that long-term exposure to fine particles is an important health risk. Current ongoing analysis using the extended follow-up information will explore the role of ecologic, economic, and, demographic covariates in the particulate air pollution and mortality association. This analysis will also provide insight into the role of spatial autocorrelation at multiple geographic scales, and whether critical instances in time of exposure to fine particles influence the risk of mortality from cardiopulmonary and lung cancer. Information on the influence of covariates at multiple scales and of critical exposure time windows can assist policymakers in establishing timelines for regulatory interventions that maximize population health benefits.
The Role of Design-of-Experiments in Managing Flow in Compact Air Vehicle Inlets

NASA Technical Reports Server (NTRS)

Anderson, Bernhard H.; Miller, Daniel N.; Gridley, Marvin C.; Agrell, Johan

2003-01-01

It is the purpose of this study to demonstrate the viability and economy of Design-of-Experiments methodologies to arrive at microscale secondary flow control array designs that maintain optimal inlet performance over a wide range of the mission variables and to explore how these statistical methods provide a better understanding of the management of flow in compact air vehicle inlets. These statistical design concepts were used to investigate the robustness properties of low unit strength micro-effector arrays. Low unit strength micro-effectors are micro-vanes set at very low angles-of-incidence with very long chord lengths. They were designed to influence the near wall inlet flow over an extended streamwise distance, and their advantage lies in low total pressure loss and high effectiveness in managing engine face distortion. The term robustness is used in this paper in the same sense as it is used in the industrial problem solving community. It refers to minimizing the effects of the hard-to-control factors that influence the development of a product or process. In Robustness Engineering, the effects of the hard-to-control factors are often called noise , and the hard-to-control factors themselves are referred to as the environmental variables or sometimes as the Taguchi noise variables. Hence Robust Optimization refers to minimizing the effects of the environmental or noise variables on the development (design) of a product or process. In the management of flow in compact inlets, the environmental or noise variables can be identified with the mission variables. Therefore this paper formulates a statistical design methodology that minimizes the impact of variations in the mission variables on inlet performance and demonstrates that these statistical design concepts can lead to simpler inlet flow management systems.
Kinetic analysis of single molecule FRET transitions without trajectories

NASA Astrophysics Data System (ADS)

Schrangl, Lukas; Göhring, Janett; Schütz, Gerhard J.

2018-03-01

Single molecule Förster resonance energy transfer (smFRET) is a popular tool to study biological systems that undergo topological transitions on the nanometer scale. smFRET experiments typically require recording of long smFRET trajectories and subsequent statistical analysis to extract parameters such as the states' lifetimes. Alternatively, analysis of probability distributions exploits the shapes of smFRET distributions at well chosen exposure times and hence works without the acquisition of time traces. Here, we describe a variant that utilizes statistical tests to compare experimental datasets with Monte Carlo simulations. For a given model, parameters are varied to cover the full realistic parameter space. As output, the method yields p-values which quantify the likelihood for each parameter setting to be consistent with the experimental data. The method provides suitable results even if the actual lifetimes differ by an order of magnitude. We also demonstrated the robustness of the method to inaccurately determine input parameters. As proof of concept, the new method was applied to the determination of transition rate constants for Holliday junctions.
A graphical user interface for RAId, a knowledge integrated proteomics analysis suite with accurate statistics.

PubMed

Joyce, Brendan; Lee, Danny; Rubio, Alex; Ogurtsov, Aleksey; Alves, Gelio; Yu, Yi-Kuo

2018-03-15

RAId is a software package that has been actively developed for the past 10 years for computationally and visually analyzing MS/MS data. Founded on rigorous statistical methods, RAId's core program computes accurate E-values for peptides and proteins identified during database searches. Making this robust tool readily accessible for the proteomics community by developing a graphical user interface (GUI) is our main goal here. We have constructed a graphical user interface to facilitate the use of RAId on users' local machines. Written in Java, RAId_GUI not only makes easy executions of RAId but also provides tools for data/spectra visualization, MS-product analysis, molecular isotopic distribution analysis, and graphing the retrieval versus the proportion of false discoveries. The results viewer displays and allows the users to download the analyses results. Both the knowledge-integrated organismal databases and the code package (containing source code, the graphical user interface, and a user manual) are available for download at https://www.ncbi.nlm.nih.gov/CBBresearch/Yu/downloads/raid.html .
Continuous EEG signal analysis for asynchronous BCI application.

PubMed

Hsu, Wei-Yen

2011-08-01

In this study, we propose a two-stage recognition system for continuous analysis of electroencephalogram (EEG) signals. An independent component analysis (ICA) and correlation coefficient are used to automatically eliminate the electrooculography (EOG) artifacts. Based on the continuous wavelet transform (CWT) and Student's two-sample t-statistics, active segment selection then detects the location of active segment in the time-frequency domain. Next, multiresolution fractal feature vectors (MFFVs) are extracted with the proposed modified fractal dimension from wavelet data. Finally, the support vector machine (SVM) is adopted for the robust classification of MFFVs. The EEG signals are continuously analyzed in 1-s segments, and every 0.5 second moves forward to simulate asynchronous BCI works in the two-stage recognition architecture. The segment is first recognized as lifted or not in the first stage, and then is classified as left or right finger lifting at stage two if the segment is recognized as lifting in the first stage. Several statistical analyses are used to evaluate the performance of the proposed system. The results indicate that it is a promising system in the applications of asynchronous BCI work.
On adaptive robustness approach to Anti-Jam signal processing

NASA Astrophysics Data System (ADS)

Poberezhskiy, Y. S.; Poberezhskiy, G. Y.

An effective approach to exploiting statistical differences between desired and jamming signals named adaptive robustness is proposed and analyzed in this paper. It combines conventional Bayesian, adaptive, and robust approaches that are complementary to each other. This combining strengthens the advantages and mitigates the drawbacks of the conventional approaches. Adaptive robustness is equally applicable to both jammers and their victim systems. The capabilities required for realization of adaptive robustness in jammers and victim systems are determined. The employment of a specific nonlinear robust algorithm for anti-jam (AJ) processing is described and analyzed. Its effectiveness in practical situations has been proven analytically and confirmed by simulation. Since adaptive robustness can be used by both sides in electronic warfare, it is more advantageous for the fastest and most intelligent side. Many results obtained and discussed in this paper are also applicable to commercial applications such as communications in unregulated or poorly regulated frequency ranges and systems with cognitive capabilities.
Measuring continuous baseline covariate imbalances in clinical trial data

PubMed Central

Ciolino, Jody D.; Martin, Renee’ H.; Zhao, Wenle; Hill, Michael D.; Jauch, Edward C.; Palesch, Yuko Y.

2014-01-01

This paper presents and compares several methods of measuring continuous baseline covariate imbalance in clinical trial data. Simulations illustrate that though the t-test is an inappropriate method of assessing continuous baseline covariate imbalance, the test statistic itself is a robust measure in capturing imbalance in continuous covariate distributions. Guidelines to assess effects of imbalance on bias, type I error rate, and power for hypothesis test for treatment effect on continuous outcomes are presented, and the benefit of covariate-adjusted analysis (ANCOVA) is also illustrated. PMID:21865270
Applying robust design to study the effects of stratigraphic characteristics on brittle failure and bump potential in a coal mine

PubMed Central

Kim, Bo-Hyun; Larson, Mark K.; Lawson, Heather E.

2018-01-01

Bumps and other types of dynamic failure have been a persistent, worldwide problem in the underground coal mining industry, spanning decades. For example, in just five states in the U.S. from 1983 to 2014, there were 388 reportable bumps. Despite significant advances in mine design tools and mining practices, these events continue to occur. Many conditions have been associated with bump potential, such as the presence of stiff units in the local geology. The effect of a stiff sandstone unit on the potential for coal bumps depends on the location of the stiff unit in the stratigraphic column, the relative stiffness and strength of other structural members, and stress concentrations caused by mining. This study describes the results of a robust design to consider the impact of different lithologic risk factors impacting dynamic failure risk. Because the inherent variability of stratigraphic characteristics in sedimentary formations, such as thickness, engineering material properties, and location, is significant and the number of influential parameters in determining a parametric study is large, it is impractical to consider every simulation case by varying each parameter individually. Therefore, to save time and honor the statistical distributions of the parameters, it is necessary to develop a robust design to collect sufficient sample data and develop a statistical analysis method to draw accurate conclusions from the collected data. In this study, orthogonal arrays, which were developed using the robust design, are used to define the combination of the (a) thickness of a stiff sandstone inserted on the top and bottom of a coal seam in a massive shale mine roof and floor, (b) location of the stiff sandstone inserted on the top and bottom of the coal seam, and (c) material properties of the stiff sandstone and contacts as interfaces using the 3-dimensional numerical model, FLAC3D. After completion of the numerical experiments, statistical and multivariate analysis are performed using the calculated results from the orthogonal arrays to analyze the effect of these variables. As a consequence, the impact of each of the parameters on the potential for bumps is quantitatively classified in terms of a normalized intensity of plastic dissipated energy. By multiple regression, the intensity of plastic dissipated energy and migration of the risk from the roof to the floor via the pillars is predicted based on the value of the variables. The results demonstrate and suggest a possible capability to predict the bump potential in a given rock mass adjacent to the underground excavations and pillars. Assessing the risk of bumps is important to preventing fatalities and injuries resulting from bumps. PMID:29416902
Exploratory study on a statistical method to analyse time resolved data obtained during nanomaterial exposure measurements

NASA Astrophysics Data System (ADS)

Clerc, F.; Njiki-Menga, G.-H.; Witschger, O.

2013-04-01

Most of the measurement strategies that are suggested at the international level to assess workplace exposure to nanomaterials rely on devices measuring, in real time, airborne particles concentrations (according different metrics). Since none of the instruments to measure aerosols can distinguish a particle of interest to the background aerosol, the statistical analysis of time resolved data requires special attention. So far, very few approaches have been used for statistical analysis in the literature. This ranges from simple qualitative analysis of graphs to the implementation of more complex statistical models. To date, there is still no consensus on a particular approach and the current period is always looking for an appropriate and robust method. In this context, this exploratory study investigates a statistical method to analyse time resolved data based on a Bayesian probabilistic approach. To investigate and illustrate the use of the this statistical method, particle number concentration data from a workplace study that investigated the potential for exposure via inhalation from cleanout operations by sandpapering of a reactor producing nanocomposite thin films have been used. In this workplace study, the background issue has been addressed through the near-field and far-field approaches and several size integrated and time resolved devices have been used. The analysis of the results presented here focuses only on data obtained with two handheld condensation particle counters. While one was measuring at the source of the released particles, the other one was measuring in parallel far-field. The Bayesian probabilistic approach allows a probabilistic modelling of data series, and the observed task is modelled in the form of probability distributions. The probability distributions issuing from time resolved data obtained at the source can be compared with the probability distributions issuing from the time resolved data obtained far-field, leading in a quantitative estimation of the airborne particles released at the source when the task is performed. Beyond obtained results, this exploratory study indicates that the analysis of the results requires specific experience in statistics.
Spatial Statistics for Tumor Cell Counting and Classification

NASA Astrophysics Data System (ADS)

Wirjadi, Oliver; Kim, Yoo-Jin; Breuel, Thomas

To count and classify cells in histological sections is a standard task in histology. One example is the grading of meningiomas, benign tumors of the meninges, which requires to assess the fraction of proliferating cells in an image. As this process is very time consuming when performed manually, automation is required. To address such problems, we propose a novel application of Markov point process methods in computer vision, leading to algorithms for computing the locations of circular objects in images. In contrast to previous algorithms using such spatial statistics methods in image analysis, the present one is fully trainable. This is achieved by combining point process methods with statistical classifiers. Using simulated data, the method proposed in this paper will be shown to be more accurate and more robust to noise than standard image processing methods. On the publicly available SIMCEP benchmark for cell image analysis algorithms, the cell count performance of the present paper is significantly more accurate than results published elsewhere, especially when cells form dense clusters. Furthermore, the proposed system performs as well as a state-of-the-art algorithm for the computer-aided histological grading of meningiomas when combined with a simple k-nearest neighbor classifier for identifying proliferating cells.
Noise-robust unsupervised spike sorting based on discriminative subspace learning with outlier handling.

PubMed

Keshtkaran, Mohammad Reza; Yang, Zhi

2017-06-01

Spike sorting is a fundamental preprocessing step for many neuroscience studies which rely on the analysis of spike trains. Most of the feature extraction and dimensionality reduction techniques that have been used for spike sorting give a projection subspace which is not necessarily the most discriminative one. Therefore, the clusters which appear inherently separable in some discriminative subspace may overlap if projected using conventional feature extraction approaches leading to a poor sorting accuracy especially when the noise level is high. In this paper, we propose a noise-robust and unsupervised spike sorting algorithm based on learning discriminative spike features for clustering. The proposed algorithm uses discriminative subspace learning to extract low dimensional and most discriminative features from the spike waveforms and perform clustering with automatic detection of the number of the clusters. The core part of the algorithm involves iterative subspace selection using linear discriminant analysis and clustering using Gaussian mixture model with outlier detection. A statistical test in the discriminative subspace is proposed to automatically detect the number of the clusters. Comparative results on publicly available simulated and real in vivo datasets demonstrate that our algorithm achieves substantially improved cluster distinction leading to higher sorting accuracy and more reliable detection of clusters which are highly overlapping and not detectable using conventional feature extraction techniques such as principal component analysis or wavelets. By providing more accurate information about the activity of more number of individual neurons with high robustness to neural noise and outliers, the proposed unsupervised spike sorting algorithm facilitates more detailed and accurate analysis of single- and multi-unit activities in neuroscience and brain machine interface studies.
Noise-robust unsupervised spike sorting based on discriminative subspace learning with outlier handling

NASA Astrophysics Data System (ADS)

Keshtkaran, Mohammad Reza; Yang, Zhi

2017-06-01

Objective. Spike sorting is a fundamental preprocessing step for many neuroscience studies which rely on the analysis of spike trains. Most of the feature extraction and dimensionality reduction techniques that have been used for spike sorting give a projection subspace which is not necessarily the most discriminative one. Therefore, the clusters which appear inherently separable in some discriminative subspace may overlap if projected using conventional feature extraction approaches leading to a poor sorting accuracy especially when the noise level is high. In this paper, we propose a noise-robust and unsupervised spike sorting algorithm based on learning discriminative spike features for clustering. Approach. The proposed algorithm uses discriminative subspace learning to extract low dimensional and most discriminative features from the spike waveforms and perform clustering with automatic detection of the number of the clusters. The core part of the algorithm involves iterative subspace selection using linear discriminant analysis and clustering using Gaussian mixture model with outlier detection. A statistical test in the discriminative subspace is proposed to automatically detect the number of the clusters. Main results. Comparative results on publicly available simulated and real in vivo datasets demonstrate that our algorithm achieves substantially improved cluster distinction leading to higher sorting accuracy and more reliable detection of clusters which are highly overlapping and not detectable using conventional feature extraction techniques such as principal component analysis or wavelets. Significance. By providing more accurate information about the activity of more number of individual neurons with high robustness to neural noise and outliers, the proposed unsupervised spike sorting algorithm facilitates more detailed and accurate analysis of single- and multi-unit activities in neuroscience and brain machine interface studies.
Mapping probabilities of extreme continental water storage changes from space gravimetry

NASA Astrophysics Data System (ADS)

Kusche, J.; Eicker, A.; Forootan, E.; Springer, A.; Longuevergne, L.

2016-08-01

Using data from the Gravity Recovery And Climate Experiment (GRACE) mission, we derive statistically robust "hot spot" regions of high probability of peak anomalous—i.e., with respect to the seasonal cycle—water storage (of up to 0.7 m one-in-five-year return level) and flux (up to 0.14 m/month). Analysis of, and comparison with, up to 32 years of ERA-Interim reanalysis fields reveals generally good agreement of these hot spot regions to GRACE results and that most exceptions are located in the tropics. However, a simulation experiment reveals that differences observed by GRACE are statistically significant, and further error analysis suggests that by around the year 2020, it will be possible to detect temporal changes in the frequency of extreme total fluxes (i.e., combined effects of mainly precipitation and floods) for at least 10-20% of the continental area, assuming that we have a continuation of GRACE by its follow-up GRACE Follow-On (GRACE-FO) mission.
Conceptual and statistical problems associated with the use of diversity indices in ecology.

PubMed

Barrantes, Gilbert; Sandoval, Luis

2009-09-01

Diversity indices, particularly the Shannon-Wiener index, have extensively been used in analyzing patterns of diversity at different geographic and ecological scales. These indices have serious conceptual and statistical problems which make comparisons of species richness or species abundances across communities nearly impossible. There is often no a single statistical method that retains all information needed to answer even a simple question. However, multivariate analyses could be used instead of diversity indices, such as cluster analyses or multiple regressions. More complex multivariate analyses, such as Canonical Correspondence Analysis, provide very valuable information on environmental variables associated to the presence and abundance of the species in a community. In addition, particular hypotheses associated to changes in species richness across localities, or change in abundance of one, or a group of species can be tested using univariate, bivariate, and/or rarefaction statistical tests. The rarefaction method has proved to be robust to standardize all samples to a common size. Even the simplest method as reporting the number of species per taxonomic category possibly provides more information than a diversity index value.

Adjustment of geochemical background by robust multivariate statistics

USGS Publications Warehouse

Zhou, D.

1985-01-01

Conventional analyses of exploration geochemical data assume that the background is a constant or slowly changing value, equivalent to a plane or a smoothly curved surface. However, it is better to regard the geochemical background as a rugged surface, varying with changes in geology and environment. This rugged surface can be estimated from observed geological, geochemical and environmental properties by using multivariate statistics. A method of background adjustment was developed and applied to groundwater and stream sediment reconnaissance data collected from the Hot Springs Quadrangle, South Dakota, as part of the National Uranium Resource Evaluation (NURE) program. Source-rock lithology appears to be a dominant factor controlling the chemical composition of groundwater or stream sediments. The most efficacious adjustment procedure is to regress uranium concentration on selected geochemical and environmental variables for each lithologic unit, and then to delineate anomalies by a common threshold set as a multiple of the standard deviation of the combined residuals. Robust versions of regression and RQ-mode principal components analysis techniques were used rather than ordinary techniques to guard against distortion caused by outliers Anomalies delineated by this background adjustment procedure correspond with uranium prospects much better than do anomalies delineated by conventional procedures. The procedure should be applicable to geochemical exploration at different scales for other metals. ?? 1985.
Shrinkage covariance matrix approach based on robust trimmed mean in gene sets detection

NASA Astrophysics Data System (ADS)

Karjanto, Suryaefiza; Ramli, Norazan Mohamed; Ghani, Nor Azura Md; Aripin, Rasimah; Yusop, Noorezatty Mohd

2015-02-01

Microarray involves of placing an orderly arrangement of thousands of gene sequences in a grid on a suitable surface. The technology has made a novelty discovery since its development and obtained an increasing attention among researchers. The widespread of microarray technology is largely due to its ability to perform simultaneous analysis of thousands of genes in a massively parallel manner in one experiment. Hence, it provides valuable knowledge on gene interaction and function. The microarray data set typically consists of tens of thousands of genes (variables) from just dozens of samples due to various constraints. Therefore, the sample covariance matrix in Hotelling's T2 statistic is not positive definite and become singular, thus it cannot be inverted. In this research, the Hotelling's T2 statistic is combined with a shrinkage approach as an alternative estimation to estimate the covariance matrix to detect significant gene sets. The use of shrinkage covariance matrix overcomes the singularity problem by converting an unbiased to an improved biased estimator of covariance matrix. Robust trimmed mean is integrated into the shrinkage matrix to reduce the influence of outliers and consequently increases its efficiency. The performance of the proposed method is measured using several simulation designs. The results are expected to outperform existing techniques in many tested conditions.
Robust statistical methods for hit selection in RNA interference high-throughput screening experiments.

PubMed

Zhang, Xiaohua Douglas; Yang, Xiting Cindy; Chung, Namjin; Gates, Adam; Stec, Erica; Kunapuli, Priya; Holder, Dan J; Ferrer, Marc; Espeseth, Amy S

2006-04-01

RNA interference (RNAi) high-throughput screening (HTS) experiments carried out using large (>5000 short interfering [si]RNA) libraries generate a huge amount of data. In order to use these data to identify the most effective siRNAs tested, it is critical to adopt and develop appropriate statistical methods. To address the questions in hit selection of RNAi HTS, we proposed a quartile-based method which is robust to outliers, true hits and nonsymmetrical data. We compared it with the more traditional tests, mean +/- k standard deviation (SD) and median +/- 3 median of absolute deviation (MAD). The results suggested that the quartile-based method selected more hits than mean +/- k SD under the same preset error rate. The number of hits selected by median +/- k MAD was close to that by the quartile-based method. Further analysis suggested that the quartile-based method had the greatest power in detecting true hits, especially weak or moderate true hits. Our investigation also suggested that platewise analysis (determining effective siRNAs on a plate-by-plate basis) can adjust for systematic errors in different plates, while an experimentwise analysis, in which effective siRNAs are identified in an analysis of the entire experiment, cannot. However, experimentwise analysis may detect a cluster of true positive hits placed together in one or several plates, while platewise analysis may not. To display hit selection results, we designed a specific figure called a plate-well series plot. We thus suggest the following strategy for hit selection in RNAi HTS experiments. First, choose the quartile-based method, or median +/- k MAD, for identifying effective siRNAs. Second, perform the chosen method experimentwise on transformed/normalized data, such as percentage inhibition, to check the possibility of hit clusters. If a cluster of selected hits are observed, repeat the analysis based on untransformed data to determine whether the cluster is due to an artifact in the data. If no clusters of hits are observed, select hits by performing platewise analysis on transformed data. Third, adopt the plate-well series plot to visualize both the data and the hit selection results, as well as to check for artifacts.
Novel Kalman filter algorithm for statistical monitoring of extensive landscapes with synoptic sensor data

Treesearch

Raymond L. Czaplewski

2015-01-01

Wall-to-wall remotely sensed data are increasingly available to monitor landscape dynamics over large geographic areas. However, statistical monitoring programs that use post-stratification cannot fully utilize those sensor data. The Kalman filter (KF) is an alternative statistical estimator. I develop a new KF algorithm that is numerically robust with large numbers of...
Variable selection for marginal longitudinal generalized linear models.

PubMed

Cantoni, Eva; Flemming, Joanna Mills; Ronchetti, Elvezio

2005-06-01

Variable selection is an essential part of any statistical analysis and yet has been somewhat neglected in the context of longitudinal data analysis. In this article, we propose a generalized version of Mallows's C(p) (GC(p)) suitable for use with both parametric and nonparametric models. GC(p) provides an estimate of a measure of model's adequacy for prediction. We examine its performance with popular marginal longitudinal models (fitted using GEE) and contrast results with what is typically done in practice: variable selection based on Wald-type or score-type tests. An application to real data further demonstrates the merits of our approach while at the same time emphasizing some important robust features inherent to GC(p).
Directional variance adjustment: bias reduction in covariance matrices based on factor analysis with an application to portfolio optimization.

PubMed

Bartz, Daniel; Hatrick, Kerr; Hesse, Christian W; Müller, Klaus-Robert; Lemm, Steven

2013-01-01

Robust and reliable covariance estimates play a decisive role in financial and many other applications. An important class of estimators is based on factor models. Here, we show by extensive Monte Carlo simulations that covariance matrices derived from the statistical Factor Analysis model exhibit a systematic error, which is similar to the well-known systematic error of the spectrum of the sample covariance matrix. Moreover, we introduce the Directional Variance Adjustment (DVA) algorithm, which diminishes the systematic error. In a thorough empirical study for the US, European, and Hong Kong stock market we show that our proposed method leads to improved portfolio allocation.
A simple white noise analysis of neuronal light responses.

PubMed

Chichilnisky, E J

2001-05-01

A white noise technique is presented for estimating the response properties of spiking visual system neurons. The technique is simple, robust, efficient and well suited to simultaneous recordings from multiple neurons. It provides a complete and easily interpretable model of light responses even for neurons that display a common form of response nonlinearity that precludes classical linear systems analysis. A theoretical justification of the technique is presented that relies only on elementary linear algebra and statistics. Implementation is described with examples. The technique and the underlying model of neural responses are validated using recordings from retinal ganglion cells, and in principle are applicable to other neurons. Advantages and disadvantages of the technique relative to classical approaches are discussed.
Three-dimensional analysis of scoliosis surgery using stereophotogrammetry

NASA Astrophysics Data System (ADS)

Jang, Stanley B.; Booth, Kellogg S.; Reilly, Chris W.; Sawatzky, Bonita J.; Tredwell, Stephen J.

1994-04-01

A new stereophotogrammetric analysis and 3D visualization allow accurate assessment of the scoliotic spine during instrumentation. Stereophoto pairs taken at each stage of the operation and robust statistical techniques are used to compute 3D transformations of the vertebrae between stages. These determine rotation, translation, goodness of fit, and overall spinal contour. A polygonal model of the spine using commercial 3D modeling package is used to produce an animation sequence of the transformation. The visualization have provided some important observation. Correction of the scoliosis is achieved largely through vertebral translation and coronal plane rotation, contrary to claims that large axial rotations are required. The animations provide valuable qualitative information for surgeons assessing the results of scoliotic correction.
Application of Partial Least Square (PLS) Analysis on Fluorescence Data of 8-Anilinonaphthalene-1-Sulfonic Acid, a Polarity Dye, for Monitoring Water Adulteration in Ethanol Fuel.

PubMed

Kumar, Keshav; Mishra, Ashok Kumar

2015-07-01

Fluorescence characteristic of 8-anilinonaphthalene-1-sulfonic acid (ANS) in ethanol-water mixture in combination with partial least square (PLS) analysis was used to propose a simple and sensitive analytical procedure for monitoring the adulteration of ethanol by water. The proposed analytical procedure was found to be capable of detecting even small adulteration level of ethanol by water. The robustness of the procedure is evident from the statistical parameters such as square of correlation coefficient (R(2)), root mean square of calibration (RMSEC) and root mean square of prediction (RMSEP) that were found to be well with in the acceptable limits.
Directional Variance Adjustment: Bias Reduction in Covariance Matrices Based on Factor Analysis with an Application to Portfolio Optimization

PubMed Central

Bartz, Daniel; Hatrick, Kerr; Hesse, Christian W.; Müller, Klaus-Robert; Lemm, Steven

2013-01-01

Robust and reliable covariance estimates play a decisive role in financial and many other applications. An important class of estimators is based on factor models. Here, we show by extensive Monte Carlo simulations that covariance matrices derived from the statistical Factor Analysis model exhibit a systematic error, which is similar to the well-known systematic error of the spectrum of the sample covariance matrix. Moreover, we introduce the Directional Variance Adjustment (DVA) algorithm, which diminishes the systematic error. In a thorough empirical study for the US, European, and Hong Kong stock market we show that our proposed method leads to improved portfolio allocation. PMID:23844016
Model Robust Calibration: Method and Application to Electronically-Scanned Pressure Transducers

NASA Technical Reports Server (NTRS)

Walker, Eric L.; Starnes, B. Alden; Birch, Jeffery B.; Mays, James E.

2010-01-01

This article presents the application of a recently developed statistical regression method to the controlled instrument calibration problem. The statistical method of Model Robust Regression (MRR), developed by Mays, Birch, and Starnes, is shown to improve instrument calibration by reducing the reliance of the calibration on a predetermined parametric (e.g. polynomial, exponential, logarithmic) model. This is accomplished by allowing fits from the predetermined parametric model to be augmented by a certain portion of a fit to the residuals from the initial regression using a nonparametric (locally parametric) regression technique. The method is demonstrated for the absolute scale calibration of silicon-based pressure transducers.
Robust Covariate-Adjusted Log-Rank Statistics and Corresponding Sample Size Formula for Recurrent Events Data

PubMed Central

Song, Rui; Kosorok, Michael R.; Cai, Jianwen

2009-01-01

Summary Recurrent events data are frequently encountered in clinical trials. This article develops robust covariate-adjusted log-rank statistics applied to recurrent events data with arbitrary numbers of events under independent censoring and the corresponding sample size formula. The proposed log-rank tests are robust with respect to different data-generating processes and are adjusted for predictive covariates. It reduces to the Kong and Slud (1997, Biometrika 84, 847–862) setting in the case of a single event. The sample size formula is derived based on the asymptotic normality of the covariate-adjusted log-rank statistics under certain local alternatives and a working model for baseline covariates in the recurrent event data context. When the effect size is small and the baseline covariates do not contain significant information about event times, it reduces to the same form as that of Schoenfeld (1983, Biometrics 39, 499–503) for cases of a single event or independent event times within a subject. We carry out simulations to study the control of type I error and the comparison of powers between several methods in finite samples. The proposed sample size formula is illustrated using data from an rhDNase study. PMID:18162107
Relations between Brain Structure and Attentional Function in Spina Bifida: Utilization of Robust Statistical Approaches

PubMed Central

Kulesz, Paulina A.; Tian, Siva; Juranek, Jenifer; Fletcher, Jack M.; Francis, David J.

2015-01-01

Objective Weak structure-function relations for brain and behavior may stem from problems in estimating these relations in small clinical samples with frequently occurring outliers. In the current project, we focused on the utility of using alternative statistics to estimate these relations. Method Fifty-four children with spina bifida meningomyelocele performed attention tasks and received MRI of the brain. Using a bootstrap sampling process, the Pearson product moment correlation was compared with four robust correlations: the percentage bend correlation, the Winsorized correlation, the skipped correlation using the Donoho-Gasko median, and the skipped correlation using the minimum volume ellipsoid estimator Results All methods yielded similar estimates of the relations between measures of brain volume and attention performance. The similarity of estimates across correlation methods suggested that the weak structure-function relations previously found in many studies are not readily attributable to the presence of outlying observations and other factors that violate the assumptions behind the Pearson correlation. Conclusions Given the difficulty of assembling large samples for brain-behavior studies, estimating correlations using multiple, robust methods may enhance the statistical conclusion validity of studies yielding small, but often clinically significant, correlations. PMID:25495830
Relations between volumetric measures of brain structure and attentional function in spina bifida: utilization of robust statistical approaches.

PubMed

Kulesz, Paulina A; Tian, Siva; Juranek, Jenifer; Fletcher, Jack M; Francis, David J

2015-03-01

Weak structure-function relations for brain and behavior may stem from problems in estimating these relations in small clinical samples with frequently occurring outliers. In the current project, we focused on the utility of using alternative statistics to estimate these relations. Fifty-four children with spina bifida meningomyelocele performed attention tasks and received MRI of the brain. Using a bootstrap sampling process, the Pearson product-moment correlation was compared with 4 robust correlations: the percentage bend correlation, the Winsorized correlation, the skipped correlation using the Donoho-Gasko median, and the skipped correlation using the minimum volume ellipsoid estimator. All methods yielded similar estimates of the relations between measures of brain volume and attention performance. The similarity of estimates across correlation methods suggested that the weak structure-function relations previously found in many studies are not readily attributable to the presence of outlying observations and other factors that violate the assumptions behind the Pearson correlation. Given the difficulty of assembling large samples for brain-behavior studies, estimating correlations using multiple, robust methods may enhance the statistical conclusion validity of studies yielding small, but often clinically significant, correlations. PsycINFO Database Record (c) 2015 APA, all rights reserved.
Utilizing Wavelet Analysis to assess hydrograph change in northwestern North America

NASA Astrophysics Data System (ADS)

Tang, W.; Carey, S. K.

2017-12-01

Historical streamflow data in the mountainous regions of northwestern North America suggest that changes flows are driven by warming temperature, declining snowpack and glacier extent, and large-scale teleconnections. However, few sites exist that have robust long-term records for statistical analysis, and pervious research has focussed on high and low-flow indices along with trend analysis using Mann-Kendal test and other similar approaches. Furthermore, there has been less emphasis on ascertaining the drivers of change in changes in shape of the streamflow hydrograph compared with traditional flow metrics. In this work, we utilize wavelet analysis to evaluate changes in hydrograph characteristics for snowmelt driven rivers in northwestern North America across a range of scales. Results suggest that wavelets can be used to detect a lengthening and advancement of freshet with a corresponding decline in peak flows. Furthermore, the gradual transition of flows from nival to pluvial regimes in more southerly catchments is evident in the wavelet spectral power through time. This method of change detection is challenged by evaluating the statistical significance of changes in wavelet spectra as related to hydrograph form, yet ongoing work seeks to link these patters to driving weather and climate along with larger scale teleconnections.
Hypothesis Testing, "p" Values, Confidence Intervals, Measures of Effect Size, and Bayesian Methods in Light of Modern Robust Techniques

ERIC Educational Resources Information Center

Wilcox, Rand R.; Serang, Sarfaraz

2017-01-01

The article provides perspectives on p values, null hypothesis testing, and alternative techniques in light of modern robust statistical methods. Null hypothesis testing and "p" values can provide useful information provided they are interpreted in a sound manner, which includes taking into account insights and advances that have…
Median statistics estimates of Hubble and Newton's constants

NASA Astrophysics Data System (ADS)

Bethapudi, Suryarao; Desai, Shantanu

2017-02-01

Robustness of any statistics depends upon the number of assumptions it makes about the measured data. We point out the advantages of median statistics using toy numerical experiments and demonstrate its robustness, when the number of assumptions we can make about the data are limited. We then apply the median statistics technique to obtain estimates of two constants of nature, Hubble constant (H0) and Newton's gravitational constant ( G , both of which show significant differences between different measurements. For H0, we update the analyses done by Chen and Ratra (2011) and Gott et al. (2001) using 576 measurements. We find after grouping the different results according to their primary type of measurement, the median estimates are given by H0 = 72.5^{+2.5}_{-8} km/sec/Mpc with errors corresponding to 95% c.l. (2 σ) and G=6.674702^{+0.0014}_{-0.0009} × 10^{-11} Nm2kg-2 corresponding to 68% c.l. (1σ).
Accurate landmarking of three-dimensional facial data in the presence of facial expressions and occlusions using a three-dimensional statistical facial feature model.

PubMed

Zhao, Xi; Dellandréa, Emmanuel; Chen, Liming; Kakadiaris, Ioannis A

2011-10-01

Three-dimensional face landmarking aims at automatically localizing facial landmarks and has a wide range of applications (e.g., face recognition, face tracking, and facial expression analysis). Existing methods assume neutral facial expressions and unoccluded faces. In this paper, we propose a general learning-based framework for reliable landmark localization on 3-D facial data under challenging conditions (i.e., facial expressions and occlusions). Our approach relies on a statistical model, called 3-D statistical facial feature model, which learns both the global variations in configurational relationships between landmarks and the local variations of texture and geometry around each landmark. Based on this model, we further propose an occlusion classifier and a fitting algorithm. Results from experiments on three publicly available 3-D face databases (FRGC, BU-3-DFE, and Bosphorus) demonstrate the effectiveness of our approach, in terms of landmarking accuracy and robustness, in the presence of expressions and occlusions.
Data-adaptive test statistics for microarray data.

PubMed

Mukherjee, Sach; Roberts, Stephen J; van der Laan, Mark J

2005-09-01

An important task in microarray data analysis is the selection of genes that are differentially expressed between different tissue samples, such as healthy and diseased. However, microarray data contain an enormous number of dimensions (genes) and very few samples (arrays), a mismatch which poses fundamental statistical problems for the selection process that have defied easy resolution. In this paper, we present a novel approach to the selection of differentially expressed genes in which test statistics are learned from data using a simple notion of reproducibility in selection results as the learning criterion. Reproducibility, as we define it, can be computed without any knowledge of the 'ground-truth', but takes advantage of certain properties of microarray data to provide an asymptotically valid guide to expected loss under the true data-generating distribution. We are therefore able to indirectly minimize expected loss, and obtain results substantially more robust than conventional methods. We apply our method to simulated and oligonucleotide array data. By request to the corresponding author.
Effect of filter type on the statistics of energy transfer between resolved and subfilter scales from a-priori analysis of direct numerical simulations of isotropic turbulence

NASA Astrophysics Data System (ADS)

Buzzicotti, M.; Linkmann, M.; Aluie, H.; Biferale, L.; Brasseur, J.; Meneveau, C.

2018-02-01

The effects of different filtering strategies on the statistical properties of the resolved-to-subfilter scale (SFS) energy transfer are analysed in forced homogeneous and isotropic turbulence. We carry out a-priori analyses of the statistical characteristics of SFS energy transfer by filtering data obtained from direct numerical simulations with up to 20483 grid points as a function of the filter cutoff scale. In order to quantify the dependence of extreme events and anomalous scaling on the filter, we compare a sharp Fourier Galerkin projector, a Gaussian filter and a novel class of Galerkin projectors with non-sharp spectral filter profiles. Of interest is the importance of Galilean invariance and we confirm that local SFS energy transfer displays intermittency scaling in both skewness and flatness as a function of the cutoff scale. Furthermore, we quantify the robustness of scaling as a function of the filtering type.

[Quantitative structure-gas chromatographic retention relationship of polycyclic aromatic sulfur heterocycles using molecular electronegativity-distance vector].

PubMed

Li, Zhenghua; Cheng, Fansheng; Xia, Zhining

2011-01-01

The chemical structures of 114 polycyclic aromatic sulfur heterocycles (PASHs) have been studied by molecular electronegativity-distance vector (MEDV). The linear relationships between gas chromatographic retention index and the MEDV have been established by a multiple linear regression (MLR) model. The results of variable selection by stepwise multiple regression (SMR) and the powerful predictive abilities of the optimization model appraised by leave-one-out cross-validation showed that the optimization model with the correlation coefficient (R) of 0.994 7 and the cross-validated correlation coefficient (Rcv) of 0.994 0 possessed the best statistical quality. Furthermore, when the 114 PASHs compounds were divided into calibration and test sets in the ratio of 2:1, the statistical analysis showed our models possesses almost equal statistical quality, the very similar regression coefficients and the good robustness. The quantitative structure-retention relationship (QSRR) model established may provide a convenient and powerful method for predicting the gas chromatographic retention of PASHs.
Uncertainty Quantification and Statistical Convergence Guidelines for PIV Data

NASA Astrophysics Data System (ADS)

Stegmeir, Matthew; Kassen, Dan

2016-11-01

As Particle Image Velocimetry has continued to mature, it has developed into a robust and flexible technique for velocimetry used by expert and non-expert users. While historical estimates of PIV accuracy have typically relied heavily on "rules of thumb" and analysis of idealized synthetic images, recently increased emphasis has been placed on better quantifying real-world PIV measurement uncertainty. Multiple techniques have been developed to provide per-vector instantaneous uncertainty estimates for PIV measurements. Often real-world experimental conditions introduce complications in collecting "optimal" data, and the effect of these conditions is important to consider when planning an experimental campaign. The current work utilizes the results of PIV Uncertainty Quantification techniques to develop a framework for PIV users to utilize estimated PIV confidence intervals to compute reliable data convergence criteria for optimal sampling of flow statistics. Results are compared using experimental and synthetic data, and recommended guidelines and procedures leveraging estimated PIV confidence intervals for efficient sampling for converged statistics are provided.
Robust DEA under discrete uncertain data: a case study of Iranian electricity distribution companies

NASA Astrophysics Data System (ADS)

Hafezalkotob, Ashkan; Haji-Sami, Elham; Omrani, Hashem

2015-06-01

Crisp input and output data are fundamentally indispensable in traditional data envelopment analysis (DEA). However, the real-world problems often deal with imprecise or ambiguous data. In this paper, we propose a novel robust data envelopment model (RDEA) to investigate the efficiencies of decision-making units (DMU) when there are discrete uncertain input and output data. The method is based upon the discrete robust optimization approaches proposed by Mulvey et al. (1995) that utilizes probable scenarios to capture the effect of ambiguous data in the case study. Our primary concern in this research is evaluating electricity distribution companies under uncertainty about input/output data. To illustrate the ability of proposed model, a numerical example of 38 Iranian electricity distribution companies is investigated. There are a large amount ambiguous data about these companies. Some electricity distribution companies may not report clear and real statistics to the government. Thus, it is needed to utilize a prominent approach to deal with this uncertainty. The results reveal that the RDEA model is suitable and reliable for target setting based on decision makers (DM's) preferences when there are uncertain input/output data.
Optimization of Robust HPLC Method for Quantitation of Ambroxol Hydrochloride and Roxithromycin Using a DoE Approach.

PubMed

Patel, Rashmin B; Patel, Nilay M; Patel, Mrunali R; Solanki, Ajay B

2017-03-01

The aim of this work was to develop and optimize a robust HPLC method for the separation and quantitation of ambroxol hydrochloride and roxithromycin utilizing Design of Experiment (DoE) approach. The Plackett-Burman design was used to assess the impact of independent variables (concentration of organic phase, mobile phase pH, flow rate and column temperature) on peak resolution, USP tailing and number of plates. A central composite design was utilized to evaluate the main, interaction, and quadratic effects of independent variables on the selected dependent variables. The optimized HPLC method was validated based on ICH Q2R1 guideline and was used to separate and quantify ambroxol hydrochloride and roxithromycin in tablet formulations. The findings showed that DoE approach could be effectively applied to optimize a robust HPLC method for quantification of ambroxol hydrochloride and roxithromycin in tablet formulations. Statistical comparison between results of proposed and reported HPLC method revealed no significant difference; indicating the ability of proposed HPLC method for analysis of ambroxol hydrochloride and roxithromycin in pharmaceutical formulations. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Adaptive and robust statistical methods for processing near-field scanning microwave microscopy images.

PubMed

Coakley, K J; Imtiaz, A; Wallis, T M; Weber, J C; Berweger, S; Kabos, P

2015-03-01

Near-field scanning microwave microscopy offers great potential to facilitate characterization, development and modeling of materials. By acquiring microwave images at multiple frequencies and amplitudes (along with the other modalities) one can study material and device physics at different lateral and depth scales. Images are typically noisy and contaminated by artifacts that can vary from scan line to scan line and planar-like trends due to sample tilt errors. Here, we level images based on an estimate of a smooth 2-d trend determined with a robust implementation of a local regression method. In this robust approach, features and outliers which are not due to the trend are automatically downweighted. We denoise images with the Adaptive Weights Smoothing method. This method smooths out additive noise while preserving edge-like features in images. We demonstrate the feasibility of our methods on topography images and microwave |S11| images. For one challenging test case, we demonstrate that our method outperforms alternative methods from the scanning probe microscopy data analysis software package Gwyddion. Our methods should be useful for massive image data sets where manual selection of landmarks or image subsets by a user is impractical. Published by Elsevier B.V.
Ecological Momentary Assessments and Automated Time Series Analysis to Promote Tailored Health Care: A Proof-of-Principle Study.

PubMed

van der Krieke, Lian; Emerencia, Ando C; Bos, Elisabeth H; Rosmalen, Judith Gm; Riese, Harriëtte; Aiello, Marco; Sytema, Sjoerd; de Jonge, Peter

2015-08-07

Health promotion can be tailored by combining ecological momentary assessments (EMA) with time series analysis. This combined method allows for studying the temporal order of dynamic relationships among variables, which may provide concrete indications for intervention. However, application of this method in health care practice is hampered because analyses are conducted manually and advanced statistical expertise is required. This study aims to show how this limitation can be overcome by introducing automated vector autoregressive modeling (VAR) of EMA data and to evaluate its feasibility through comparisons with results of previously published manual analyses. We developed a Web-based open source application, called AutoVAR, which automates time series analyses of EMA data and provides output that is intended to be interpretable by nonexperts. The statistical technique we used was VAR. AutoVAR tests and evaluates all possible VAR models within a given combinatorial search space and summarizes their results, thereby replacing the researcher's tasks of conducting the analysis, making an informed selection of models, and choosing the best model. We compared the output of AutoVAR to the output of a previously published manual analysis (n=4). An illustrative example consisting of 4 analyses was provided. Compared to the manual output, the AutoVAR output presents similar model characteristics and statistical results in terms of the Akaike information criterion, the Bayesian information criterion, and the test statistic of the Granger causality test. Results suggest that automated analysis and interpretation of times series is feasible. Compared to a manual procedure, the automated procedure is more robust and can save days of time. These findings may pave the way for using time series analysis for health promotion on a larger scale. AutoVAR was evaluated using the results of a previously conducted manual analysis. Analysis of additional datasets is needed in order to validate and refine the application for general use.
Ecological Momentary Assessments and Automated Time Series Analysis to Promote Tailored Health Care: A Proof-of-Principle Study

PubMed Central

Emerencia, Ando C; Bos, Elisabeth H; Rosmalen, Judith GM; Riese, Harriëtte; Aiello, Marco; Sytema, Sjoerd; de Jonge, Peter

2015-01-01

Background Health promotion can be tailored by combining ecological momentary assessments (EMA) with time series analysis. This combined method allows for studying the temporal order of dynamic relationships among variables, which may provide concrete indications for intervention. However, application of this method in health care practice is hampered because analyses are conducted manually and advanced statistical expertise is required. Objective This study aims to show how this limitation can be overcome by introducing automated vector autoregressive modeling (VAR) of EMA data and to evaluate its feasibility through comparisons with results of previously published manual analyses. Methods We developed a Web-based open source application, called AutoVAR, which automates time series analyses of EMA data and provides output that is intended to be interpretable by nonexperts. The statistical technique we used was VAR. AutoVAR tests and evaluates all possible VAR models within a given combinatorial search space and summarizes their results, thereby replacing the researcher’s tasks of conducting the analysis, making an informed selection of models, and choosing the best model. We compared the output of AutoVAR to the output of a previously published manual analysis (n=4). Results An illustrative example consisting of 4 analyses was provided. Compared to the manual output, the AutoVAR output presents similar model characteristics and statistical results in terms of the Akaike information criterion, the Bayesian information criterion, and the test statistic of the Granger causality test. Conclusions Results suggest that automated analysis and interpretation of times series is feasible. Compared to a manual procedure, the automated procedure is more robust and can save days of time. These findings may pave the way for using time series analysis for health promotion on a larger scale. AutoVAR was evaluated using the results of a previously conducted manual analysis. Analysis of additional datasets is needed in order to validate and refine the application for general use. PMID:26254160
A statistical framework for neuroimaging data analysis based on mutual information estimated via a gaussian copula

PubMed Central

Giordano, Bruno L.; Kayser, Christoph; Rousselet, Guillaume A.; Gross, Joachim; Schyns, Philippe G.

2016-01-01

Abstract We begin by reviewing the statistical framework of information theory as applicable to neuroimaging data analysis. A major factor hindering wider adoption of this framework in neuroimaging is the difficulty of estimating information theoretic quantities in practice. We present a novel estimation technique that combines the statistical theory of copulas with the closed form solution for the entropy of Gaussian variables. This results in a general, computationally efficient, flexible, and robust multivariate statistical framework that provides effect sizes on a common meaningful scale, allows for unified treatment of discrete, continuous, unidimensional and multidimensional variables, and enables direct comparisons of representations from behavioral and brain responses across any recording modality. We validate the use of this estimate as a statistical test within a neuroimaging context, considering both discrete stimulus classes and continuous stimulus features. We also present examples of analyses facilitated by these developments, including application of multivariate analyses to MEG planar magnetic field gradients, and pairwise temporal interactions in evoked EEG responses. We show the benefit of considering the instantaneous temporal derivative together with the raw values of M/EEG signals as a multivariate response, how we can separately quantify modulations of amplitude and direction for vector quantities, and how we can measure the emergence of novel information over time in evoked responses. Open‐source Matlab and Python code implementing the new methods accompanies this article. Hum Brain Mapp 38:1541–1573, 2017. © 2016 Wiley Periodicals, Inc. PMID:27860095
Change-point analysis of geophysical time-series: application to landslide displacement rate (Séchilienne rock avalanche, France)

NASA Astrophysics Data System (ADS)

Amorese, D.; Grasso, J.-R.; Garambois, S.; Font, M.

2018-05-01

The rank-sum multiple change-point method is a robust statistical procedure designed to search for the optimal number and the location of change points in an arbitrary continue or discrete sequence of values. As such, this procedure can be used to analyse time-series data. Twelve years of robust data sets for the Séchilienne (French Alps) rockslide show a continuous increase in average displacement rate from 50 to 280 mm per month, in the 2004-2014 period, followed by a strong decrease back to 50 mm per month in the 2014-2015 period. When possible kinematic phases are tentatively suggested in previous studies, its solely rely on the basis of empirical threshold values. In this paper, we analyse how the use of a statistical algorithm for change-point detection helps to better understand time phases in landslide kinematics. First, we test the efficiency of the statistical algorithm on geophysical benchmark data, these data sets (stream flows and Northern Hemisphere temperatures) being already analysed by independent statistical tools. Second, we apply the method to 12-yr daily time-series of the Séchilienne landslide, for rainfall and displacement data, from 2003 December to 2015 December, in order to quantitatively extract changes in landslide kinematics. We find two strong significant discontinuities in the weekly cumulated rainfall values: an average rainfall rate increase is resolved in 2012 April and a decrease in 2014 August. Four robust changes are highlighted in the displacement time-series (2008 May, 2009 November-December-2010 January, 2012 September and 2014 March), the 2010 one being preceded by a significant but weak rainfall rate increase (in 2009 November). Accordingly, we are able to quantitatively define five kinematic stages for the Séchilienne rock avalanche during this period. The synchronization between the rainfall and displacement rate, only resolved at the end of 2009 and beginning of 2010, corresponds to a remarkable change (fourfold increase in mean displacement rate) in the landslide kinematic. This suggests that an increase of the rainfall is able to drive an increase of the landslide displacement rate, but that most of the kinematics of the landslide is not directly attributable to rainfall amount. The detailed exploration of the characteristics of the five kinematic stages suggests that the weekly averaged displacement rates are more tied to the frequency or rainy days than to the rainfall rate values. These results suggest the pattern of Séchilienne rock avalanche is consistent with the previous findings that landslide kinematics is dependent upon not only rainfall but also soil moisture conditions (as known as being more strongly related to precipitation frequency than to precipitation amount). Finally, our analysis of the displacement rate time-series pinpoints a susceptibility change of slope response to rainfall, as being slower before the end of 2009 than after, respectively. The kinematic history as depicted by statistical tools opens new routes to understand the apparent complexity of Séchilienne landslide kinematic.
Robust Methods for Moderation Analysis with a Two-Level Regression Model.

PubMed

Yang, Miao; Yuan, Ke-Hai

2016-01-01

Moderation analysis has many applications in social sciences. Most widely used estimation methods for moderation analysis assume that errors are normally distributed and homoscedastic. When these assumptions are not met, the results from a classical moderation analysis can be misleading. For more reliable moderation analysis, this article proposes two robust methods with a two-level regression model when the predictors do not contain measurement error. One method is based on maximum likelihood with Student's t distribution and the other is based on M-estimators with Huber-type weights. An algorithm for obtaining the robust estimators is developed. Consistent estimates of standard errors of the robust estimators are provided. The robust approaches are compared against normal-distribution-based maximum likelihood (NML) with respect to power and accuracy of parameter estimates through a simulation study. Results show that the robust approaches outperform NML under various distributional conditions. Application of the robust methods is illustrated through a real data example. An R program is developed and documented to facilitate the application of the robust methods.
Improving Incremental Balance in the GSI 3DVAR Analysis System

NASA Technical Reports Server (NTRS)

Errico, Ronald M.; Yang, Runhua; Kleist, Daryl T.; Parrish, David F.; Derber, John C.; Treadon, Russ

2008-01-01

The Gridpoint Statistical Interpolation (GSI) analysis system is a unified global/regional 3DVAR analysis code that has been under development for several years at the National Centers for Environmental Prediction (NCEP)/Environmental Modeling Center. It has recently been implemented into operations at NCEP in both the global and North American data assimilation systems (GDAS and NDAS). An important aspect of this development has been improving the balance of the analysis produced by GSI. The improved balance between variables has been achieved through the inclusion of a Tangent Linear Normal Mode Constraint (TLNMC). The TLNMC method has proven to be very robust and effective. The TLNMC as part of the global GSI system has resulted in substantial improvement in data assimilation both at NCEP and at the NASA Global Modeling and Assimilation Office (GMAO).
Sensitivity of wildlife habitat models to uncertainties in GIS data

NASA Technical Reports Server (NTRS)

Stoms, David M.; Davis, Frank W.; Cogan, Christopher B.

1992-01-01

Decision makers need to know the reliability of output products from GIS analysis. For many GIS applications, it is not possible to compare these products to an independent measure of 'truth'. Sensitivity analysis offers an alternative means of estimating reliability. In this paper, we present a CIS-based statistical procedure for estimating the sensitivity of wildlife habitat models to uncertainties in input data and model assumptions. The approach is demonstrated in an analysis of habitat associations derived from a GIS database for the endangered California condor. Alternative data sets were generated to compare results over a reasonable range of assumptions about several sources of uncertainty. Sensitivity analysis indicated that condor habitat associations are relatively robust, and the results have increased our confidence in our initial findings. Uncertainties and methods described in the paper have general relevance for many GIS applications.
ARTiiFACT: a tool for heart rate artifact processing and heart rate variability analysis.

PubMed

Kaufmann, Tobias; Sütterlin, Stefan; Schulz, Stefan M; Vögele, Claus

2011-12-01

The importance of appropriate handling of artifacts in interbeat interval (IBI) data must not be underestimated. Even a single artifact may cause unreliable heart rate variability (HRV) results. Thus, a robust artifact detection algorithm and the option for manual intervention by the researcher form key components for confident HRV analysis. Here, we present ARTiiFACT, a software tool for processing electrocardiogram and IBI data. Both automated and manual artifact detection and correction are available in a graphical user interface. In addition, ARTiiFACT includes time- and frequency-based HRV analyses and descriptive statistics, thus offering the basic tools for HRV analysis. Notably, all program steps can be executed separately and allow for data export, thus offering high flexibility and interoperability with a whole range of applications.
Accounting for immunoprecipitation efficiencies in the statistical analysis of ChIP-seq data.

PubMed

Bao, Yanchun; Vinciotti, Veronica; Wit, Ernst; 't Hoen, Peter A C

2013-05-30

ImmunoPrecipitation (IP) efficiencies may vary largely between different antibodies and between repeated experiments with the same antibody. These differences have a large impact on the quality of ChIP-seq data: a more efficient experiment will necessarily lead to a higher signal to background ratio, and therefore to an apparent larger number of enriched regions, compared to a less efficient experiment. In this paper, we show how IP efficiencies can be explicitly accounted for in the joint statistical modelling of ChIP-seq data. We fit a latent mixture model to eight experiments on two proteins, from two laboratories where different antibodies are used for the two proteins. We use the model parameters to estimate the efficiencies of individual experiments, and find that these are clearly different for the different laboratories, and amongst technical replicates from the same lab. When we account for ChIP efficiency, we find more regions bound in the more efficient experiments than in the less efficient ones, at the same false discovery rate. A priori knowledge of the same number of binding sites across experiments can also be included in the model for a more robust detection of differentially bound regions among two different proteins. We propose a statistical model for the detection of enriched and differentially bound regions from multiple ChIP-seq data sets. The framework that we present accounts explicitly for IP efficiencies in ChIP-seq data, and allows to model jointly, rather than individually, replicates and experiments from different proteins, leading to more robust biological conclusions.
An Alternative Flight Software Trigger Paradigm: Applying Multivariate Logistic Regression to Sense Trigger Conditions Using Inaccurate or Scarce Information

NASA Technical Reports Server (NTRS)

Smith, Kelly M.; Gay, Robert S.; Stachowiak, Susan J.

2013-01-01

In late 2014, NASA will fly the Orion capsule on a Delta IV-Heavy rocket for the Exploration Flight Test-1 (EFT-1) mission. For EFT-1, the Orion capsule will be flying with a new GPS receiver and new navigation software. Given the experimental nature of the flight, the flight software must be robust to the loss of GPS measurements. Once the high-speed entry is complete, the drogue parachutes must be deployed within the proper conditions to stabilize the vehicle prior to deploying the main parachutes. When GPS is available in nominal operations, the vehicle will deploy the drogue parachutes based on an altitude trigger. However, when GPS is unavailable, the navigated altitude errors become excessively large, driving the need for a backup barometric altimeter to improve altitude knowledge. In order to increase overall robustness, the vehicle also has an alternate method of triggering the parachute deployment sequence based on planet-relative velocity if both the GPS and the barometric altimeter fail. However, this backup trigger results in large altitude errors relative to the targeted altitude. Motivated by this challenge, this paper demonstrates how logistic regression may be employed to semi-automatically generate robust triggers based on statistical analysis. Logistic regression is used as a ground processor pre-flight to develop a statistical classifier. The classifier would then be implemented in flight software and executed in real-time. This technique offers improved performance even in the face of highly inaccurate measurements. Although the logistic regression-based trigger approach will not be implemented within EFT-1 flight software, the methodology can be carried forward for future missions and vehicles.
An Alternative Flight Software Paradigm: Applying Multivariate Logistic Regression to Sense Trigger Conditions using Inaccurate or Scarce Information

NASA Technical Reports Server (NTRS)

Smith, Kelly; Gay, Robert; Stachowiak, Susan

2013-01-01

In late 2014, NASA will fly the Orion capsule on a Delta IV-Heavy rocket for the Exploration Flight Test-1 (EFT-1) mission. For EFT-1, the Orion capsule will be flying with a new GPS receiver and new navigation software. Given the experimental nature of the flight, the flight software must be robust to the loss of GPS measurements. Once the high-speed entry is complete, the drogue parachutes must be deployed within the proper conditions to stabilize the vehicle prior to deploying the main parachutes. When GPS is available in nominal operations, the vehicle will deploy the drogue parachutes based on an altitude trigger. However, when GPS is unavailable, the navigated altitude errors become excessively large, driving the need for a backup barometric altimeter to improve altitude knowledge. In order to increase overall robustness, the vehicle also has an alternate method of triggering the parachute deployment sequence based on planet-relative velocity if both the GPS and the barometric altimeter fail. However, this backup trigger results in large altitude errors relative to the targeted altitude. Motivated by this challenge, this paper demonstrates how logistic regression may be employed to semi-automatically generate robust triggers based on statistical analysis. Logistic regression is used as a ground processor pre-flight to develop a statistical classifier. The classifier would then be implemented in flight software and executed in real-time. This technique offers improved performance even in the face of highly inaccurate measurements. Although the logistic regression-based trigger approach will not be implemented within EFT-1 flight software, the methodology can be carried forward for future missions and vehicles
Robust scoring functions for protein-ligand interactions with quantum chemical charge models.

PubMed

Wang, Jui-Chih; Lin, Jung-Hsin; Chen, Chung-Ming; Perryman, Alex L; Olson, Arthur J

2011-10-24

Ordinary least-squares (OLS) regression has been used widely for constructing the scoring functions for protein-ligand interactions. However, OLS is very sensitive to the existence of outliers, and models constructed using it are easily affected by the outliers or even the choice of the data set. On the other hand, determination of atomic charges is regarded as of central importance, because the electrostatic interaction is known to be a key contributing factor for biomolecular association. In the development of the AutoDock4 scoring function, only OLS was conducted, and the simple Gasteiger method was adopted. It is therefore of considerable interest to see whether more rigorous charge models could improve the statistical performance of the AutoDock4 scoring function. In this study, we have employed two well-established quantum chemical approaches, namely the restrained electrostatic potential (RESP) and the Austin-model 1-bond charge correction (AM1-BCC) methods, to obtain atomic partial charges, and we have compared how different charge models affect the performance of AutoDock4 scoring functions. In combination with robust regression analysis and outlier exclusion, our new protein-ligand free energy regression model with AM1-BCC charges for ligands and Amber99SB charges for proteins achieve lowest root-mean-squared error of 1.637 kcal/mol for the training set of 147 complexes and 2.176 kcal/mol for the external test set of 1427 complexes. The assessment for binding pose prediction with the 100 external decoy sets indicates very high success rate of 87% with the criteria of predicted root-mean-squared deviation of less than 2 Å. The success rates and statistical performance of our robust scoring functions are only weakly class-dependent (hydrophobic, hydrophilic, or mixed).
A pre-operative planning for endoprosthetic human tracheal implantation: a decision support system based on robust design of experiments.

PubMed

Trabelsi, O; Villalobos, J L López; Ginel, A; Cortes, E Barrot; Doblaré, M

2014-05-01

Swallowing depends on physiological variables that have a decisive influence on the swallowing capacity and on the tracheal stress distribution. Prosthetic implantation modifies these values and the overall performance of the trachea. The objective of this work was to develop a decision support system based on experimental, numerical and statistical approaches, with clinical verification, to help the thoracic surgeon in deciding the position and appropriate dimensions of a Dumon prosthesis for a specific patient in an optimal time and with sufficient robustness. A code for mesh adaptation to any tracheal geometry was implemented and used to develop a robust experimental design, based on the Taguchi's method and the analysis of variance. This design was able to establish the main swallowing influencing factors. The equations to fit the stress and the vertical displacement distributions were obtained. The resulting fitted values were compared to those calculated directly by the finite element method (FEM). Finally, a checking and clinical validation of the statistical study were made, by studying two cases of real patients. The vertical displacements and principal stress distribution obtained for the specific tracheal model were in agreement with those calculated by FE simulations with a maximum absolute error of 1.2 mm and 0.17 MPa, respectively. It was concluded that the resulting decision support tool provides a fast, accurate and simple tool for the thoracic surgeon to predict the stress state of the trachea and the reduction in the ability to swallow after implantation. Thus, it will help them in taking decisions during pre-operative planning of tracheal interventions.
Validation tools for image segmentation

NASA Astrophysics Data System (ADS)

Padfield, Dirk; Ross, James

2009-02-01

A large variety of image analysis tasks require the segmentation of various regions in an image. For example, segmentation is required to generate accurate models of brain pathology that are important components of modern diagnosis and therapy. While the manual delineation of such structures gives accurate information, the automatic segmentation of regions such as the brain and tumors from such images greatly enhances the speed and repeatability of quantifying such structures. The ubiquitous need for such algorithms has lead to a wide range of image segmentation algorithms with various assumptions, parameters, and robustness. The evaluation of such algorithms is an important step in determining their effectiveness. Therefore, rather than developing new segmentation algorithms, we here describe validation methods for segmentation algorithms. Using similarity metrics comparing the automatic to manual segmentations, we demonstrate methods for optimizing the parameter settings for individual cases and across a collection of datasets using the Design of Experiment framework. We then employ statistical analysis methods to compare the effectiveness of various algorithms. We investigate several region-growing algorithms from the Insight Toolkit and compare their accuracy to that of a separate statistical segmentation algorithm. The segmentation algorithms are used with their optimized parameters to automatically segment the brain and tumor regions in MRI images of 10 patients. The validation tools indicate that none of the ITK algorithms studied are able to outperform with statistical significance the statistical segmentation algorithm although they perform reasonably well considering their simplicity.
Statistical analysis of water-quality data containing multiple detection limits II: S-language software for nonparametric distribution modeling and hypothesis testing

USGS Publications Warehouse

Lee, L.; Helsel, D.

2007-01-01

Analysis of low concentrations of trace contaminants in environmental media often results in left-censored data that are below some limit of analytical precision. Interpretation of values becomes complicated when there are multiple detection limits in the data-perhaps as a result of changing analytical precision over time. Parametric and semi-parametric methods, such as maximum likelihood estimation and robust regression on order statistics, can be employed to model distributions of multiply censored data and provide estimates of summary statistics. However, these methods are based on assumptions about the underlying distribution of data. Nonparametric methods provide an alternative that does not require such assumptions. A standard nonparametric method for estimating summary statistics of multiply-censored data is the Kaplan-Meier (K-M) method. This method has seen widespread usage in the medical sciences within a general framework termed "survival analysis" where it is employed with right-censored time-to-failure data. However, K-M methods are equally valid for the left-censored data common in the geosciences. Our S-language software provides an analytical framework based on K-M methods that is tailored to the needs of the earth and environmental sciences community. This includes routines for the generation of empirical cumulative distribution functions, prediction or exceedance probabilities, and related confidence limits computation. Additionally, our software contains K-M-based routines for nonparametric hypothesis testing among an unlimited number of grouping variables. A primary characteristic of K-M methods is that they do not perform extrapolation and interpolation. Thus, these routines cannot be used to model statistics beyond the observed data range or when linear interpolation is desired. For such applications, the aforementioned parametric and semi-parametric methods must be used.

Robustness of location estimators under t-distributions: a literature review

NASA Astrophysics Data System (ADS)

Sumarni, C.; Sadik, K.; Notodiputro, K. A.; Sartono, B.

2017-03-01

The assumption of normality is commonly used in estimation of parameters in statistical modelling, but this assumption is very sensitive to outliers. The t-distribution is more robust than the normal distribution since the t-distributions have longer tails. The robustness measures of location estimators under t-distributions are reviewed and discussed in this paper. For the purpose of illustration we use the onion yield data which includes outliers as a case study and showed that the t model produces better fit than the normal model.
Standard reference water samples for rare earth element determinations

USGS Publications Warehouse

Verplanck, P.L.; Antweiler, Ronald C.; Nordstrom, D. Kirk; Taylor, Howard E.

2001-01-01

Standard reference water samples (SRWS) were collected from two mine sites, one near Ophir, CO, USA and the other near Redding, CA, USA. The samples were filtered, preserved, and analyzed for rare earth element (REE) concentrations (La, Ce, Pr, Nd, Sm, Eu, Gd, Tb, Dy, Ho, Er, Tm, Yb, and Lu) by inductively coupled plasma-mass spectrometry (ICP-MS). These two samples were acid mine waters with elevated concentrations of REEs (0.45-161 ??g/1). Seventeen international laboratories participated in a 'round-robin' chemical analysis program, which made it possible to evaluate the data by robust statistical procedures that are insensitive to outliers. The resulting most probable values are reported. Ten to 15 of the participants also reported values for Ba, Y, and Sc. Field parameters, major ion, and other trace element concentrations, not subject to statistical evaluation, are provided.
The Graphical Display of Simulation Results, with Applications to the Comparison of Robust IRT Estimators of Ability.

ERIC Educational Resources Information Center

Thissen, David; Wainer, Howard

Simulation studies of the performance of (potentially) robust statistical estimation produce large quantities of numbers in the form of performance indices of the various estimators under various conditions. This report presents a multivariate graphical display used to aid in the digestion of the plentiful results in a current study of Item…
THESEUS: maximum likelihood superpositioning and analysis of macromolecular structures.

PubMed

Theobald, Douglas L; Wuttke, Deborah S

2006-09-01

THESEUS is a command line program for performing maximum likelihood (ML) superpositions and analysis of macromolecular structures. While conventional superpositioning methods use ordinary least-squares (LS) as the optimization criterion, ML superpositions provide substantially improved accuracy by down-weighting variable structural regions and by correcting for correlations among atoms. ML superpositioning is robust and insensitive to the specific atoms included in the analysis, and thus it does not require subjective pruning of selected variable atomic coordinates. Output includes both likelihood-based and frequentist statistics for accurate evaluation of the adequacy of a superposition and for reliable analysis of structural similarities and differences. THESEUS performs principal components analysis for analyzing the complex correlations found among atoms within a structural ensemble. ANSI C source code and selected binaries for various computing platforms are available under the GNU open source license from http://monkshood.colorado.edu/theseus/ or http://www.theseus3d.org.
Sampling stored product insect pests: a comparison of four statistical sampling models for probability of pest detection

USDA-ARS?s Scientific Manuscript database

Statistically robust sampling strategies form an integral component of grain storage and handling activities throughout the world. Developing sampling strategies to target biological pests such as insects in stored grain is inherently difficult due to species biology and behavioral characteristics. ...
Impact of genotyping errors on statistical power of association tests in genomic analyses: A case study

PubMed Central

Hou, Lin; Sun, Ning; Mane, Shrikant; Sayward, Fred; Rajeevan, Nallakkandi; Cheung, Kei-Hoi; Cho, Kelly; Pyarajan, Saiju; Aslan, Mihaela; Miller, Perry; Harvey, Philip D.; Gaziano, J. Michael; Concato, John; Zhao, Hongyu

2017-01-01

A key step in genomic studies is to assess high throughput measurements across millions of markers for each participant’s DNA, either using microarrays or sequencing techniques. Accurate genotype calling is essential for downstream statistical analysis of genotype-phenotype associations, and next generation sequencing (NGS) has recently become a more common approach in genomic studies. How the accuracy of variant calling in NGS-based studies affects downstream association analysis has not, however, been studied using empirical data in which both microarrays and NGS were available. In this article, we investigate the impact of variant calling errors on the statistical power to identify associations between single nucleotides and disease, and on associations between multiple rare variants and disease. Both differential and nondifferential genotyping errors are considered. Our results show that the power of burden tests for rare variants is strongly influenced by the specificity in variant calling, but is rather robust with regard to sensitivity. By using the variant calling accuracies estimated from a substudy of a Cooperative Studies Program project conducted by the Department of Veterans Affairs, we show that the power of association tests is mostly retained with commonly adopted variant calling pipelines. An R package, GWAS.PC, is provided to accommodate power analysis that takes account of genotyping errors (http://zhaocenter.org/software/). PMID:28019059
Ankle plantarflexion strength in rearfoot and forefoot runners: a novel clusteranalytic approach.

PubMed

Liebl, Dominik; Willwacher, Steffen; Hamill, Joseph; Brüggemann, Gert-Peter

2014-06-01

The purpose of the present study was to test for differences in ankle plantarflexion strengths of habitually rearfoot and forefoot runners. In order to approach this issue, we revisit the problem of classifying different footfall patterns in human runners. A dataset of 119 subjects running shod and barefoot (speed 3.5m/s) was analyzed. The footfall patterns were clustered by a novel statistical approach, which is motivated by advances in the statistical literature on functional data analysis. We explain the novel statistical approach in detail and compare it to the classically used strike index of Cavanagh and Lafortune (1980). The two groups found by the new cluster approach are well interpretable as a forefoot and a rearfoot footfall groups. The subsequent comparison study of the clustered subjects reveals that runners with a forefoot footfall pattern are capable of producing significantly higher joint moments in a maximum voluntary contraction (MVC) of their ankle plantarflexor muscles tendon units; difference in means: 0.28Nm/kg. This effect remains significant after controlling for an additional gender effect and for differences in training levels. Our analysis confirms the hypothesis that forefoot runners have a higher mean MVC plantarflexion strength than rearfoot runners. Furthermore, we demonstrate that our proposed stochastic cluster analysis provides a robust and useful framework for clustering foot strikes. Copyright © 2014 Elsevier B.V. All rights reserved.
Nonindependence and sensitivity analyses in ecological and evolutionary meta-analyses.

PubMed

Noble, Daniel W A; Lagisz, Malgorzata; O'dea, Rose E; Nakagawa, Shinichi

2017-05-01

Meta-analysis is an important tool for synthesizing research on a variety of topics in ecology and evolution, including molecular ecology, but can be susceptible to nonindependence. Nonindependence can affect two major interrelated components of a meta-analysis: (i) the calculation of effect size statistics and (ii) the estimation of overall meta-analytic estimates and their uncertainty. While some solutions to nonindependence exist at the statistical analysis stages, there is little advice on what to do when complex analyses are not possible, or when studies with nonindependent experimental designs exist in the data. Here we argue that exploring the effects of procedural decisions in a meta-analysis (e.g. inclusion of different quality data, choice of effect size) and statistical assumptions (e.g. assuming no phylogenetic covariance) using sensitivity analyses are extremely important in assessing the impact of nonindependence. Sensitivity analyses can provide greater confidence in results and highlight important limitations of empirical work (e.g. impact of study design on overall effects). Despite their importance, sensitivity analyses are seldom applied to problems of nonindependence. To encourage better practice for dealing with nonindependence in meta-analytic studies, we present accessible examples demonstrating the impact that ignoring nonindependence can have on meta-analytic estimates. We also provide pragmatic solutions for dealing with nonindependent study designs, and for analysing dependent effect sizes. Additionally, we offer reporting guidelines that will facilitate disclosure of the sources of nonindependence in meta-analyses, leading to greater transparency and more robust conclusions. © 2017 John Wiley & Sons Ltd.
A comparative study of multivariable robustness analysis methods as applied to integrated flight and propulsion control

NASA Technical Reports Server (NTRS)

Schierman, John D.; Lovell, T. A.; Schmidt, David K.

1993-01-01

Three multivariable robustness analysis methods are compared and contrasted. The focus of the analysis is on system stability and performance robustness to uncertainty in the coupling dynamics between two interacting subsystems. Of particular interest is interacting airframe and engine subsystems, and an example airframe/engine vehicle configuration is utilized in the demonstration of these approaches. The singular value (SV) and structured singular value (SSV) analysis methods are compared to a method especially well suited for analysis of robustness to uncertainties in subsystem interactions. This approach is referred to here as the interacting subsystem (IS) analysis method. This method has been used previously to analyze airframe/engine systems, emphasizing the study of stability robustness. However, performance robustness is also investigated here, and a new measure of allowable uncertainty for acceptable performance robustness is introduced. The IS methodology does not require plant uncertainty models to measure the robustness of the system, and is shown to yield valuable information regarding the effects of subsystem interactions. In contrast, the SV and SSV methods allow for the evaluation of the robustness of the system to particular models of uncertainty, and do not directly indicate how the airframe (engine) subsystem interacts with the engine (airframe) subsystem.
Robust Statistical Detection of Power-Law Cross-Correlation.

PubMed

Blythe, Duncan A J; Nikulin, Vadim V; Müller, Klaus-Robert

2016-06-02

We show that widely used approaches in statistical physics incorrectly indicate the existence of power-law cross-correlations between financial stock market fluctuations measured over several years and the neuronal activity of the human brain lasting for only a few minutes. While such cross-correlations are nonsensical, no current methodology allows them to be reliably discarded, leaving researchers at greater risk when the spurious nature of cross-correlations is not clear from the unrelated origin of the time series and rather requires careful statistical estimation. Here we propose a theory and method (PLCC-test) which allows us to rigorously and robustly test for power-law cross-correlations, correctly detecting genuine and discarding spurious cross-correlations, thus establishing meaningful relationships between processes in complex physical systems. Our method reveals for the first time the presence of power-law cross-correlations between amplitudes of the alpha and beta frequency ranges of the human electroencephalogram.
Robust Statistical Detection of Power-Law Cross-Correlation

PubMed Central

Blythe, Duncan A. J.; Nikulin, Vadim V.; Müller, Klaus-Robert

2016-01-01

We show that widely used approaches in statistical physics incorrectly indicate the existence of power-law cross-correlations between financial stock market fluctuations measured over several years and the neuronal activity of the human brain lasting for only a few minutes. While such cross-correlations are nonsensical, no current methodology allows them to be reliably discarded, leaving researchers at greater risk when the spurious nature of cross-correlations is not clear from the unrelated origin of the time series and rather requires careful statistical estimation. Here we propose a theory and method (PLCC-test) which allows us to rigorously and robustly test for power-law cross-correlations, correctly detecting genuine and discarding spurious cross-correlations, thus establishing meaningful relationships between processes in complex physical systems. Our method reveals for the first time the presence of power-law cross-correlations between amplitudes of the alpha and beta frequency ranges of the human electroencephalogram. PMID:27250630
Rigorous force field optimization principles based on statistical distance minimization

DOE PAGES

Vlcek, Lukas; Chialvo, Ariel A.

2015-10-12

We use the concept of statistical distance to define a measure of distinguishability between a pair of statistical mechanical systems, i.e., a model and its target, and show that its minimization leads to general convergence of the model’s static measurable properties to those of the target. Here we exploit this feature to define a rigorous basis for the development of accurate and robust effective molecular force fields that are inherently compatible with coarse-grained experimental data. The new model optimization principles and their efficient implementation are illustrated through selected examples, whose outcome demonstrates the higher robustness and predictive accuracy of themore » approach compared to other currently used methods, such as force matching and relative entropy minimization. We also discuss relations between the newly developed principles and established thermodynamic concepts, which include the Gibbs-Bogoliubov inequality and the thermodynamic length.« less
Combining band recovery data and Pollock's robust design to model temporary and permanent emigration

USGS Publications Warehouse

Lindberg, M.S.; Kendall, W.L.; Hines, J.E.; Anderson, M.G.

2001-01-01

Capture-recapture models are widely used to estimate demographic parameters of marked populations. Recently, this statistical theory has been extended to modeling dispersal of open populations. Multistate models can be used to estimate movement probabilities among subdivided populations if multiple sites are sampled. Frequently, however, sampling is limited to a single site. Models described by Burnham (1993, in Marked Individuals in the Study of Bird Populations, 199-213), which combined open population capture-recapture and band-recovery models, can be used to estimate permanent emigration when sampling is limited to a single population. Similarly, Kendall, Nichols, and Hines (1997, Ecology 51, 563-578) developed models to estimate temporary emigration under Pollock's (1982, Journal of Wildlife Management 46, 757-760) robust design. We describe a likelihood-based approach to simultaneously estimate temporary and permanent emigration when sampling is limited to a single population. We use a sampling design that combines the robust design and recoveries of individuals obtained immediately following each sampling period. We present a general form for our model where temporary emigration is a first-order Markov process, and we discuss more restrictive models. We illustrate these models with analysis of data on marked Canvasback ducks. Our analysis indicates that probability of permanent emigration for adult female Canvasbacks was 0.193 (SE = 0.082) and that birds that were present at the study area in year i - 1 had a higher probability of presence in year i than birds that were not present in year i - 1.
Accounting for technical noise in differential expression analysis of single-cell RNA sequencing data.

PubMed

Jia, Cheng; Hu, Yu; Kelly, Derek; Kim, Junhyong; Li, Mingyao; Zhang, Nancy R

2017-11-02

Recent technological breakthroughs have made it possible to measure RNA expression at the single-cell level, thus paving the way for exploring expression heterogeneity among individual cells. Current single-cell RNA sequencing (scRNA-seq) protocols are complex and introduce technical biases that vary across cells, which can bias downstream analysis without proper adjustment. To account for cell-to-cell technical differences, we propose a statistical framework, TASC (Toolkit for Analysis of Single Cell RNA-seq), an empirical Bayes approach to reliably model the cell-specific dropout rates and amplification bias by use of external RNA spike-ins. TASC incorporates the technical parameters, which reflect cell-to-cell batch effects, into a hierarchical mixture model to estimate the biological variance of a gene and detect differentially expressed genes. More importantly, TASC is able to adjust for covariates to further eliminate confounding that may originate from cell size and cell cycle differences. In simulation and real scRNA-seq data, TASC achieves accurate Type I error control and displays competitive sensitivity and improved robustness to batch effects in differential expression analysis, compared to existing methods. TASC is programmed to be computationally efficient, taking advantage of multi-threaded parallelization. We believe that TASC will provide a robust platform for researchers to leverage the power of scRNA-seq. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Accounting for technical noise in differential expression analysis of single-cell RNA sequencing data

PubMed Central

Jia, Cheng; Hu, Yu; Kelly, Derek; Kim, Junhyong

2017-01-01

Abstract Recent technological breakthroughs have made it possible to measure RNA expression at the single-cell level, thus paving the way for exploring expression heterogeneity among individual cells. Current single-cell RNA sequencing (scRNA-seq) protocols are complex and introduce technical biases that vary across cells, which can bias downstream analysis without proper adjustment. To account for cell-to-cell technical differences, we propose a statistical framework, TASC (Toolkit for Analysis of Single Cell RNA-seq), an empirical Bayes approach to reliably model the cell-specific dropout rates and amplification bias by use of external RNA spike-ins. TASC incorporates the technical parameters, which reflect cell-to-cell batch effects, into a hierarchical mixture model to estimate the biological variance of a gene and detect differentially expressed genes. More importantly, TASC is able to adjust for covariates to further eliminate confounding that may originate from cell size and cell cycle differences. In simulation and real scRNA-seq data, TASC achieves accurate Type I error control and displays competitive sensitivity and improved robustness to batch effects in differential expression analysis, compared to existing methods. TASC is programmed to be computationally efficient, taking advantage of multi-threaded parallelization. We believe that TASC will provide a robust platform for researchers to leverage the power of scRNA-seq. PMID:29036714
Robust matching for voice recognition

NASA Astrophysics Data System (ADS)

Higgins, Alan; Bahler, L.; Porter, J.; Blais, P.

1994-10-01

This paper describes an automated method of comparing a voice sample of an unknown individual with samples from known speakers in order to establish or verify the individual's identity. The method is based on a statistical pattern matching approach that employs a simple training procedure, requires no human intervention (transcription, work or phonetic marketing, etc.), and makes no assumptions regarding the expected form of the statistical distributions of the observations. The content of the speech material (vocabulary, grammar, etc.) is not assumed to be constrained in any way. An algorithm is described which incorporates frame pruning and channel equalization processes designed to achieve robust performance with reasonable computational resources. An experimental implementation demonstrating the feasibility of the concept is described.
Multivariate Methods for Meta-Analysis of Genetic Association Studies.

PubMed

Dimou, Niki L; Pantavou, Katerina G; Braliou, Georgia G; Bagos, Pantelis G

2018-01-01

Multivariate meta-analysis of genetic association studies and genome-wide association studies has received a remarkable attention as it improves the precision of the analysis. Here, we review, summarize and present in a unified framework methods for multivariate meta-analysis of genetic association studies and genome-wide association studies. Starting with the statistical methods used for robust analysis and genetic model selection, we present in brief univariate methods for meta-analysis and we then scrutinize multivariate methodologies. Multivariate models of meta-analysis for a single gene-disease association studies, including models for haplotype association studies, multiple linked polymorphisms and multiple outcomes are discussed. The popular Mendelian randomization approach and special cases of meta-analysis addressing issues such as the assumption of the mode of inheritance, deviation from Hardy-Weinberg Equilibrium and gene-environment interactions are also presented. All available methods are enriched with practical applications and methodologies that could be developed in the future are discussed. Links for all available software implementing multivariate meta-analysis methods are also provided.
STATISTICS AND INTELLIGENCE IN DEVELOPING COUNTRIES: A NOTE.

PubMed

Kodila-Tedika, Oasis; Asongu, Simplice A; Azia-Dimbu, Florentin

2017-05-01

The purpose of this study is to assess the relationship between intelligence (or human capital) and the statistical capacity of developing countries. The line of inquiry is motivated essentially by the scarce literature on poor statistics in developing countries and an evolving stream of literature on the knowledge economy. A positive association is established between intelligence quotient (IQ) and statistical capacity. The relationship is robust to alternative specifications with varying conditioning information sets and control for outliers. Policy implications are discussed.
HIFI-C: a robust and fast method for determining NMR couplings from adaptive 3D to 2D projections.

PubMed

Cornilescu, Gabriel; Bahrami, Arash; Tonelli, Marco; Markley, John L; Eghbalnia, Hamid R

2007-08-01

We describe a novel method for the robust, rapid, and reliable determination of J couplings in multi-dimensional NMR coupling data, including small couplings from larger proteins. The method, "High-resolution Iterative Frequency Identification of Couplings" (HIFI-C) is an extension of the adaptive and intelligent data collection approach introduced earlier in HIFI-NMR. HIFI-C collects one or more optimally tilted two-dimensional (2D) planes of a 3D experiment, identifies peaks, and determines couplings with high resolution and precision. The HIFI-C approach, demonstrated here for the 3D quantitative J method, offers vital features that advance the goal of rapid and robust collection of NMR coupling data. (1) Tilted plane residual dipolar couplings (RDC) data are collected adaptively in order to offer an intelligent trade off between data collection time and accuracy. (2) Data from independent planes can provide a statistical measure of reliability for each measured coupling. (3) Fast data collection enables measurements in cases where sample stability is a limiting factor (for example in the presence of an orienting medium required for residual dipolar coupling measurements). (4) For samples that are stable, or in experiments involving relatively stronger couplings, robust data collection enables more reliable determinations of couplings in shorter time, particularly for larger biomolecules. As a proof of principle, we have applied the HIFI-C approach to the 3D quantitative J experiment to determine N-C' RDC values for three proteins ranging from 56 to 159 residues (including a homodimer with 111 residues in each subunit). A number of factors influence the robustness and speed of data collection. These factors include the size of the protein, the experimental set up, and the coupling being measured, among others. To exhibit a lower bound on robustness and the potential for time saving, the measurement of dipolar couplings for the N-C' vector represents a realistic "worst case analysis". These couplings are among the smallest currently measured, and their determination in both isotropic and anisotropic media demands the highest measurement precision. The new approach yielded excellent quantitative agreement with values determined independently by the conventional 3D quantitative J NMR method (in cases where sample stability in oriented media permitted these measurements) but with a factor of 2-5 in time savings. The statistical measure of reliability, measuring the quality of each RDC value, offers valuable adjunct information even in cases where modest time savings may be realized.
Robustness of Reconstructed Ancestral Protein Functions to Statistical Uncertainty.

PubMed

Eick, Geeta N; Bridgham, Jamie T; Anderson, Douglas P; Harms, Michael J; Thornton, Joseph W

2017-02-01

Hypotheses about the functions of ancient proteins and the effects of historical mutations on them are often tested using ancestral protein reconstruction (APR)-phylogenetic inference of ancestral sequences followed by synthesis and experimental characterization. Usually, some sequence sites are ambiguously reconstructed, with two or more statistically plausible states. The extent to which the inferred functions and mutational effects are robust to uncertainty about the ancestral sequence has not been studied systematically. To address this issue, we reconstructed ancestral proteins in three domain families that have different functions, architectures, and degrees of uncertainty; we then experimentally characterized the functional robustness of these proteins when uncertainty was incorporated using several approaches, including sampling amino acid states from the posterior distribution at each site and incorporating the alternative amino acid state at every ambiguous site in the sequence into a single "worst plausible case" protein. In every case, qualitative conclusions about the ancestral proteins' functions and the effects of key historical mutations were robust to sequence uncertainty, with similar functions observed even when scores of alternate amino acids were incorporated. There was some variation in quantitative descriptors of function among plausible sequences, suggesting that experimentally characterizing robustness is particularly important when quantitative estimates of ancient biochemical parameters are desired. The worst plausible case method appears to provide an efficient strategy for characterizing the functional robustness of ancestral proteins to large amounts of sequence uncertainty. Sampling from the posterior distribution sometimes produced artifactually nonfunctional proteins for sequences reconstructed with substantial ambiguity. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

Robust transceiver design for reciprocal M × N interference channel based on statistical linearization approximation

NASA Astrophysics Data System (ADS)

Mayvan, Ali D.; Aghaeinia, Hassan; Kazemi, Mohammad

2017-12-01

This paper focuses on robust transceiver design for throughput enhancement on the interference channel (IC), under imperfect channel state information (CSI). In this paper, two algorithms are proposed to improve the throughput of the multi-input multi-output (MIMO) IC. Each transmitter and receiver has, respectively, M and N antennas and IC operates in a time division duplex mode. In the first proposed algorithm, each transceiver adjusts its filter to maximize the expected value of signal-to-interference-plus-noise ratio (SINR). On the other hand, the second algorithm tries to minimize the variances of the SINRs to hedge against the variability due to CSI error. Taylor expansion is exploited to approximate the effect of CSI imperfection on mean and variance. The proposed robust algorithms utilize the reciprocity of wireless networks to optimize the estimated statistical properties in two different working modes. Monte Carlo simulations are employed to investigate sum rate performance of the proposed algorithms and the advantage of incorporating variation minimization into the transceiver design.
Measuring hospital efficiency--comparing four European countries.

PubMed

Mateus, Céu; Joaquim, Inês; Nunes, Carla

2015-02-01

Performing international comparisons on efficiency usually has two main drawbacks: the lack of comparability of data from different countries and the appropriateness and adequacy of data selected for efficiency measurement. With inpatient discharges for four countries, some of the problems of data comparability usually found in international comparisons were mitigated. The objectives are to assess and compare hospital efficiency levels within and between countries, using stochastic frontier analysis with both cross-sectional and panel data. Data from English (2005-2008), Portuguese (2002-2009), Spanish (2003-2009) and Slovenian (2005-2009) hospital discharges and characteristics are used. Weighted hospital discharges were considered as outputs while the number of employees, physicians, nurses and beds were selected as inputs of the production function. Stochastic frontier analysis using both cross-sectional and panel data were performed, as well as ordinary least squares (OLS) analysis. The adequacy of the data was assessed with Kolmogorov-Smirnov and Breusch-Pagan/Cook-Weisberg tests. Data available results were redundant to perform efficiency measurements using stochastic frontier analysis with cross-sectional data. The likelihood ratio test reveals that in cross-sectional data stochastic frontier analysis (SFA) is not statistically different from OLS in Portuguese data, while SFA and OLS estimates are statistically different for Spanish, Slovenian and English data. In the panel data, the inefficiency term is statistically different from 0 in the four countries in analysis, though for Portugal it is still close to 0. Panel data are preferred over cross-section analysis because results are more robust. For all countries except Slovenia, beds and employees are relevant inputs for the production process. © The Author 2015. Published by Oxford University Press on behalf of the European Public Health Association. All rights reserved.
Segmenting lung fields in serial chest radiographs using both population-based and patient-specific shape statistics.

PubMed

Shi, Y; Qi, F; Xue, Z; Chen, L; Ito, K; Matsuo, H; Shen, D

2008-04-01

This paper presents a new deformable model using both population-based and patient-specific shape statistics to segment lung fields from serial chest radiographs. There are two novelties in the proposed deformable model. First, a modified scale invariant feature transform (SIFT) local descriptor, which is more distinctive than the general intensity and gradient features, is used to characterize the image features in the vicinity of each pixel. Second, the deformable contour is constrained by both population-based and patient-specific shape statistics, and it yields more robust and accurate segmentation of lung fields for serial chest radiographs. In particular, for segmenting the initial time-point images, the population-based shape statistics is used to constrain the deformable contour; as more subsequent images of the same patient are acquired, the patient-specific shape statistics online collected from the previous segmentation results gradually takes more roles. Thus, this patient-specific shape statistics is updated each time when a new segmentation result is obtained, and it is further used to refine the segmentation results of all the available time-point images. Experimental results show that the proposed method is more robust and accurate than other active shape models in segmenting the lung fields from serial chest radiographs.
Tolerancing aspheres based on manufacturing knowledge

NASA Astrophysics Data System (ADS)

Wickenhagen, S.; Kokot, S.; Fuchs, U.

2017-10-01

A standard way of tolerancing optical elements or systems is to perform a Monte Carlo based analysis within a common optical design software package. Although, different weightings and distributions are assumed they are all counting on statistics, which usually means several hundreds or thousands of systems for reliable results. Thus, employing these methods for small batch sizes is unreliable, especially when aspheric surfaces are involved. The huge database of asphericon was used to investigate the correlation between the given tolerance values and measured data sets. The resulting probability distributions of these measured data were analyzed aiming for a robust optical tolerancing process.
Skeletal Correlates for Body Mass Estimation in Modern and Fossil Flying Birds

PubMed Central

Field, Daniel J.; Lynner, Colton; Brown, Christian; Darroch, Simon A. F.

2013-01-01

Scaling relationships between skeletal dimensions and body mass in extant birds are often used to estimate body mass in fossil crown-group birds, as well as in stem-group avialans. However, useful statistical measurements for constraining the precision and accuracy of fossil mass estimates are rarely provided, which prevents the quantification of robust upper and lower bound body mass estimates for fossils. Here, we generate thirteen body mass correlations and associated measures of statistical robustness using a sample of 863 extant flying birds. By providing robust body mass regressions with upper- and lower-bound prediction intervals for individual skeletal elements, we address the longstanding problem of body mass estimation for highly fragmentary fossil birds. We demonstrate that the most precise proxy for estimating body mass in the overall dataset, measured both as coefficient determination of ordinary least squares regression and percent prediction error, is the maximum diameter of the coracoid’s humeral articulation facet (the glenoid). We further demonstrate that this result is consistent among the majority of investigated avian orders (10 out of 18). As a result, we suggest that, in the majority of cases, this proxy may provide the most accurate estimates of body mass for volant fossil birds. Additionally, by presenting statistical measurements of body mass prediction error for thirteen different body mass regressions, this study provides a much-needed quantitative framework for the accurate estimation of body mass and associated ecological correlates in fossil birds. The application of these regressions will enhance the precision and robustness of many mass-based inferences in future paleornithological studies. PMID:24312392
Robust Selectivity for Faces in the Human Amygdala in the Absence of Expressions

PubMed Central

Mende-Siedlecki, Peter; Verosky, Sara C.; Turk-Browne, Nicholas B.; Todorov, Alexander

2014-01-01

There is a well-established posterior network of cortical regions that plays a central role in face processing and that has been investigated extensively. In contrast, although responsive to faces, the amygdala is not considered a core face-selective region, and its face selectivity has never been a topic of systematic research in human neuroimaging studies. Here, we conducted a large-scale group analysis of fMRI data from 215 participants. We replicated the posterior network observed in prior studies but found equally robust and reliable responses to faces in the amygdala. These responses were detectable in most individual participants, but they were also highly sensitive to the initial statistical threshold and habituated more rapidly than the responses in posterior face-selective regions. A multivariate analysis showed that the pattern of responses to faces across voxels in the amygdala had high reliability over time. Finally, functional connectivity analyses showed stronger coupling between the amygdala and posterior face-selective regions during the perception of faces than during the perception of control visual categories. These findings suggest that the amygdala should be considered a core face-selective region. PMID:23984945
DOE Office of Scientific and Technical Information (OSTI.GOV)

Shumway, R.H.; McQuarrie, A.D.

Robust statistical approaches to the problem of discriminating between regional earthquakes and explosions are developed. We compare linear discriminant analysis using descriptive features like amplitude and spectral ratios with signal discrimination techniques using the original signal waveforms and spectral approximations to the log likelihood function. Robust information theoretic techniques are proposed and all methods are applied to 8 earthquakes and 8 mining explosions in Scandinavia and to an event from Novaya Zemlya of unknown origin. It is noted that signal discrimination approaches based on discrimination information and Renyi entropy perform better in the test sample than conventional methods based onmore » spectral ratios involving the P and S phases. Two techniques for identifying the ripple-firing pattern for typical mining explosions are proposed and shown to work well on simulated data and on several Scandinavian earthquakes and explosions. We use both cepstral analysis in the frequency domain and a time domain method based on the autocorrelation and partial autocorrelation functions. The proposed approach strips off underlying smooth spectral and seasonal spectral components corresponding to the echo pattern induced by two simple ripple-fired models. For two mining explosions, a pattern is identified whereas for two earthquakes, no pattern is evident.« less
VBA: A Probabilistic Treatment of Nonlinear Models for Neurobiological and Behavioural Data

PubMed Central

Daunizeau, Jean; Adam, Vincent; Rigoux, Lionel

2014-01-01

This work is in line with an on-going effort tending toward a computational (quantitative and refutable) understanding of human neuro-cognitive processes. Many sophisticated models for behavioural and neurobiological data have flourished during the past decade. Most of these models are partly unspecified (i.e. they have unknown parameters) and nonlinear. This makes them difficult to peer with a formal statistical data analysis framework. In turn, this compromises the reproducibility of model-based empirical studies. This work exposes a software toolbox that provides generic, efficient and robust probabilistic solutions to the three problems of model-based analysis of empirical data: (i) data simulation, (ii) parameter estimation/model selection, and (iii) experimental design optimization. PMID:24465198
Robust spectral-domain optical coherence tomography speckle model and its cross-correlation coefficient analysis

PubMed Central

Liu, Xuan; Ramella-Roman, Jessica C.; Huang, Yong; Guo, Yuan; Kang, Jin U.

2013-01-01

In this study, we proposed a generic speckle simulation for optical coherence tomography (OCT) signal, by convolving the point spread function (PSF) of the OCT system with the numerically synthesized random sample field. We validated our model and used the simulation method to study the statistical properties of cross-correlation coefficients (XCC) between Ascans which have been recently applied in transverse motion analysis by our group. The results of simulation show that over sampling is essential for accurate motion tracking; exponential decay of OCT signal leads to an under estimate of motion which can be corrected; lateral heterogeneity of sample leads to an over estimate of motion for a few pixels corresponding to the structural boundary. PMID:23456001
Meteor tracking via local pattern clustering in spatio-temporal domain

NASA Astrophysics Data System (ADS)

Kukal, Jaromír.; Klimt, Martin; Švihlík, Jan; Fliegel, Karel

2016-09-01

Reliable meteor detection is one of the crucial disciplines in astronomy. A variety of imaging systems is used for meteor path reconstruction. The traditional approach is based on analysis of 2D image sequences obtained from a double station video observation system. Precise localization of meteor path is difficult due to atmospheric turbulence and other factors causing spatio-temporal fluctuations of the image background. The proposed technique performs non-linear preprocessing of image intensity using Box-Cox transform as recommended in our previous work. Both symmetric and asymmetric spatio-temporal differences are designed to be robust in the statistical sense. Resulting local patterns are processed by data whitening technique and obtained vectors are classified via cluster analysis and Self-Organized Map (SOM).
Hidden Markov model analysis of force/torque information in telemanipulation

NASA Technical Reports Server (NTRS)

Hannaford, Blake; Lee, Paul

1991-01-01

A model for the prediction and analysis of sensor information recorded during robotic performance of telemanipulation tasks is presented. The model uses the hidden Markov model to describe the task structure, the operator's or intelligent controller's goal structure, and the sensor signals. A methodology for constructing the model parameters based on engineering knowledge of the task is described. It is concluded that the model and its optimal state estimation algorithm, the Viterbi algorithm, are very succesful at the task of segmenting the data record into phases corresponding to subgoals of the task. The model provides a rich modeling structure within a statistical framework, which enables it to represent complex systems and be robust to real-world sensory signals.
Development and optimization of SPE-HPLC-UV/ELSD for simultaneous determination of nine bioactive components in Shenqi Fuzheng Injection based on Quality by Design principles.

PubMed

Wang, Lu; Qu, Haibin

2016-03-01

A method combining solid phase extraction, high performance liquid chromatography, and ultraviolet/evaporative light scattering detection (SPE-HPLC-UV/ELSD) was developed according to Quality by Design (QbD) principles and used to assay nine bioactive compounds within a botanical drug, Shenqi Fuzheng Injection. Risk assessment and a Plackett-Burman design were utilized to evaluate the impact of 11 factors on the resolutions and signal-to-noise of chromatographic peaks. Multiple regression and Pareto ranking analysis indicated that the sorbent mass, sample volume, flow rate, column temperature, evaporator temperature, and gas flow rate were statistically significant (p < 0.05) in this procedure. Furthermore, a Box-Behnken design combined with response surface analysis was employed to study the relationships between the quality of SPE-HPLC-UV/ELSD analysis and four significant factors, i.e., flow rate, column temperature, evaporator temperature, and gas flow rate. An analytical design space of SPE-HPLC-UV/ELSD was then constructed by calculated Monte Carlo probability. In the presented approach, the operating parameters of sample preparation, chromatographic separation, and compound detection were investigated simultaneously. Eight terms of method validation, i.e., system-suitability tests, method robustness/ruggedness, sensitivity, precision, repeatability, linearity, accuracy, and stability, were accomplished at a selected working point. These results revealed that the QbD principles were suitable in the development of analytical procedures for samples in complex matrices. Meanwhile, the analytical quality and method robustness were validated by the analytical design space. The presented strategy provides a tutorial on the development of a robust QbD-compliant quantitative method for samples in complex matrices.
Use of Robust z in Detecting Unstable Items in Item Response Theory Models

ERIC Educational Resources Information Center

Huynh, Huynh; Meyer, Patrick

2010-01-01

The first part of this paper describes the use of the robust z[subscript R] statistic to link test forms using the Rasch (or one-parameter logistic) model. The procedure is then extended to the two-parameter and three-parameter logistic and two-parameter partial credit (2PPC) models. A real set of data was used to illustrate the extension. The…
A 3D interactive multi-object segmentation tool using local robust statistics driven active contours.

PubMed

Gao, Yi; Kikinis, Ron; Bouix, Sylvain; Shenton, Martha; Tannenbaum, Allen

2012-08-01

Extracting anatomical and functional significant structures renders one of the important tasks for both the theoretical study of the medical image analysis, and the clinical and practical community. In the past, much work has been dedicated only to the algorithmic development. Nevertheless, for clinical end users, a well designed algorithm with an interactive software is necessary for an algorithm to be utilized in their daily work. Furthermore, the software would better be open sourced in order to be used and validated by not only the authors but also the entire community. Therefore, the contribution of the present work is twofolds: first, we propose a new robust statistics based conformal metric and the conformal area driven multiple active contour framework, to simultaneously extract multiple targets from MR and CT medical imagery in 3D. Second, an open source graphically interactive 3D segmentation tool based on the aforementioned contour evolution is implemented and is publicly available for end users on multiple platforms. In using this software for the segmentation task, the process is initiated by the user drawn strokes (seeds) in the target region in the image. Then, the local robust statistics are used to describe the object features, and such features are learned adaptively from the seeds under a non-parametric estimation scheme. Subsequently, several active contours evolve simultaneously with their interactions being motivated by the principles of action and reaction-this not only guarantees mutual exclusiveness among the contours, but also no longer relies upon the assumption that the multiple objects fill the entire image domain, which was tacitly or explicitly assumed in many previous works. In doing so, the contours interact and converge to equilibrium at the desired positions of the desired multiple objects. Furthermore, with the aim of not only validating the algorithm and the software, but also demonstrating how the tool is to be used, we provide the reader reproducible experiments that demonstrate the capability of the proposed segmentation tool on several public available data sets. Copyright © 2012 Elsevier B.V. All rights reserved.
A 3D Interactive Multi-object Segmentation Tool using Local Robust Statistics Driven Active Contours

PubMed Central

Gao, Yi; Kikinis, Ron; Bouix, Sylvain; Shenton, Martha; Tannenbaum, Allen

2012-01-01

Extracting anatomical and functional significant structures renders one of the important tasks for both the theoretical study of the medical image analysis, and the clinical and practical community. In the past, much work has been dedicated only to the algorithmic development. Nevertheless, for clinical end users, a well designed algorithm with an interactive software is necessary for an algorithm to be utilized in their daily work. Furthermore, the software would better be open sourced in order to be used and validated by not only the authors but also the entire community. Therefore, the contribution of the present work is twofolds: First, we propose a new robust statistics based conformal metric and the conformal area driven multiple active contour framework, to simultaneously extract multiple targets from MR and CT medical imagery in 3D. Second, an open source graphically interactive 3D segmentation tool based on the aforementioned contour evolution is implemented and is publicly available for end users on multiple platforms. In using this software for the segmentation task, the process is initiated by the user drawn strokes (seeds) in the target region in the image. Then, the local robust statistics are used to describe the object features, and such features are learned adaptively from the seeds under a non-parametric estimation scheme. Subsequently, several active contours evolve simultaneously with their interactions being motivated by the principles of action and reaction — This not only guarantees mutual exclusiveness among the contours, but also no longer relies upon the assumption that the multiple objects fill the entire image domain, which was tacitly or explicitly assumed in many previous works. In doing so, the contours interact and converge to equilibrium at the desired positions of the desired multiple objects. Furthermore, with the aim of not only validating the algorithm and the software, but also demonstrating how the tool is to be used, we provide the reader reproducible experiments that demonstrate the capability of the proposed segmentation tool on several public available data sets. PMID:22831773
Variability and robustness of scatterers in HRR/ISAR ground target data and its influence on the ATR performance

NASA Astrophysics Data System (ADS)

Schumacher, R.; Schimpf, H.; Schiller, J.

2011-06-01

The most challenging problem of Automatic Target Recognition (ATR) is the extraction of robust and independent target features which describe the target unambiguously. These features have to be robust and invariant in different senses: in time, between aspect views (azimuth and elevation angle), between target motion (translation and rotation) and between different target variants. Especially for ground moving targets in military applications an irregular target motion is typical, so that a strong variation of the backscattered radar signal with azimuth and elevation angle makes the extraction of stable and robust features most difficult. For ATR based on High Range Resolution (HRR) profiles and / or Inverse Synthetic Aperture Radar (ISAR) images it is crucial that the reference dataset consists of stable and robust features, which, among others, will depend on the target aspect and depression angle amongst others. Here it is important to find an adequate data grid for an efficient data coverage in the reference dataset for ATR. In this paper the variability of the backscattered radar signals of target scattering centers is analyzed for different HRR profiles and ISAR images from measured turntable datasets of ground targets under controlled conditions. Especially the dependency of the features on the elevation angle is analyzed regarding to the ATR of large strip SAR data with a large range of depression angles by using available (I)SAR datasets as reference. In this work the robustness of these scattering centers is analyzed by extracting their amplitude, phase and position. Therefore turntable measurements under controlled conditions were performed targeting an artificial military reference object called STANDCAM. Measures referring to variability, similarity, robustness and separability regarding the scattering centers are defined. The dependency of the scattering behaviour with respect to azimuth and elevation variations is analyzed. Additionally generic types of features (geometrical, statistical), which can be derived especially from (I)SAR images, are applied to the ATR-task. Therefore subsequently the dependence of individual feature values as well as the feature statistics on aspect (i.e. azimuth and elevation) are presented. The Kolmogorov-Smirnov distance will be used to show how the feature statistics is influenced by varying elevation angles. Finally, confusion matrices are computed between the STANDCAM target at all eleven elevation angles. This helps to assess the robustness of ATR performance under the influence of aspect angle deviations between training set and test set.
Water quality and non-point sources of risk: the Jiulong River Watershed, P. R. of China.

PubMed

Zhang, Jingjing; Zhang, Luoping; Ricci, Paolo F

2012-01-01

Retrospective water quality assessment plays an essential role in identifying trends and causal associations between exposures and risks, thus it can be a guide for water resources management. We have developed empirical relationships between several time-varying social and economic factors of economic development, water quality variables such as nitrate-nitrogen, COD(Mn), BOD(5), and DO, in the Jiulong River Watershed and its main tributary, the West River. Our analyses used alternative statistical methods to reduce the dimensionality of the analysis first and then strengthen the study's causal associations. The statistical methods included: factor analysis (FA), trend analysis, Monte Carlo/bootstrap simulations, robust regressions and a coupled equations model, integrated into a framework that allows an investigation and resolution of the issues that may affect the estimated results. After resolving these, we found that the concentrations of nitrogen compounds increased over time in the West River region, and that fertilizer used in agricultural fruit crops was the main risk with regard to nitrogen pollution. The relationships we developed can identify hazards and explain the impact of sources of different types of pollution, such as urbanization, and agriculture.
Neandertal admixture in Eurasia confirmed by maximum-likelihood analysis of three genomes.

PubMed

Lohse, Konrad; Frantz, Laurent A F

2014-04-01

Although there has been much interest in estimating histories of divergence and admixture from genomic data, it has proved difficult to distinguish recent admixture from long-term structure in the ancestral population. Thus, recent genome-wide analyses based on summary statistics have sparked controversy about the possibility of interbreeding between Neandertals and modern humans in Eurasia. Here we derive the probability of full mutational configurations in nonrecombining sequence blocks under both admixture and ancestral structure scenarios. Dividing the genome into short blocks gives an efficient way to compute maximum-likelihood estimates of parameters. We apply this likelihood scheme to triplets of human and Neandertal genomes and compare the relative support for a model of admixture from Neandertals into Eurasian populations after their expansion out of Africa against a history of persistent structure in their common ancestral population in Africa. Our analysis allows us to conclusively reject a model of ancestral structure in Africa and instead reveals strong support for Neandertal admixture in Eurasia at a higher rate (3.4-7.3%) than suggested previously. Using analysis and simulations we show that our inference is more powerful than previous summary statistics and robust to realistic levels of recombination.
Neandertal Admixture in Eurasia Confirmed by Maximum-Likelihood Analysis of Three Genomes

PubMed Central

Lohse, Konrad; Frantz, Laurent A. F.

2014-01-01

Although there has been much interest in estimating histories of divergence and admixture from genomic data, it has proved difficult to distinguish recent admixture from long-term structure in the ancestral population. Thus, recent genome-wide analyses based on summary statistics have sparked controversy about the possibility of interbreeding between Neandertals and modern humans in Eurasia. Here we derive the probability of full mutational configurations in nonrecombining sequence blocks under both admixture and ancestral structure scenarios. Dividing the genome into short blocks gives an efficient way to compute maximum-likelihood estimates of parameters. We apply this likelihood scheme to triplets of human and Neandertal genomes and compare the relative support for a model of admixture from Neandertals into Eurasian populations after their expansion out of Africa against a history of persistent structure in their common ancestral population in Africa. Our analysis allows us to conclusively reject a model of ancestral structure in Africa and instead reveals strong support for Neandertal admixture in Eurasia at a higher rate (3.4−7.3%) than suggested previously. Using analysis and simulations we show that our inference is more powerful than previous summary statistics and robust to realistic levels of recombination. PMID:24532731
An online sleep apnea detection method based on recurrence quantification analysis.

PubMed

Nguyen, Hoa Dinh; Wilkins, Brek A; Cheng, Qi; Benjamin, Bruce Allen

2014-07-01

This paper introduces an online sleep apnea detection method based on heart rate complexity as measured by recurrence quantification analysis (RQA) statistics of heart rate variability (HRV) data. RQA statistics can capture nonlinear dynamics of a complex cardiorespiratory system during obstructive sleep apnea. In order to obtain a more robust measurement of the nonstationarity of the cardiorespiratory system, we use different fixed amount of neighbor thresholdings for recurrence plot calculation. We integrate a feature selection algorithm based on conditional mutual information to select the most informative RQA features for classification, and hence, to speed up the real-time classification process without degrading the performance of the system. Two types of binary classifiers, i.e., support vector machine and neural network, are used to differentiate apnea from normal sleep. A soft decision fusion rule is developed to combine the results of these classifiers in order to improve the classification performance of the whole system. Experimental results show that our proposed method achieves better classification results compared with the previous recurrence analysis-based approach. We also show that our method is flexible and a strong candidate for a real efficient sleep apnea detection system.

A systematic review and meta-analysis of tract-based spatial statistics studies regarding attention-deficit/hyperactivity disorder.

PubMed

Chen, Lizhou; Hu, Xinyu; Ouyang, Luo; He, Ning; Liao, Yi; Liu, Qi; Zhou, Ming; Wu, Min; Huang, Xiaoqi; Gong, Qiyong

2016-09-01

Diffusion tensor imaging (DTI) studies that use tract-based spatial statistics (TBSS) have demonstrated the microstructural abnormalities of white matter (WM) in patients with attention-deficit/hyperactivity disorder (ADHD); however, robust conclusions have not yet been drawn. The present study integrated the findings of previous TBSS studies to determine the most consistent WM alterations in ADHD via a narrative review and meta-analysis. The literature search was conducted through October 2015 to identify TBSS studies that compared fractional anisotropy (FA) between ADHD patients and healthy controls. FA reductions were identified in the splenium of the corpus callosum (CC) that extended to the right cingulum, right sagittal stratum, and left tapetum. The first two clusters retained significance in the sensitivity analysis and in all subgroup analyses. The FA reduction in the CC splenium was negatively associated with the mean age of the ADHD group. We hypothesize that, in addition to the fronto-striatal-cerebellar circuit, the disturbed WM matter tracts that integrate the bilateral hemispheres and posterior-brain circuitries play a crucial role in the pathophysiology of ADHD. Copyright © 2016 Elsevier Ltd. All rights reserved.
Symbol recognition via statistical integration of pixel-level constraint histograms: a new descriptor.

PubMed

Yang, Su

2005-02-01

A new descriptor for symbol recognition is proposed. 1) A histogram is constructed for every pixel to figure out the distribution of the constraints among the other pixels. 2) All the histograms are statistically integrated to form a feature vector with fixed dimension. The robustness and invariance were experimentally confirmed.
The Robustness of the Studentized Range Statistic to Violations of the Normality and Homogeneity of Variance Assumptions.

ERIC Educational Resources Information Center

Ramseyer, Gary C.; Tcheng, Tse-Kia

The present study was directed at determining the extent to which the Type I Error rate is affected by violations in the basic assumptions of the q statistic. Monte Carlo methods were employed, and a variety of departures from the assumptions were examined. (Author)
Identification of Major Histocompatibility Complex-Regulated Body Odorants by Statistical Analysis of a Comparative Gas Chromatography/Mass Spectrometry Experiment

DOE Office of Scientific and Technical Information (OSTI.GOV)

Willse, Alan R.; Belcher, Ann; Preti, George

2005-04-15

Gas chromatography (GC), combined with mass spectrometry (MS) detection, is a powerful analytical technique that can be used to separate, quantify, and identify volatile compounds in complex mixtures. This paper examines the application of GC-MS in a comparative experiment to identify volatiles that differ in concentration between two groups. A complex mixture might comprise several hundred or even thousands of volatile compounds. Because their number and location in a chromatogram generally are unknown, and because components overlap in populous chromatograms, the statistical problems offer significant challenges beyond traditional two-group screening procedures. We describe a statistical procedure to compare two-dimensional GC-MSmore » profiles between groups, which entails (1) signal processing: baseline correction and peak detection in single ion chromatograms; (2) aligning chromatograms in time; (3) normalizing differences in overall signal intensities; and (4) detecting chromatographic regions that differ between groups. Compared to existing approaches, the proposed method is robust to errors made at earlier stages of analysis, such as missed peaks or slightly misaligned chromatograms. To illustrate the method, we identify differences in GC-MS chromatograms of ether-extracted urine collected from two nearly identical inbred groups of mice, to investigate the relationship between odor and genetics of the major histocompatibility complex.« less
Replicability of time-varying connectivity patterns in large resting state fMRI samples.

PubMed

Abrol, Anees; Damaraju, Eswar; Miller, Robyn L; Stephen, Julia M; Claus, Eric D; Mayer, Andrew R; Calhoun, Vince D

2017-12-01

The past few years have seen an emergence of approaches that leverage temporal changes in whole-brain patterns of functional connectivity (the chronnectome). In this chronnectome study, we investigate the replicability of the human brain's inter-regional coupling dynamics during rest by evaluating two different dynamic functional network connectivity (dFNC) analysis frameworks using 7 500 functional magnetic resonance imaging (fMRI) datasets. To quantify the extent to which the emergent functional connectivity (FC) patterns are reproducible, we characterize the temporal dynamics by deriving several summary measures across multiple large, independent age-matched samples. Reproducibility was demonstrated through the existence of basic connectivity patterns (FC states) amidst an ensemble of inter-regional connections. Furthermore, application of the methods to conservatively configured (statistically stationary, linear and Gaussian) surrogate datasets revealed that some of the studied state summary measures were indeed statistically significant and also suggested that this class of null model did not explain the fMRI data fully. This extensive testing of reproducibility of similarity statistics also suggests that the estimated FC states are robust against variation in data quality, analysis, grouping, and decomposition methods. We conclude that future investigations probing the functional and neurophysiological relevance of time-varying connectivity assume critical importance. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.
Replicability of time-varying connectivity patterns in large resting state fMRI samples

PubMed Central

Abrol, Anees; Damaraju, Eswar; Miller, Robyn L.; Stephen, Julia M.; Claus, Eric D.; Mayer, Andrew R.; Calhoun, Vince D.

2018-01-01

The past few years have seen an emergence of approaches that leverage temporal changes in whole-brain patterns of functional connectivity (the chronnectome). In this chronnectome study, we investigate the replicability of the human brain’s inter-regional coupling dynamics during rest by evaluating two different dynamic functional network connectivity (dFNC) analysis frameworks using 7 500 functional magnetic resonance imaging (fMRI) datasets. To quantify the extent to which the emergent functional connectivity (FC) patterns are reproducible, we characterize the temporal dynamics by deriving several summary measures across multiple large, independent age-matched samples. Reproducibility was demonstrated through the existence of basic connectivity patterns (FC states) amidst an ensemble of inter-regional connections. Furthermore, application of the methods to conservatively configured (statistically stationary, linear and Gaussian) surrogate datasets revealed that some of the studied state summary measures were indeed statistically significant and also suggested that this class of null model did not explain the fMRI data fully. This extensive testing of reproducibility of similarity statistics also suggests that the estimated FC states are robust against variation in data quality, analysis, grouping, and decomposition methods. We conclude that future investigations probing the functional and neurophysiological relevance of time-varying connectivity assume critical importance. PMID:28916181
Bayesian approach for counting experiment statistics applied to a neutrino point source analysis

NASA Astrophysics Data System (ADS)

Bose, D.; Brayeur, L.; Casier, M.; de Vries, K. D.; Golup, G.; van Eijndhoven, N.

2013-12-01

In this paper we present a model independent analysis method following Bayesian statistics to analyse data from a generic counting experiment and apply it to the search for neutrinos from point sources. We discuss a test statistic defined following a Bayesian framework that will be used in the search for a signal. In case no signal is found, we derive an upper limit without the introduction of approximations. The Bayesian approach allows us to obtain the full probability density function for both the background and the signal rate. As such, we have direct access to any signal upper limit. The upper limit derivation directly compares with a frequentist approach and is robust in the case of low-counting observations. Furthermore, it allows also to account for previous upper limits obtained by other analyses via the concept of prior information without the need of the ad hoc application of trial factors. To investigate the validity of the presented Bayesian approach, we have applied this method to the public IceCube 40-string configuration data for 10 nearby blazars and we have obtained a flux upper limit, which is in agreement with the upper limits determined via a frequentist approach. Furthermore, the upper limit obtained compares well with the previously published result of IceCube, using the same data set.
An asymptotic theory for cross-correlation between auto-correlated sequences and its application on neuroimaging data.

PubMed

Zhou, Yunyi; Tao, Chenyang; Lu, Wenlian; Feng, Jianfeng

2018-04-20

Functional connectivity is among the most important tools to study brain. The correlation coefficient, between time series of different brain areas, is the most popular method to quantify functional connectivity. Correlation coefficient in practical use assumes the data to be temporally independent. However, the time series data of brain can manifest significant temporal auto-correlation. A widely applicable method is proposed for correcting temporal auto-correlation. We considered two types of time series models: (1) auto-regressive-moving-average model, (2) nonlinear dynamical system model with noisy fluctuations, and derived their respective asymptotic distributions of correlation coefficient. These two types of models are most commonly used in neuroscience studies. We show the respective asymptotic distributions share a unified expression. We have verified the validity of our method, and shown our method exhibited sufficient statistical power for detecting true correlation on numerical experiments. Employing our method on real dataset yields more robust functional network and higher classification accuracy than conventional methods. Our method robustly controls the type I error while maintaining sufficient statistical power for detecting true correlation in numerical experiments, where existing methods measuring association (linear and nonlinear) fail. In this work, we proposed a widely applicable approach for correcting the effect of temporal auto-correlation on functional connectivity. Empirical results favor the use of our method in functional network analysis. Copyright © 2018. Published by Elsevier B.V.
Inverse probability weighting and doubly robust methods in correcting the effects of non-response in the reimbursed medication and self-reported turnout estimates in the ATH survey.

PubMed

Härkänen, Tommi; Kaikkonen, Risto; Virtala, Esa; Koskinen, Seppo

2014-11-06

To assess the nonresponse rates in a questionnaire survey with respect to administrative register data, and to correct the bias statistically. The Finnish Regional Health and Well-being Study (ATH) in 2010 was based on a national sample and several regional samples. Missing data analysis was based on socio-demographic register data covering the whole sample. Inverse probability weighting (IPW) and doubly robust (DR) methods were estimated using the logistic regression model, which was selected using the Bayesian information criteria. The crude, weighted and true self-reported turnout in the 2008 municipal election and prevalences of entitlements to specially reimbursed medication, and the crude and weighted body mass index (BMI) means were compared. The IPW method appeared to remove a relatively large proportion of the bias compared to the crude prevalence estimates of the turnout and the entitlements to specially reimbursed medication. Several demographic factors were shown to be associated with missing data, but few interactions were found. Our results suggest that the IPW method can improve the accuracy of results of a population survey, and the model selection provides insight into the structure of missing data. However, health-related missing data mechanisms are beyond the scope of statistical methods, which mainly rely on socio-demographic information to correct the results.
Robust variable selection method for nonparametric differential equation models with application to nonlinear dynamic gene regulatory network analysis.

PubMed

Lu, Tao

2016-01-01

The gene regulation network (GRN) evaluates the interactions between genes and look for models to describe the gene expression behavior. These models have many applications; for instance, by characterizing the gene expression mechanisms that cause certain disorders, it would be possible to target those genes to block the progress of the disease. Many biological processes are driven by nonlinear dynamic GRN. In this article, we propose a nonparametric differential equation (ODE) to model the nonlinear dynamic GRN. Specially, we address following questions simultaneously: (i) extract information from noisy time course gene expression data; (ii) model the nonlinear ODE through a nonparametric smoothing function; (iii) identify the important regulatory gene(s) through a group smoothly clipped absolute deviation (SCAD) approach; (iv) test the robustness of the model against possible shortening of experimental duration. We illustrate the usefulness of the model and associated statistical methods through a simulation and a real application examples.
Argon-oxygen atmospheric pressure plasma treatment on carbon fiber reinforced polymer for improved bonding

NASA Astrophysics Data System (ADS)

Chartosias, Marios

Acceptance of Carbon Fiber Reinforced Polymer (CFRP) structures requires a robust surface preparation method with improved process controls capable of ensuring high bond quality. Surface preparation in a production clean room environment prior to applying adhesive for bonding would minimize risk of contamination and reduce cost. Plasma treatment is a robust surface preparation process capable of being applied in a production clean room environment with process parameters that are easily controlled and documented. Repeatable and consistent processing is enabled through the development of a process parameter window utilizing techniques such as Design of Experiments (DOE) tailored to specific adhesive and substrate bonding applications. Insight from respective plasma treatment Original Equipment Manufacturers (OEMs) and screening tests determined critical process factors from non-factors and set the associated factor levels prior to execution of the DOE. Results from mode I Double Cantilever Beam (DCB) testing per ASTM D 5528 [1] standard and DOE statistical analysis software are used to produce a regression model and determine appropriate optimum settings for each factor.
Robust continuous clustering

PubMed Central

Shah, Sohil Atul

2017-01-01

Clustering is a fundamental procedure in the analysis of scientific data. It is used ubiquitously across the sciences. Despite decades of research, existing clustering algorithms have limited effectiveness in high dimensions and often require tuning parameters for different domains and datasets. We present a clustering algorithm that achieves high accuracy across multiple domains and scales efficiently to high dimensions and large datasets. The presented algorithm optimizes a smooth continuous objective, which is based on robust statistics and allows heavily mixed clusters to be untangled. The continuous nature of the objective also allows clustering to be integrated as a module in end-to-end feature learning pipelines. We demonstrate this by extending the algorithm to perform joint clustering and dimensionality reduction by efficiently optimizing a continuous global objective. The presented approach is evaluated on large datasets of faces, hand-written digits, objects, newswire articles, sensor readings from the Space Shuttle, and protein expression levels. Our method achieves high accuracy across all datasets, outperforming the best prior algorithm by a factor of 3 in average rank. PMID:28851838
Revised upper limb module for spinal muscular atrophy: Development of a new module.

PubMed

Mazzone, Elena S; Mayhew, Anna; Montes, Jacqueline; Ramsey, Danielle; Fanelli, Lavinia; Young, Sally Dunaway; Salazar, Rachel; De Sanctis, Roberto; Pasternak, Amy; Glanzman, Allan; Coratti, Giorgia; Civitello, Matthew; Forcina, Nicola; Gee, Richard; Duong, Tina; Pane, Marika; Scoto, Mariacristina; Pera, Maria Carmela; Messina, Sonia; Tennekoon, Gihan; Day, John W; Darras, Basil T; De Vivo, Darryl C; Finkel, Richard; Muntoni, Francesco; Mercuri, Eugenio

2017-06-01

There is a growing need for a robust clinical measure to assess upper limb motor function in spinal muscular atrophy (SMA), as the available scales lack sensitivity at the extremes of the clinical spectrum. We report the development of the Revised Upper Limb Module (RULM), an assessment specifically designed for upper limb function in SMA patients. An international panel with specific neuromuscular expertise performed a thorough review of scales currently available to assess upper limb function in SMA. This review facilitated a revision of the existing upper limb function scales to make a more robust clinical scale. Multiple revisions of the scale included statistical analysis and captured clinically relevant changes to fulfill requirements by regulators and advocacy groups. The resulting RULM scale shows good reliability and validity, making it a suitable tool to assess upper extremity function in the SMA population for multi-center clinical research. Muscle Nerve 55: 869-874, 2017. © 2016 Wiley Periodicals, Inc.
Planck intermediate results. XVI. Profile likelihoods for cosmological parameters

NASA Astrophysics Data System (ADS)

Planck Collaboration; Ade, P. A. R.; Aghanim, N.; Arnaud, M.; Ashdown, M.; Aumont, J.; Baccigalupi, C.; Banday, A. J.; Barreiro, R. B.; Bartlett, J. G.; Battaner, E.; Benabed, K.; Benoit-Lévy, A.; Bernard, J.-P.; Bersanelli, M.; Bielewicz, P.; Bobin, J.; Bonaldi, A.; Bond, J. R.; Bouchet, F. R.; Burigana, C.; Cardoso, J.-F.; Catalano, A.; Chamballu, A.; Chiang, H. C.; Christensen, P. R.; Clements, D. L.; Colombi, S.; Colombo, L. P. L.; Couchot, F.; Cuttaia, F.; Danese, L.; Davies, R. D.; Davis, R. J.; de Bernardis, P.; de Rosa, A.; de Zotti, G.; Delabrouille, J.; Dickinson, C.; Diego, J. M.; Dole, H.; Donzelli, S.; Doré, O.; Douspis, M.; Dupac, X.; Enßlin, T. A.; Eriksen, H. K.; Finelli, F.; Forni, O.; Frailis, M.; Franceschi, E.; Galeotta, S.; Galli, S.; Ganga, K.; Giard, M.; Giraud-Héraud, Y.; González-Nuevo, J.; Górski, K. M.; Gregorio, A.; Gruppuso, A.; Hansen, F. K.; Harrison, D. L.; Henrot-Versillé, S.; Hernández-Monteagudo, C.; Herranz, D.; Hildebrandt, S. R.; Hivon, E.; Hobson, M.; Holmes, W. A.; Hornstrup, A.; Hovest, W.; Huffenberger, K. M.; Jaffe, A. H.; Jaffe, T. R.; Jones, W. C.; Juvela, M.; Keihänen, E.; Keskitalo, R.; Kisner, T. S.; Kneissl, R.; Knoche, J.; Knox, L.; Kunz, M.; Kurki-Suonio, H.; Lagache, G.; Lähteenmäki, A.; Lamarre, J.-M.; Lasenby, A.; Lawrence, C. R.; Leonardi, R.; Liddle, A.; Liguori, M.; Lilje, P. B.; Linden-Vørnle, M.; López-Caniego, M.; Lubin, P. M.; Macías-Pérez, J. F.; Maffei, B.; Maino, D.; Mandolesi, N.; Maris, M.; Martin, P. G.; Martínez-González, E.; Masi, S.; Massardi, M.; Matarrese, S.; Mazzotta, P.; Melchiorri, A.; Mendes, L.; Mennella, A.; Migliaccio, M.; Mitra, S.; Miville-Deschênes, M.-A.; Moneti, A.; Montier, L.; Morgante, G.; Munshi, D.; Murphy, J. A.; Naselsky, P.; Nati, F.; Natoli, P.; Noviello, F.; Novikov, D.; Novikov, I.; Oxborrow, C. A.; Pagano, L.; Pajot, F.; Paoletti, D.; Pasian, F.; Perdereau, O.; Perotto, L.; Perrotta, F.; Pettorino, V.; Piacentini, F.; Piat, M.; Pierpaoli, E.; Pietrobon, D.; Plaszczynski∗, S.; Pointecouteau, E.; Polenta, G.; Popa, L.; Pratt, G. W.; Puget, J.-L.; Rachen, J. P.; Rebolo, R.; Reinecke, M.; Remazeilles, M.; Renault, C.; Ricciardi, S.; Riller, T.; Ristorcelli, I.; Rocha, G.; Rosset, C.; Roudier, G.; Rouillé d'Orfeuil, B.; Rubiño-Martín, J. A.; Rusholme, B.; Sandri, M.; Savelainen, M.; Savini, G.; Spencer, L. D.; Spinelli, M.; Starck, J.-L.; Sureau, F.; Sutton, D.; Suur-Uski, A.-S.; Sygnet, J.-F.; Tauber, J. A.; Terenzi, L.; Toffolatti, L.; Tomasi, M.; Tristram, M.; Tucci, M.; Umana, G.; Valenziano, L.; Valiviita, J.; Van Tent, B.; Vielva, P.; Villa, F.; Wade, L. A.; Wandelt, B. D.; White, M.; Yvon, D.; Zacchei, A.; Zonca, A.

2014-06-01

We explore the 2013 Planck likelihood function with a high-precision multi-dimensional minimizer (Minuit). This allows a refinement of the ΛCDM best-fit solution with respect to previously-released results, and the construction of frequentist confidence intervals using profile likelihoods. The agreement with the cosmological results from the Bayesian framework is excellent, demonstrating the robustness of the Planck results to the statistical methodology. We investigate the inclusion of neutrino masses, where more significant differences may appear due to the non-Gaussian nature of the posterior mass distribution. By applying the Feldman-Cousins prescription, we again obtain results very similar to those of the Bayesian methodology. However, the profile-likelihood analysis of the cosmic microwave background (CMB) combination (Planck+WP+highL) reveals a minimum well within the unphysical negative-mass region. We show that inclusion of the Planck CMB-lensing information regularizes this issue, and provide a robust frequentist upper limit ∑ mν ≤ 0.26 eV (95% confidence) from the CMB+lensing+BAO data combination.
eSACP - a new Nordic initiative towards developing statistical climate services

NASA Astrophysics Data System (ADS)

Thorarinsdottir, Thordis; Thejll, Peter; Drews, Martin; Guttorp, Peter; Venälainen, Ari; Uotila, Petteri; Benestad, Rasmus; Mesquita, Michel d. S.; Madsen, Henrik; Fox Maule, Cathrine

2015-04-01

The Nordic research council NordForsk has recently announced its support for a new 3-year research initiative on "statistical analysis of climate projections" (eSACP). eSACP will focus on developing e-science tools and services based on statistical analysis of climate projections for the purpose of helping decision-makers and planners in the face of expected future challenges in regional climate change. The motivation behind the project is the growing recognition in our society that forecasts of future climate change is associated with various sources of uncertainty, and that any long-term planning and decision-making dependent on a changing climate must account for this. At the same time there is an obvious gap between scientists from different fields and between practitioners in terms of understanding how climate information relates to different parts of the "uncertainty cascade". In eSACP we will develop generic e-science tools and statistical climate services to facilitate the use of climate projections by decision-makers and scientists from all fields for climate impact analyses and for the development of robust adaptation strategies, which properly (in a statistical sense) account for the inherent uncertainty. The new tool will be publically available and include functionality to utilize the extensive and dynamically growing repositories of data and use state-of-the-art statistical techniques to quantify the uncertainty and innovative approaches to visualize the results. Such a tool will not only be valuable for future assessments and underpin the development of dedicated climate services, but will also assist the scientific community in making more clearly its case on the consequences of our changing climate to policy makers and the general public. The eSACP project is led by Thordis Thorarinsdottir, Norwegian Computing Center, and also includes the Finnish Meteorological Institute, the Norwegian Meteorological Institute, the Technical University of Denmark and the Bjerknes Centre for Climate Research, Norway. This poster will present details of focus areas in the project and show some examples of the expected analysis tools.
Mars Exploration Rover Six-Degree-Of-Freedom Entry Trajectory Analysis

NASA Technical Reports Server (NTRS)

Desai, Prasun N.; Schoenenberger, Mark; Cheatwood, F. M.

2003-01-01

The Mars Exploration Rover mission will be the next opportunity for surface exploration of Mars in January 2004. Two rovers will be delivered to the surface of Mars using the same entry, descent, and landing scenario that was developed and successfully implemented by Mars Pathfinder. This investigation describes the trajectory analysis that was performed for the hypersonic portion of the MER entry. In this analysis, a six-degree-of-freedom trajectory simulation of the entry is performed to determine the entry characteristics of the capsules. In addition, a Monte Carlo analysis is also performed to statistically assess the robustness of the entry design to off-nominal conditions to assure that all entry requirements are satisfied. The results show that the attitude at peak heating and parachute deployment are well within entry limits. In addition, the parachute deployment dynamics pressure and Mach number are also well within the design requirements.
Comprehensive machine learning analysis of Hydra behavior reveals a stable basal behavioral repertoire

PubMed Central

Taralova, Ekaterina; Dupre, Christophe; Yuste, Rafael

2018-01-01

Animal behavior has been studied for centuries, but few efficient methods are available to automatically identify and classify it. Quantitative behavioral studies have been hindered by the subjective and imprecise nature of human observation, and the slow speed of annotating behavioral data. Here, we developed an automatic behavior analysis pipeline for the cnidarian Hydra vulgaris using machine learning. We imaged freely behaving Hydra, extracted motion and shape features from the videos, and constructed a dictionary of visual features to classify pre-defined behaviors. We also identified unannotated behaviors with unsupervised methods. Using this analysis pipeline, we quantified 6 basic behaviors and found surprisingly similar behavior statistics across animals within the same species, regardless of experimental conditions. Our analysis indicates that the fundamental behavioral repertoire of Hydra is stable. This robustness could reflect a homeostatic neural control of "housekeeping" behaviors which could have been already present in the earliest nervous systems. PMID:29589829
The fragility of statistically significant findings from randomized trials in head and neck surgery.

PubMed

Noel, Christopher W; McMullen, Caitlin; Yao, Christopher; Monteiro, Eric; Goldstein, David P; Eskander, Antoine; de Almeida, John R

2018-04-23

The Fragility Index (FI) is a novel tool for evaluating the robustness of statistically significant findings in a randomized control trial (RCT). It measures the number of events upon which statistical significance depends. We sought to calculate the FI scores for RCTs in the head and neck cancer literature where surgery was a primary intervention. Potential articles were identified in PubMed (MEDLINE), Embase, and Cochrane without publication date restrictions. Two reviewers independently screened eligible RCTs reporting at least one dichotomous and statistically significant outcome. The data from each trial were extracted and the FI scores were calculated. Associations between trial characteristics and FI were determined. In total, 27 articles were identified. The median sample size was 67.5 (interquartile range [IQR] = 42-143) and the median number of events per trial was 8 (IQR = 2.25-18.25). The median FI score was 1 (IQR = 0-2.5), meaning that changing one patient from a nonevent to an event in the treatment arm would change the result to a statistically nonsignificant result, or P > .05. The FI score was less than the number of patients lost to follow-up in 71% of cases. The FI score was found to be moderately correlated with P value (ρ = -0.52, P = .007) and with journal impact factor (ρ = 0.49, P = .009) on univariable analysis. On multivariable analysis, only the P value was found to be a predictor of FI score (P = .001). Randomized trials in the head and neck cancer literature where surgery is a primary modality are relatively nonrobust statistically with low FI scores. Laryngoscope, 2018. © 2018 The American Laryngological, Rhinological and Otological Society, Inc.
Detection and Evaluation of Spatio-Temporal Spike Patterns in Massively Parallel Spike Train Data with SPADE.

PubMed

Quaglio, Pietro; Yegenoglu, Alper; Torre, Emiliano; Endres, Dominik M; Grün, Sonja

2017-01-01

Repeated, precise sequences of spikes are largely considered a signature of activation of cell assemblies. These repeated sequences are commonly known under the name of spatio-temporal patterns (STPs). STPs are hypothesized to play a role in the communication of information in the computational process operated by the cerebral cortex. A variety of statistical methods for the detection of STPs have been developed and applied to electrophysiological recordings, but such methods scale poorly with the current size of available parallel spike train recordings (more than 100 neurons). In this work, we introduce a novel method capable of overcoming the computational and statistical limits of existing analysis techniques in detecting repeating STPs within massively parallel spike trains (MPST). We employ advanced data mining techniques to efficiently extract repeating sequences of spikes from the data. Then, we introduce and compare two alternative approaches to distinguish statistically significant patterns from chance sequences. The first approach uses a measure known as conceptual stability, of which we investigate a computationally cheap approximation for applications to such large data sets. The second approach is based on the evaluation of pattern statistical significance. In particular, we provide an extension to STPs of a method we recently introduced for the evaluation of statistical significance of synchronous spike patterns. The performance of the two approaches is evaluated in terms of computational load and statistical power on a variety of artificial data sets that replicate specific features of experimental data. Both methods provide an effective and robust procedure for detection of STPs in MPST data. The method based on significance evaluation shows the best overall performance, although at a higher computational cost. We name the novel procedure the spatio-temporal Spike PAttern Detection and Evaluation (SPADE) analysis.
Detection and Evaluation of Spatio-Temporal Spike Patterns in Massively Parallel Spike Train Data with SPADE

PubMed Central

Quaglio, Pietro; Yegenoglu, Alper; Torre, Emiliano; Endres, Dominik M.; Grün, Sonja

2017-01-01

Repeated, precise sequences of spikes are largely considered a signature of activation of cell assemblies. These repeated sequences are commonly known under the name of spatio-temporal patterns (STPs). STPs are hypothesized to play a role in the communication of information in the computational process operated by the cerebral cortex. A variety of statistical methods for the detection of STPs have been developed and applied to electrophysiological recordings, but such methods scale poorly with the current size of available parallel spike train recordings (more than 100 neurons). In this work, we introduce a novel method capable of overcoming the computational and statistical limits of existing analysis techniques in detecting repeating STPs within massively parallel spike trains (MPST). We employ advanced data mining techniques to efficiently extract repeating sequences of spikes from the data. Then, we introduce and compare two alternative approaches to distinguish statistically significant patterns from chance sequences. The first approach uses a measure known as conceptual stability, of which we investigate a computationally cheap approximation for applications to such large data sets. The second approach is based on the evaluation of pattern statistical significance. In particular, we provide an extension to STPs of a method we recently introduced for the evaluation of statistical significance of synchronous spike patterns. The performance of the two approaches is evaluated in terms of computational load and statistical power on a variety of artificial data sets that replicate specific features of experimental data. Both methods provide an effective and robust procedure for detection of STPs in MPST data. The method based on significance evaluation shows the best overall performance, although at a higher computational cost. We name the novel procedure the spatio-temporal Spike PAttern Detection and Evaluation (SPADE) analysis. PMID:28596729

Robust control for fractional variable-order chaotic systems with non-singular kernel

NASA Astrophysics Data System (ADS)

Zuñiga-Aguilar, C. J.; Gómez-Aguilar, J. F.; Escobar-Jiménez, R. F.; Romero-Ugalde, H. M.

2018-01-01

This paper investigates the chaos control for a class of variable-order fractional chaotic systems using robust control strategy. The variable-order fractional models of the non-autonomous biological system, the King Cobra chaotic system, the Halvorsen's attractor and the Burke-Shaw system, have been derived using the fractional-order derivative with Mittag-Leffler in the Liouville-Caputo sense. The fractional differential equations and the control law were solved using the Adams-Bashforth-Moulton algorithm. To test the control stability efficiency, different statistical indicators were introduced. Finally, simulation results demonstrate the effectiveness of the proposed robust control.
Efficient Robust Regression via Two-Stage Generalized Empirical Likelihood

PubMed Central

Bondell, Howard D.; Stefanski, Leonard A.

2013-01-01

Large- and finite-sample efficiency and resistance to outliers are the key goals of robust statistics. Although often not simultaneously attainable, we develop and study a linear regression estimator that comes close. Efficiency obtains from the estimator’s close connection to generalized empirical likelihood, and its favorable robustness properties are obtained by constraining the associated sum of (weighted) squared residuals. We prove maximum attainable finite-sample replacement breakdown point, and full asymptotic efficiency for normal errors. Simulation evidence shows that compared to existing robust regression estimators, the new estimator has relatively high efficiency for small sample sizes, and comparable outlier resistance. The estimator is further illustrated and compared to existing methods via application to a real data set with purported outliers. PMID:23976805
Hierarchical modeling and robust synthesis for the preliminary design of large scale complex systems

NASA Astrophysics Data System (ADS)

Koch, Patrick Nathan

Large-scale complex systems are characterized by multiple interacting subsystems and the analysis of multiple disciplines. The design and development of such systems inevitably requires the resolution of multiple conflicting objectives. The size of complex systems, however, prohibits the development of comprehensive system models, and thus these systems must be partitioned into their constituent parts. Because simultaneous solution of individual subsystem models is often not manageable iteration is inevitable and often excessive. In this dissertation these issues are addressed through the development of a method for hierarchical robust preliminary design exploration to facilitate concurrent system and subsystem design exploration, for the concurrent generation of robust system and subsystem specifications for the preliminary design of multi-level, multi-objective, large-scale complex systems. This method is developed through the integration and expansion of current design techniques: (1) Hierarchical partitioning and modeling techniques for partitioning large-scale complex systems into more tractable parts, and allowing integration of subproblems for system synthesis, (2) Statistical experimentation and approximation techniques for increasing both the efficiency and the comprehensiveness of preliminary design exploration, and (3) Noise modeling techniques for implementing robust preliminary design when approximate models are employed. The method developed and associated approaches are illustrated through their application to the preliminary design of a commercial turbofan turbine propulsion system; the turbofan system-level problem is partitioned into engine cycle and configuration design and a compressor module is integrated for more detailed subsystem-level design exploration, improving system evaluation.
A statistical framework for neuroimaging data analysis based on mutual information estimated via a gaussian copula.

PubMed

Ince, Robin A A; Giordano, Bruno L; Kayser, Christoph; Rousselet, Guillaume A; Gross, Joachim; Schyns, Philippe G

2017-03-01

We begin by reviewing the statistical framework of information theory as applicable to neuroimaging data analysis. A major factor hindering wider adoption of this framework in neuroimaging is the difficulty of estimating information theoretic quantities in practice. We present a novel estimation technique that combines the statistical theory of copulas with the closed form solution for the entropy of Gaussian variables. This results in a general, computationally efficient, flexible, and robust multivariate statistical framework that provides effect sizes on a common meaningful scale, allows for unified treatment of discrete, continuous, unidimensional and multidimensional variables, and enables direct comparisons of representations from behavioral and brain responses across any recording modality. We validate the use of this estimate as a statistical test within a neuroimaging context, considering both discrete stimulus classes and continuous stimulus features. We also present examples of analyses facilitated by these developments, including application of multivariate analyses to MEG planar magnetic field gradients, and pairwise temporal interactions in evoked EEG responses. We show the benefit of considering the instantaneous temporal derivative together with the raw values of M/EEG signals as a multivariate response, how we can separately quantify modulations of amplitude and direction for vector quantities, and how we can measure the emergence of novel information over time in evoked responses. Open-source Matlab and Python code implementing the new methods accompanies this article. Hum Brain Mapp 38:1541-1573, 2017. © 2016 Wiley Periodicals, Inc. 2016 The Authors Human Brain Mapping Published by Wiley Periodicals, Inc.
IsoMAP (Isoscape Modeling, Analysis, and Prediction)

NASA Astrophysics Data System (ADS)

Miller, C. C.; Bowen, G. J.; Zhang, T.; Zhao, L.; West, J. B.; Liu, Z.; Rapolu, N.

2009-12-01

IsoMAP is a TeraGrid-based web portal aimed at building the infrastructure that brings together distributed multi-scale and multi-format geospatial datasets to enable statistical analysis and modeling of environmental isotopes. A typical workflow enabled by the portal includes (1) data source exploration and selection, (2) statistical analysis and model development; (3) predictive simulation of isotope distributions using models developed in (1) and (2); (4) analysis and interpretation of simulated spatial isotope distributions (e.g., comparison with independent observations, pattern analysis). The gridded models and data products created by one user can be shared and reused among users within the portal, enabling collaboration and knowledge transfer. This infrastructure and the research it fosters can lead to fundamental changes in our knowledge of the water cycle and ecological and biogeochemical processes through analysis of network-based isotope data, but it will be important A) that those with whom the data and models are shared can be sure of the origin, quality, inputs, and processing history of these products, and B) the system is agile and intuitive enough to facilitate this sharing (rather than just ‘allow’ it). IsoMAP researchers are therefore building into the portal’s architecture several components meant to increase the amount of metadata about users’ products and to repurpose those metadata to make sharing and discovery more intuitive and robust to both expected, professional users as well as unforeseeable populations from other sectors.
On the Interplay between the Evolvability and Network Robustness in an Evolutionary Biological Network: A Systems Biology Approach

PubMed Central

Chen, Bor-Sen; Lin, Ying-Po

2011-01-01

In the evolutionary process, the random transmission and mutation of genes provide biological diversities for natural selection. In order to preserve functional phenotypes between generations, gene networks need to evolve robustly under the influence of random perturbations. Therefore, the robustness of the phenotype, in the evolutionary process, exerts a selection force on gene networks to keep network functions. However, gene networks need to adjust, by variations in genetic content, to generate phenotypes for new challenges in the network’s evolution, ie, the evolvability. Hence, there should be some interplay between the evolvability and network robustness in evolutionary gene networks. In this study, the interplay between the evolvability and network robustness of a gene network and a biochemical network is discussed from a nonlinear stochastic system point of view. It was found that if the genetic robustness plus environmental robustness is less than the network robustness, the phenotype of the biological network is robust in evolution. The tradeoff between the genetic robustness and environmental robustness in evolution is discussed from the stochastic stability robustness and sensitivity of the nonlinear stochastic biological network, which may be relevant to the statistical tradeoff between bias and variance, the so-called bias/variance dilemma. Further, the tradeoff could be considered as an antagonistic pleiotropic action of a gene network and discussed from the systems biology perspective. PMID:22084563
Comparing Networks from a Data Analysis Perspective

NASA Astrophysics Data System (ADS)

Li, Wei; Yang, Jing-Yu

To probe network characteristics, two predominant ways of network comparison are global property statistics and subgraph enumeration. However, they suffer from limited information and exhaustible computing. Here, we present an approach to compare networks from the perspective of data analysis. Initially, the approach projects each node of original network as a high-dimensional data point, and the network is seen as clouds of data points. Then the dispersion information of the principal component analysis (PCA) projection of the generated data clouds can be used to distinguish networks. We applied this node projection method to the yeast protein-protein interaction networks and the Internet Autonomous System networks, two types of networks with several similar higher properties. The method can efficiently distinguish one from the other. The identical result of different datasets from independent sources also indicated that the method is a robust and universal framework.
DOE Office of Scientific and Technical Information (OSTI.GOV)

P-Mart was designed specifically to allow cancer researchers to perform robust statistical processing of publicly available cancer proteomic datasets. To date an online statistical processing suite for proteomics does not exist. The P-Mart software is designed to allow statistical programmers to utilize these algorithms through packages in the R programming language as well as offering a web-based interface using the Azure cloud technology. The Azure cloud technology also allows the release of the software via Docker containers.
Data Analysis and Statistical Methods for the Assessment and Interpretation of Geochronologic Data

NASA Astrophysics Data System (ADS)

Reno, B. L.; Brown, M.; Piccoli, P. M.

2007-12-01

Ages are traditionally reported as a weighted mean with an uncertainty based on least squares analysis of analytical error on individual dates. This method does not take into account geological uncertainties, and cannot accommodate asymmetries in the data. In most instances, this method will understate uncertainty on a given age, which may lead to over interpretation of age data. Geologic uncertainty is difficult to quantify, but is typically greater than analytical uncertainty. These factors make traditional statistical approaches inadequate to fully evaluate geochronologic data. We propose a protocol to assess populations within multi-event datasets and to calculate age and uncertainty from each population of dates interpreted to represent a single geologic event using robust and resistant statistical methods. To assess whether populations thought to represent different events are statistically separate exploratory data analysis is undertaken using a box plot, where the range of the data is represented by a 'box' of length given by the interquartile range, divided at the median of the data, with 'whiskers' that extend to the furthest datapoint that lies within 1.5 times the interquartile range beyond the box. If the boxes representing the populations do not overlap, they are interpreted to represent statistically different sets of dates. Ages are calculated from statistically distinct populations using a robust tool such as the tanh method of Kelsey et al. (2003, CMP, 146, 326-340), which is insensitive to any assumptions about the underlying probability distribution from which the data are drawn. Therefore, this method takes into account the full range of data, and is not drastically affected by outliers. The interquartile range of each population of dates (the interquartile range) gives a first pass at expressing uncertainty, which accommodates asymmetry in the dataset; outliers have a minor affect on the uncertainty. To better quantify the uncertainty, a resistant tool that is insensitive to local misbehavior of data is preferred, such as the normalized median absolute deviations proposed by Powell et al. (2002, Chem Geol, 185, 191-204). We illustrate the method using a dataset of 152 monazite dates determined using EPMA chemical data from a single sample from the Neoproterozoic Brasília Belt, Brazil. Results are compared with ages and uncertainties calculated using traditional methods to demonstrate the differences. The dataset was manually culled into three populations representing discrete compositional domains within chemically-zoned monazite grains. The weighted mean ages and least squares uncertainties for these populations are 633±6 (2σ) Ma for a core domain, 614±5 (2σ) Ma for an intermediate domain and 595±6 (2σ) Ma for a rim domain. Probability distribution plots indicate asymmetric distributions of all populations, which cannot be accounted for with traditional statistical tools. These three domains record distinct ages outside the interquartile range for each population of dates, with the core domain lying in the subrange 642-624 Ma, the intermediate domain 617-609 Ma and the rim domain 606-589 Ma. The tanh estimator yields ages of 631±7 (2σ) for the core domain, 616±7 (2σ) for the intermediate domain and 601±8 (2σ) for the rim domain. Whereas the uncertainties derived using a resistant statistical tool are larger than those derived from traditional statistical tools, the method yields more realistic uncertainties that better address the spread in the dataset and account for asymmetry in the data.
Principal Component Analysis in the Spectral Analysis of the Dynamic Laser Speckle Patterns

NASA Astrophysics Data System (ADS)

Ribeiro, K. M.; Braga, R. A., Jr.; Horgan, G. W.; Ferreira, D. D.; Safadi, T.

2014-02-01

Dynamic laser speckle is a phenomenon that interprets an optical patterns formed by illuminating a surface under changes with coherent light. Therefore, the dynamic change of the speckle patterns caused by biological material is known as biospeckle. Usually, these patterns of optical interference evolving in time are analyzed by graphical or numerical methods, and the analysis in frequency domain has also been an option, however involving large computational requirements which demands new approaches to filter the images in time. Principal component analysis (PCA) works with the statistical decorrelation of data and it can be used as a data filtering. In this context, the present work evaluated the PCA technique to filter in time the data from the biospeckle images aiming the reduction of time computer consuming and improving the robustness of the filtering. It was used 64 images of biospeckle in time observed in a maize seed. The images were arranged in a data matrix and statistically uncorrelated by PCA technique, and the reconstructed signals were analyzed using the routine graphical and numerical methods to analyze the biospeckle. Results showed the potential of the PCA tool in filtering the dynamic laser speckle data, with the definition of markers of principal components related to the biological phenomena and with the advantage of fast computational processing.
Extracting neuronal functional network dynamics via adaptive Granger causality analysis.

PubMed

Sheikhattar, Alireza; Miran, Sina; Liu, Ji; Fritz, Jonathan B; Shamma, Shihab A; Kanold, Patrick O; Babadi, Behtash

2018-04-24

Quantifying the functional relations between the nodes in a network based on local observations is a key challenge in studying complex systems. Most existing time series analysis techniques for this purpose provide static estimates of the network properties, pertain to stationary Gaussian data, or do not take into account the ubiquitous sparsity in the underlying functional networks. When applied to spike recordings from neuronal ensembles undergoing rapid task-dependent dynamics, they thus hinder a precise statistical characterization of the dynamic neuronal functional networks underlying adaptive behavior. We develop a dynamic estimation and inference paradigm for extracting functional neuronal network dynamics in the sense of Granger, by integrating techniques from adaptive filtering, compressed sensing, point process theory, and high-dimensional statistics. We demonstrate the utility of our proposed paradigm through theoretical analysis, algorithm development, and application to synthetic and real data. Application of our techniques to two-photon Ca 2+ imaging experiments from the mouse auditory cortex reveals unique features of the functional neuronal network structures underlying spontaneous activity at unprecedented spatiotemporal resolution. Our analysis of simultaneous recordings from the ferret auditory and prefrontal cortical areas suggests evidence for the role of rapid top-down and bottom-up functional dynamics across these areas involved in robust attentive behavior.
AG Channel Measurement and Modeling Results for Over-Water and Hilly Terrain Conditions

NASA Technical Reports Server (NTRS)

Matolak, David W.; Sun, Ruoyu

2015-01-01

This report describes work completed over the past year on our project, entitled "Unmanned Aircraft Systems (UAS) Research: The AG Channel, Robust Waveforms, and Aeronautical Network Simulations." This project is funded under the NASA project "Unmanned Aircraft Systems (UAS) in the National Airspace System (NAS)." In this report we provide the following: an update on project progress; a description of the over-freshwater and hilly terrain initial results on path loss, delay spread, small-scale fading, and correlations; complete path loss models for the over-water AG channels; analysis for obtaining parameter statistics required for development of accurate wideband AG channel models; and analysis of an atypical AG channel in which the aircraft flies out of the ground site antenna main beam. We have modeled the small-scale fading of these channels with Ricean statistics, and have quantified the behavior of the Ricean K-factor. We also provide some results for correlations of signal components, both intra-band and inter-band. An updated literature review, and a summary that also describes future work, are also included.
Coherent instability in wall-bounded turbulence

NASA Astrophysics Data System (ADS)

Hack, M. J. Philipp

2017-11-01

Hairpin vortices are commonly considered one of the major classes of coherent fluid motions in shear layers, even as their significance in the grand scheme of turbulence has remained an openly debated question. The statistical prevalence of the dynamic process that gives rise to the hairpins across different types of flows suggests an origin in a robust common mechanism triggered by conditions widespread in wall-bounded shear layers. This study seeks to shed light on the physical process which drives the generation of hairpin vortices. It is primarily facilitated through an algorithm based on concepts developed in the field of computer vision which allows the topological identification and analysis of coherent flow processes across multiple scales. Application to direct numerical simulations of boundary layers enables the time-resolved sampling and exploration of the hairpin process in natural flow. The analysis yields rich statistical results which lead to a refined characterization of the hairpin process. Linear stability theory offers further insight into the flow physics and especially into the connection between the hairpin and exponential amplification mechanisms. The results also provide a sharpened understanding of the underlying causality of events.
Measuring outcome from vestibular rehabilitation, part II: refinement and validation of a new self-report measure.

PubMed

Morris, Anna E; Lutman, Mark E; Yardley, Lucy

2009-01-01

A prototype self-report measure of vestibular rehabilitation outcome is described in a previous paper. The objectives of the present work were to identify the most useful items and assess their psychometric properties. Stage 1: One hundred fifty-five participants completed a prototype 36-item Vestibular Rehabilitation Benefit Questionnaire (VRBQ). Statistical analysis demonstrated its subscale structure and identified redundant items. Stage 2: One hundred twenty-four participants completed a refined 22-item VRBQ and three established questionnaires (Dizziness Handicap Inventory, DHI; Vertigo Symptom Scale short form, VSS-sf; Medical Outcomes Study short form 36, SF-36) in a longitudinal study. Statistical analysis revealed four internally consistent subscales of the VRBQ: Dizziness, Anxiety, Motion-Provoked Dizziness, and Quality of Life. Correlations with the DHI, VSS-sf, and SF-36 support the validity of the VRBQ, and effect size estimates suggest that the VRBQ is more responsive than comparable questionnaires. Twenty participants completed the VRBQ twice in a 24-hour period, indicating excellent test-retest reliability. The VRBQ appears to be a concise and psychometrically robust questionnaire that addresses the main aspects of dizziness impact.
Mapping probabilities of extreme continental water storage changes from space gravimetry

NASA Astrophysics Data System (ADS)

Kusche, J.; Eicker, A.; Forootan, E.; Springer, A.; Longuevergne, L.

2016-12-01

Using data from the Gravity Recovery and Climate Experiment (GRACE) mission, we derive statistically robust 'hotspot' regions of high probability of peak anomalous - i.e. with respect to the seasonal cycle - water storage (of up to 0.7 m one-in-five-year return level) and flux (up to 0.14 m/mon). Analysis of, and comparison with, up to 32 years of ERA-Interim reanalysis fields reveals generally good agreement of these hotspot regions to GRACE results, and that most exceptions are located in the Tropics. However, a simulation experiment reveals that differences observed by GRACE are statistically significant, and further error analysis suggests that by around the year 2020 it will be possible to detect temporal changes in the frequency of extreme total fluxes (i.e. combined effects of mainly precipitation and floods) for at least 10-20% of the continental area, assuming that we have a continuation of GRACE by its follow-up GRACE-FO. J. Kusche et al. (2016): Mapping probabilities of extreme continental water storage changes from space gravimetry, Geophysical Research Letters, accepted online, doi:10.1002/2016GL069538
Regional Patterns and Spatial Clusters of Nonstationarities in Annual Peak Instantaneous Streamflow

NASA Astrophysics Data System (ADS)

White, K. D.; Baker, B.; Mueller, C.; Villarini, G.; Foley, P.; Friedman, D.

2017-12-01

Information about hydrologic changes resulting from changes in climate, land use, and land cover is a necessity planning and design or water resources infrastructure. The United States Army Corps of Engineers (USACE) evaluated and selected 12 methods to detect abrupt and slowly varying nonstationarities in records of maximum peak annual flows. They deployed a publicly available tool[1]in 2016 and a guidance document in 2017 to support identification of nonstationarities in a reproducible manner using a robust statistical framework. This statistical framework has now been applied to streamflow records across the continental United States to explore the presence of regional patterns and spatial clusters of nonstationarities in peak annual flow. Incorporating this geographic dimension into the detection of nonstationarities provides valuable insight for the process of attribution of these significant changes. This poster summarizes the methods used and provides the results of the regional analysis. [1] Available here - http://www.corpsclimate.us/ptcih.cfm
The contribution of executive functions to emergent mathematic skills in preschool children.

PubMed

Espy, Kimberly Andrews; McDiarmid, Melanie M; Cwik, Mary F; Stalets, Melissa Meade; Hamby, Arlena; Senn, Theresa E

2004-01-01

Mathematical ability is related to both activation of the prefrontal cortex in neuroimaging studies of adults and to executive functions in school-age children. The purpose of this study was to determine whether executive functions were related to emergent mathematical proficiency in preschool children. Preschool children (N = 96) were administered an executive function battery that was reduced empirically to working memory (WM), inhibitory control (IC), and shifting abilities by calculating composite scores derived from principal component analysis. Both WM and IC predicted early arithmetic competency, with the observed relations robust after controlling statistically for child age, maternal education, and child vocabulary. Only IC accounted for unique variance in mathematical skills, after the contribution of other executive functions were controlled statistically as well. Specific executive functions are related to emergent mathematical proficiency in this age range. Longitudinal studies using structural equation modeling are necessary to better characterize these ontogenetic relations.
Replication, lies and lesser-known truths regarding experimental design in environmental microbiology.

PubMed

Lennon, Jay T

2011-06-01

A recent analysis revealed that most environmental microbiologists neglect replication in their science (Prosser, 2010). Of all peer-reviewed papers published during 2009 in the field's leading journals, slightly more than 70% lacked replication when it came to analyzing microbial community data. The paucity of replication is viewed as an 'endemic' and 'embarrassing' problem that amounts to 'bad science', or worse yet, as the title suggests, lying (Prosser, 2010). Although replication is an important component of experimental design, it is possible to do good science without replication. There are various quantitative techniques - some old, some new - that, when used properly, will allow environmental microbiologists to make strong statistical conclusions from experimental and comparative data. Here, I provide examples where unreplicated data can be used to test hypotheses and yield novel information in a statistically robust manner. © 2011 Society for Applied Microbiology and Blackwell Publishing Ltd.
An On-Demand Optical Quantum Random Number Generator with In-Future Action and Ultra-Fast Response

PubMed Central

Stipčević, Mario; Ursin, Rupert

2015-01-01

Random numbers are essential for our modern information based society e.g. in cryptography. Unlike frequently used pseudo-random generators, physical random number generators do not depend on complex algorithms but rather on a physicsal process to provide true randomness. Quantum random number generators (QRNG) do rely on a process, wich can be described by a probabilistic theory only, even in principle. Here we present a conceptualy simple implementation, which offers a 100% efficiency of producing a random bit upon a request and simultaneously exhibits an ultra low latency. A careful technical and statistical analysis demonstrates its robustness against imperfections of the actual implemented technology and enables to quickly estimate randomness of very long sequences. Generated random numbers pass standard statistical tests without any post-processing. The setup described, as well as the theory presented here, demonstrate the maturity and overall understanding of the technology. PMID:26057576
Asymmetry of projected increases in extreme temperature distributions

PubMed Central

Kodra, Evan; Ganguly, Auroop R.

2014-01-01

A statistical analysis reveals projections of consistently larger increases in the highest percentiles of summer and winter temperature maxima and minima versus the respective lowest percentiles, resulting in a wider range of temperature extremes in the future. These asymmetric changes in tail distributions of temperature appear robust when explored through 14 CMIP5 climate models and three reanalysis datasets. Asymmetry of projected increases in temperature extremes generalizes widely. Magnitude of the projected asymmetry depends significantly on region, season, land-ocean contrast, and climate model variability as well as whether the extremes of consideration are seasonal minima or maxima events. An assessment of potential physical mechanisms provides support for asymmetric tail increases and hence wider temperature extremes ranges, especially for northern winter extremes. These results offer statistically grounded perspectives on projected changes in the IPCC-recommended extremes indices relevant for impacts and adaptation studies. PMID:25073751

DOE Office of Scientific and Technical Information (OSTI.GOV)

Medeiros, Brian; Williamson, David L.; Olson, Jerry G.

In this study, fundamental characteristics of the aquaplanet climate simulated by the Community Atmosphere Model, Version 5.3 (CAM5.3) are presented. The assumptions and simplifications of the configuration are described. A 16 year long, perpetual equinox integration with prescribed SST using the model’s standard 18 grid spacing is presented as a reference simulation. Statistical analysis is presented that shows similar aquaplanet configurations can be run for about 2 years to obtain robust climatological structures, including global and zonal means, eddy statistics, and precipitation distributions. Such a simulation can be compared to the reference simulation to discern differences in the climate, includingmore » an assessment of confidence in the differences. To aid such comparisons, the reference simulation has been made available via earthsystemgrid.org. Examples are shown comparing the reference simulation with simulations from the CAM5 series that make different microphysical assumptions and use a different dynamical core.« less
A generalized Grubbs-Beck test statistic for detecting multiple potentially influential low outliers in flood series

USGS Publications Warehouse

Cohn, T.A.; England, J.F.; Berenbrock, C.E.; Mason, R.R.; Stedinger, J.R.; Lamontagne, J.R.

2013-01-01

he Grubbs-Beck test is recommended by the federal guidelines for detection of low outliers in flood flow frequency computation in the United States. This paper presents a generalization of the Grubbs-Beck test for normal data (similar to the Rosner (1983) test; see also Spencer and McCuen (1996)) that can provide a consistent standard for identifying multiple potentially influential low flows. In cases where low outliers have been identified, they can be represented as “less-than” values, and a frequency distribution can be developed using censored-data statistical techniques, such as the Expected Moments Algorithm. This approach can improve the fit of the right-hand tail of a frequency distribution and provide protection from lack-of-fit due to unimportant but potentially influential low flows (PILFs) in a flood series, thus making the flood frequency analysis procedure more robust.
Reference aquaplanet climate in the Community Atmosphere Model, Version 5

DOE PAGES

Medeiros, Brian; Williamson, David L.; Olson, Jerry G.

2016-03-18

In this study, fundamental characteristics of the aquaplanet climate simulated by the Community Atmosphere Model, Version 5.3 (CAM5.3) are presented. The assumptions and simplifications of the configuration are described. A 16 year long, perpetual equinox integration with prescribed SST using the model’s standard 18 grid spacing is presented as a reference simulation. Statistical analysis is presented that shows similar aquaplanet configurations can be run for about 2 years to obtain robust climatological structures, including global and zonal means, eddy statistics, and precipitation distributions. Such a simulation can be compared to the reference simulation to discern differences in the climate, includingmore » an assessment of confidence in the differences. To aid such comparisons, the reference simulation has been made available via earthsystemgrid.org. Examples are shown comparing the reference simulation with simulations from the CAM5 series that make different microphysical assumptions and use a different dynamical core.« less
A Statistics-Based Cracking Criterion of Resin-Bonded Silica Sand for Casting Process Simulation

NASA Astrophysics Data System (ADS)

Wang, Huimin; Lu, Yan; Ripplinger, Keith; Detwiler, Duane; Luo, Alan A.

2017-02-01

Cracking of sand molds/cores can result in many casting defects such as veining. A robust cracking criterion is needed in casting process simulation for predicting/controlling such defects. A cracking probability map, relating to fracture stress and effective volume, was proposed for resin-bonded silica sand based on Weibull statistics. Three-point bending test results of sand samples were used to generate the cracking map and set up a safety line for cracking criterion. Tensile test results confirmed the accuracy of the safety line for cracking prediction. A laboratory casting experiment was designed and carried out to predict cracking of a cup mold during aluminum casting. The stress-strain behavior and the effective volume of the cup molds were calculated using a finite element analysis code ProCAST®. Furthermore, an energy dispersive spectroscopy fractographic examination of the sand samples confirmed the binder cracking in resin-bonded silica sand.
A generalized Grubbs-Beck test statistic for detecting multiple potentially influential low outliers in flood series

NASA Astrophysics Data System (ADS)

Cohn, T. A.; England, J. F.; Berenbrock, C. E.; Mason, R. R.; Stedinger, J. R.; Lamontagne, J. R.

2013-08-01

The Grubbs-Beck test is recommended by the federal guidelines for detection of low outliers in flood flow frequency computation in the United States. This paper presents a generalization of the Grubbs-Beck test for normal data (similar to the Rosner (1983) test; see also Spencer and McCuen (1996)) that can provide a consistent standard for identifying multiple potentially influential low flows. In cases where low outliers have been identified, they can be represented as "less-than" values, and a frequency distribution can be developed using censored-data statistical techniques, such as the Expected Moments Algorithm. This approach can improve the fit of the right-hand tail of a frequency distribution and provide protection from lack-of-fit due to unimportant but potentially influential low flows (PILFs) in a flood series, thus making the flood frequency analysis procedure more robust.
Temperature- and composition-dependent hydrogen diffusivity in palladium from statistically-averaged molecular dynamics

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhou, Xiaowang; Heo, Tae Wook; Wood, Brandon C.

Solid-state hydrogen storage materials undergo complex phase transformations whose kinetics is often limited by hydrogen diffusion. Among metal hydrides, palladium hydride undergoes a diffusional phase transformation upon hydrogen uptake, during which the hydrogen diffusivity varies with hydrogen composition and temperature. Here we perform robust statistically-averaged molecular dynamics simulations to obtain a well-converged analytical expression for hydrogen diffusivity in bulk palladium that is valid throughout all stages of the reaction. Our studies confirm significant dependence of the diffusivity on composition and temperature that elucidate key trends in the available experimental measurements. Whereas at low hydrogen compositions, a single process dominates, atmore » high hydrogen compositions, diffusion is found to exhibit behavior consistent with multiple hopping barriers. Further analysis, supported by nudged elastic band computations, suggests that the multi-barrier diffusion can be interpreted as two distinct mechanisms corresponding to hydrogen-rich and hydrogen-poor local environments.« less
Temperature- and composition-dependent hydrogen diffusivity in palladium from statistically-averaged molecular dynamics

DOE PAGES

Zhou, Xiaowang; Heo, Tae Wook; Wood, Brandon C.; ...

2018-03-09

Solid-state hydrogen storage materials undergo complex phase transformations whose kinetics is often limited by hydrogen diffusion. Among metal hydrides, palladium hydride undergoes a diffusional phase transformation upon hydrogen uptake, during which the hydrogen diffusivity varies with hydrogen composition and temperature. Here we perform robust statistically-averaged molecular dynamics simulations to obtain a well-converged analytical expression for hydrogen diffusivity in bulk palladium that is valid throughout all stages of the reaction. Our studies confirm significant dependence of the diffusivity on composition and temperature that elucidate key trends in the available experimental measurements. Whereas at low hydrogen compositions, a single process dominates, atmore » high hydrogen compositions, diffusion is found to exhibit behavior consistent with multiple hopping barriers. Further analysis, supported by nudged elastic band computations, suggests that the multi-barrier diffusion can be interpreted as two distinct mechanisms corresponding to hydrogen-rich and hydrogen-poor local environments.« less
Novel Kalman Filter Algorithm for Statistical Monitoring of Extensive Landscapes with Synoptic Sensor Data

PubMed Central

Czaplewski, Raymond L.

2015-01-01

Wall-to-wall remotely sensed data are increasingly available to monitor landscape dynamics over large geographic areas. However, statistical monitoring programs that use post-stratification cannot fully utilize those sensor data. The Kalman filter (KF) is an alternative statistical estimator. I develop a new KF algorithm that is numerically robust with large numbers of study variables and auxiliary sensor variables. A National Forest Inventory (NFI) illustrates application within an official statistics program. Practical recommendations regarding remote sensing and statistical issues are offered. This algorithm has the potential to increase the value of synoptic sensor data for statistical monitoring of large geographic areas. PMID:26393588
Robust logistic regression to narrow down the winner's curse for rare and recessive susceptibility variants.

PubMed

Kesselmeier, Miriam; Lorenzo Bermejo, Justo

2017-11-01

Logistic regression is the most common technique used for genetic case-control association studies. A disadvantage of standard maximum likelihood estimators of the genotype relative risk (GRR) is their strong dependence on outlier subjects, for example, patients diagnosed at unusually young age. Robust methods are available to constrain outlier influence, but they are scarcely used in genetic studies. This article provides a non-intimidating introduction to robust logistic regression, and investigates its benefits and limitations in genetic association studies. We applied the bounded Huber and extended the R package 'robustbase' with the re-descending Hampel functions to down-weight outlier influence. Computer simulations were carried out to assess the type I error rate, mean squared error (MSE) and statistical power according to major characteristics of the genetic study and investigated markers. Simulations were complemented with the analysis of real data. Both standard and robust estimation controlled type I error rates. Standard logistic regression showed the highest power but standard GRR estimates also showed the largest bias and MSE, in particular for associated rare and recessive variants. For illustration, a recessive variant with a true GRR=6.32 and a minor allele frequency=0.05 investigated in a 1000 case/1000 control study by standard logistic regression resulted in power=0.60 and MSE=16.5. The corresponding figures for Huber-based estimation were power=0.51 and MSE=0.53. Overall, Hampel- and Huber-based GRR estimates did not differ much. Robust logistic regression may represent a valuable alternative to standard maximum likelihood estimation when the focus lies on risk prediction rather than identification of susceptibility variants. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Quantized correlation coefficient for measuring reproducibility of ChIP-chip data.

PubMed

Peng, Shouyong; Kuroda, Mitzi I; Park, Peter J

2010-07-27

Chromatin immunoprecipitation followed by microarray hybridization (ChIP-chip) is used to study protein-DNA interactions and histone modifications on a genome-scale. To ensure data quality, these experiments are usually performed in replicates, and a correlation coefficient between replicates is used often to assess reproducibility. However, the correlation coefficient can be misleading because it is affected not only by the reproducibility of the signal but also by the amount of binding signal present in the data. We develop the Quantized correlation coefficient (QCC) that is much less dependent on the amount of signal. This involves discretization of data into set of quantiles (quantization), a merging procedure to group the background probes, and recalculation of the Pearson correlation coefficient. This procedure reduces the influence of the background noise on the statistic, which then properly focuses more on the reproducibility of the signal. The performance of this procedure is tested in both simulated and real ChIP-chip data. For replicates with different levels of enrichment over background and coverage, we find that QCC reflects reproducibility more accurately and is more robust than the standard Pearson or Spearman correlation coefficients. The quantization and the merging procedure can also suggest a proper quantile threshold for separating signal from background for further analysis. To measure reproducibility of ChIP-chip data correctly, a correlation coefficient that is robust to the amount of signal present should be used. QCC is one such measure. The QCC statistic can also be applied in a variety of other contexts for measuring reproducibility, including analysis of array CGH data for DNA copy number and gene expression data.
Robust Statistical Approaches for RSS-Based Floor Detection in Indoor Localization.

PubMed

Razavi, Alireza; Valkama, Mikko; Lohan, Elena Simona

2016-05-31

Floor detection for indoor 3D localization of mobile devices is currently an important challenge in the wireless world. Many approaches currently exist, but usually the robustness of such approaches is not addressed or investigated. The goal of this paper is to show how to robustify the floor estimation when probabilistic approaches with a low number of parameters are employed. Indeed, such an approach would allow a building-independent estimation and a lower computing power at the mobile side. Four robustified algorithms are to be presented: a robust weighted centroid localization method, a robust linear trilateration method, a robust nonlinear trilateration method, and a robust deconvolution method. The proposed approaches use the received signal strengths (RSS) measured by the Mobile Station (MS) from various heard WiFi access points (APs) and provide an estimate of the vertical position of the MS, which can be used for floor detection. We will show that robustification can indeed increase the performance of the RSS-based floor detection algorithms.
Effect size measures in a two-independent-samples case with nonnormal and nonhomogeneous data.

PubMed

Li, Johnson Ching-Hong

2016-12-01

In psychological science, the "new statistics" refer to the new statistical practices that focus on effect size (ES) evaluation instead of conventional null-hypothesis significance testing (Cumming, Psychological Science, 25, 7-29, 2014). In a two-independent-samples scenario, Cohen's (1988) standardized mean difference (d) is the most popular ES, but its accuracy relies on two assumptions: normality and homogeneity of variances. Five other ESs-the unscaled robust d (d r * ; Hogarty & Kromrey, 2001), scaled robust d (d r ; Algina, Keselman, & Penfield, Psychological Methods, 10, 317-328, 2005), point-biserial correlation (r pb ; McGrath & Meyer, Psychological Methods, 11, 386-401, 2006), common-language ES (CL; Cliff, Psychological Bulletin, 114, 494-509, 1993), and nonparametric estimator for CL (A w ; Ruscio, Psychological Methods, 13, 19-30, 2008)-may be robust to violations of these assumptions, but no study has systematically evaluated their performance. Thus, in this simulation study the performance of these six ESs was examined across five factors: data distribution, sample, base rate, variance ratio, and sample size. The results showed that A w and d r were generally robust to these violations, and A w slightly outperformed d r . Implications for the use of A w and d r in real-world research are discussed.
Robust multiscale prediction of Po River discharge using a twofold AR-NN approach

NASA Astrophysics Data System (ADS)

Alessio, Silvia; Taricco, Carla; Rubinetti, Sara; Zanchettin, Davide; Rubino, Angelo; Mancuso, Salvatore

2017-04-01

The Mediterranean area is among the regions most exposed to hydroclimatic changes, with a likely increase of frequency and duration of droughts in the last decades and potentially substantial future drying according to climate projections. However, significant decadal variability is often superposed or even dominates these long-term hydrological trend as observed, for instance, in North Italian precipitation and river discharge records. The capability to accurately predict such decadal changes is, therefore, of utmost environmental and social importance. In order to forecast short and noisy hydroclimatic time series, we apply a twofold statistical approach that we improved with respect to previous works [1]. Our prediction strategy consists in the application of two independent methods that use autoregressive models and feed-forward neural networks. Since all prediction methods work better on clean signals, the predictions are not performed directly on the series, but rather on each significant variability components extracted with Singular Spectrum Analysis (SSA). In this contribution, we will illustrate the multiscale prediction approach and its application to the case of decadal prediction of annual-average Po River discharges (Italy). The discharge record is available for the last 209 years and allows to work with both interannual and decadal time-scale components. Fifteen-year forecasts obtained with both methods robustly indicate a prominent dry period in the second half of the 2020s. We will discuss advantages and limitations of the proposed statistical approach in the light of the current capabilities of decadal climate prediction systems based on numerical climate models, toward an integrated dynamical and statistical approach for the interannual-to-decadal prediction of hydroclimate variability in medium-size river basins. [1] Alessio et. al., Natural variability and anthropogenic effects in a Central Mediterranean core, Clim. of the Past, 8, 831-839, 2012.
Investigation of Magnetotelluric Source Effect Based on Twenty Years of Telluric and Geomagnetic Observation

NASA Astrophysics Data System (ADS)

Kis, A.; Lemperger, I.; Wesztergom, V.; Menvielle, M.; Szalai, S.; Novák, A.; Hada, T.; Matsukiyo, S.; Lethy, A. M.

2016-12-01

Magnetotelluric method is widely applied for investigation of subsurface structures by imaging the spatial distribution of electric conductivity. The method is based on the experimental determination of surface electromagnetic impedance tensor (Z) by surface geomagnetic and telluric registrations in two perpendicular orientation. In practical explorations the accurate estimation of Z necessitates the application of robust statistical methods for two reasons:1) the geomagnetic and telluric time series' are contaminated by man-made noise components and2) the non-homogeneous behavior of ionospheric current systems in the period range of interest (ELF-ULF and longer periods) results in systematic deviation of the impedance of individual time windows.Robust statistics manage both load of Z for the purpose of subsurface investigations. However, accurate analysis of the long term temporal variation of the first and second statistical moments of Z may provide valuable information about the characteristics of the ionospheric source current systems. Temporal variation of extent, spatial variability and orientation of the ionospheric source currents has specific effects on the surface impedance tensor. Twenty year long geomagnetic and telluric recordings of the Nagycenk Geophysical Observatory provides unique opportunity to reconstruct the so called magnetotelluric source effect and obtain information about the spatial and temporal behavior of ionospheric source currents at mid-latitudes. Detailed investigation of time series of surface electromagnetic impedance tensor has been carried out in different frequency classes of the ULF range. The presentation aims to provide a brief review of our results related to long term periodic modulations, up to solar cycle scale and about eventual deviations of the electromagnetic impedance and so the reconstructed equivalent ionospheric source effects.
Robustness Analysis and Optimally Robust Control Design via Sum-of-Squares

NASA Technical Reports Server (NTRS)

Dorobantu, Andrei; Crespo, Luis G.; Seiler, Peter J.

2012-01-01

A control analysis and design framework is proposed for systems subject to parametric uncertainty. The underlying strategies are based on sum-of-squares (SOS) polynomial analysis and nonlinear optimization to design an optimally robust controller. The approach determines a maximum uncertainty range for which the closed-loop system satisfies a set of stability and performance requirements. These requirements, de ned as inequality constraints on several metrics, are restricted to polynomial functions of the uncertainty. To quantify robustness, SOS analysis is used to prove that the closed-loop system complies with the requirements for a given uncertainty range. The maximum uncertainty range, calculated by assessing a sequence of increasingly larger ranges, serves as a robustness metric for the closed-loop system. To optimize the control design, nonlinear optimization is used to enlarge the maximum uncertainty range by tuning the controller gains. Hence, the resulting controller is optimally robust to parametric uncertainty. This approach balances the robustness margins corresponding to each requirement in order to maximize the aggregate system robustness. The proposed framework is applied to a simple linear short-period aircraft model with uncertain aerodynamic coefficients.
Robust algebraic image enhancement for intelligent control systems

NASA Technical Reports Server (NTRS)

Lerner, Bao-Ting; Morrelli, Michael

1993-01-01

Robust vision capability for intelligent control systems has been an elusive goal in image processing. The computationally intensive techniques a necessary for conventional image processing make real-time applications, such as object tracking and collision avoidance difficult. In order to endow an intelligent control system with the needed vision robustness, an adequate image enhancement subsystem capable of compensating for the wide variety of real-world degradations, must exist between the image capturing and the object recognition subsystems. This enhancement stage must be adaptive and must operate with consistency in the presence of both statistical and shape-based noise. To deal with this problem, we have developed an innovative algebraic approach which provides a sound mathematical framework for image representation and manipulation. Our image model provides a natural platform from which to pursue dynamic scene analysis, and its incorporation into a vision system would serve as the front-end to an intelligent control system. We have developed a unique polynomial representation of gray level imagery and applied this representation to develop polynomial operators on complex gray level scenes. This approach is highly advantageous since polynomials can be manipulated very easily, and are readily understood, thus providing a very convenient environment for image processing. Our model presents a highly structured and compact algebraic representation of grey-level images which can be viewed as fuzzy sets.
A Bayesian blind survey for cold molecular gas in the Universe

NASA Astrophysics Data System (ADS)

Lentati, L.; Carilli, C.; Alexander, P.; Walter, F.; Decarli, R.

2014-10-01

A new Bayesian method for performing an image domain search for line-emitting galaxies is presented. The method uses both spatial and spectral information to robustly determine the source properties, employing either simple Gaussian, or other physically motivated models whilst using the evidence to determine the probability that the source is real. In this paper, we describe the method, and its application to both a simulated data set, and a blind survey for cold molecular gas using observations of the Hubble Deep Field-North taken with the Plateau de Bure Interferometer. We make a total of six robust detections in the survey, five of which have counterparts in other observing bands. We identify the most secure detections found in a previous investigation, while finding one new probable line source with an optical ID not seen in the previous analysis. This study acts as a pilot application of Bayesian statistics to future searches to be carried out both for low-J CO transitions of high-redshift galaxies using the Jansky Very Large Array (JVLA), and at millimetre wavelengths with Atacama Large Millimeter/submillimeter Array (ALMA), enabling the inference of robust scientific conclusions about the history of the molecular gas properties of star-forming galaxies in the Universe through cosmic time.
A novel bi-level meta-analysis approach: applied to biological pathway analysis.

PubMed

Nguyen, Tin; Tagett, Rebecca; Donato, Michele; Mitrea, Cristina; Draghici, Sorin

2016-02-01

The accumulation of high-throughput data in public repositories creates a pressing need for integrative analysis of multiple datasets from independent experiments. However, study heterogeneity, study bias, outliers and the lack of power of available methods present real challenge in integrating genomic data. One practical drawback of many P-value-based meta-analysis methods, including Fisher's, Stouffer's, minP and maxP, is that they are sensitive to outliers. Another drawback is that, because they perform just one statistical test for each individual experiment, they may not fully exploit the potentially large number of samples within each study. We propose a novel bi-level meta-analysis approach that employs the additive method and the Central Limit Theorem within each individual experiment and also across multiple experiments. We prove that the bi-level framework is robust against bias, less sensitive to outliers than other methods, and more sensitive to small changes in signal. For comparative analysis, we demonstrate that the intra-experiment analysis has more power than the equivalent statistical test performed on a single large experiment. For pathway analysis, we compare the proposed framework versus classical meta-analysis approaches (Fisher's, Stouffer's and the additive method) as well as against a dedicated pathway meta-analysis package (MetaPath), using 1252 samples from 21 datasets related to three human diseases, acute myeloid leukemia (9 datasets), type II diabetes (5 datasets) and Alzheimer's disease (7 datasets). Our framework outperforms its competitors to correctly identify pathways relevant to the phenotypes. The framework is sufficiently general to be applied to any type of statistical meta-analysis. The R scripts are available on demand from the authors. sorin@wayne.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Statistical analysis and handling of missing data in cluster randomized trials: a systematic review.

PubMed

Fiero, Mallorie H; Huang, Shuang; Oren, Eyal; Bell, Melanie L

2016-02-09

Cluster randomized trials (CRTs) randomize participants in groups, rather than as individuals and are key tools used to assess interventions in health research where treatment contamination is likely or if individual randomization is not feasible. Two potential major pitfalls exist regarding CRTs, namely handling missing data and not accounting for clustering in the primary analysis. The aim of this review was to evaluate approaches for handling missing data and statistical analysis with respect to the primary outcome in CRTs. We systematically searched for CRTs published between August 2013 and July 2014 using PubMed, Web of Science, and PsycINFO. For each trial, two independent reviewers assessed the extent of the missing data and method(s) used for handling missing data in the primary and sensitivity analyses. We evaluated the primary analysis and determined whether it was at the cluster or individual level. Of the 86 included CRTs, 80 (93%) trials reported some missing outcome data. Of those reporting missing data, the median percent of individuals with a missing outcome was 19% (range 0.5 to 90%). The most common way to handle missing data in the primary analysis was complete case analysis (44, 55%), whereas 18 (22%) used mixed models, six (8%) used single imputation, four (5%) used unweighted generalized estimating equations, and two (2%) used multiple imputation. Fourteen (16%) trials reported a sensitivity analysis for missing data, but most assumed the same missing data mechanism as in the primary analysis. Overall, 67 (78%) trials accounted for clustering in the primary analysis. High rates of missing outcome data are present in the majority of CRTs, yet handling missing data in practice remains suboptimal. Researchers and applied statisticians should carry out appropriate missing data methods, which are valid under plausible assumptions in order to increase statistical power in trials and reduce the possibility of bias. Sensitivity analysis should be performed, with weakened assumptions regarding the missing data mechanism to explore the robustness of results reported in the primary analysis.
Improved Doubly Robust Estimation when Data are Monotonely Coarsened, with Application to Longitudinal Studies with Dropout

PubMed Central

Tsiatis, Anastasios A.; Davidian, Marie; Cao, Weihua

2010-01-01

Summary A routine challenge is that of making inference on parameters in a statistical model of interest from longitudinal data subject to drop out, which are a special case of the more general setting of monotonely coarsened data. Considerable recent attention has focused on doubly robust estimators, which in this context involve positing models for both the missingness (more generally, coarsening) mechanism and aspects of the distribution of the full data, that have the appealing property of yielding consistent inferences if only one of these models is correctly specified. Doubly robust estimators have been criticized for potentially disastrous performance when both of these models are even only mildly misspecified. We propose a doubly robust estimator applicable in general monotone coarsening problems that achieves comparable or improved performance relative to existing doubly robust methods, which we demonstrate via simulation studies and by application to data from an AIDS clinical trial. PMID:20731640

Evaluation of Ares-I Control System Robustness to Uncertain Aerodynamics and Flex Dynamics

NASA Technical Reports Server (NTRS)

Jang, Jiann-Woei; VanTassel, Chris; Bedrossian, Nazareth; Hall, Charles; Spanos, Pol

2008-01-01

This paper discusses the application of robust control theory to evaluate robustness of the Ares-I control systems. Three techniques for estimating upper and lower bounds of uncertain parameters which yield stable closed-loop response are used here: (1) Monte Carlo analysis, (2) mu analysis, and (3) characteristic frequency response analysis. All three methods are used to evaluate stability envelopes of the Ares-I control systems with uncertain aerodynamics and flex dynamics. The results show that characteristic frequency response analysis is the most effective of these methods for assessing robustness.
Comparative Efficacy of Tongxinluo Capsule and Beta-Blockers in Treating Angina Pectoris: Meta-Analysis of Randomized Controlled Trials.

PubMed

Jia, Yongliang; Leung, Siu-wai

2015-11-01

There have been no systematic reviews, let alone meta-analyses, of randomized controlled trials (RCTs) comparing tongxinluo capsule (TXL) and beta-blockers in treating angina pectoris. This study aimed to evaluate the efficacy of TXL and beta-blockers in treating angina pectoris by a meta-analysis of eligible RCTs. The RCTs comparing TXL with beta-blockers (including metoprolol) in treating angina pectoris were searched and retrieved from databases including PubMed, Chinese National Knowledge Infrastructure, and WanFang Data. Eligible RCTs were selected according to prespecified criteria. Meta-analysis was performed on the odds ratios (OR) of symptomatic and electrocardiographic (ECG) improvements after treatment. Subgroup analysis, sensitivity analysis, meta-regression, and publication biases analysis were conducted to evaluate the robustness of the results. Seventy-three RCTs published between 2000 and 2014 with 7424 participants were eligible. Overall ORs comparing TXL with beta-blockers were 3.40 (95% confidence interval [CI], 2.97-3.89; p<0.0001) for symptomatic improvement and 2.63 (95% CI, 2.29-3.02; p<0.0001) for ECG improvement. Subgroup analysis and sensitivity analysis found no statistically significant dependence of overall ORs on specific study characteristics except efficacy criteria. Meta-regression found no significant except sample sizes for data on symptomatic improvement. Publication biases were statistically significant. TXL seems to be more effective than beta-blockers in treating angina pectoris, on the basis of the eligible RCTs. Further RCTs are warranted to reduce publication bias and verify efficacy.
Understanding Evaluation of Learning Support in Mathematics and Statistics

ERIC Educational Resources Information Center

MacGillivray, Helen; Croft, Tony

2011-01-01

With rapid and continuing growth of learning support initiatives in mathematics and statistics found in many parts of the world, and with the likelihood that this trend will continue, there is a need to ensure that robust and coherent measures are in place to evaluate the effectiveness of these initiatives. The nature of learning support brings…
Forensic Discrimination of Latent Fingerprints Using Laser-Induced Breakdown Spectroscopy (LIBS) and Chemometric Approaches.

PubMed

Yang, Jun-Ho; Yoh, Jack J

2018-01-01

A novel technique is reported for separating overlapping latent fingerprints using chemometric approaches that combine laser-induced breakdown spectroscopy (LIBS) and multivariate analysis. The LIBS technique provides the capability of real time analysis and high frequency scanning as well as the data regarding the chemical composition of overlapping latent fingerprints. These spectra offer valuable information for the classification and reconstruction of overlapping latent fingerprints by implementing appropriate statistical multivariate analysis. The current study employs principal component analysis and partial least square methods for the classification of latent fingerprints from the LIBS spectra. This technique was successfully demonstrated through a classification study of four distinct latent fingerprints using classification methods such as soft independent modeling of class analogy (SIMCA) and partial least squares discriminant analysis (PLS-DA). The novel method yielded an accuracy of more than 85% and was proven to be sufficiently robust. Furthermore, through laser scanning analysis at a spatial interval of 125 µm, the overlapping fingerprints were reconstructed as separate two-dimensional forms.
Origin of the correlations between exit times in pedestrian flows through a bottleneck

NASA Astrophysics Data System (ADS)

Nicolas, Alexandre; Touloupas, Ioannis

2018-01-01

Robust statistical features have emerged from the microscopic analysis of dense pedestrian flows through a bottleneck, notably with respect to the time gaps between successive passages. We pinpoint the mechanisms at the origin of these features thanks to simple models that we develop and analyse quantitatively. We disprove the idea that anticorrelations between successive time gaps (i.e. an alternation between shorter ones and longer ones) are a hallmark of a zipper-like intercalation of pedestrian lines and show that they simply result from the possibility that pedestrians from distinct ‘lines’ or directions cross the bottleneck within a short time interval. A second feature concerns the bursts of escapes, i.e. egresses that come in fast succession. Despite the ubiquity of exponential distributions of burst sizes, entailed by a Poisson process, we argue that anomalous (power-law) statistics arise if the bottleneck is nearly congested, albeit only in a tiny portion of parameter space. The generality of the proposed mechanisms implies that similar statistical features should also be observed for other types of particulate flows.
Intelligent Condition Diagnosis Method Based on Adaptive Statistic Test Filter and Diagnostic Bayesian Network

PubMed Central

Li, Ke; Zhang, Qiuju; Wang, Kun; Chen, Peng; Wang, Huaqing

2016-01-01

A new fault diagnosis method for rotating machinery based on adaptive statistic test filter (ASTF) and Diagnostic Bayesian Network (DBN) is presented in this paper. ASTF is proposed to obtain weak fault features under background noise, ASTF is based on statistic hypothesis testing in the frequency domain to evaluate similarity between reference signal (noise signal) and original signal, and remove the component of high similarity. The optimal level of significance α is obtained using particle swarm optimization (PSO). To evaluate the performance of the ASTF, evaluation factor Ipq is also defined. In addition, a simulation experiment is designed to verify the effectiveness and robustness of ASTF. A sensitive evaluation method using principal component analysis (PCA) is proposed to evaluate the sensitiveness of symptom parameters (SPs) for condition diagnosis. By this way, the good SPs that have high sensitiveness for condition diagnosis can be selected. A three-layer DBN is developed to identify condition of rotation machinery based on the Bayesian Belief Network (BBN) theory. Condition diagnosis experiment for rolling element bearings demonstrates the effectiveness of the proposed method. PMID:26761006
Intelligent Condition Diagnosis Method Based on Adaptive Statistic Test Filter and Diagnostic Bayesian Network.

PubMed

Li, Ke; Zhang, Qiuju; Wang, Kun; Chen, Peng; Wang, Huaqing

2016-01-08

A new fault diagnosis method for rotating machinery based on adaptive statistic test filter (ASTF) and Diagnostic Bayesian Network (DBN) is presented in this paper. ASTF is proposed to obtain weak fault features under background noise, ASTF is based on statistic hypothesis testing in the frequency domain to evaluate similarity between reference signal (noise signal) and original signal, and remove the component of high similarity. The optimal level of significance α is obtained using particle swarm optimization (PSO). To evaluate the performance of the ASTF, evaluation factor Ipq is also defined. In addition, a simulation experiment is designed to verify the effectiveness and robustness of ASTF. A sensitive evaluation method using principal component analysis (PCA) is proposed to evaluate the sensitiveness of symptom parameters (SPs) for condition diagnosis. By this way, the good SPs that have high sensitiveness for condition diagnosis can be selected. A three-layer DBN is developed to identify condition of rotation machinery based on the Bayesian Belief Network (BBN) theory. Condition diagnosis experiment for rolling element bearings demonstrates the effectiveness of the proposed method.
Fluctuations and Noise in Stochastic Spread of Respiratory Infection Epidemics in Social Networks

NASA Astrophysics Data System (ADS)

Yulmetyev, Renat; Emelyanova, Natalya; Demin, Sergey; Gafarov, Fail; Hänggi, Peter; Yulmetyeva, Dinara

2003-05-01

For the analysis of epidemic and disease dynamics complexity, it is necessary to understand the basic principles and notions of its spreading in long-time memory media. Here we considering the problem from a theoretical and practical viewpoint, presenting the quantitative evidence confirming the existence of stochastic long-range memory and robust chaos in a real time series of respiratory infections of human upper respiratory track. In this work we present a new statistical method of analyzing the spread of grippe and acute respiratory track infections epidemic process of human upper respiratory track by means of the theory of discrete non-Markov stochastic processes. We use the results of our recent theory (Phys. Rev. E 65, 046107 (2002)) for the study of statistical effects of memory in real data series, describing the epidemic dynamics of human acute respiratory track infections and grippe. The obtained results testify to an opportunity of the strict quantitative description of the regular and stochastic components in epidemic dynamics of social networks with a view to time discreteness and effects of statistical memory.
Automated finite element modeling of the lumbar spine: Using a statistical shape model to generate a virtual population of models.

PubMed

Campbell, J Q; Petrella, A J

2016-09-06

Population-based modeling of the lumbar spine has the potential to be a powerful clinical tool. However, developing a fully parameterized model of the lumbar spine with accurate geometry has remained a challenge. The current study used automated methods for landmark identification to create a statistical shape model of the lumbar spine. The shape model was evaluated using compactness, generalization ability, and specificity. The primary shape modes were analyzed visually, quantitatively, and biomechanically. The biomechanical analysis was performed by using the statistical shape model with an automated method for finite element model generation to create a fully parameterized finite element model of the lumbar spine. Functional finite element models of the mean shape and the extreme shapes (±3 standard deviations) of all 17 shape modes were created demonstrating the robust nature of the methods. This study represents an advancement in finite element modeling of the lumbar spine and will allow population-based modeling in the future. Copyright © 2016 Elsevier Ltd. All rights reserved.
QSAR models for anti-malarial activity of 4-aminoquinolines.

PubMed

Masand, Vijay H; Toropov, Andrey A; Toropova, Alla P; Mahajan, Devidas T

2014-03-01

In the present study, predictive quantitative structure - activity relationship (QSAR) models for anti-malarial activity of 4-aminoquinolines have been developed. CORAL, which is freely available on internet (http://www.insilico.eu/coral), has been used as a tool of QSAR analysis to establish statistically robust QSAR model of anti-malarial activity of 4-aminoquinolines. Six random splits into the visible sub-system of the training and invisible subsystem of validation were examined. Statistical qualities for these splits vary, but in all these cases, statistical quality of prediction for anti-malarial activity was quite good. The optimal SMILES-based descriptor was used to derive the single descriptor based QSAR model for a data set of 112 aminoquinolones. All the splits had r(2)> 0.85 and r(2)> 0.78 for subtraining and validation sets, respectively. The three parametric multilinear regression (MLR) QSAR model has Q(2) = 0.83, R(2) = 0.84 and F = 190.39. The anti-malarial activity has strong correlation with presence/absence of nitrogen and oxygen at a topological distance of six.
Evaluating performances of simplified physically based landslide susceptibility models.

NASA Astrophysics Data System (ADS)

Capparelli, Giovanna; Formetta, Giuseppe; Versace, Pasquale

2015-04-01

Rainfall induced shallow landslides cause significant damages involving loss of life and properties. Prediction of shallow landslides susceptible locations is a complex task that involves many disciplines: hydrology, geotechnical science, geomorphology, and statistics. Usually to accomplish this task two main approaches are used: statistical or physically based model. This paper presents a package of GIS based models for landslide susceptibility analysis. It was integrated in the NewAge-JGrass hydrological model using the Object Modeling System (OMS) modeling framework. The package includes three simplified physically based models for landslides susceptibility analysis (M1, M2, and M3) and a component for models verifications. It computes eight goodness of fit indices (GOF) by comparing pixel-by-pixel model results and measurements data. Moreover, the package integration in NewAge-JGrass allows the use of other components such as geographic information system tools to manage inputs-output processes, and automatic calibration algorithms to estimate model parameters. The system offers the possibility to investigate and fairly compare the quality and the robustness of models and models parameters, according a procedure that includes: i) model parameters estimation by optimizing each of the GOF index separately, ii) models evaluation in the ROC plane by using each of the optimal parameter set, and iii) GOF robustness evaluation by assessing their sensitivity to the input parameter variation. This procedure was repeated for all three models. The system was applied for a case study in Calabria (Italy) along the Salerno-Reggio Calabria highway, between Cosenza and Altilia municipality. The analysis provided that among all the optimized indices and all the three models, Average Index (AI) optimization coupled with model M3 is the best modeling solution for our test case. This research was funded by PON Project No. 01_01503 "Integrated Systems for Hydrogeological Risk Monitoring, Early Warning and Mitigation Along the Main Lifelines", CUP B31H11000370005, in the framework of the National Operational Program for "Research and Competitiveness" 2007-2013.
Sampling design considerations for demographic studies: a case of colonial seabirds

USGS Publications Warehouse

Kendall, William L.; Converse, Sarah J.; Doherty, Paul F.; Naughton, Maura B.; Anders, Angela; Hines, James E.; Flint, Elizabeth

2009-01-01

For the purposes of making many informed conservation decisions, the main goal for data collection is to assess population status and allow prediction of the consequences of candidate management actions. Reducing the bias and variance of estimates of population parameters reduces uncertainty in population status and projections, thereby reducing the overall uncertainty under which a population manager must make a decision. In capture-recapture studies, imperfect detection of individuals, unobservable life-history states, local movement outside study areas, and tag loss can cause bias or precision problems with estimates of population parameters. Furthermore, excessive disturbance to individuals during capture?recapture sampling may be of concern because disturbance may have demographic consequences. We address these problems using as an example a monitoring program for Black-footed Albatross (Phoebastria nigripes) and Laysan Albatross (Phoebastria immutabilis) nesting populations in the northwestern Hawaiian Islands. To mitigate these estimation problems, we describe a synergistic combination of sampling design and modeling approaches. Solutions include multiple capture periods per season and multistate, robust design statistical models, dead recoveries and incidental observations, telemetry and data loggers, buffer areas around study plots to neutralize the effect of local movements outside study plots, and double banding and statistical models that account for band loss. We also present a variation on the robust capture?recapture design and a corresponding statistical model that minimizes disturbance to individuals. For the albatross case study, this less invasive robust design was more time efficient and, when used in combination with a traditional robust design, reduced the standard error of detection probability by 14% with only two hours of additional effort in the field. These field techniques and associated modeling approaches are applicable to studies of most taxa being marked and in some cases have individually been applied to studies of birds, fish, herpetofauna, and mammals.
Can power-law scaling and neuronal avalanches arise from stochastic dynamics?

PubMed

Touboul, Jonathan; Destexhe, Alain

2010-02-11

The presence of self-organized criticality in biology is often evidenced by a power-law scaling of event size distributions, which can be measured by linear regression on logarithmic axes. We show here that such a procedure does not necessarily mean that the system exhibits self-organized criticality. We first provide an analysis of multisite local field potential (LFP) recordings of brain activity and show that event size distributions defined as negative LFP peaks can be close to power-law distributions. However, this result is not robust to change in detection threshold, or when tested using more rigorous statistical analyses such as the Kolmogorov-Smirnov test. Similar power-law scaling is observed for surrogate signals, suggesting that power-law scaling may be a generic property of thresholded stochastic processes. We next investigate this problem analytically, and show that, indeed, stochastic processes can produce spurious power-law scaling without the presence of underlying self-organized criticality. However, this power-law is only apparent in logarithmic representations, and does not survive more rigorous analysis such as the Kolmogorov-Smirnov test. The same analysis was also performed on an artificial network known to display self-organized criticality. In this case, both the graphical representations and the rigorous statistical analysis reveal with no ambiguity that the avalanche size is distributed as a power-law. We conclude that logarithmic representations can lead to spurious power-law scaling induced by the stochastic nature of the phenomenon. This apparent power-law scaling does not constitute a proof of self-organized criticality, which should be demonstrated by more stringent statistical tests.
Characterization of Strong Light-Matter Coupling in Semiconductor Quantum-Dot Microcavities via Photon-Statistics Spectroscopy

NASA Astrophysics Data System (ADS)

Schneebeli, L.; Kira, M.; Koch, S. W.

2008-08-01

It is shown that spectrally resolved photon-statistics measurements of the resonance fluorescence from realistic semiconductor quantum-dot systems allow for high contrast identification of the two-photon strong-coupling states. Using a microscopic theory, the second-rung resonance of Jaynes-Cummings ladder is analyzed and optimum excitation conditions are determined. The computed photon-statistics spectrum displays gigantic, experimentally robust resonances at the energetic positions of the second-rung emission.
Robust spike classification based on frequency domain neural waveform features.

PubMed

Yang, Chenhui; Yuan, Yuan; Si, Jennie

2013-12-01

We introduce a new spike classification algorithm based on frequency domain features of the spike snippets. The goal for the algorithm is to provide high classification accuracy, low false misclassification, ease of implementation, robustness to signal degradation, and objectivity in classification outcomes. In this paper, we propose a spike classification algorithm based on frequency domain features (CFDF). It makes use of frequency domain contents of the recorded neural waveforms for spike classification. The self-organizing map (SOM) is used as a tool to determine the cluster number intuitively and directly by viewing the SOM output map. After that, spike classification can be easily performed using clustering algorithms such as the k-Means. In conjunction with our previously developed multiscale correlation of wavelet coefficient (MCWC) spike detection algorithm, we show that the MCWC and CFDF detection and classification system is robust when tested on several sets of artificial and real neural waveforms. The CFDF is comparable to or outperforms some popular automatic spike classification algorithms with artificial and real neural data. The detection and classification of neural action potentials or neural spikes is an important step in single-unit-based neuroscientific studies and applications. After the detection of neural snippets potentially containing neural spikes, a robust classification algorithm is applied for the analysis of the snippets to (1) extract similar waveforms into one class for them to be considered coming from one unit, and to (2) remove noise snippets if they do not contain any features of an action potential. Usually, a snippet is a small 2 or 3 ms segment of the recorded waveform, and differences in neural action potentials can be subtle from one unit to another. Therefore, a robust, high performance classification system like the CFDF is necessary. In addition, the proposed algorithm does not require any assumptions on statistical properties of the noise and proves to be robust under noise contamination.
Real-time movement detection and analysis for video surveillance applications

NASA Astrophysics Data System (ADS)

Hueber, Nicolas; Hennequin, Christophe; Raymond, Pierre; Moeglin, Jean-Pierre

2014-06-01

Pedestrian movement along critical infrastructures like pipes, railways or highways, is of major interest in surveillance applications as well as its behavior in urban environment. The goal is to anticipate illicit or dangerous human activities. For this purpose, we propose an all-in-one small autonomous system which delivers high level statistics and reports alerts in specific cases. This situational awareness project leads us to manage efficiently the scene by performing movement analysis. A dynamic background extraction algorithm is developed to reach the degree of robustness against natural and urban environment perturbations and also to match the embedded implementation constraints. When changes are detected in the scene, specific patterns are applied to detect and highlight relevant movements. Depending on the applications, specific descriptors can be extracted and fused in order to reach a high level of interpretation. In this paper, our approach is applied to two operational use cases: pedestrian urban statistics and railway surveillance. In the first case, a grid of prototypes is deployed over a city centre to collect pedestrian movement statistics up to a macroscopic level of analysis. The results demonstrate the relevance of the delivered information; in particular, the flow density map highlights pedestrian preferential paths along the streets. In the second case, one prototype is set next to high speed train tracks to secure the area. The results exhibit a low false alarm rate and assess our approach of a large sensor network for delivering a precise operational picture without overwhelming a supervisor.
[Analysis of the technical efficiency of hospitals in the Spanish National Health Service].

PubMed

Pérez-Romero, Carmen; Ortega-Díaz, M Isabel; Ocaña-Riola, Ricardo; Martín-Martín, José Jesús

To analyse the technical efficiency and productivity of general hospitals in the Spanish National Health Service (NHS) (2010-2012) and identify explanatory hospital and regional variables. 230 NHS hospitals were analysed by data envelopment analysis for overall, technical and scale efficiency, and Malmquist index. The robustness of the analysis is contrasted with alternative input-output models. A fixed effects multilevel cross-sectional linear model was used to analyse the explanatory efficiency variables. The average rate of overall technical efficiency (OTE) was 0.736 in 2012; there was considerable variability by region. Malmquist index (2010-2012) is 1.013. A 23% variability in OTE is attributable to the region in question. Statistically significant exogenous variables (residents per 100 physicians, aging index, average annual income per household, essential public service expenditure and public health expenditure per capita) explain 42% of the OTE variability between hospitals and 64% between regions. The number of residents showed a statistically significant relationship. As regards regions, there is a statistically significant direct linear association between OTE and annual income per capita and essential public service expenditure, and an indirect association with the aging index and annual public health expenditure per capita. The significant room for improvement in the efficiency of hospitals is conditioned by region-specific characteristics, specifically aging, wealth and the public expenditure policies of each one. Copyright © 2016 SESPAS. Publicado por Elsevier España, S.L.U. All rights reserved.
Fully automatic and precise data analysis developed for time-of-flight mass spectrometry.

PubMed

Meyer, Stefan; Riedo, Andreas; Neuland, Maike B; Tulej, Marek; Wurz, Peter

2017-09-01

Scientific objectives of current and future space missions are focused on the investigation of the origin and evolution of the solar system with the particular emphasis on habitability and signatures of past and present life. For in situ measurements of the chemical composition of solid samples on planetary surfaces, the neutral atmospheric gas and the thermal plasma of planetary atmospheres, the application of mass spectrometers making use of time-of-flight mass analysers is a technique widely used. However, such investigations imply measurements with good statistics and, thus, a large amount of data to be analysed. Therefore, faster and especially robust automated data analysis with enhanced accuracy is required. In this contribution, an automatic data analysis software, which allows fast and precise quantitative data analysis of time-of-flight mass spectrometric data, is presented and discussed in detail. A crucial part of this software is a robust and fast peak finding algorithm with a consecutive numerical integration method allowing precise data analysis. We tested our analysis software with data from different time-of-flight mass spectrometers and different measurement campaigns thereof. The quantitative analysis of isotopes, using automatic data analysis, yields results with an accuracy of isotope ratios up to 100 ppm for a signal-to-noise ratio (SNR) of 10 4 . We show that the accuracy of isotope ratios is in fact proportional to SNR -1 . Furthermore, we observe that the accuracy of isotope ratios is inversely proportional to the mass resolution. Additionally, we show that the accuracy of isotope ratios is depending on the sample width T s by T s 0.5 . Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Safety and efficacy of Cerebrolysin in early post-stroke recovery: a meta-analysis of nine randomized clinical trials.

PubMed

Bornstein, Natan M; Guekht, Alla; Vester, Johannes; Heiss, Wolf-Dieter; Gusev, Eugene; Hömberg, Volker; Rahlfs, Volker W; Bajenaru, Ovidiu; Popescu, Bogdan O; Muresanu, Dafin

2018-04-01

This meta-analysis combines the results of nine ischemic stroke trials, assessing efficacy of Cerebrolysin on global neurological improvement during early post-stroke period. Cerebrolysin is a parenterally administered neuropeptide preparation approved for treatment of stroke. All included studies had a prospective, randomized, double-blind, placebo-controlled design. The patients were treated with 30-50 ml Cerebrolysin once daily for 10-21 days, with treatment initiation within 72 h after onset of ischemic stroke. For five studies, original analysis data were available for meta-analysis (individual patient data analysis); for four studies, aggregate data were used. The combination by meta-analytic procedures was pre-planned and the methods of synthesis were pre-defined under blinded conditions. Search deadline for the present meta-analysis was December 31, 2016. The nonparametric Mann-Whitney (MW) effect size for National Institutes of Health Stroke Scale (NIHSS) on day 30 (or 21), combining the results of nine randomized, controlled trials by means of the robust Wei-Lachin pooling procedure (maximin-efficient robust test), indicated superiority of Cerebrolysin as compared with placebo (MW 0.60, P < 0.0001, N = 1879). The combined number needed to treat for clinically relevant changes in early NIHSS was 7.7 (95% CI 5.2 to 15.0). The additional full-scale ordinal analysis of modified Rankin Scale at day 90 in moderate to severe patients resulted in MW 0.61 with statistical significance in favor of Cerebrolysin (95% CI 0.52 to 0.69, P = 0.0118, N = 314). Safety aspects were comparable to placebo. Our meta-analysis confirms previous evidence that Cerebrolysin has a beneficial effect on early global neurological deficits in patients with acute ischemic stroke.
Objective definition of rosette shape variation using a combined computer vision and data mining approach.

PubMed

Camargo, Anyela; Papadopoulou, Dimitra; Spyropoulou, Zoi; Vlachonasios, Konstantinos; Doonan, John H; Gay, Alan P

2014-01-01

Computer-vision based measurements of phenotypic variation have implications for crop improvement and food security because they are intrinsically objective. It should be possible therefore to use such approaches to select robust genotypes. However, plants are morphologically complex and identification of meaningful traits from automatically acquired image data is not straightforward. Bespoke algorithms can be designed to capture and/or quantitate specific features but this approach is inflexible and is not generally applicable to a wide range of traits. In this paper, we have used industry-standard computer vision techniques to extract a wide range of features from images of genetically diverse Arabidopsis rosettes growing under non-stimulated conditions, and then used statistical analysis to identify those features that provide good discrimination between ecotypes. This analysis indicates that almost all the observed shape variation can be described by 5 principal components. We describe an easily implemented pipeline including image segmentation, feature extraction and statistical analysis. This pipeline provides a cost-effective and inherently scalable method to parameterise and analyse variation in rosette shape. The acquisition of images does not require any specialised equipment and the computer routines for image processing and data analysis have been implemented using open source software. Source code for data analysis is written using the R package. The equations to calculate image descriptors have been also provided.

Topologically protected modes in non-equilibrium stochastic systems.

PubMed

Murugan, Arvind; Vaikuntanathan, Suriyanarayanan

2017-01-10

Non-equilibrium driving of biophysical processes is believed to enable their robust functioning despite the presence of thermal fluctuations and other sources of disorder. Such robust functions include sensory adaptation, enhanced enzymatic specificity and maintenance of coherent oscillations. Elucidating the relation between energy consumption and organization remains an important and open question in non-equilibrium statistical mechanics. Here we report that steady states of systems with non-equilibrium fluxes can support topologically protected boundary modes that resemble similar modes in electronic and mechanical systems. Akin to their electronic and mechanical counterparts, topological-protected boundary steady states in non-equilibrium systems are robust and are largely insensitive to local perturbations. We argue that our work provides a framework for how biophysical systems can use non-equilibrium driving to achieve robust function.
Deformable image registration as a tool to improve survival prediction after neoadjuvant chemotherapy for breast cancer: results from the ACRIN 6657/I-SPY-1 trial

NASA Astrophysics Data System (ADS)

Jahani, Nariman; Cohen, Eric; Hsieh, Meng-Kang; Weinstein, Susan P.; Pantalone, Lauren; Davatzikos, Christos; Kontos, Despina

2018-02-01

We examined the ability of DCE-MRI longitudinal features to give early prediction of recurrence-free survival (RFS) in women undergoing neoadjuvant chemotherapy for breast cancer, in a retrospective analysis of 106 women from the ISPY 1 cohort. These features were based on the voxel-wise changes seen in registered images taken before treatment and after the first round of chemotherapy. We computed the transformation field using a robust deformable image registration technique to match breast images from these two visits. Using the deformation field, parametric response maps (PRM) — a voxel-based feature analysis of longitudinal changes in images between visits — was computed for maps of four kinetic features (signal enhancement ratio, peak enhancement, and wash-in/wash-out slopes). A two-level discrete wavelet transform was applied to these PRMs to extract heterogeneity information about tumor change between visits. To estimate survival, a Cox proportional hazard model was applied with the C statistic as the measure of success in predicting RFS. The best PRM feature (as determined by C statistic in univariable analysis) was determined for each of the four kinetic features. The baseline model, incorporating functional tumor volume, age, race, and hormone response status, had a C statistic of 0.70 in predicting RFS. The model augmented with the four PRM features had a C statistic of 0.76. Thus, our results suggest that adding information on the texture of voxel-level changes in tumor kinetic response between registered images of first and second visits could improve early RFS prediction in breast cancer after neoadjuvant chemotherapy.
Novel Methods for Analysing Bacterial Tracks Reveal Persistence in Rhodobacter sphaeroides

PubMed Central

Rosser, Gabriel; Fletcher, Alexander G.; Wilkinson, David A.; de Beyer, Jennifer A.; Yates, Christian A.; Armitage, Judith P.; Maini, Philip K.; Baker, Ruth E.

2013-01-01

Tracking bacteria using video microscopy is a powerful experimental approach to probe their motile behaviour. The trajectories obtained contain much information relating to the complex patterns of bacterial motility. However, methods for the quantitative analysis of such data are limited. Most swimming bacteria move in approximately straight lines, interspersed with random reorientation phases. It is therefore necessary to segment observed tracks into swimming and reorientation phases to extract useful statistics. We present novel robust analysis tools to discern these two phases in tracks. Our methods comprise a simple and effective protocol for removing spurious tracks from tracking datasets, followed by analysis based on a two-state hidden Markov model, taking advantage of the availability of mutant strains that exhibit swimming-only or reorientating-only motion to generate an empirical prior distribution. Using simulated tracks with varying levels of added noise, we validate our methods and compare them with an existing heuristic method. To our knowledge this is the first example of a systematic assessment of analysis methods in this field. The new methods are substantially more robust to noise and introduce less systematic bias than the heuristic method. We apply our methods to tracks obtained from the bacterial species Rhodobacter sphaeroides and Escherichia coli. Our results demonstrate that R. sphaeroides exhibits persistence over the course of a tumbling event, which is a novel result with important implications in the study of this and similar species. PMID:24204227
Advances in computer simulation of genome evolution: toward more realistic evolutionary genomics analysis by approximate bayesian computation.

PubMed

Arenas, Miguel

2015-04-01

NGS technologies present a fast and cheap generation of genomic data. Nevertheless, ancestral genome inference is not so straightforward due to complex evolutionary processes acting on this material such as inversions, translocations, and other genome rearrangements that, in addition to their implicit complexity, can co-occur and confound ancestral inferences. Recently, models of genome evolution that accommodate such complex genomic events are emerging. This letter explores these novel evolutionary models and proposes their incorporation into robust statistical approaches based on computer simulations, such as approximate Bayesian computation, that may produce a more realistic evolutionary analysis of genomic data. Advantages and pitfalls in using these analytical methods are discussed. Potential applications of these ancestral genomic inferences are also pointed out.
DIGE Analysis of Human Tissues.

PubMed

Gelfi, Cecilia; Capitanio, Daniele

2018-01-01

Two-dimensional difference gel electrophoresis (2-D DIGE) is an advanced and elegant gel electrophoretic analytical tool for comparative protein assessment. It is based on two-dimensional gel electrophoresis (2-DE) separation of fluorescently labeled protein extracts. The tagging procedures are designed to not interfere with the chemical properties of proteins with respect to their pI and electrophoretic mobility, once a proper labeling protocol is followed. The two-dye or three-dye systems can be adopted and their choice depends on specific applications. Furthermore, the use of an internal pooled standard makes 2-D DIGE a highly accurate quantitative method enabling multiple protein samples to be separated on the same two-dimensional gel. The image matching and cross-gel statistical analysis generates robust quantitative results making data validation by independent technologies successful.
Statistical analysis and machine learning algorithms for optical biopsy

NASA Astrophysics Data System (ADS)

Wu, Binlin; Liu, Cheng-hui; Boydston-White, Susie; Beckman, Hugh; Sriramoju, Vidyasagar; Sordillo, Laura; Zhang, Chunyuan; Zhang, Lin; Shi, Lingyan; Smith, Jason; Bailin, Jacob; Alfano, Robert R.

2018-02-01

Analyzing spectral or imaging data collected with various optical biopsy methods is often times difficult due to the complexity of the biological basis. Robust methods that can utilize the spectral or imaging data and detect the characteristic spectral or spatial signatures for different types of tissue is challenging but highly desired. In this study, we used various machine learning algorithms to analyze a spectral dataset acquired from human skin normal and cancerous tissue samples using resonance Raman spectroscopy with 532nm excitation. The algorithms including principal component analysis, nonnegative matrix factorization, and autoencoder artificial neural network are used to reduce dimension of the dataset and detect features. A support vector machine with a linear kernel is used to classify the normal tissue and cancerous tissue samples. The efficacies of the methods are compared.
The power and limits of a rule-based morpho-semantic parser.

PubMed Central

Baud, R. H.; Rassinoux, A. M.; Ruch, P.; Lovis, C.; Scherrer, J. R.

1999-01-01

The venue of Electronic Patient Record (EPR) implies an increasing amount of medical texts readily available for processing, as soon as convenient tools are made available. The chief application is text analysis, from which one can drive other disciplines like indexing for retrieval, knowledge representation, translation and inferencing for medical intelligent systems. Prerequisites for a convenient analyzer of medical texts are: building the lexicon, developing semantic representation of the domain, having a large corpus of texts available for statistical analysis, and finally mastering robust and powerful parsing techniques in order to satisfy the constraints of the medical domain. This article aims at presenting an easy-to-use parser ready to be adapted in different settings. It describes its power together with its practical limitations as experienced by the authors. PMID:10566313
The power and limits of a rule-based morpho-semantic parser.

PubMed

Baud, R H; Rassinoux, A M; Ruch, P; Lovis, C; Scherrer, J R

1999-01-01

The venue of Electronic Patient Record (EPR) implies an increasing amount of medical texts readily available for processing, as soon as convenient tools are made available. The chief application is text analysis, from which one can drive other disciplines like indexing for retrieval, knowledge representation, translation and inferencing for medical intelligent systems. Prerequisites for a convenient analyzer of medical texts are: building the lexicon, developing semantic representation of the domain, having a large corpus of texts available for statistical analysis, and finally mastering robust and powerful parsing techniques in order to satisfy the constraints of the medical domain. This article aims at presenting an easy-to-use parser ready to be adapted in different settings. It describes its power together with its practical limitations as experienced by the authors.
Implementing informative priors for heterogeneity in meta-analysis using meta-regression and pseudo data.

PubMed

Rhodes, Kirsty M; Turner, Rebecca M; White, Ian R; Jackson, Dan; Spiegelhalter, David J; Higgins, Julian P T

2016-12-20

Many meta-analyses combine results from only a small number of studies, a situation in which the between-study variance is imprecisely estimated when standard methods are applied. Bayesian meta-analysis allows incorporation of external evidence on heterogeneity, providing the potential for more robust inference on the effect size of interest. We present a method for performing Bayesian meta-analysis using data augmentation, in which we represent an informative conjugate prior for between-study variance by pseudo data and use meta-regression for estimation. To assist in this, we derive predictive inverse-gamma distributions for the between-study variance expected in future meta-analyses. These may serve as priors for heterogeneity in new meta-analyses. In a simulation study, we compare approximate Bayesian methods using meta-regression and pseudo data against fully Bayesian approaches based on importance sampling techniques and Markov chain Monte Carlo (MCMC). We compare the frequentist properties of these Bayesian methods with those of the commonly used frequentist DerSimonian and Laird procedure. The method is implemented in standard statistical software and provides a less complex alternative to standard MCMC approaches. An importance sampling approach produces almost identical results to standard MCMC approaches, and results obtained through meta-regression and pseudo data are very similar. On average, data augmentation provides closer results to MCMC, if implemented using restricted maximum likelihood estimation rather than DerSimonian and Laird or maximum likelihood estimation. The methods are applied to real datasets, and an extension to network meta-analysis is described. The proposed method facilitates Bayesian meta-analysis in a way that is accessible to applied researchers. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.
Substituting values for censored data from Texas, USA, reservoirs inflated and obscured trends in analyses commonly used for water quality target development.

PubMed

Grantz, Erin; Haggard, Brian; Scott, J Thad

2018-06-12

We calculated four median datasets (chlorophyll a, Chl a; total phosphorus, TP; and transparency) using multiple approaches to handling censored observations, including substituting fractions of the quantification limit (QL; dataset 1 = 1QL, dataset 2 = 0.5QL) and statistical methods for censored datasets (datasets 3-4) for approximately 100 Texas, USA reservoirs. Trend analyses of differences between dataset 1 and 3 medians indicated percent difference increased linearly above thresholds in percent censored data (%Cen). This relationship was extrapolated to estimate medians for site-parameter combinations with %Cen > 80%, which were combined with dataset 3 as dataset 4. Changepoint analysis of Chl a- and transparency-TP relationships indicated threshold differences up to 50% between datasets. Recursive analysis identified secondary thresholds in dataset 4. Threshold differences show that information introduced via substitution or missing due to limitations of statistical methods biased values, underestimated error, and inflated the strength of TP thresholds identified in datasets 1-3. Analysis of covariance identified differences in linear regression models relating transparency-TP between datasets 1, 2, and the more statistically robust datasets 3-4. Study findings identify high-risk scenarios for biased analytical outcomes when using substitution. These include high probability of median overestimation when %Cen > 50-60% for a single QL, or when %Cen is as low 16% for multiple QL's. Changepoint analysis was uniquely vulnerable to substitution effects when using medians from sites with %Cen > 50%. Linear regression analysis was less sensitive to substitution and missing data effects, but differences in model parameters for transparency cannot be discounted and could be magnified by log-transformation of the variables.
Food Choice Questionnaire (FCQ) revisited. Suggestions for the development of an enhanced general food motivation model.

PubMed

Fotopoulos, Christos; Krystallis, Athanasios; Vassallo, Marco; Pagiaslis, Anastasios

2009-02-01

Recognising the need for a more statistically robust instrument to investigate general food selection determinants, the research validates and confirms Food Choice Questionnaire (FCQ's) factorial design, develops ad hoc a more robust FCQ version and tests its ability to discriminate between consumer segments in terms of the importance they assign to the FCQ motivational factors. The original FCQ appears to represent a comprehensive and reliable research instrument. However, the empirical data do not support the robustness of its 9-factorial design. On the other hand, segmentation results at the subpopulation level based on the enhanced FCQ version bring about an optimistic message for the FCQ's ability to predict food selection behaviour. The paper concludes that some of the basic components of the original FCQ can be used as a basis for a new general food motivation typology. The development of such a new instrument, with fewer, of higher abstraction FCQ-based dimensions and fewer items per dimension, is a right step forward; yet such a step should be theory-driven, while a rigorous statistical testing across and within population would be necessary.
A spatially informative optic flow model of bee colony with saccadic flight strategy for global optimization.

PubMed

Das, Swagatam; Biswas, Subhodip; Panigrahi, Bijaya K; Kundu, Souvik; Basu, Debabrota

2014-10-01

This paper presents a novel search metaheuristic inspired from the physical interpretation of the optic flow of information in honeybees about the spatial surroundings that help them orient themselves and navigate through search space while foraging. The interpreted behavior combined with the minimal foraging is simulated by the artificial bee colony algorithm to develop a robust search technique that exhibits elevated performance in multidimensional objective space. Through detailed experimental study and rigorous analysis, we highlight the statistical superiority enjoyed by our algorithm over a wide variety of functions as compared to some highly competitive state-of-the-art methods.
A microscopic model of the Stokes-Einstein relation in arbitrary dimension.

PubMed

Charbonneau, Benoit; Charbonneau, Patrick; Szamel, Grzegorz

2018-06-14

The Stokes-Einstein relation (SER) is one of the most robust and widely employed results from the theory of liquids. Yet sizable deviations can be observed for self-solvation, which cannot be explained by the standard hydrodynamic derivation. Here, we revisit the work of Masters and Madden [J. Chem. Phys. 74, 2450-2459 (1981)], who first solved a statistical mechanics model of the SER using the projection operator formalism. By generalizing their analysis to all spatial dimensions and to partially structured solvents, we identify a potential microscopic origin of some of these deviations. We also reproduce the SER-like result from the exact dynamics of infinite-dimensional fluids.
A stochastic simulation method for the assessment of resistive random access memory retention reliability

DOE Office of Scientific and Technical Information (OSTI.GOV)

Berco, Dan, E-mail: danny.barkan@gmail.com; Tseng, Tseung-Yuen, E-mail: tseng@cc.nctu.edu.tw

This study presents an evaluation method for resistive random access memory retention reliability based on the Metropolis Monte Carlo algorithm and Gibbs free energy. The method, which does not rely on a time evolution, provides an extremely efficient way to compare the relative retention properties of metal-insulator-metal structures. It requires a small number of iterations and may be used for statistical analysis. The presented approach is used to compare the relative robustness of a single layer ZrO{sub 2} device with a double layer ZnO/ZrO{sub 2} one, and obtain results which are in good agreement with experimental data.
Immunophenotype Discovery, Hierarchical Organization, and Template-Based Classification of Flow Cytometry Samples

DOE PAGES

Azad, Ariful; Rajwa, Bartek; Pothen, Alex

2016-08-31

We describe algorithms for discovering immunophenotypes from large collections of flow cytometry samples and using them to organize the samples into a hierarchy based on phenotypic similarity. The hierarchical organization is helpful for effective and robust cytometry data mining, including the creation of collections of cell populations’ characteristic of different classes of samples, robust classification, and anomaly detection. We summarize a set of samples belonging to a biological class or category with a statistically derived template for the class. Whereas individual samples are represented in terms of their cell populations (clusters), a template consists of generic meta-populations (a group ofmore » homogeneous cell populations obtained from the samples in a class) that describe key phenotypes shared among all those samples. We organize an FC data collection in a hierarchical data structure that supports the identification of immunophenotypes relevant to clinical diagnosis. A robust template-based classification scheme is also developed, but our primary focus is in the discovery of phenotypic signatures and inter-sample relationships in an FC data collection. This collective analysis approach is more efficient and robust since templates describe phenotypic signatures common to cell populations in several samples while ignoring noise and small sample-specific variations. We have applied the template-based scheme to analyze several datasets, including one representing a healthy immune system and one of acute myeloid leukemia (AML) samples. The last task is challenging due to the phenotypic heterogeneity of the several subtypes of AML. However, we identified thirteen immunophenotypes corresponding to subtypes of AML and were able to distinguish acute promyelocytic leukemia (APL) samples with the markers provided. Clinically, this is helpful since APL has a different treatment regimen from other subtypes of AML. Core algorithms used in our data analysis are available in the flowMatch package at www.bioconductor.org. It has been downloaded nearly 6,000 times since 2014.« less
Immunophenotype Discovery, Hierarchical Organization, and Template-Based Classification of Flow Cytometry Samples

DOE Office of Scientific and Technical Information (OSTI.GOV)

Azad, Ariful; Rajwa, Bartek; Pothen, Alex

We describe algorithms for discovering immunophenotypes from large collections of flow cytometry samples and using them to organize the samples into a hierarchy based on phenotypic similarity. The hierarchical organization is helpful for effective and robust cytometry data mining, including the creation of collections of cell populations’ characteristic of different classes of samples, robust classification, and anomaly detection. We summarize a set of samples belonging to a biological class or category with a statistically derived template for the class. Whereas individual samples are represented in terms of their cell populations (clusters), a template consists of generic meta-populations (a group ofmore » homogeneous cell populations obtained from the samples in a class) that describe key phenotypes shared among all those samples. We organize an FC data collection in a hierarchical data structure that supports the identification of immunophenotypes relevant to clinical diagnosis. A robust template-based classification scheme is also developed, but our primary focus is in the discovery of phenotypic signatures and inter-sample relationships in an FC data collection. This collective analysis approach is more efficient and robust since templates describe phenotypic signatures common to cell populations in several samples while ignoring noise and small sample-specific variations. We have applied the template-based scheme to analyze several datasets, including one representing a healthy immune system and one of acute myeloid leukemia (AML) samples. The last task is challenging due to the phenotypic heterogeneity of the several subtypes of AML. However, we identified thirteen immunophenotypes corresponding to subtypes of AML and were able to distinguish acute promyelocytic leukemia (APL) samples with the markers provided. Clinically, this is helpful since APL has a different treatment regimen from other subtypes of AML. Core algorithms used in our data analysis are available in the flowMatch package at www.bioconductor.org. It has been downloaded nearly 6,000 times since 2014.« less
Neural network uncertainty assessment using Bayesian statistics: a remote sensing application

NASA Technical Reports Server (NTRS)

Aires, F.; Prigent, C.; Rossow, W. B.

2004-01-01

Neural network (NN) techniques have proved successful for many regression problems, in particular for remote sensing; however, uncertainty estimates are rarely provided. In this article, a Bayesian technique to evaluate uncertainties of the NN parameters (i.e., synaptic weights) is first presented. In contrast to more traditional approaches based on point estimation of the NN weights, we assess uncertainties on such estimates to monitor the robustness of the NN model. These theoretical developments are illustrated by applying them to the problem of retrieving surface skin temperature, microwave surface emissivities, and integrated water vapor content from a combined analysis of satellite microwave and infrared observations over land. The weight uncertainty estimates are then used to compute analytically the uncertainties in the network outputs (i.e., error bars and correlation structure of these errors). Such quantities are very important for evaluating any application of an NN model. The uncertainties on the NN Jacobians are then considered in the third part of this article. Used for regression fitting, NN models can be used effectively to represent highly nonlinear, multivariate functions. In this situation, most emphasis is put on estimating the output errors, but almost no attention has been given to errors associated with the internal structure of the regression model. The complex structure of dependency inside the NN is the essence of the model, and assessing its quality, coherency, and physical character makes all the difference between a blackbox model with small output errors and a reliable, robust, and physically coherent model. Such dependency structures are described to the first order by the NN Jacobians: they indicate the sensitivity of one output with respect to the inputs of the model for given input data. We use a Monte Carlo integration procedure to estimate the robustness of the NN Jacobians. A regularization strategy based on principal component analysis is proposed to suppress the multicollinearities in order to make these Jacobians robust and physically meaningful.
Low power and type II errors in recent ophthalmology research.

PubMed

Khan, Zainab; Milko, Jordan; Iqbal, Munir; Masri, Moness; Almeida, David R P

2016-10-01

To investigate the power of unpaired t tests in prospective, randomized controlled trials when these tests failed to detect a statistically significant difference and to determine the frequency of type II errors. Systematic review and meta-analysis. We examined all prospective, randomized controlled trials published between 2010 and 2012 in 4 major ophthalmology journals (Archives of Ophthalmology, British Journal of Ophthalmology, Ophthalmology, and American Journal of Ophthalmology). Studies that used unpaired t tests were included. Power was calculated using the number of subjects in each group, standard deviations, and α = 0.05. The difference between control and experimental means was set to be (1) 20% and (2) 50% of the absolute value of the control's initial conditions. Power and Precision version 4.0 software was used to carry out calculations. Finally, the proportion of articles with type II errors was calculated. β = 0.3 was set as the largest acceptable value for the probability of type II errors. In total, 280 articles were screened. Final analysis included 50 prospective, randomized controlled trials using unpaired t tests. The median power of tests to detect a 50% difference between means was 0.9 and was the same for all 4 journals regardless of the statistical significance of the test. The median power of tests to detect a 20% difference between means ranged from 0.26 to 0.9 for the 4 journals. The median power of these tests to detect a 50% and 20% difference between means was 0.9 and 0.5 for tests that did not achieve statistical significance. A total of 14% and 57% of articles with negative unpaired t tests contained results with β > 0.3 when power was calculated for differences between means of 50% and 20%, respectively. A large portion of studies demonstrate high probabilities of type II errors when detecting small differences between means. The power to detect small difference between means varies across journals. It is, therefore, worthwhile for authors to mention the minimum clinically important difference for individual studies. Journals can consider publishing statistical guidelines for authors to use. Day-to-day clinical decisions rely heavily on the evidence base formed by the plethora of studies available to clinicians. Prospective, randomized controlled clinical trials are highly regarded as a robust study and are used to make important clinical decisions that directly affect patient care. The quality of study designs and statistical methods in major clinical journals is improving overtime, 1 and researchers and journals are being more attentive to statistical methodologies incorporated by studies. The results of well-designed ophthalmic studies with robust methodologies, therefore, have the ability to modify the ways in which diseases are managed. Copyright © 2016 Canadian Ophthalmological Society. Published by Elsevier Inc. All rights reserved.
New approach in the quantum statistical parton distribution

NASA Astrophysics Data System (ADS)

Sohaily, Sozha; Vaziri (Khamedi), Mohammad

2017-12-01

An attempt to find simple parton distribution functions (PDFs) based on quantum statistical approach is presented. The PDFs described by the statistical model have very interesting physical properties which help to understand the structure of partons. The longitudinal portion of distribution functions are given by applying the maximum entropy principle. An interesting and simple approach to determine the statistical variables exactly without fitting and fixing parameters is surveyed. Analytic expressions of the x-dependent PDFs are obtained in the whole x region [0, 1], and the computed distributions are consistent with the experimental observations. The agreement with experimental data, gives a robust confirm of our simple presented statistical model.
Differentiation of wines according to grape variety and geographical origin based on volatiles profiling using SPME-MS and SPME-GC/MS methods.

PubMed

Ziółkowska, Angelika; Wąsowicz, Erwin; Jeleń, Henryk H

2016-12-15

Among methods to detect wine adulteration, profiling volatiles is one with a great potential regarding robustness, analysis time and abundance of information for subsequent data treatment. Volatile fraction fingerprinting by solid-phase microextraction with direct analysis by mass spectrometry without compounds separation (SPME-MS) was used for differentiation of white as well as red wines. The aim was to differentiate between varieties used for wine production and to also differentiate wines by country of origin. The results obtained were compared to SPME-GC/MS analysis in which compounds were resolved by gas chromatography. For both approaches the same type of statistical procedure was used to compare samples: principal component analysis (PCA) followed by linear discriminant analysis (LDA). White wines (38) and red wines (41) representing different grape varieties and various regions of origin were analysed. SPME-MS proved to be advantageous in use due to better discrimination and higher sample throughput. Copyright © 2016 Elsevier Ltd. All rights reserved.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.