Sample records for robust statistical model

  1. New robust statistical procedures for the polytomous logistic regression models.

    PubMed

    Castilla, Elena; Ghosh, Abhik; Martin, Nirian; Pardo, Leandro

    2018-05-17

    This article derives a new family of estimators, namely the minimum density power divergence estimators, as a robust generalization of the maximum likelihood estimator for the polytomous logistic regression model. Based on these estimators, a family of Wald-type test statistics for linear hypotheses is introduced. Robustness properties of both the proposed estimators and the test statistics are theoretically studied through the classical influence function analysis. Appropriate real life examples are presented to justify the requirement of suitable robust statistical procedures in place of the likelihood based inference for the polytomous logistic regression model. The validity of the theoretical results established in the article are further confirmed empirically through suitable simulation studies. Finally, an approach for the data-driven selection of the robustness tuning parameter is proposed with empirical justifications. © 2018, The International Biometric Society.

  2. Scaled test statistics and robust standard errors for non-normal data in covariance structure analysis: a Monte Carlo study.

    PubMed

    Chou, C P; Bentler, P M; Satorra, A

    1991-11-01

    Research studying robustness of maximum likelihood (ML) statistics in covariance structure analysis has concluded that test statistics and standard errors are biased under severe non-normality. An estimation procedure known as asymptotic distribution free (ADF), making no distributional assumption, has been suggested to avoid these biases. Corrections to the normal theory statistics to yield more adequate performance have also been proposed. This study compares the performance of a scaled test statistic and robust standard errors for two models under several non-normal conditions and also compares these with the results from ML and ADF methods. Both ML and ADF test statistics performed rather well in one model and considerably worse in the other. In general, the scaled test statistic seemed to behave better than the ML test statistic and the ADF statistic performed the worst. The robust and ADF standard errors yielded more appropriate estimates of sampling variability than the ML standard errors, which were usually downward biased, in both models under most of the non-normal conditions. ML test statistics and standard errors were found to be quite robust to the violation of the normality assumption when data had either symmetric and platykurtic distributions, or non-symmetric and zero kurtotic distributions.

  3. Robust kernel representation with statistical local features for face recognition.

    PubMed

    Yang, Meng; Zhang, Lei; Shiu, Simon Chi-Keung; Zhang, David

    2013-06-01

    Factors such as misalignment, pose variation, and occlusion make robust face recognition a difficult problem. It is known that statistical features such as local binary pattern are effective for local feature extraction, whereas the recently proposed sparse or collaborative representation-based classification has shown interesting results in robust face recognition. In this paper, we propose a novel robust kernel representation model with statistical local features (SLF) for robust face recognition. Initially, multipartition max pooling is used to enhance the invariance of SLF to image registration error. Then, a kernel-based representation model is proposed to fully exploit the discrimination information embedded in the SLF, and robust regression is adopted to effectively handle the occlusion in face images. Extensive experiments are conducted on benchmark face databases, including extended Yale B, AR (A. Martinez and R. Benavente), multiple pose, illumination, and expression (multi-PIE), facial recognition technology (FERET), face recognition grand challenge (FRGC), and labeled faces in the wild (LFW), which have different variations of lighting, expression, pose, and occlusions, demonstrating the promising performance of the proposed method.

  4. Robustness of movement models: can models bridge the gap between temporal scales of data sets and behavioural processes?

    PubMed

    Schlägel, Ulrike E; Lewis, Mark A

    2016-12-01

    Discrete-time random walks and their extensions are common tools for analyzing animal movement data. In these analyses, resolution of temporal discretization is a critical feature. Ideally, a model both mirrors the relevant temporal scale of the biological process of interest and matches the data sampling rate. Challenges arise when resolution of data is too coarse due to technological constraints, or when we wish to extrapolate results or compare results obtained from data with different resolutions. Drawing loosely on the concept of robustness in statistics, we propose a rigorous mathematical framework for studying movement models' robustness against changes in temporal resolution. In this framework, we define varying levels of robustness as formal model properties, focusing on random walk models with spatially-explicit component. With the new framework, we can investigate whether models can validly be applied to data across varying temporal resolutions and how we can account for these different resolutions in statistical inference results. We apply the new framework to movement-based resource selection models, demonstrating both analytical and numerical calculations, as well as a Monte Carlo simulation approach. While exact robustness is rare, the concept of approximate robustness provides a promising new direction for analyzing movement models.

  5. Statistical robustness of machine-learning estimates for characterizing a groundwater-surface water system, Southland, New Zealand

    NASA Astrophysics Data System (ADS)

    Friedel, M. J.; Daughney, C.

    2016-12-01

    The development of a successful surface-groundwater management strategy depends on the quality of data provided for analysis. This study evaluates the statistical robustness when using a modified self-organizing map (MSOM) technique to estimate missing values for three hypersurface models: synoptic groundwater-surface water hydrochemistry, time-series of groundwater-surface water hydrochemistry, and mixed-survey (combination of groundwater-surface water hydrochemistry and lithologies) hydrostratigraphic unit data. These models of increasing complexity are developed and validated based on observations from the Southland region of New Zealand. In each case, the estimation method is sufficiently robust to cope with groundwater-surface water hydrochemistry vagaries due to sample size and extreme data insufficiency, even when >80% of the data are missing. The estimation of surface water hydrochemistry time series values enabled the evaluation of seasonal variation, and the imputation of lithologies facilitated the evaluation of hydrostratigraphic controls on groundwater-surface water interaction. The robust statistical results for groundwater-surface water models of increasing data complexity provide justification to apply the MSOM technique in other regions of New Zealand and abroad.

  6. Model Uncertainty and Robustness: A Computational Framework for Multimodel Analysis

    ERIC Educational Resources Information Center

    Young, Cristobal; Holsteen, Katherine

    2017-01-01

    Model uncertainty is pervasive in social science. A key question is how robust empirical results are to sensible changes in model specification. We present a new approach and applied statistical software for computational multimodel analysis. Our approach proceeds in two steps: First, we estimate the modeling distribution of estimates across all…

  7. Robust estimation approach for blind denoising.

    PubMed

    Rabie, Tamer

    2005-11-01

    This work develops a new robust statistical framework for blind image denoising. Robust statistics addresses the problem of estimation when the idealized assumptions about a system are occasionally violated. The contaminating noise in an image is considered as a violation of the assumption of spatial coherence of the image intensities and is treated as an outlier random variable. A denoised image is estimated by fitting a spatially coherent stationary image model to the available noisy data using a robust estimator-based regression method within an optimal-size adaptive window. The robust formulation aims at eliminating the noise outliers while preserving the edge structures in the restored image. Several examples demonstrating the effectiveness of this robust denoising technique are reported and a comparison with other standard denoising filters is presented.

  8. Robust Mean and Covariance Structure Analysis through Iteratively Reweighted Least Squares.

    ERIC Educational Resources Information Center

    Yuan, Ke-Hai; Bentler, Peter M.

    2000-01-01

    Adapts robust schemes to mean and covariance structures, providing an iteratively reweighted least squares approach to robust structural equation modeling. Each case is weighted according to its distance, based on first and second order moments. Test statistics and standard error estimators are given. (SLD)

  9. Detection of outliers in the response and explanatory variables of the simple circular regression model

    NASA Astrophysics Data System (ADS)

    Mahmood, Ehab A.; Rana, Sohel; Hussin, Abdul Ghapor; Midi, Habshah

    2016-06-01

    The circular regression model may contain one or more data points which appear to be peculiar or inconsistent with the main part of the model. This may be occur due to recording errors, sudden short events, sampling under abnormal conditions etc. The existence of these data points "outliers" in the data set cause lot of problems in the research results and the conclusions. Therefore, we should identify them before applying statistical analysis. In this article, we aim to propose a statistic to identify outliers in the both of the response and explanatory variables of the simple circular regression model. Our proposed statistic is robust circular distance RCDxy and it is justified by the three robust measurements such as proportion of detection outliers, masking and swamping rates.

  10. Model Robust Calibration: Method and Application to Electronically-Scanned Pressure Transducers

    NASA Technical Reports Server (NTRS)

    Walker, Eric L.; Starnes, B. Alden; Birch, Jeffery B.; Mays, James E.

    2010-01-01

    This article presents the application of a recently developed statistical regression method to the controlled instrument calibration problem. The statistical method of Model Robust Regression (MRR), developed by Mays, Birch, and Starnes, is shown to improve instrument calibration by reducing the reliance of the calibration on a predetermined parametric (e.g. polynomial, exponential, logarithmic) model. This is accomplished by allowing fits from the predetermined parametric model to be augmented by a certain portion of a fit to the residuals from the initial regression using a nonparametric (locally parametric) regression technique. The method is demonstrated for the absolute scale calibration of silicon-based pressure transducers.

  11. Evaluating optimal therapy robustness by virtual expansion of a sample population, with a case study in cancer immunotherapy

    PubMed Central

    Barish, Syndi; Ochs, Michael F.; Sontag, Eduardo D.; Gevertz, Jana L.

    2017-01-01

    Cancer is a highly heterogeneous disease, exhibiting spatial and temporal variations that pose challenges for designing robust therapies. Here, we propose the VEPART (Virtual Expansion of Populations for Analyzing Robustness of Therapies) technique as a platform that integrates experimental data, mathematical modeling, and statistical analyses for identifying robust optimal treatment protocols. VEPART begins with time course experimental data for a sample population, and a mathematical model fit to aggregate data from that sample population. Using nonparametric statistics, the sample population is amplified and used to create a large number of virtual populations. At the final step of VEPART, robustness is assessed by identifying and analyzing the optimal therapy (perhaps restricted to a set of clinically realizable protocols) across each virtual population. As proof of concept, we have applied the VEPART method to study the robustness of treatment response in a mouse model of melanoma subject to treatment with immunostimulatory oncolytic viruses and dendritic cell vaccines. Our analysis (i) showed that every scheduling variant of the experimentally used treatment protocol is fragile (nonrobust) and (ii) discovered an alternative region of dosing space (lower oncolytic virus dose, higher dendritic cell dose) for which a robust optimal protocol exists. PMID:28716945

  12. Robustness of fit indices to outliers and leverage observations in structural equation modeling.

    PubMed

    Yuan, Ke-Hai; Zhong, Xiaoling

    2013-06-01

    Normal-distribution-based maximum likelihood (NML) is the most widely used method in structural equation modeling (SEM), although practical data tend to be nonnormally distributed. The effect of nonnormally distributed data or data contamination on the normal-distribution-based likelihood ratio (LR) statistic is well understood due to many analytical and empirical studies. In SEM, fit indices are used as widely as the LR statistic. In addition to NML, robust procedures have been developed for more efficient and less biased parameter estimates with practical data. This article studies the effect of outliers and leverage observations on fit indices following NML and two robust methods. Analysis and empirical results indicate that good leverage observations following NML and one of the robust methods lead most fit indices to give more support to the substantive model. While outliers tend to make a good model superficially bad according to many fit indices following NML, they have little effect on those following the two robust procedures. Implications of the results to data analysis are discussed, and recommendations are provided regarding the use of estimation methods and interpretation of fit indices. (PsycINFO Database Record (c) 2013 APA, all rights reserved).

  13. A Statistical Analysis of Brain Morphology Using Wild Bootstrapping

    PubMed Central

    Ibrahim, Joseph G.; Tang, Niansheng; Rowe, Daniel B.; Hao, Xuejun; Bansal, Ravi; Peterson, Bradley S.

    2008-01-01

    Methods for the analysis of brain morphology, including voxel-based morphology and surface-based morphometries, have been used to detect associations between brain structure and covariates of interest, such as diagnosis, severity of disease, age, IQ, and genotype. The statistical analysis of morphometric measures usually involves two statistical procedures: 1) invoking a statistical model at each voxel (or point) on the surface of the brain or brain subregion, followed by mapping test statistics (e.g., t test) or their associated p values at each of those voxels; 2) correction for the multiple statistical tests conducted across all voxels on the surface of the brain region under investigation. We propose the use of new statistical methods for each of these procedures. We first use a heteroscedastic linear model to test the associations between the morphological measures at each voxel on the surface of the specified subregion (e.g., cortical or subcortical surfaces) and the covariates of interest. Moreover, we develop a robust test procedure that is based on a resampling method, called wild bootstrapping. This procedure assesses the statistical significance of the associations between a measure of given brain structure and the covariates of interest. The value of this robust test procedure lies in its computationally simplicity and in its applicability to a wide range of imaging data, including data from both anatomical and functional magnetic resonance imaging (fMRI). Simulation studies demonstrate that this robust test procedure can accurately control the family-wise error rate. We demonstrate the application of this robust test procedure to the detection of statistically significant differences in the morphology of the hippocampus over time across gender groups in a large sample of healthy subjects. PMID:17649909

  14. Modeling of a Robust Confidence Band for the Power Curve of a Wind Turbine.

    PubMed

    Hernandez, Wilmar; Méndez, Alfredo; Maldonado-Correa, Jorge L; Balleteros, Francisco

    2016-12-07

    Having an accurate model of the power curve of a wind turbine allows us to better monitor its operation and planning of storage capacity. Since wind speed and direction is of a highly stochastic nature, the forecasting of the power generated by the wind turbine is of the same nature as well. In this paper, a method for obtaining a robust confidence band containing the power curve of a wind turbine under test conditions is presented. Here, the confidence band is bound by two curves which are estimated using parametric statistical inference techniques. However, the observations that are used for carrying out the statistical analysis are obtained by using the binning method, and in each bin, the outliers are eliminated by using a censorship process based on robust statistical techniques. Then, the observations that are not outliers are divided into observation sets. Finally, both the power curve of the wind turbine and the two curves that define the robust confidence band are estimated using each of the previously mentioned observation sets.

  15. Modeling of a Robust Confidence Band for the Power Curve of a Wind Turbine

    PubMed Central

    Hernandez, Wilmar; Méndez, Alfredo; Maldonado-Correa, Jorge L.; Balleteros, Francisco

    2016-01-01

    Having an accurate model of the power curve of a wind turbine allows us to better monitor its operation and planning of storage capacity. Since wind speed and direction is of a highly stochastic nature, the forecasting of the power generated by the wind turbine is of the same nature as well. In this paper, a method for obtaining a robust confidence band containing the power curve of a wind turbine under test conditions is presented. Here, the confidence band is bound by two curves which are estimated using parametric statistical inference techniques. However, the observations that are used for carrying out the statistical analysis are obtained by using the binning method, and in each bin, the outliers are eliminated by using a censorship process based on robust statistical techniques. Then, the observations that are not outliers are divided into observation sets. Finally, both the power curve of the wind turbine and the two curves that define the robust confidence band are estimated using each of the previously mentioned observation sets. PMID:27941604

  16. Notes on power of normality tests of error terms in regression models

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Střelec, Luboš

    2015-03-10

    Normality is one of the basic assumptions in applying statistical procedures. For example in linear regression most of the inferential procedures are based on the assumption of normality, i.e. the disturbance vector is assumed to be normally distributed. Failure to assess non-normality of the error terms may lead to incorrect results of usual statistical inference techniques such as t-test or F-test. Thus, error terms should be normally distributed in order to allow us to make exact inferences. As a consequence, normally distributed stochastic errors are necessary in order to make a not misleading inferences which explains a necessity and importancemore » of robust tests of normality. Therefore, the aim of this contribution is to discuss normality testing of error terms in regression models. In this contribution, we introduce the general RT class of robust tests for normality, and present and discuss the trade-off between power and robustness of selected classical and robust normality tests of error terms in regression models.« less

  17. Development of a Comprehensive Digital Avionics Curriculum for the Aeronautical Engineer

    DTIC Science & Technology

    2006-03-01

    able to analyze and design aircraft and missile guidance and control systems, including feedback stabilization schemes and stochastic processes, using ...Uncertainty modeling for robust control; Robust closed-loop stability and performance; Robust H- infinity control; Robustness check using mu-analysis...Controlled feedback (reduces noise) 3. Statistical group response (reduce pressure toward conformity) When used as a tool to study a complex problem

  18. Robustness of location estimators under t-distributions: a literature review

    NASA Astrophysics Data System (ADS)

    Sumarni, C.; Sadik, K.; Notodiputro, K. A.; Sartono, B.

    2017-03-01

    The assumption of normality is commonly used in estimation of parameters in statistical modelling, but this assumption is very sensitive to outliers. The t-distribution is more robust than the normal distribution since the t-distributions have longer tails. The robustness measures of location estimators under t-distributions are reviewed and discussed in this paper. For the purpose of illustration we use the onion yield data which includes outliers as a case study and showed that the t model produces better fit than the normal model.

  19. Use of Robust z in Detecting Unstable Items in Item Response Theory Models

    ERIC Educational Resources Information Center

    Huynh, Huynh; Meyer, Patrick

    2010-01-01

    The first part of this paper describes the use of the robust z[subscript R] statistic to link test forms using the Rasch (or one-parameter logistic) model. The procedure is then extended to the two-parameter and three-parameter logistic and two-parameter partial credit (2PPC) models. A real set of data was used to illustrate the extension. The…

  20. GWAR: robust analysis and meta-analysis of genome-wide association studies.

    PubMed

    Dimou, Niki L; Tsirigos, Konstantinos D; Elofsson, Arne; Bagos, Pantelis G

    2017-05-15

    In the context of genome-wide association studies (GWAS), there is a variety of statistical techniques in order to conduct the analysis, but, in most cases, the underlying genetic model is usually unknown. Under these circumstances, the classical Cochran-Armitage trend test (CATT) is suboptimal. Robust procedures that maximize the power and preserve the nominal type I error rate are preferable. Moreover, performing a meta-analysis using robust procedures is of great interest and has never been addressed in the past. The primary goal of this work is to implement several robust methods for analysis and meta-analysis in the statistical package Stata and subsequently to make the software available to the scientific community. The CATT under a recessive, additive and dominant model of inheritance as well as robust methods based on the Maximum Efficiency Robust Test statistic, the MAX statistic and the MIN2 were implemented in Stata. Concerning MAX and MIN2, we calculated their asymptotic null distributions relying on numerical integration resulting in a great gain in computational time without losing accuracy. All the aforementioned approaches were employed in a fixed or a random effects meta-analysis setting using summary data with weights equal to the reciprocal of the combined cases and controls. Overall, this is the first complete effort to implement procedures for analysis and meta-analysis in GWAS using Stata. A Stata program and a web-server are freely available for academic users at http://www.compgen.org/tools/GWAR. pbagos@compgen.org. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  1. Robust biological parametric mapping: an improved technique for multimodal brain image analysis

    NASA Astrophysics Data System (ADS)

    Yang, Xue; Beason-Held, Lori; Resnick, Susan M.; Landman, Bennett A.

    2011-03-01

    Mapping the quantitative relationship between structure and function in the human brain is an important and challenging problem. Numerous volumetric, surface, region of interest and voxelwise image processing techniques have been developed to statistically assess potential correlations between imaging and non-imaging metrics. Recently, biological parametric mapping has extended the widely popular statistical parametric approach to enable application of the general linear model to multiple image modalities (both for regressors and regressands) along with scalar valued observations. This approach offers great promise for direct, voxelwise assessment of structural and functional relationships with multiple imaging modalities. However, as presented, the biological parametric mapping approach is not robust to outliers and may lead to invalid inferences (e.g., artifactual low p-values) due to slight mis-registration or variation in anatomy between subjects. To enable widespread application of this approach, we introduce robust regression and robust inference in the neuroimaging context of application of the general linear model. Through simulation and empirical studies, we demonstrate that our robust approach reduces sensitivity to outliers without substantial degradation in power. The robust approach and associated software package provides a reliable way to quantitatively assess voxelwise correlations between structural and functional neuroimaging modalities.

  2. Evaluation of a New Mean Scaled and Moment Adjusted Test Statistic for SEM

    ERIC Educational Resources Information Center

    Tong, Xiaoxiao; Bentler, Peter M.

    2013-01-01

    Recently a new mean scaled and skewness adjusted test statistic was developed for evaluating structural equation models in small samples and with potentially nonnormal data, but this statistic has received only limited evaluation. The performance of this statistic is compared to normal theory maximum likelihood and 2 well-known robust test…

  3. Improved Doubly Robust Estimation when Data are Monotonely Coarsened, with Application to Longitudinal Studies with Dropout

    PubMed Central

    Tsiatis, Anastasios A.; Davidian, Marie; Cao, Weihua

    2010-01-01

    Summary A routine challenge is that of making inference on parameters in a statistical model of interest from longitudinal data subject to drop out, which are a special case of the more general setting of monotonely coarsened data. Considerable recent attention has focused on doubly robust estimators, which in this context involve positing models for both the missingness (more generally, coarsening) mechanism and aspects of the distribution of the full data, that have the appealing property of yielding consistent inferences if only one of these models is correctly specified. Doubly robust estimators have been criticized for potentially disastrous performance when both of these models are even only mildly misspecified. We propose a doubly robust estimator applicable in general monotone coarsening problems that achieves comparable or improved performance relative to existing doubly robust methods, which we demonstrate via simulation studies and by application to data from an AIDS clinical trial. PMID:20731640

  4. Robust Linear Models for Cis-eQTL Analysis.

    PubMed

    Rantalainen, Mattias; Lindgren, Cecilia M; Holmes, Christopher C

    2015-01-01

    Expression Quantitative Trait Loci (eQTL) analysis enables characterisation of functional genetic variation influencing expression levels of individual genes. In outbread populations, including humans, eQTLs are commonly analysed using the conventional linear model, adjusting for relevant covariates, assuming an allelic dosage model and a Gaussian error term. However, gene expression data generally have noise that induces heavy-tailed errors relative to the Gaussian distribution and often include atypical observations, or outliers. Such departures from modelling assumptions can lead to an increased rate of type II errors (false negatives), and to some extent also type I errors (false positives). Careful model checking can reduce the risk of type-I errors but often not type II errors, since it is generally too time-consuming to carefully check all models with a non-significant effect in large-scale and genome-wide studies. Here we propose the application of a robust linear model for eQTL analysis to reduce adverse effects of deviations from the assumption of Gaussian residuals. We present results from a simulation study as well as results from the analysis of real eQTL data sets. Our findings suggest that in many situations robust models have the potential to provide more reliable eQTL results compared to conventional linear models, particularly in respect to reducing type II errors due to non-Gaussian noise. Post-genomic data, such as that generated in genome-wide eQTL studies, are often noisy and frequently contain atypical observations. Robust statistical models have the potential to provide more reliable results and increased statistical power under non-Gaussian conditions. The results presented here suggest that robust models should be considered routinely alongside other commonly used methodologies for eQTL analysis.

  5. Robust regression for large-scale neuroimaging studies.

    PubMed

    Fritsch, Virgile; Da Mota, Benoit; Loth, Eva; Varoquaux, Gaël; Banaschewski, Tobias; Barker, Gareth J; Bokde, Arun L W; Brühl, Rüdiger; Butzek, Brigitte; Conrod, Patricia; Flor, Herta; Garavan, Hugh; Lemaitre, Hervé; Mann, Karl; Nees, Frauke; Paus, Tomas; Schad, Daniel J; Schümann, Gunter; Frouin, Vincent; Poline, Jean-Baptiste; Thirion, Bertrand

    2015-05-01

    Multi-subject datasets used in neuroimaging group studies have a complex structure, as they exhibit non-stationary statistical properties across regions and display various artifacts. While studies with small sample sizes can rarely be shown to deviate from standard hypotheses (such as the normality of the residuals) due to the poor sensitivity of normality tests with low degrees of freedom, large-scale studies (e.g. >100 subjects) exhibit more obvious deviations from these hypotheses and call for more refined models for statistical inference. Here, we demonstrate the benefits of robust regression as a tool for analyzing large neuroimaging cohorts. First, we use an analytic test based on robust parameter estimates; based on simulations, this procedure is shown to provide an accurate statistical control without resorting to permutations. Second, we show that robust regression yields more detections than standard algorithms using as an example an imaging genetics study with 392 subjects. Third, we show that robust regression can avoid false positives in a large-scale analysis of brain-behavior relationships with over 1500 subjects. Finally we embed robust regression in the Randomized Parcellation Based Inference (RPBI) method and demonstrate that this combination further improves the sensitivity of tests carried out across the whole brain. Altogether, our results show that robust procedures provide important advantages in large-scale neuroimaging group studies. Copyright © 2015 Elsevier Inc. All rights reserved.

  6. Sampling design considerations for demographic studies: a case of colonial seabirds

    USGS Publications Warehouse

    Kendall, William L.; Converse, Sarah J.; Doherty, Paul F.; Naughton, Maura B.; Anders, Angela; Hines, James E.; Flint, Elizabeth

    2009-01-01

    For the purposes of making many informed conservation decisions, the main goal for data collection is to assess population status and allow prediction of the consequences of candidate management actions. Reducing the bias and variance of estimates of population parameters reduces uncertainty in population status and projections, thereby reducing the overall uncertainty under which a population manager must make a decision. In capture-recapture studies, imperfect detection of individuals, unobservable life-history states, local movement outside study areas, and tag loss can cause bias or precision problems with estimates of population parameters. Furthermore, excessive disturbance to individuals during capture?recapture sampling may be of concern because disturbance may have demographic consequences. We address these problems using as an example a monitoring program for Black-footed Albatross (Phoebastria nigripes) and Laysan Albatross (Phoebastria immutabilis) nesting populations in the northwestern Hawaiian Islands. To mitigate these estimation problems, we describe a synergistic combination of sampling design and modeling approaches. Solutions include multiple capture periods per season and multistate, robust design statistical models, dead recoveries and incidental observations, telemetry and data loggers, buffer areas around study plots to neutralize the effect of local movements outside study plots, and double banding and statistical models that account for band loss. We also present a variation on the robust capture?recapture design and a corresponding statistical model that minimizes disturbance to individuals. For the albatross case study, this less invasive robust design was more time efficient and, when used in combination with a traditional robust design, reduced the standard error of detection probability by 14% with only two hours of additional effort in the field. These field techniques and associated modeling approaches are applicable to studies of most taxa being marked and in some cases have individually been applied to studies of birds, fish, herpetofauna, and mammals.

  7. Standard and Robust Methods in Regression Imputation

    ERIC Educational Resources Information Center

    Moraveji, Behjat; Jafarian, Koorosh

    2014-01-01

    The aim of this paper is to provide an introduction of new imputation algorithms for estimating missing values from official statistics in larger data sets of data pre-processing, or outliers. The goal is to propose a new algorithm called IRMI (iterative robust model-based imputation). This algorithm is able to deal with all challenges like…

  8. The power and robustness of maximum LOD score statistics.

    PubMed

    Yoo, Y J; Mendell, N R

    2008-07-01

    The maximum LOD score statistic is extremely powerful for gene mapping when calculated using the correct genetic parameter value. When the mode of genetic transmission is unknown, the maximum of the LOD scores obtained using several genetic parameter values is reported. This latter statistic requires higher critical value than the maximum LOD score statistic calculated from a single genetic parameter value. In this paper, we compare the power of maximum LOD scores based on three fixed sets of genetic parameter values with the power of the LOD score obtained after maximizing over the entire range of genetic parameter values. We simulate family data under nine generating models. For generating models with non-zero phenocopy rates, LOD scores maximized over the entire range of genetic parameters yielded greater power than maximum LOD scores for fixed sets of parameter values with zero phenocopy rates. No maximum LOD score was consistently more powerful than the others for generating models with a zero phenocopy rate. The power loss of the LOD score maximized over the entire range of genetic parameters, relative to the maximum LOD score calculated using the correct genetic parameter value, appeared to be robust to the generating models.

  9. Segmenting lung fields in serial chest radiographs using both population-based and patient-specific shape statistics.

    PubMed

    Shi, Y; Qi, F; Xue, Z; Chen, L; Ito, K; Matsuo, H; Shen, D

    2008-04-01

    This paper presents a new deformable model using both population-based and patient-specific shape statistics to segment lung fields from serial chest radiographs. There are two novelties in the proposed deformable model. First, a modified scale invariant feature transform (SIFT) local descriptor, which is more distinctive than the general intensity and gradient features, is used to characterize the image features in the vicinity of each pixel. Second, the deformable contour is constrained by both population-based and patient-specific shape statistics, and it yields more robust and accurate segmentation of lung fields for serial chest radiographs. In particular, for segmenting the initial time-point images, the population-based shape statistics is used to constrain the deformable contour; as more subsequent images of the same patient are acquired, the patient-specific shape statistics online collected from the previous segmentation results gradually takes more roles. Thus, this patient-specific shape statistics is updated each time when a new segmentation result is obtained, and it is further used to refine the segmentation results of all the available time-point images. Experimental results show that the proposed method is more robust and accurate than other active shape models in segmenting the lung fields from serial chest radiographs.

  10. Design of robust flow processing networks with time-programmed responses

    NASA Astrophysics Data System (ADS)

    Kaluza, P.; Mikhailov, A. S.

    2012-04-01

    Can artificially designed networks reach the levels of robustness against local damage which are comparable with those of the biochemical networks of a living cell? We consider a simple model where the flow applied to an input node propagates through the network and arrives at different times to the output nodes, thus generating a pattern of coordinated responses. By using evolutionary optimization algorithms, functional networks - with required time-programmed responses - were constructed. Then, continuing the evolution, such networks were additionally optimized for robustness against deletion of individual nodes or links. In this manner, large ensembles of functional networks with different kinds of robustness were obtained, making statistical investigations and comparison of their structural properties possible. We have found that, generally, different architectures are needed for various kinds of robustness. The differences are statistically revealed, for example, in the Laplacian spectra of the respective graphs. On the other hand, motif distributions of robust networks do not differ from those of the merely functional networks; they are found to belong to the first Alon superfamily, the same as that of the gene transcription networks of single-cell organisms.

  11. Rigorous force field optimization principles based on statistical distance minimization

    DOE PAGES

    Vlcek, Lukas; Chialvo, Ariel A.

    2015-10-12

    We use the concept of statistical distance to define a measure of distinguishability between a pair of statistical mechanical systems, i.e., a model and its target, and show that its minimization leads to general convergence of the model’s static measurable properties to those of the target. Here we exploit this feature to define a rigorous basis for the development of accurate and robust effective molecular force fields that are inherently compatible with coarse-grained experimental data. The new model optimization principles and their efficient implementation are illustrated through selected examples, whose outcome demonstrates the higher robustness and predictive accuracy of themore » approach compared to other currently used methods, such as force matching and relative entropy minimization. We also discuss relations between the newly developed principles and established thermodynamic concepts, which include the Gibbs-Bogoliubov inequality and the thermodynamic length.« less

  12. Data, Model, Conclusions, Doing It Again.

    ERIC Educational Resources Information Center

    Molenaar, Ivo W.

    1998-01-01

    Explores the robustness of conclusions from a statistical model against variations in model choice with an illustration from G. Box and G. Tiao (1973). Suggests that simultaneous consideration of a class of models for the same data is sometimes superior to analyzing the data under one model and demonstrates advantages to Adaptive Bayesian…

  13. EVALUATION OF A NEW MEAN SCALED AND MOMENT ADJUSTED TEST STATISTIC FOR SEM.

    PubMed

    Tong, Xiaoxiao; Bentler, Peter M

    2013-01-01

    Recently a new mean scaled and skewness adjusted test statistic was developed for evaluating structural equation models in small samples and with potentially nonnormal data, but this statistic has received only limited evaluation. The performance of this statistic is compared to normal theory maximum likelihood and two well-known robust test statistics. A modification to the Satorra-Bentler scaled statistic is developed for the condition that sample size is smaller than degrees of freedom. The behavior of the four test statistics is evaluated with a Monte Carlo confirmatory factor analysis study that varies seven sample sizes and three distributional conditions obtained using Headrick's fifth-order transformation to nonnormality. The new statistic performs badly in most conditions except under the normal distribution. The goodness-of-fit χ(2) test based on maximum-likelihood estimation performed well under normal distributions as well as under a condition of asymptotic robustness. The Satorra-Bentler scaled test statistic performed best overall, while the mean scaled and variance adjusted test statistic outperformed the others at small and moderate sample sizes under certain distributional conditions.

  14. Robust Combining of Disparate Classifiers Through Order Statistics

    NASA Technical Reports Server (NTRS)

    Tumer, Kagan; Ghosh, Joydeep

    2001-01-01

    Integrating the outputs of multiple classifiers via combiners or meta-learners has led to substantial improvements in several difficult pattern recognition problems. In this article we investigate a family of combiners based on order statistics, for robust handling of situations where there are large discrepancies in performance of individual classifiers. Based on a mathematical modeling of how the decision boundaries are affected by order statistic combiners, we derive expressions for the reductions in error expected when simple output combination methods based on the the median, the maximum and in general, the ith order statistic, are used. Furthermore, we analyze the trim and spread combiners, both based on linear combinations of the ordered classifier outputs, and show that in the presence of uneven classifier performance, they often provide substantial gains over both linear and simple order statistics combiners. Experimental results on both real world data and standard public domain data sets corroborate these findings.

  15. Statistically Assessing Time-Averaged and Paleosecular Variation Field Models Against Paleomagnetic Directional Data Sets. Can Likely non-Zonal Features be Detected in a Robust way ?

    NASA Astrophysics Data System (ADS)

    Hulot, G.; Khokhlov, A.

    2007-12-01

    We recently introduced a method to rigorously test the statistical compatibility of combined time-averaged (TAF) and paleosecular variation (PSV) field models against any lava flow paleomagnetic database (Khokhlov et al., 2001, 2006). Applying this method to test (TAF+PSV) models against synthetic data produced from those shows that the method is very efficient at discriminating models, and very sensitive, provided those data errors are properly taken into account. This prompted us to test a variety of published combined (TAF+PSV) models against a test Bruhnes stable polarity data set extracted from the Quidelleur et al. (1994) data base. Not surprisingly, ignoring data errors leads all models to be rejected. But taking data errors into account leads to the stimulating conclusion that at least one (TAF+PSV) model appears to be compatible with the selected data set, this model being purely axisymmetric. This result shows that in practice also, and with the data bases currently available, the method can discriminate various candidate models and decide which actually best fits a given data set. But it also shows that likely non-zonal signatures of non-homogeneous boundary conditions imposed by the mantle are difficult to identify as statistically robust from paleomagnetic directional data sets. In the present paper, we will discuss the possibility that such signatures could eventually be identified as robust with the help of more recent data sets (such as the one put together under the collaborative "TAFI" effort, see e.g. Johnson et al. abstract #GP21A-0013, AGU Fall Meeting, 2005) or by taking additional information into account (such as the possible coincidence of non-zonal time-averaged field patterns with analogous patterns in the modern field).

  16. Sampling stored product insect pests: a comparison of four statistical sampling models for probability of pest detection

    USDA-ARS?s Scientific Manuscript database

    Statistically robust sampling strategies form an integral component of grain storage and handling activities throughout the world. Developing sampling strategies to target biological pests such as insects in stored grain is inherently difficult due to species biology and behavioral characteristics. ...

  17. Robust functional statistics applied to Probability Density Function shape screening of sEMG data.

    PubMed

    Boudaoud, S; Rix, H; Al Harrach, M; Marin, F

    2014-01-01

    Recent studies pointed out possible shape modifications of the Probability Density Function (PDF) of surface electromyographical (sEMG) data according to several contexts like fatigue and muscle force increase. Following this idea, criteria have been proposed to monitor these shape modifications mainly using High Order Statistics (HOS) parameters like skewness and kurtosis. In experimental conditions, these parameters are confronted with small sample size in the estimation process. This small sample size induces errors in the estimated HOS parameters restraining real-time and precise sEMG PDF shape monitoring. Recently, a functional formalism, the Core Shape Model (CSM), has been used to analyse shape modifications of PDF curves. In this work, taking inspiration from CSM method, robust functional statistics are proposed to emulate both skewness and kurtosis behaviors. These functional statistics combine both kernel density estimation and PDF shape distances to evaluate shape modifications even in presence of small sample size. Then, the proposed statistics are tested, using Monte Carlo simulations, on both normal and Log-normal PDFs that mimic observed sEMG PDF shape behavior during muscle contraction. According to the obtained results, the functional statistics seem to be more robust than HOS parameters to small sample size effect and more accurate in sEMG PDF shape screening applications.

  18. Comparing estimates of climate change impacts from process-based and statistical crop models

    NASA Astrophysics Data System (ADS)

    Lobell, David B.; Asseng, Senthold

    2017-01-01

    The potential impacts of climate change on crop productivity are of widespread interest to those concerned with addressing climate change and improving global food security. Two common approaches to assess these impacts are process-based simulation models, which attempt to represent key dynamic processes affecting crop yields, and statistical models, which estimate functional relationships between historical observations of weather and yields. Examples of both approaches are increasingly found in the scientific literature, although often published in different disciplinary journals. Here we compare published sensitivities to changes in temperature, precipitation, carbon dioxide (CO2), and ozone from each approach for the subset of crops, locations, and climate scenarios for which both have been applied. Despite a common perception that statistical models are more pessimistic, we find no systematic differences between the predicted sensitivities to warming from process-based and statistical models up to +2 °C, with limited evidence at higher levels of warming. For precipitation, there are many reasons why estimates could be expected to differ, but few estimates exist to develop robust comparisons, and precipitation changes are rarely the dominant factor for predicting impacts given the prominent role of temperature, CO2, and ozone changes. A common difference between process-based and statistical studies is that the former tend to include the effects of CO2 increases that accompany warming, whereas statistical models typically do not. Major needs moving forward include incorporating CO2 effects into statistical studies, improving both approaches’ treatment of ozone, and increasing the use of both methods within the same study. At the same time, those who fund or use crop model projections should understand that in the short-term, both approaches when done well are likely to provide similar estimates of warming impacts, with statistical models generally requiring fewer resources to produce robust estimates, especially when applied to crops beyond the major grains.

  19. Full-Duplex Bidirectional Secure Communications Under Perfect and Distributionally Ambiguous Eavesdropper's CSI

    NASA Astrophysics Data System (ADS)

    Li, Qiang; Zhang, Ying; Lin, Jingran; Wu, Sissi Xiaoxiao

    2017-09-01

    Consider a full-duplex (FD) bidirectional secure communication system, where two communication nodes, named Alice and Bob, simultaneously transmit and receive confidential information from each other, and an eavesdropper, named Eve, overhears the transmissions. Our goal is to maximize the sum secrecy rate (SSR) of the bidirectional transmissions by optimizing the transmit covariance matrices at Alice and Bob. To tackle this SSR maximization (SSRM) problem, we develop an alternating difference-of-concave (ADC) programming approach to alternately optimize the transmit covariance matrices at Alice and Bob. We show that the ADC iteration has a semi-closed-form beamforming solution, and is guaranteed to converge to a stationary solution of the SSRM problem. Besides the SSRM design, this paper also deals with a robust SSRM transmit design under a moment-based random channel state information (CSI) model, where only some roughly estimated first and second-order statistics of Eve's CSI are available, but the exact distribution or other high-order statistics is not known. This moment-based error model is new and different from the widely used bounded-sphere error model and the Gaussian random error model. Under the consider CSI error model, the robust SSRM is formulated as an outage probability-constrained SSRM problem. By leveraging the Lagrangian duality theory and DC programming, a tractable safe solution to the robust SSRM problem is derived. The effectiveness and the robustness of the proposed designs are demonstrated through simulations.

  20. A Statistical Approach Reveals Designs for the Most Robust Stochastic Gene Oscillators

    PubMed Central

    2016-01-01

    The engineering of transcriptional networks presents many challenges due to the inherent uncertainty in the system structure, changing cellular context, and stochasticity in the governing dynamics. One approach to address these problems is to design and build systems that can function across a range of conditions; that is they are robust to uncertainty in their constituent components. Here we examine the parametric robustness landscape of transcriptional oscillators, which underlie many important processes such as circadian rhythms and the cell cycle, plus also serve as a model for the engineering of complex and emergent phenomena. The central questions that we address are: Can we build genetic oscillators that are more robust than those already constructed? Can we make genetic oscillators arbitrarily robust? These questions are technically challenging due to the large model and parameter spaces that must be efficiently explored. Here we use a measure of robustness that coincides with the Bayesian model evidence, combined with an efficient Monte Carlo method to traverse model space and concentrate on regions of high robustness, which enables the accurate evaluation of the relative robustness of gene network models governed by stochastic dynamics. We report the most robust two and three gene oscillator systems, plus examine how the number of interactions, the presence of autoregulation, and degradation of mRNA and protein affects the frequency, amplitude, and robustness of transcriptional oscillators. We also find that there is a limit to parametric robustness, beyond which there is nothing to be gained by adding additional feedback. Importantly, we provide predictions on new oscillator systems that can be constructed to verify the theory and advance design and modeling approaches to systems and synthetic biology. PMID:26835539

  1. New approach in the quantum statistical parton distribution

    NASA Astrophysics Data System (ADS)

    Sohaily, Sozha; Vaziri (Khamedi), Mohammad

    2017-12-01

    An attempt to find simple parton distribution functions (PDFs) based on quantum statistical approach is presented. The PDFs described by the statistical model have very interesting physical properties which help to understand the structure of partons. The longitudinal portion of distribution functions are given by applying the maximum entropy principle. An interesting and simple approach to determine the statistical variables exactly without fitting and fixing parameters is surveyed. Analytic expressions of the x-dependent PDFs are obtained in the whole x region [0, 1], and the computed distributions are consistent with the experimental observations. The agreement with experimental data, gives a robust confirm of our simple presented statistical model.

  2. Robust hypothesis tests for detecting statistical evidence of two-dimensional and three-dimensional interactions in single-molecule measurements

    NASA Astrophysics Data System (ADS)

    Calderon, Christopher P.; Weiss, Lucien E.; Moerner, W. E.

    2014-05-01

    Experimental advances have improved the two- (2D) and three-dimensional (3D) spatial resolution that can be extracted from in vivo single-molecule measurements. This enables researchers to quantitatively infer the magnitude and directionality of forces experienced by biomolecules in their native environment. Situations where such force information is relevant range from mitosis to directed transport of protein cargo along cytoskeletal structures. Models commonly applied to quantify single-molecule dynamics assume that effective forces and velocity in the x ,y (or x ,y,z) directions are statistically independent, but this assumption is physically unrealistic in many situations. We present a hypothesis testing approach capable of determining if there is evidence of statistical dependence between positional coordinates in experimentally measured trajectories; if the hypothesis of independence between spatial coordinates is rejected, then a new model accounting for 2D (3D) interactions can and should be considered. Our hypothesis testing technique is robust, meaning it can detect interactions, even if the noise statistics are not well captured by the model. The approach is demonstrated on control simulations and on experimental data (directed transport of intraflagellar transport protein 88 homolog in the primary cilium).

  3. Model Checking Techniques for Assessing Functional Form Specifications in Censored Linear Regression Models.

    PubMed

    León, Larry F; Cai, Tianxi

    2012-04-01

    In this paper we develop model checking techniques for assessing functional form specifications of covariates in censored linear regression models. These procedures are based on a censored data analog to taking cumulative sums of "robust" residuals over the space of the covariate under investigation. These cumulative sums are formed by integrating certain Kaplan-Meier estimators and may be viewed as "robust" censored data analogs to the processes considered by Lin, Wei & Ying (2002). The null distributions of these stochastic processes can be approximated by the distributions of certain zero-mean Gaussian processes whose realizations can be generated by computer simulation. Each observed process can then be graphically compared with a few realizations from the Gaussian process. We also develop formal test statistics for numerical comparison. Such comparisons enable one to assess objectively whether an apparent trend seen in a residual plot reects model misspecification or natural variation. We illustrate the methods with a well known dataset. In addition, we examine the finite sample performance of the proposed test statistics in simulation experiments. In our simulation experiments, the proposed test statistics have good power of detecting misspecification while at the same time controlling the size of the test.

  4. Robust detection-isolation-accommodation for sensor failures

    NASA Technical Reports Server (NTRS)

    Weiss, J. L.; Pattipati, K. R.; Willsky, A. S.; Eterno, J. S.; Crawford, J. T.

    1985-01-01

    The results of a one year study to: (1) develop a theory for Robust Failure Detection and Identification (FDI) in the presence of model uncertainty, (2) develop a design methodology which utilizes the robust FDI ththeory, (3) apply the methodology to a sensor FDI problem for the F-100 jet engine, and (4) demonstrate the application of the theory to the evaluation of alternative FDI schemes are presented. Theoretical results in statistical discrimination are used to evaluate the robustness of residual signals (or parity relations) in terms of their usefulness for FDI. Furthermore, optimally robust parity relations are derived through the optimization of robustness metrics. The result is viewed as decentralization of the FDI process. A general structure for decentralized FDI is proposed and robustness metrics are used for determining various parameters of the algorithm.

  5. Statistical procedures for evaluating daily and monthly hydrologic model predictions

    USGS Publications Warehouse

    Coffey, M.E.; Workman, S.R.; Taraba, J.L.; Fogle, A.W.

    2004-01-01

    The overall study objective was to evaluate the applicability of different qualitative and quantitative methods for comparing daily and monthly SWAT computer model hydrologic streamflow predictions to observed data, and to recommend statistical methods for use in future model evaluations. Statistical methods were tested using daily streamflows and monthly equivalent runoff depths. The statistical techniques included linear regression, Nash-Sutcliffe efficiency, nonparametric tests, t-test, objective functions, autocorrelation, and cross-correlation. None of the methods specifically applied to the non-normal distribution and dependence between data points for the daily predicted and observed data. Of the tested methods, median objective functions, sign test, autocorrelation, and cross-correlation were most applicable for the daily data. The robust coefficient of determination (CD*) and robust modeling efficiency (EF*) objective functions were the preferred methods for daily model results due to the ease of comparing these values with a fixed ideal reference value of one. Predicted and observed monthly totals were more normally distributed, and there was less dependence between individual monthly totals than was observed for the corresponding predicted and observed daily values. More statistical methods were available for comparing SWAT model-predicted and observed monthly totals. The 1995 monthly SWAT model predictions and observed data had a regression Rr2 of 0.70, a Nash-Sutcliffe efficiency of 0.41, and the t-test failed to reject the equal data means hypothesis. The Nash-Sutcliffe coefficient and the R r2 coefficient were the preferred methods for monthly results due to the ability to compare these coefficients to a set ideal value of one.

  6. Uniting statistical and individual-based approaches for animal movement modelling.

    PubMed

    Latombe, Guillaume; Parrott, Lael; Basille, Mathieu; Fortin, Daniel

    2014-01-01

    The dynamic nature of their internal states and the environment directly shape animals' spatial behaviours and give rise to emergent properties at broader scales in natural systems. However, integrating these dynamic features into habitat selection studies remains challenging, due to practically impossible field work to access internal states and the inability of current statistical models to produce dynamic outputs. To address these issues, we developed a robust method, which combines statistical and individual-based modelling. Using a statistical technique for forward modelling of the IBM has the advantage of being faster for parameterization than a pure inverse modelling technique and allows for robust selection of parameters. Using GPS locations from caribou monitored in Québec, caribou movements were modelled based on generative mechanisms accounting for dynamic variables at a low level of emergence. These variables were accessed by replicating real individuals' movements in parallel sub-models, and movement parameters were then empirically parameterized using Step Selection Functions. The final IBM model was validated using both k-fold cross-validation and emergent patterns validation and was tested for two different scenarios, with varying hardwood encroachment. Our results highlighted a functional response in habitat selection, which suggests that our method was able to capture the complexity of the natural system, and adequately provided projections on future possible states of the system in response to different management plans. This is especially relevant for testing the long-term impact of scenarios corresponding to environmental configurations that have yet to be observed in real systems.

  7. Uniting Statistical and Individual-Based Approaches for Animal Movement Modelling

    PubMed Central

    Latombe, Guillaume; Parrott, Lael; Basille, Mathieu; Fortin, Daniel

    2014-01-01

    The dynamic nature of their internal states and the environment directly shape animals' spatial behaviours and give rise to emergent properties at broader scales in natural systems. However, integrating these dynamic features into habitat selection studies remains challenging, due to practically impossible field work to access internal states and the inability of current statistical models to produce dynamic outputs. To address these issues, we developed a robust method, which combines statistical and individual-based modelling. Using a statistical technique for forward modelling of the IBM has the advantage of being faster for parameterization than a pure inverse modelling technique and allows for robust selection of parameters. Using GPS locations from caribou monitored in Québec, caribou movements were modelled based on generative mechanisms accounting for dynamic variables at a low level of emergence. These variables were accessed by replicating real individuals' movements in parallel sub-models, and movement parameters were then empirically parameterized using Step Selection Functions. The final IBM model was validated using both k-fold cross-validation and emergent patterns validation and was tested for two different scenarios, with varying hardwood encroachment. Our results highlighted a functional response in habitat selection, which suggests that our method was able to capture the complexity of the natural system, and adequately provided projections on future possible states of the system in response to different management plans. This is especially relevant for testing the long-term impact of scenarios corresponding to environmental configurations that have yet to be observed in real systems. PMID:24979047

  8. Education in a Research University

    ERIC Educational Resources Information Center

    Arrow, Kenneth J. Ed.; And Others

    This collection of 30 essays on the character, administration, and management of research universities research university emphasizes the perspective of statistics and operations research: The essays are: "A Robust Faculty Planning Model" (Frederick Biedenweg); "Looking Back at Computer Models Employed in the Stanford University…

  9. Correlation techniques to determine model form in robust nonlinear system realization/identification

    NASA Technical Reports Server (NTRS)

    Stry, Greselda I.; Mook, D. Joseph

    1991-01-01

    The fundamental challenge in identification of nonlinear dynamic systems is determining the appropriate form of the model. A robust technique is presented which essentially eliminates this problem for many applications. The technique is based on the Minimum Model Error (MME) optimal estimation approach. A detailed literature review is included in which fundamental differences between the current approach and previous work is described. The most significant feature is the ability to identify nonlinear dynamic systems without prior assumption regarding the form of the nonlinearities, in contrast to existing nonlinear identification approaches which usually require detailed assumptions of the nonlinearities. Model form is determined via statistical correlation of the MME optimal state estimates with the MME optimal model error estimates. The example illustrations indicate that the method is robust with respect to prior ignorance of the model, and with respect to measurement noise, measurement frequency, and measurement record length.

  10. The Utility of Robust Means in Statistics

    ERIC Educational Resources Information Center

    Goodwyn, Fara

    2012-01-01

    Location estimates calculated from heuristic data were examined using traditional and robust statistical methods. The current paper demonstrates the impact outliers have on the sample mean and proposes robust methods to control for outliers in sample data. Traditional methods fail because they rely on the statistical assumptions of normality and…

  11. Efficient Computation of Info-Gap Robustness for Finite Element Models

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Stull, Christopher J.; Hemez, Francois M.; Williams, Brian J.

    2012-07-05

    A recent research effort at LANL proposed info-gap decision theory as a framework by which to measure the predictive maturity of numerical models. Info-gap theory explores the trade-offs between accuracy, that is, the extent to which predictions reproduce the physical measurements, and robustness, that is, the extent to which predictions are insensitive to modeling assumptions. Both accuracy and robustness are necessary to demonstrate predictive maturity. However, conducting an info-gap analysis can present a formidable challenge, from the standpoint of the required computational resources. This is because a robustness function requires the resolution of multiple optimization problems. This report offers anmore » alternative, adjoint methodology to assess the info-gap robustness of Ax = b-like numerical models solved for a solution x. Two situations that can arise in structural analysis and design are briefly described and contextualized within the info-gap decision theory framework. The treatments of the info-gap problems, using the adjoint methodology are outlined in detail, and the latter problem is solved for four separate finite element models. As compared to statistical sampling, the proposed methodology offers highly accurate approximations of info-gap robustness functions for the finite element models considered in the report, at a small fraction of the computational cost. It is noted that this report considers only linear systems; a natural follow-on study would extend the methodologies described herein to include nonlinear systems.« less

  12. How Reliable is Bayesian Model Averaging Under Noisy Data? Statistical Assessment and Implications for Robust Model Selection

    NASA Astrophysics Data System (ADS)

    Schöniger, Anneli; Wöhling, Thomas; Nowak, Wolfgang

    2014-05-01

    Bayesian model averaging ranks the predictive capabilities of alternative conceptual models based on Bayes' theorem. The individual models are weighted with their posterior probability to be the best one in the considered set of models. Finally, their predictions are combined into a robust weighted average and the predictive uncertainty can be quantified. This rigorous procedure does, however, not yet account for possible instabilities due to measurement noise in the calibration data set. This is a major drawback, since posterior model weights may suffer a lack of robustness related to the uncertainty in noisy data, which may compromise the reliability of model ranking. We present a new statistical concept to account for measurement noise as source of uncertainty for the weights in Bayesian model averaging. Our suggested upgrade reflects the limited information content of data for the purpose of model selection. It allows us to assess the significance of the determined posterior model weights, the confidence in model selection, and the accuracy of the quantified predictive uncertainty. Our approach rests on a brute-force Monte Carlo framework. We determine the robustness of model weights against measurement noise by repeatedly perturbing the observed data with random realizations of measurement error. Then, we analyze the induced variability in posterior model weights and introduce this "weighting variance" as an additional term into the overall prediction uncertainty analysis scheme. We further determine the theoretical upper limit in performance of the model set which is imposed by measurement noise. As an extension to the merely relative model ranking, this analysis provides a measure of absolute model performance. To finally decide, whether better data or longer time series are needed to ensure a robust basis for model selection, we resample the measurement time series and assess the convergence of model weights for increasing time series length. We illustrate our suggested approach with an application to model selection between different soil-plant models following up on a study by Wöhling et al. (2013). Results show that measurement noise compromises the reliability of model ranking and causes a significant amount of weighting uncertainty, if the calibration data time series is not long enough to compensate for its noisiness. This additional contribution to the overall predictive uncertainty is neglected without our approach. Thus, we strongly advertise to include our suggested upgrade in the Bayesian model averaging routine.

  13. Robust mislabel logistic regression without modeling mislabel probabilities.

    PubMed

    Hung, Hung; Jou, Zhi-Yu; Huang, Su-Yun

    2018-03-01

    Logistic regression is among the most widely used statistical methods for linear discriminant analysis. In many applications, we only observe possibly mislabeled responses. Fitting a conventional logistic regression can then lead to biased estimation. One common resolution is to fit a mislabel logistic regression model, which takes into consideration of mislabeled responses. Another common method is to adopt a robust M-estimation by down-weighting suspected instances. In this work, we propose a new robust mislabel logistic regression based on γ-divergence. Our proposal possesses two advantageous features: (1) It does not need to model the mislabel probabilities. (2) The minimum γ-divergence estimation leads to a weighted estimating equation without the need to include any bias correction term, that is, it is automatically bias-corrected. These features make the proposed γ-logistic regression more robust in model fitting and more intuitive for model interpretation through a simple weighting scheme. Our method is also easy to implement, and two types of algorithms are included. Simulation studies and the Pima data application are presented to demonstrate the performance of γ-logistic regression. © 2017, The International Biometric Society.

  14. Vector autoregressive models: A Gini approach

    NASA Astrophysics Data System (ADS)

    Mussard, Stéphane; Ndiaye, Oumar Hamady

    2018-02-01

    In this paper, it is proven that the usual VAR models may be performed in the Gini sense, that is, on a ℓ1 metric space. The Gini regression is robust to outliers. As a consequence, when data are contaminated by extreme values, we show that semi-parametric VAR-Gini regressions may be used to obtain robust estimators. The inference about the estimators is made with the ℓ1 norm. Also, impulse response functions and Gini decompositions for prevision errors are introduced. Finally, Granger's causality tests are properly derived based on U-statistics.

  15. Adaptive transmission disequilibrium test for family trio design.

    PubMed

    Yuan, Min; Tian, Xin; Zheng, Gang; Yang, Yaning

    2009-01-01

    The transmission disequilibrium test (TDT) is a standard method to detect association using family trio design. It is optimal for an additive genetic model. Other TDT-type tests optimal for recessive and dominant models have also been developed. Association tests using family data, including the TDT-type statistics, have been unified to a class of more comprehensive and flexable family-based association tests (FBAT). TDT-type tests have high efficiency when the genetic model is known or correctly specified, but may lose power if the model is mis-specified. Hence tests that are robust to genetic model mis-specification yet efficient are preferred. Constrained likelihood ratio test (CLRT) and MAX-type test have been shown to be efficiency robust. In this paper we propose a new efficiency robust procedure, referred to as adaptive TDT (aTDT). It uses the Hardy-Weinberg disequilibrium coefficient to identify the potential genetic model underlying the data and then applies the TDT-type test (or FBAT for general applications) corresponding to the selected model. Simulation demonstrates that aTDT is efficiency robust to model mis-specifications and generally outperforms the MAX test and CLRT in terms of power. We also show that aTDT has power close to, but much more robust, than the optimal TDT-type test based on a single genetic model. Applications to real and simulated data from Genetic Analysis Workshop (GAW) illustrate the use of our adaptive TDT.

  16. Fully Bayesian inference for structural MRI: application to segmentation and statistical analysis of T2-hypointensities.

    PubMed

    Schmidt, Paul; Schmid, Volker J; Gaser, Christian; Buck, Dorothea; Bührlen, Susanne; Förschler, Annette; Mühlau, Mark

    2013-01-01

    Aiming at iron-related T2-hypointensity, which is related to normal aging and neurodegenerative processes, we here present two practicable approaches, based on Bayesian inference, for preprocessing and statistical analysis of a complex set of structural MRI data. In particular, Markov Chain Monte Carlo methods were used to simulate posterior distributions. First, we rendered a segmentation algorithm that uses outlier detection based on model checking techniques within a Bayesian mixture model. Second, we rendered an analytical tool comprising a Bayesian regression model with smoothness priors (in the form of Gaussian Markov random fields) mitigating the necessity to smooth data prior to statistical analysis. For validation, we used simulated data and MRI data of 27 healthy controls (age: [Formula: see text]; range, [Formula: see text]). We first observed robust segmentation of both simulated T2-hypointensities and gray-matter regions known to be T2-hypointense. Second, simulated data and images of segmented T2-hypointensity were analyzed. We found not only robust identification of simulated effects but also a biologically plausible age-related increase of T2-hypointensity primarily within the dentate nucleus but also within the globus pallidus, substantia nigra, and red nucleus. Our results indicate that fully Bayesian inference can successfully be applied for preprocessing and statistical analysis of structural MRI data.

  17. Machine learning modeling of plant phenology based on coupling satellite and gridded meteorological dataset

    NASA Astrophysics Data System (ADS)

    Czernecki, Bartosz; Nowosad, Jakub; Jabłońska, Katarzyna

    2018-04-01

    Changes in the timing of plant phenological phases are important proxies in contemporary climate research. However, most of the commonly used traditional phenological observations do not give any coherent spatial information. While consistent spatial data can be obtained from airborne sensors and preprocessed gridded meteorological data, not many studies robustly benefit from these data sources. Therefore, the main aim of this study is to create and evaluate different statistical models for reconstructing, predicting, and improving quality of phenological phases monitoring with the use of satellite and meteorological products. A quality-controlled dataset of the 13 BBCH plant phenophases in Poland was collected for the period 2007-2014. For each phenophase, statistical models were built using the most commonly applied regression-based machine learning techniques, such as multiple linear regression, lasso, principal component regression, generalized boosted models, and random forest. The quality of the models was estimated using a k-fold cross-validation. The obtained results showed varying potential for coupling meteorological derived indices with remote sensing products in terms of phenological modeling; however, application of both data sources improves models' accuracy from 0.6 to 4.6 day in terms of obtained RMSE. It is shown that a robust prediction of early phenological phases is mostly related to meteorological indices, whereas for autumn phenophases, there is a stronger information signal provided by satellite-derived vegetation metrics. Choosing a specific set of predictors and applying a robust preprocessing procedures is more important for final results than the selection of a particular statistical model. The average RMSE for the best models of all phenophases is 6.3, while the individual RMSE vary seasonally from 3.5 to 10 days. Models give reliable proxy for ground observations with RMSE below 5 days for early spring and late spring phenophases. For other phenophases, RMSE are higher and rise up to 9-10 days in the case of the earliest spring phenophases.

  18. An empirical comparison of methods for analyzing correlated data from a discrete choice survey to elicit patient preference for colorectal cancer screening

    PubMed Central

    2012-01-01

    Background A discrete choice experiment (DCE) is a preference survey which asks participants to make a choice among product portfolios comparing the key product characteristics by performing several choice tasks. Analyzing DCE data needs to account for within-participant correlation because choices from the same participant are likely to be similar. In this study, we empirically compared some commonly-used statistical methods for analyzing DCE data while accounting for within-participant correlation based on a survey of patient preference for colorectal cancer (CRC) screening tests conducted in Hamilton, Ontario, Canada in 2002. Methods A two-stage DCE design was used to investigate the impact of six attributes on participants' preferences for CRC screening test and willingness to undertake the test. We compared six models for clustered binary outcomes (logistic and probit regressions using cluster-robust standard error (SE), random-effects and generalized estimating equation approaches) and three models for clustered nominal outcomes (multinomial logistic and probit regressions with cluster-robust SE and random-effects multinomial logistic model). We also fitted a bivariate probit model with cluster-robust SE treating the choices from two stages as two correlated binary outcomes. The rank of relative importance between attributes and the estimates of β coefficient within attributes were used to assess the model robustness. Results In total 468 participants with each completing 10 choices were analyzed. Similar results were reported for the rank of relative importance and β coefficients across models for stage-one data on evaluating participants' preferences for the test. The six attributes ranked from high to low as follows: cost, specificity, process, sensitivity, preparation and pain. However, the results differed across models for stage-two data on evaluating participants' willingness to undertake the tests. Little within-patient correlation (ICC ≈ 0) was found in stage-one data, but substantial within-patient correlation existed (ICC = 0.659) in stage-two data. Conclusions When small clustering effect presented in DCE data, results remained robust across statistical models. However, results varied when larger clustering effect presented. Therefore, it is important to assess the robustness of the estimates via sensitivity analysis using different models for analyzing clustered data from DCE studies. PMID:22348526

  19. Robustness of statistical tests for multiplicative terms in the additive main effects and multiplicative interaction model for cultivar trials.

    PubMed

    Piepho, H P

    1995-03-01

    The additive main effects multiplicative interaction model is frequently used in the analysis of multilocation trials. In the analysis of such data it is of interest to decide how many of the multiplicative interaction terms are significant. Several tests for this task are available, all of which assume that errors are normally distributed with a common variance. This paper investigates the robustness of several tests (Gollob, F GH1, FGH2, FR)to departures from these assumptions. It is concluded that, because of its better robustness, the F Rtest is preferable. If the other tests are to be used, preliminary tests for the validity of assumptions should be performed.

  20. Approach for Input Uncertainty Propagation and Robust Design in CFD Using Sensitivity Derivatives

    NASA Technical Reports Server (NTRS)

    Putko, Michele M.; Taylor, Arthur C., III; Newman, Perry A.; Green, Lawrence L.

    2002-01-01

    An implementation of the approximate statistical moment method for uncertainty propagation and robust optimization for quasi 3-D Euler CFD code is presented. Given uncertainties in statistically independent, random, normally distributed input variables, first- and second-order statistical moment procedures are performed to approximate the uncertainty in the CFD output. Efficient calculation of both first- and second-order sensitivity derivatives is required. In order to assess the validity of the approximations, these moments are compared with statistical moments generated through Monte Carlo simulations. The uncertainties in the CFD input variables are also incorporated into a robust optimization procedure. For this optimization, statistical moments involving first-order sensitivity derivatives appear in the objective function and system constraints. Second-order sensitivity derivatives are used in a gradient-based search to successfully execute a robust optimization. The approximate methods used throughout the analyses are found to be valid when considering robustness about input parameter mean values.

  1. A Tutorial for Analyzing Human Reaction Times: How to Filter Data, Manage Missing Values, and Choose a Statistical Model

    ERIC Educational Resources Information Center

    Lachaud, Christian Michel; Renaud, Olivier

    2011-01-01

    This tutorial for the statistical processing of reaction times collected through a repeated-measure design is addressed to researchers in psychology. It aims at making explicit some important methodological issues, at orienting researchers to the existing solutions, and at providing them some evaluation tools for choosing the most robust and…

  2. Assessing the sensitivity and robustness of prediction models for apple firmness using spectral scattering technique

    USDA-ARS?s Scientific Manuscript database

    Spectral scattering is useful for nondestructive sensing of fruit firmness. Prediction models, however, are typically built using multivariate statistical methods such as partial least squares regression (PLSR), whose performance generally depends on the characteristics of the data. The aim of this ...

  3. Marginal Structural Models with Counterfactual Effect Modifiers.

    PubMed

    Zheng, Wenjing; Luo, Zhehui; van der Laan, Mark J

    2018-06-08

    In health and social sciences, research questions often involve systematic assessment of the modification of treatment causal effect by patient characteristics. In longitudinal settings, time-varying or post-intervention effect modifiers are also of interest. In this work, we investigate the robust and efficient estimation of the Counterfactual-History-Adjusted Marginal Structural Model (van der Laan MJ, Petersen M. Statistical learning of origin-specific statically optimal individualized treatment rules. Int J Biostat. 2007;3), which models the conditional intervention-specific mean outcome given a counterfactual modifier history in an ideal experiment. We establish the semiparametric efficiency theory for these models, and present a substitution-based, semiparametric efficient and doubly robust estimator using the targeted maximum likelihood estimation methodology (TMLE, e.g. van der Laan MJ, Rubin DB. Targeted maximum likelihood learning. Int J Biostat. 2006;2, van der Laan MJ, Rose S. Targeted learning: causal inference for observational and experimental data, 1st ed. Springer Series in Statistics. Springer, 2011). To facilitate implementation in applications where the effect modifier is high dimensional, our third contribution is a projected influence function (and the corresponding projected TMLE estimator), which retains most of the robustness of its efficient peer and can be easily implemented in applications where the use of the efficient influence function becomes taxing. We compare the projected TMLE estimator with an Inverse Probability of Treatment Weighted estimator (e.g. Robins JM. Marginal structural models. In: Proceedings of the American Statistical Association. Section on Bayesian Statistical Science, 1-10. 1997a, Hernan MA, Brumback B, Robins JM. Marginal structural models to estimate the causal effect of zidovudine on the survival of HIV-positive men. 2000;11:561-570), and a non-targeted G-computation estimator (Robins JM. A new approach to causal inference in mortality studies with sustained exposure periods - application to control of the healthy worker survivor effect. Math Modell. 1986;7:1393-1512.). The comparative performance of these estimators is assessed in a simulation study. The use of the projected TMLE estimator is illustrated in a secondary data analysis for the Sequenced Treatment Alternatives to Relieve Depression (STAR*D) trial where effect modifiers are subject to missing at random.

  4. Ice Water Classification Using Statistical Distribution Based Conditional Random Fields in RADARSAT-2 Dual Polarization Imagery

    NASA Astrophysics Data System (ADS)

    Zhang, Y.; Li, F.; Zhang, S.; Hao, W.; Zhu, T.; Yuan, L.; Xiao, F.

    2017-09-01

    In this paper, Statistical Distribution based Conditional Random Fields (STA-CRF) algorithm is exploited for improving marginal ice-water classification. Pixel level ice concentration is presented as the comparison of methods based on CRF. Furthermore, in order to explore the effective statistical distribution model to be integrated into STA-CRF, five statistical distribution models are investigated. The STA-CRF methods are tested on 2 scenes around Prydz Bay and Adélie Depression, where contain a variety of ice types during melt season. Experimental results indicate that the proposed method can resolve sea ice edge well in Marginal Ice Zone (MIZ) and show a robust distinction of ice and water.

  5. Robust control for fractional variable-order chaotic systems with non-singular kernel

    NASA Astrophysics Data System (ADS)

    Zuñiga-Aguilar, C. J.; Gómez-Aguilar, J. F.; Escobar-Jiménez, R. F.; Romero-Ugalde, H. M.

    2018-01-01

    This paper investigates the chaos control for a class of variable-order fractional chaotic systems using robust control strategy. The variable-order fractional models of the non-autonomous biological system, the King Cobra chaotic system, the Halvorsen's attractor and the Burke-Shaw system, have been derived using the fractional-order derivative with Mittag-Leffler in the Liouville-Caputo sense. The fractional differential equations and the control law were solved using the Adams-Bashforth-Moulton algorithm. To test the control stability efficiency, different statistical indicators were introduced. Finally, simulation results demonstrate the effectiveness of the proposed robust control.

  6. Toward statistical modeling of saccadic eye-movement and visual saliency.

    PubMed

    Sun, Xiaoshuai; Yao, Hongxun; Ji, Rongrong; Liu, Xian-Ming

    2014-11-01

    In this paper, we present a unified statistical framework for modeling both saccadic eye movements and visual saliency. By analyzing the statistical properties of human eye fixations on natural images, we found that human attention is sparsely distributed and usually deployed to locations with abundant structural information. This observations inspired us to model saccadic behavior and visual saliency based on super-Gaussian component (SGC) analysis. Our model sequentially obtains SGC using projection pursuit, and generates eye movements by selecting the location with maximum SGC response. Besides human saccadic behavior simulation, we also demonstrated our superior effectiveness and robustness over state-of-the-arts by carrying out dense experiments on synthetic patterns and human eye fixation benchmarks. Multiple key issues in saliency modeling research, such as individual differences, the effects of scale and blur, are explored in this paper. Based on extensive qualitative and quantitative experimental results, we show promising potentials of statistical approaches for human behavior research.

  7. Smoking and Cancers: Case-Robust Analysis of a Classic Data Set

    ERIC Educational Resources Information Center

    Bentler, Peter M.; Satorra, Albert; Yuan, Ke-Hai

    2009-01-01

    A typical structural equation model is intended to reproduce the means, variances, and correlations or covariances among a set of variables based on parameter estimates of a highly restricted model. It is not widely appreciated that the sample statistics being modeled can be quite sensitive to outliers and influential observations, leading to bias…

  8. Statistical variation in progressive scrambling

    NASA Astrophysics Data System (ADS)

    Clark, Robert D.; Fox, Peter C.

    2004-07-01

    The two methods most often used to evaluate the robustness and predictivity of partial least squares (PLS) models are cross-validation and response randomization. Both methods may be overly optimistic for data sets that contain redundant observations, however. The kinds of perturbation analysis widely used for evaluating model stability in the context of ordinary least squares regression are only applicable when the descriptors are independent of each other and errors are independent and normally distributed; neither assumption holds for QSAR in general and for PLS in particular. Progressive scrambling is a novel, non-parametric approach to perturbing models in the response space in a way that does not disturb the underlying covariance structure of the data. Here, we introduce adjustments for two of the characteristic values produced by a progressive scrambling analysis - the deprecated predictivity (Q_s^{ast^2}) and standard error of prediction (SDEP s * ) - that correct for the effect of introduced perturbation. We also explore the statistical behavior of the adjusted values (Q_0^{ast^2} and SDEP 0 * ) and the sensitivity to perturbation (d q 2/d r yy ' 2). It is shown that the three statistics are all robust for stable PLS models, in terms of the stochastic component of their determination and of their variation due to sampling effects involved in training set selection.

  9. A robust clustering algorithm for identifying problematic samples in genome-wide association studies.

    PubMed

    Bellenguez, Céline; Strange, Amy; Freeman, Colin; Donnelly, Peter; Spencer, Chris C A

    2012-01-01

    High-throughput genotyping arrays provide an efficient way to survey single nucleotide polymorphisms (SNPs) across the genome in large numbers of individuals. Downstream analysis of the data, for example in genome-wide association studies (GWAS), often involves statistical models of genotype frequencies across individuals. The complexities of the sample collection process and the potential for errors in the experimental assay can lead to biases and artefacts in an individual's inferred genotypes. Rather than attempting to model these complications, it has become a standard practice to remove individuals whose genome-wide data differ from the sample at large. Here we describe a simple, but robust, statistical algorithm to identify samples with atypical summaries of genome-wide variation. Its use as a semi-automated quality control tool is demonstrated using several summary statistics, selected to identify different potential problems, and it is applied to two different genotyping platforms and sample collections. The algorithm is written in R and is freely available at www.well.ox.ac.uk/chris-spencer chris.spencer@well.ox.ac.uk Supplementary data are available at Bioinformatics online.

  10. Frequentist Model Averaging in Structural Equation Modelling.

    PubMed

    Jin, Shaobo; Ankargren, Sebastian

    2018-06-04

    Model selection from a set of candidate models plays an important role in many structural equation modelling applications. However, traditional model selection methods introduce extra randomness that is not accounted for by post-model selection inference. In the current study, we propose a model averaging technique within the frequentist statistical framework. Instead of selecting an optimal model, the contributions of all candidate models are acknowledged. Valid confidence intervals and a [Formula: see text] test statistic are proposed. A simulation study shows that the proposed method is able to produce a robust mean-squared error, a better coverage probability, and a better goodness-of-fit test compared to model selection. It is an interesting compromise between model selection and the full model.

  11. A robust interrupted time series model for analyzing complex health care intervention data.

    PubMed

    Cruz, Maricela; Bender, Miriam; Ombao, Hernando

    2017-12-20

    Current health policy calls for greater use of evidence-based care delivery services to improve patient quality and safety outcomes. Care delivery is complex, with interacting and interdependent components that challenge traditional statistical analytic techniques, in particular, when modeling a time series of outcomes data that might be "interrupted" by a change in a particular method of health care delivery. Interrupted time series (ITS) is a robust quasi-experimental design with the ability to infer the effectiveness of an intervention that accounts for data dependency. Current standardized methods for analyzing ITS data do not model changes in variation and correlation following the intervention. This is a key limitation since it is plausible for data variability and dependency to change because of the intervention. Moreover, present methodology either assumes a prespecified interruption time point with an instantaneous effect or removes data for which the effect of intervention is not fully realized. In this paper, we describe and develop a novel robust interrupted time series (robust-ITS) model that overcomes these omissions and limitations. The robust-ITS model formally performs inference on (1) identifying the change point; (2) differences in preintervention and postintervention correlation; (3) differences in the outcome variance preintervention and postintervention; and (4) differences in the mean preintervention and postintervention. We illustrate the proposed method by analyzing patient satisfaction data from a hospital that implemented and evaluated a new nursing care delivery model as the intervention of interest. The robust-ITS model is implemented in an R Shiny toolbox, which is freely available to the community. Copyright © 2017 John Wiley & Sons, Ltd.

  12. Approach for Uncertainty Propagation and Robust Design in CFD Using Sensitivity Derivatives

    NASA Technical Reports Server (NTRS)

    Putko, Michele M.; Newman, Perry A.; Taylor, Arthur C., III; Green, Lawrence L.

    2001-01-01

    This paper presents an implementation of the approximate statistical moment method for uncertainty propagation and robust optimization for a quasi 1-D Euler CFD (computational fluid dynamics) code. Given uncertainties in statistically independent, random, normally distributed input variables, a first- and second-order statistical moment matching procedure is performed to approximate the uncertainty in the CFD output. Efficient calculation of both first- and second-order sensitivity derivatives is required. In order to assess the validity of the approximations, the moments are compared with statistical moments generated through Monte Carlo simulations. The uncertainties in the CFD input variables are also incorporated into a robust optimization procedure. For this optimization, statistical moments involving first-order sensitivity derivatives appear in the objective function and system constraints. Second-order sensitivity derivatives are used in a gradient-based search to successfully execute a robust optimization. The approximate methods used throughout the analyses are found to be valid when considering robustness about input parameter mean values.

  13. The Problem of Size in Robust Design

    NASA Technical Reports Server (NTRS)

    Koch, Patrick N.; Allen, Janet K.; Mistree, Farrokh; Mavris, Dimitri

    1997-01-01

    To facilitate the effective solution of multidisciplinary, multiobjective complex design problems, a departure from the traditional parametric design analysis and single objective optimization approaches is necessary in the preliminary stages of design. A necessary tradeoff becomes one of efficiency vs. accuracy as approximate models are sought to allow fast analysis and effective exploration of a preliminary design space. In this paper we apply a general robust design approach for efficient and comprehensive preliminary design to a large complex system: a high speed civil transport (HSCT) aircraft. Specifically, we investigate the HSCT wing configuration design, incorporating life cycle economic uncertainties to identify economically robust solutions. The approach is built on the foundation of statistical experimentation and modeling techniques and robust design principles, and is specialized through incorporation of the compromise Decision Support Problem for multiobjective design. For large problems however, as in the HSCT example, this robust design approach developed for efficient and comprehensive design breaks down with the problem of size - combinatorial explosion in experimentation and model building with number of variables -and both efficiency and accuracy are sacrificed. Our focus in this paper is on identifying and discussing the implications and open issues associated with the problem of size for the preliminary design of large complex systems.

  14. Baseline estimation in flame's spectra by using neural networks and robust statistics

    NASA Astrophysics Data System (ADS)

    Garces, Hugo; Arias, Luis; Rojas, Alejandro

    2014-09-01

    This work presents a baseline estimation method in flame spectra based on artificial intelligence structure as a neural network, combining robust statistics with multivariate analysis to automatically discriminate measured wavelengths belonging to continuous feature for model adaptation, surpassing restriction of measuring target baseline for training. The main contributions of this paper are: to analyze a flame spectra database computing Jolliffe statistics from Principal Components Analysis detecting wavelengths not correlated with most of the measured data corresponding to baseline; to systematically determine the optimal number of neurons in hidden layers based on Akaike's Final Prediction Error; to estimate baseline in full wavelength range sampling measured spectra; and to train an artificial intelligence structure as a Neural Network which allows to generalize the relation between measured and baseline spectra. The main application of our research is to compute total radiation with baseline information, allowing to diagnose combustion process state for optimization in early stages.

  15. Multilevel Factor Analysis by Model Segregation: New Applications for Robust Test Statistics

    ERIC Educational Resources Information Center

    Schweig, Jonathan

    2014-01-01

    Measures of classroom environments have become central to policy efforts that assess school and teacher quality. This has sparked a wide interest in using multilevel factor analysis to test measurement hypotheses about classroom-level variables. One approach partitions the total covariance matrix and tests models separately on the…

  16. Statistical analysis of water-quality data containing multiple detection limits: S-language software for regression on order statistics

    USGS Publications Warehouse

    Lee, L.; Helsel, D.

    2005-01-01

    Trace contaminants in water, including metals and organics, often are measured at sufficiently low concentrations to be reported only as values below the instrument detection limit. Interpretation of these "less thans" is complicated when multiple detection limits occur. Statistical methods for multiply censored, or multiple-detection limit, datasets have been developed for medical and industrial statistics, and can be employed to estimate summary statistics or model the distributions of trace-level environmental data. We describe S-language-based software tools that perform robust linear regression on order statistics (ROS). The ROS method has been evaluated as one of the most reliable procedures for developing summary statistics of multiply censored data. It is applicable to any dataset that has 0 to 80% of its values censored. These tools are a part of a software library, or add-on package, for the R environment for statistical computing. This library can be used to generate ROS models and associated summary statistics, plot modeled distributions, and predict exceedance probabilities of water-quality standards. ?? 2005 Elsevier Ltd. All rights reserved.

  17. A Third Moment Adjusted Test Statistic for Small Sample Factor Analysis.

    PubMed

    Lin, Johnny; Bentler, Peter M

    2012-01-01

    Goodness of fit testing in factor analysis is based on the assumption that the test statistic is asymptotically chi-square; but this property may not hold in small samples even when the factors and errors are normally distributed in the population. Robust methods such as Browne's asymptotically distribution-free method and Satorra Bentler's mean scaling statistic were developed under the presumption of non-normality in the factors and errors. This paper finds new application to the case where factors and errors are normally distributed in the population but the skewness of the obtained test statistic is still high due to sampling error in the observed indicators. An extension of Satorra Bentler's statistic is proposed that not only scales the mean but also adjusts the degrees of freedom based on the skewness of the obtained test statistic in order to improve its robustness under small samples. A simple simulation study shows that this third moment adjusted statistic asymptotically performs on par with previously proposed methods, and at a very small sample size offers superior Type I error rates under a properly specified model. Data from Mardia, Kent and Bibby's study of students tested for their ability in five content areas that were either open or closed book were used to illustrate the real-world performance of this statistic.

  18. Robust Real-Time Music Transcription with a Compositional Hierarchical Model.

    PubMed

    Pesek, Matevž; Leonardis, Aleš; Marolt, Matija

    2017-01-01

    The paper presents a new compositional hierarchical model for robust music transcription. Its main features are unsupervised learning of a hierarchical representation of input data, transparency, which enables insights into the learned representation, as well as robustness and speed which make it suitable for real-world and real-time use. The model consists of multiple layers, each composed of a number of parts. The hierarchical nature of the model corresponds well to hierarchical structures in music. The parts in lower layers correspond to low-level concepts (e.g. tone partials), while the parts in higher layers combine lower-level representations into more complex concepts (tones, chords). The layers are learned in an unsupervised manner from music signals. Parts in each layer are compositions of parts from previous layers based on statistical co-occurrences as the driving force of the learning process. In the paper, we present the model's structure and compare it to other hierarchical approaches in the field of music information retrieval. We evaluate the model's performance for the multiple fundamental frequency estimation. Finally, we elaborate on extensions of the model towards other music information retrieval tasks.

  19. Landslides, forest fires, and earthquakes: examples of self-organized critical behavior

    NASA Astrophysics Data System (ADS)

    Turcotte, Donald L.; Malamud, Bruce D.

    2004-09-01

    Per Bak conceived self-organized criticality as an explanation for the behavior of the sandpile model. Subsequently, many cellular automata models were found to exhibit similar behavior. Two examples are the forest-fire and slider-block models. Each of these models can be associated with a serious natural hazard: the sandpile model with landslides, the forest-fire model with actual forest fires, and the slider-block model with earthquakes. We examine the noncumulative frequency-area statistics for each natural hazard, and show that each has a robust power-law (fractal) distribution. We propose an inverse-cascade model as a general explanation for the power-law frequency-area statistics of the three cellular-automata models and their ‘associated’ natural hazards.

  20. Robust Covariate-Adjusted Log-Rank Statistics and Corresponding Sample Size Formula for Recurrent Events Data

    PubMed Central

    Song, Rui; Kosorok, Michael R.; Cai, Jianwen

    2009-01-01

    Summary Recurrent events data are frequently encountered in clinical trials. This article develops robust covariate-adjusted log-rank statistics applied to recurrent events data with arbitrary numbers of events under independent censoring and the corresponding sample size formula. The proposed log-rank tests are robust with respect to different data-generating processes and are adjusted for predictive covariates. It reduces to the Kong and Slud (1997, Biometrika 84, 847–862) setting in the case of a single event. The sample size formula is derived based on the asymptotic normality of the covariate-adjusted log-rank statistics under certain local alternatives and a working model for baseline covariates in the recurrent event data context. When the effect size is small and the baseline covariates do not contain significant information about event times, it reduces to the same form as that of Schoenfeld (1983, Biometrics 39, 499–503) for cases of a single event or independent event times within a subject. We carry out simulations to study the control of type I error and the comparison of powers between several methods in finite samples. The proposed sample size formula is illustrated using data from an rhDNase study. PMID:18162107

  1. Neutral face classification using personalized appearance models for fast and robust emotion detection.

    PubMed

    Chiranjeevi, Pojala; Gopalakrishnan, Viswanath; Moogi, Pratibha

    2015-09-01

    Facial expression recognition is one of the open problems in computer vision. Robust neutral face recognition in real time is a major challenge for various supervised learning-based facial expression recognition methods. This is due to the fact that supervised methods cannot accommodate all appearance variability across the faces with respect to race, pose, lighting, facial biases, and so on, in the limited amount of training data. Moreover, processing each and every frame to classify emotions is not required, as user stays neutral for majority of the time in usual applications like video chat or photo album/web browsing. Detecting neutral state at an early stage, thereby bypassing those frames from emotion classification would save the computational power. In this paper, we propose a light-weight neutral versus emotion classification engine, which acts as a pre-processer to the traditional supervised emotion classification approaches. It dynamically learns neutral appearance at key emotion (KE) points using a statistical texture model, constructed by a set of reference neutral frames for each user. The proposed method is made robust to various types of user head motions by accounting for affine distortions based on a statistical texture model. Robustness to dynamic shift of KE points is achieved by evaluating the similarities on a subset of neighborhood patches around each KE point using the prior information regarding the directionality of specific facial action units acting on the respective KE point. The proposed method, as a result, improves emotion recognition (ER) accuracy and simultaneously reduces computational complexity of the ER system, as validated on multiple databases.

  2. Goodness-of-fit tests and model diagnostics for negative binomial regression of RNA sequencing data.

    PubMed

    Mi, Gu; Di, Yanming; Schafer, Daniel W

    2015-01-01

    This work is about assessing model adequacy for negative binomial (NB) regression, particularly (1) assessing the adequacy of the NB assumption, and (2) assessing the appropriateness of models for NB dispersion parameters. Tools for the first are appropriate for NB regression generally; those for the second are primarily intended for RNA sequencing (RNA-Seq) data analysis. The typically small number of biological samples and large number of genes in RNA-Seq analysis motivate us to address the trade-offs between robustness and statistical power using NB regression models. One widely-used power-saving strategy, for example, is to assume some commonalities of NB dispersion parameters across genes via simple models relating them to mean expression rates, and many such models have been proposed. As RNA-Seq analysis is becoming ever more popular, it is appropriate to make more thorough investigations into power and robustness of the resulting methods, and into practical tools for model assessment. In this article, we propose simulation-based statistical tests and diagnostic graphics to address model adequacy. We provide simulated and real data examples to illustrate that our proposed methods are effective for detecting the misspecification of the NB mean-variance relationship as well as judging the adequacy of fit of several NB dispersion models.

  3. ASCS online fault detection and isolation based on an improved MPCA

    NASA Astrophysics Data System (ADS)

    Peng, Jianxin; Liu, Haiou; Hu, Yuhui; Xi, Junqiang; Chen, Huiyan

    2014-09-01

    Multi-way principal component analysis (MPCA) has received considerable attention and been widely used in process monitoring. A traditional MPCA algorithm unfolds multiple batches of historical data into a two-dimensional matrix and cut the matrix along the time axis to form subspaces. However, low efficiency of subspaces and difficult fault isolation are the common disadvantages for the principal component model. This paper presents a new subspace construction method based on kernel density estimation function that can effectively reduce the storage amount of the subspace information. The MPCA model and the knowledge base are built based on the new subspace. Then, fault detection and isolation with the squared prediction error (SPE) statistic and the Hotelling ( T 2) statistic are also realized in process monitoring. When a fault occurs, fault isolation based on the SPE statistic is achieved by residual contribution analysis of different variables. For fault isolation of subspace based on the T 2 statistic, the relationship between the statistic indicator and state variables is constructed, and the constraint conditions are presented to check the validity of fault isolation. Then, to improve the robustness of fault isolation to unexpected disturbances, the statistic method is adopted to set the relation between single subspace and multiple subspaces to increase the corrective rate of fault isolation. Finally fault detection and isolation based on the improved MPCA is used to monitor the automatic shift control system (ASCS) to prove the correctness and effectiveness of the algorithm. The research proposes a new subspace construction method to reduce the required storage capacity and to prove the robustness of the principal component model, and sets the relationship between the state variables and fault detection indicators for fault isolation.

  4. A robust nonlinear filter for image restoration.

    PubMed

    Koivunen, V

    1995-01-01

    A class of nonlinear regression filters based on robust estimation theory is introduced. The goal of the filtering is to recover a high-quality image from degraded observations. Models for desired image structures and contaminating processes are employed, but deviations from strict assumptions are allowed since the assumptions on signal and noise are typically only approximately true. The robustness of filters is usually addressed only in a distributional sense, i.e., the actual error distribution deviates from the nominal one. In this paper, the robustness is considered in a broad sense since the outliers may also be due to inappropriate signal model, or there may be more than one statistical population present in the processing window, causing biased estimates. Two filtering algorithms minimizing a least trimmed squares criterion are provided. The design of the filters is simple since no scale parameters or context-dependent threshold values are required. Experimental results using both real and simulated data are presented. The filters effectively attenuate both impulsive and nonimpulsive noise while recovering the signal structure and preserving interesting details.

  5. Control design and robustness analysis of a ball and plate system by using polynomial chaos

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Colón, Diego; Balthazar, José M.; Reis, Célia A. dos

    2014-12-10

    In this paper, we present a mathematical model of a ball and plate system, a control law and analyze its robustness properties by using the polynomial chaos method. The ball rolls without slipping. There is an auxiliary robot vision system that determines the bodies' positions and velocities, and is used for control purposes. The actuators are to orthogonal DC motors, that changes the plate's angles with the ground. The model is a extension of the ball and beam system and is highly nonlinear. The system is decoupled in two independent equations for coordinates x and y. Finally, the resulting nonlinearmore » closed loop systems are analyzed by the polynomial chaos methodology, which considers that some system parameters are random variables, and generates statistical data that can be used in the robustness analysis.« less

  6. Control design and robustness analysis of a ball and plate system by using polynomial chaos

    NASA Astrophysics Data System (ADS)

    Colón, Diego; Balthazar, José M.; dos Reis, Célia A.; Bueno, Átila M.; Diniz, Ivando S.; de S. R. F. Rosa, Suelia

    2014-12-01

    In this paper, we present a mathematical model of a ball and plate system, a control law and analyze its robustness properties by using the polynomial chaos method. The ball rolls without slipping. There is an auxiliary robot vision system that determines the bodies' positions and velocities, and is used for control purposes. The actuators are to orthogonal DC motors, that changes the plate's angles with the ground. The model is a extension of the ball and beam system and is highly nonlinear. The system is decoupled in two independent equations for coordinates x and y. Finally, the resulting nonlinear closed loop systems are analyzed by the polynomial chaos methodology, which considers that some system parameters are random variables, and generates statistical data that can be used in the robustness analysis.

  7. Robust crop and weed segmentation under uncontrolled outdoor illumination

    USDA-ARS?s Scientific Manuscript database

    A new machine vision for weed detection was developed from RGB color model images. Processes included in the algorithm for the detection were excessive green conversion, threshold value computation by statistical analysis, adaptive image segmentation by adjusting the threshold value, median filter, ...

  8. Robust model predictive control for multi-step short range spacecraft rendezvous

    NASA Astrophysics Data System (ADS)

    Zhu, Shuyi; Sun, Ran; Wang, Jiaolong; Wang, Jihe; Shao, Xiaowei

    2018-07-01

    This work presents a robust model predictive control (MPC) approach for the multi-step short range spacecraft rendezvous problem. During the specific short range phase concerned, the chaser is supposed to be initially outside the line-of-sight (LOS) cone. Therefore, the rendezvous process naturally includes two steps: the first step is to transfer the chaser into the LOS cone and the second step is to transfer the chaser into the aimed region with its motion confined within the LOS cone. A novel MPC framework named after Mixed MPC (M-MPC) is proposed, which is the combination of the Variable-Horizon MPC (VH-MPC) framework and the Fixed-Instant MPC (FI-MPC) framework. The M-MPC framework enables the optimization for the two steps to be implemented jointly rather than to be separated factitiously, and its computation workload is acceptable for the usually low-power processors onboard spacecraft. Then considering that disturbances including modeling error, sensor noise and thrust uncertainty may induce undesired constraint violations, a robust technique is developed and it is attached to the above M-MPC framework to form a robust M-MPC approach. The robust technique is based on the chance-constrained idea, which ensures that constraints can be satisfied with a prescribed probability. It improves the robust technique proposed by Gavilan et al., because it eliminates the unnecessary conservativeness by explicitly incorporating known statistical properties of the navigation uncertainty. The efficacy of the robust M-MPC approach is shown in a simulation study.

  9. Robust geostatistical analysis of spatial data

    NASA Astrophysics Data System (ADS)

    Papritz, Andreas; Künsch, Hans Rudolf; Schwierz, Cornelia; Stahel, Werner A.

    2013-04-01

    Most of the geostatistical software tools rely on non-robust algorithms. This is unfortunate, because outlying observations are rather the rule than the exception, in particular in environmental data sets. Outliers affect the modelling of the large-scale spatial trend, the estimation of the spatial dependence of the residual variation and the predictions by kriging. Identifying outliers manually is cumbersome and requires expertise because one needs parameter estimates to decide which observation is a potential outlier. Moreover, inference after the rejection of some observations is problematic. A better approach is to use robust algorithms that prevent automatically that outlying observations have undue influence. Former studies on robust geostatistics focused on robust estimation of the sample variogram and ordinary kriging without external drift. Furthermore, Richardson and Welsh (1995) proposed a robustified version of (restricted) maximum likelihood ([RE]ML) estimation for the variance components of a linear mixed model, which was later used by Marchant and Lark (2007) for robust REML estimation of the variogram. We propose here a novel method for robust REML estimation of the variogram of a Gaussian random field that is possibly contaminated by independent errors from a long-tailed distribution. It is based on robustification of estimating equations for the Gaussian REML estimation (Welsh and Richardson, 1997). Besides robust estimates of the parameters of the external drift and of the variogram, the method also provides standard errors for the estimated parameters, robustified kriging predictions at both sampled and non-sampled locations and kriging variances. Apart from presenting our modelling framework, we shall present selected simulation results by which we explored the properties of the new method. This will be complemented by an analysis a data set on heavy metal contamination of the soil in the vicinity of a metal smelter. Marchant, B.P. and Lark, R.M. 2007. Robust estimation of the variogram by residual maximum likelihood. Geoderma 140: 62-72. Richardson, A.M. and Welsh, A.H. 1995. Robust restricted maximum likelihood in mixed linear models. Biometrics 51: 1429-1439. Welsh, A.H. and Richardson, A.M. 1997. Approaches to the robust estimation of mixed models. In: Handbook of Statistics Vol. 15, Elsevier, pp. 343-384.

  10. Kendall-Theil Robust Line (KTRLine--version 1.0)-A Visual Basic Program for Calculating and Graphing Robust Nonparametric Estimates of Linear-Regression Coefficients Between Two Continuous Variables

    USGS Publications Warehouse

    Granato, Gregory E.

    2006-01-01

    The Kendall-Theil Robust Line software (KTRLine-version 1.0) is a Visual Basic program that may be used with the Microsoft Windows operating system to calculate parameters for robust, nonparametric estimates of linear-regression coefficients between two continuous variables. The KTRLine software was developed by the U.S. Geological Survey, in cooperation with the Federal Highway Administration, for use in stochastic data modeling with local, regional, and national hydrologic data sets to develop planning-level estimates of potential effects of highway runoff on the quality of receiving waters. The Kendall-Theil robust line was selected because this robust nonparametric method is resistant to the effects of outliers and nonnormality in residuals that commonly characterize hydrologic data sets. The slope of the line is calculated as the median of all possible pairwise slopes between points. The intercept is calculated so that the line will run through the median of input data. A single-line model or a multisegment model may be specified. The program was developed to provide regression equations with an error component for stochastic data generation because nonparametric multisegment regression tools are not available with the software that is commonly used to develop regression models. The Kendall-Theil robust line is a median line and, therefore, may underestimate total mass, volume, or loads unless the error component or a bias correction factor is incorporated into the estimate. Regression statistics such as the median error, the median absolute deviation, the prediction error sum of squares, the root mean square error, the confidence interval for the slope, and the bias correction factor for median estimates are calculated by use of nonparametric methods. These statistics, however, may be used to formulate estimates of mass, volume, or total loads. The program is used to read a two- or three-column tab-delimited input file with variable names in the first row and data in subsequent rows. The user may choose the columns that contain the independent (X) and dependent (Y) variable. A third column, if present, may contain metadata such as the sample-collection location and date. The program screens the input files and plots the data. The KTRLine software is a graphical tool that facilitates development of regression models by use of graphs of the regression line with data, the regression residuals (with X or Y), and percentile plots of the cumulative frequency of the X variable, Y variable, and the regression residuals. The user may individually transform the independent and dependent variables to reduce heteroscedasticity and to linearize data. The program plots the data and the regression line. The program also prints model specifications and regression statistics to the screen. The user may save and print the regression results. The program can accept data sets that contain up to about 15,000 XY data points, but because the program must sort the array of all pairwise slopes, the program may be perceptibly slow with data sets that contain more than about 1,000 points.

  11. New methods in hydrologic modeling and decision support for culvert flood risk under climate change

    NASA Astrophysics Data System (ADS)

    Rosner, A.; Letcher, B. H.; Vogel, R. M.; Rees, P. S.

    2015-12-01

    Assessing culvert flood vulnerability under climate change poses an unusual combination of challenges. We seek a robust method of planning for an uncertain future, and therefore must consider a wide range of plausible future conditions. Culverts in our case study area, northwestern Massachusetts, USA, are predominantly found in small, ungaged basins. The need to predict flows both at numerous sites and under numerous plausible climate conditions requires a statistical model with low data and computational requirements. We present a statistical streamflow model that is driven by precipitation and temperature, allowing us to predict flows without reliance on reference gages of observed flows. The hydrological analysis is used to determine each culvert's risk of failure under current conditions. We also explore the hydrological response to a range of plausible future climate conditions. These results are used to determine the tolerance of each culvert to future increases in precipitation. In a decision support context, current flood risk as well as tolerance to potential climate changes are used to provide a robust assessment and prioritization for culvert replacements.

  12. Invariance in the recurrence of large returns and the validation of models of price dynamics

    NASA Astrophysics Data System (ADS)

    Chang, Lo-Bin; Geman, Stuart; Hsieh, Fushing; Hwang, Chii-Ruey

    2013-08-01

    Starting from a robust, nonparametric definition of large returns (“excursions”), we study the statistics of their occurrences, focusing on the recurrence process. The empirical waiting-time distribution between excursions is remarkably invariant to year, stock, and scale (return interval). This invariance is related to self-similarity of the marginal distributions of returns, but the excursion waiting-time distribution is a function of the entire return process and not just its univariate probabilities. Generalized autoregressive conditional heteroskedasticity (GARCH) models, market-time transformations based on volume or trades, and generalized (Lévy) random-walk models all fail to fit the statistical structure of excursions.

  13. Delayed Majority Game with Heterogeneous Learning Speeds for Financial Markets

    NASA Astrophysics Data System (ADS)

    Yoshimura, Yushi; Yamada, Kenta

    There are two famous statistical laws, so-called stylized facts, in financial markets. One is fat tail where the tail of price returns obeys a power law. The other is volatility clustering in which the autocorrelation function of absolute price returns decays with a power law. In order to understand relationships between the stylized facts and dealers' behaviors, we constructed a new agent-based model based on the grand canonical minority game (GCMG) and the Giardina-Bouchaud (GB) model. The recovery of stylized facts by GCMG and GB lacks of robustness. Therefore, based on the GCMG and GB model, we develop a new model that can reproduce stylized facts robustly. Furthermore, we find that heterogeneity of learning speeds of agents is important to reproduce the stylized facts.

  14. Approximations to the distribution of a test statistic in covariance structure analysis: A comprehensive study.

    PubMed

    Wu, Hao

    2018-05-01

    In structural equation modelling (SEM), a robust adjustment to the test statistic or to its reference distribution is needed when its null distribution deviates from a χ 2 distribution, which usually arises when data do not follow a multivariate normal distribution. Unfortunately, existing studies on this issue typically focus on only a few methods and neglect the majority of alternative methods in statistics. Existing simulation studies typically consider only non-normal distributions of data that either satisfy asymptotic robustness or lead to an asymptotic scaled χ 2 distribution. In this work we conduct a comprehensive study that involves both typical methods in SEM and less well-known methods from the statistics literature. We also propose the use of several novel non-normal data distributions that are qualitatively different from the non-normal distributions widely used in existing studies. We found that several under-studied methods give the best performance under specific conditions, but the Satorra-Bentler method remains the most viable method for most situations. © 2017 The British Psychological Society.

  15. Mapping Quantitative Traits in Unselected Families: Algorithms and Examples

    PubMed Central

    Dupuis, Josée; Shi, Jianxin; Manning, Alisa K.; Benjamin, Emelia J.; Meigs, James B.; Cupples, L. Adrienne; Siegmund, David

    2009-01-01

    Linkage analysis has been widely used to identify from family data genetic variants influencing quantitative traits. Common approaches have both strengths and limitations. Likelihood ratio tests typically computed in variance component analysis can accommodate large families but are highly sensitive to departure from normality assumptions. Regression-based approaches are more robust but their use has primarily been restricted to nuclear families. In this paper, we develop methods for mapping quantitative traits in moderately large pedigrees. Our methods are based on the score statistic which in contrast to the likelihood ratio statistic, can use nonparametric estimators of variability to achieve robustness of the false positive rate against departures from the hypothesized phenotypic model. Because the score statistic is easier to calculate than the likelihood ratio statistic, our basic mapping methods utilize relatively simple computer code that performs statistical analysis on output from any program that computes estimates of identity-by-descent. This simplicity also permits development and evaluation of methods to deal with multivariate and ordinal phenotypes, and with gene-gene and gene-environment interaction. We demonstrate our methods on simulated data and on fasting insulin, a quantitative trait measured in the Framingham Heart Study. PMID:19278016

  16. Rainfall Downscaling Conditional on Upper-air Atmospheric Predictors: Improved Assessment of Rainfall Statistics in a Changing Climate

    NASA Astrophysics Data System (ADS)

    Langousis, Andreas; Mamalakis, Antonis; Deidda, Roberto; Marrocu, Marino

    2015-04-01

    To improve the level skill of Global Climate Models (GCMs) and Regional Climate Models (RCMs) in reproducing the statistics of rainfall at a basin level and at hydrologically relevant temporal scales (e.g. daily), two types of statistical approaches have been suggested. One is the statistical correction of climate model rainfall outputs using historical series of precipitation. The other is the use of stochastic models of rainfall to conditionally simulate precipitation series, based on large-scale atmospheric predictors produced by climate models (e.g. geopotential height, relative vorticity, divergence, mean sea level pressure). The latter approach, usually referred to as statistical rainfall downscaling, aims at reproducing the statistical character of rainfall, while accounting for the effects of large-scale atmospheric circulation (and, therefore, climate forcing) on rainfall statistics. While promising, statistical rainfall downscaling has not attracted much attention in recent years, since the suggested approaches involved complex (i.e. subjective or computationally intense) identification procedures of the local weather, in addition to demonstrating limited success in reproducing several statistical features of rainfall, such as seasonal variations, the distributions of dry and wet spell lengths, the distribution of the mean rainfall intensity inside wet periods, and the distribution of rainfall extremes. In an effort to remedy those shortcomings, Langousis and Kaleris (2014) developed a statistical framework for simulation of daily rainfall intensities conditional on upper air variables, which accurately reproduces the statistical character of rainfall at multiple time-scales. Here, we study the relative performance of: a) quantile-quantile (Q-Q) correction of climate model rainfall products, and b) the statistical downscaling scheme of Langousis and Kaleris (2014), in reproducing the statistical structure of rainfall, as well as rainfall extremes, at a regional level. This is done for an intermediate-sized catchment in Italy, i.e. the Flumendosa catchment, using climate model rainfall and atmospheric data from the ENSEMBLES project (http://ensembleseu.metoffice.com). In doing so, we split the historical rainfall record of mean areal precipitation (MAP) in 15-year calibration and 45-year validation periods, and compare the historical rainfall statistics to those obtained from: a) Q-Q corrected climate model rainfall products, and b) synthetic rainfall series generated by the suggested downscaling scheme. To our knowledge, this is the first time that climate model rainfall and statistically downscaled precipitation are compared to catchment-averaged MAP at a daily resolution. The obtained results are promising, since the proposed downscaling scheme is more accurate and robust in reproducing a number of historical rainfall statistics, independent of the climate model used and the length of the calibration period. This is particularly the case for the yearly rainfall maxima, where direct statistical correction of climate model rainfall outputs shows increased sensitivity to the length of the calibration period and the climate model used. The robustness of the suggested downscaling scheme in modeling rainfall extremes at a daily resolution, is a notable feature that can effectively be used to assess hydrologic risk at a regional level under changing climatic conditions. Acknowledgments The research project is implemented within the framework of the Action «Supporting Postdoctoral Researchers» of the Operational Program "Education and Lifelong Learning" (Action's Beneficiary: General Secretariat for Research and Technology), and is co-financed by the European Social Fund (ESF) and the Greek State. CRS4 highly acknowledges the contribution of the Sardinian regional authorities.

  17. A Third Moment Adjusted Test Statistic for Small Sample Factor Analysis

    PubMed Central

    Lin, Johnny; Bentler, Peter M.

    2012-01-01

    Goodness of fit testing in factor analysis is based on the assumption that the test statistic is asymptotically chi-square; but this property may not hold in small samples even when the factors and errors are normally distributed in the population. Robust methods such as Browne’s asymptotically distribution-free method and Satorra Bentler’s mean scaling statistic were developed under the presumption of non-normality in the factors and errors. This paper finds new application to the case where factors and errors are normally distributed in the population but the skewness of the obtained test statistic is still high due to sampling error in the observed indicators. An extension of Satorra Bentler’s statistic is proposed that not only scales the mean but also adjusts the degrees of freedom based on the skewness of the obtained test statistic in order to improve its robustness under small samples. A simple simulation study shows that this third moment adjusted statistic asymptotically performs on par with previously proposed methods, and at a very small sample size offers superior Type I error rates under a properly specified model. Data from Mardia, Kent and Bibby’s study of students tested for their ability in five content areas that were either open or closed book were used to illustrate the real-world performance of this statistic. PMID:23144511

  18. A Robust Adaptive Autonomous Approach to Optimal Experimental Design

    NASA Astrophysics Data System (ADS)

    Gu, Hairong

    Experimentation is the fundamental tool of scientific inquiries to understand the laws governing the nature and human behaviors. Many complex real-world experimental scenarios, particularly in quest of prediction accuracy, often encounter difficulties to conduct experiments using an existing experimental procedure for the following two reasons. First, the existing experimental procedures require a parametric model to serve as the proxy of the latent data structure or data-generating mechanism at the beginning of an experiment. However, for those experimental scenarios of concern, a sound model is often unavailable before an experiment. Second, those experimental scenarios usually contain a large number of design variables, which potentially leads to a lengthy and costly data collection cycle. Incompetently, the existing experimental procedures are unable to optimize large-scale experiments so as to minimize the experimental length and cost. Facing the two challenges in those experimental scenarios, the aim of the present study is to develop a new experimental procedure that allows an experiment to be conducted without the assumption of a parametric model while still achieving satisfactory prediction, and performs optimization of experimental designs to improve the efficiency of an experiment. The new experimental procedure developed in the present study is named robust adaptive autonomous system (RAAS). RAAS is a procedure for sequential experiments composed of multiple experimental trials, which performs function estimation, variable selection, reverse prediction and design optimization on each trial. Directly addressing the challenges in those experimental scenarios of concern, function estimation and variable selection are performed by data-driven modeling methods to generate a predictive model from data collected during the course of an experiment, thus exempting the requirement of a parametric model at the beginning of an experiment; design optimization is performed to select experimental designs on the fly of an experiment based on their usefulness so that fewest designs are needed to reach useful inferential conclusions. Technically, function estimation is realized by Bayesian P-splines, variable selection is realized by Bayesian spike-and-slab prior, reverse prediction is realized by grid-search and design optimization is realized by the concepts of active learning. The present study demonstrated that RAAS achieves statistical robustness by making accurate predictions without the assumption of a parametric model serving as the proxy of latent data structure while the existing procedures can draw poor statistical inferences if a misspecified model is assumed; RAAS also achieves inferential efficiency by taking fewer designs to acquire useful statistical inferences than non-optimal procedures. Thus, RAAS is expected to be a principled solution to real-world experimental scenarios pursuing robust prediction and efficient experimentation.

  19. Stochastic simulation and robust design optimization of integrated photonic filters

    NASA Astrophysics Data System (ADS)

    Weng, Tsui-Wei; Melati, Daniele; Melloni, Andrea; Daniel, Luca

    2017-01-01

    Manufacturing variations are becoming an unavoidable issue in modern fabrication processes; therefore, it is crucial to be able to include stochastic uncertainties in the design phase. In this paper, integrated photonic coupled ring resonator filters are considered as an example of significant interest. The sparsity structure in photonic circuits is exploited to construct a sparse combined generalized polynomial chaos model, which is then used to analyze related statistics and perform robust design optimization. Simulation results show that the optimized circuits are more robust to fabrication process variations and achieve a reduction of 11%-35% in the mean square errors of the 3 dB bandwidth compared to unoptimized nominal designs.

  20. Integrating prior knowledge in multiple testing under dependence with applications to detecting differential DNA methylation.

    PubMed

    Kuan, Pei Fen; Chiang, Derek Y

    2012-09-01

    DNA methylation has emerged as an important hallmark of epigenetics. Numerous platforms including tiling arrays and next generation sequencing, and experimental protocols are available for profiling DNA methylation. Similar to other tiling array data, DNA methylation data shares the characteristics of inherent correlation structure among nearby probes. However, unlike gene expression or protein DNA binding data, the varying CpG density which gives rise to CpG island, shore and shelf definition provides exogenous information in detecting differential methylation. This article aims to introduce a robust testing and probe ranking procedure based on a nonhomogeneous hidden Markov model that incorporates the above-mentioned features for detecting differential methylation. We revisit the seminal work of Sun and Cai (2009, Journal of the Royal Statistical Society: Series B (Statistical Methodology)71, 393-424) and propose modeling the nonnull using a nonparametric symmetric distribution in two-sided hypothesis testing. We show that this model improves probe ranking and is robust to model misspecification based on extensive simulation studies. We further illustrate that our proposed framework achieves good operating characteristics as compared to commonly used methods in real DNA methylation data that aims to detect differential methylation sites. © 2012, The International Biometric Society.

  1. Ignoring the Innocent: Non-combatants in Urban Operations and in Military Models and Simulations

    DTIC Science & Technology

    2006-01-01

    such a model yields is a sufficiency theorem , a single run does not provide any information on the robustness of such theorems . That is, given that...often formally resolvable via inspection, simple differentiation, the implicit function theorem , comparative statistics, and so on. The only way to... Pythagoras , and Bactowars. For each, Grieger discusses model parameters, data collection, terrain, and other features. Grieger also discusses

  2. kruX: matrix-based non-parametric eQTL discovery.

    PubMed

    Qi, Jianlong; Asl, Hassan Foroughi; Björkegren, Johan; Michoel, Tom

    2014-01-14

    The Kruskal-Wallis test is a popular non-parametric statistical test for identifying expression quantitative trait loci (eQTLs) from genome-wide data due to its robustness against variations in the underlying genetic model and expression trait distribution, but testing billions of marker-trait combinations one-by-one can become computationally prohibitive. We developed kruX, an algorithm implemented in Matlab, Python and R that uses matrix multiplications to simultaneously calculate the Kruskal-Wallis test statistic for several millions of marker-trait combinations at once. KruX is more than ten thousand times faster than computing associations one-by-one on a typical human dataset. We used kruX and a dataset of more than 500k SNPs and 20k expression traits measured in 102 human blood samples to compare eQTLs detected by the Kruskal-Wallis test to eQTLs detected by the parametric ANOVA and linear model methods. We found that the Kruskal-Wallis test is more robust against data outliers and heterogeneous genotype group sizes and detects a higher proportion of non-linear associations, but is more conservative for calling additive linear associations. kruX enables the use of robust non-parametric methods for massive eQTL mapping without the need for a high-performance computing infrastructure and is freely available from http://krux.googlecode.com.

  3. Project risk management in the construction of high-rise buildings

    NASA Astrophysics Data System (ADS)

    Titarenko, Boris; Hasnaoui, Amir; Titarenko, Roman; Buzuk, Liliya

    2018-03-01

    This paper shows the project risk management methods, which allow to better identify risks in the construction of high-rise buildings and to manage them throughout the life cycle of the project. One of the project risk management processes is a quantitative analysis of risks. The quantitative analysis usually includes the assessment of the potential impact of project risks and their probabilities. This paper shows the most popular methods of risk probability assessment and tries to indicate the advantages of the robust approach over the traditional methods. Within the framework of the project risk management model a robust approach of P. Huber is applied and expanded for the tasks of regression analysis of project data. The suggested algorithms used to assess the parameters in statistical models allow to obtain reliable estimates. A review of the theoretical problems of the development of robust models built on the methodology of the minimax estimates was done and the algorithm for the situation of asymmetric "contamination" was developed.

  4. QSAR study of curcumine derivatives as HIV-1 integrase inhibitors.

    PubMed

    Gupta, Pawan; Sharma, Anju; Garg, Prabha; Roy, Nilanjan

    2013-03-01

    A QSAR study was performed on curcumine derivatives as HIV-1 integrase inhibitors using multiple linear regression. The statistically significant model was developed with squared correlation coefficients (r(2)) 0.891 and cross validated r(2) (r(2) cv) 0.825. The developed model revealed that electronic, shape, size, geometry, substitution's information and hydrophilicity were important atomic properties for determining the inhibitory activity of these molecules. The model was also tested successfully for external validation (r(2) pred = 0.849) as well as Tropsha's test for model predictability. Furthermore, the domain analysis was carried out to evaluate the prediction reliability of external set molecules. The model was statistically robust and had good predictive power which can be successfully utilized for screening of new molecules.

  5. Structural damage detection based on stochastic subspace identification and statistical pattern recognition: I. Theory

    NASA Astrophysics Data System (ADS)

    Ren, W. X.; Lin, Y. Q.; Fang, S. E.

    2011-11-01

    One of the key issues in vibration-based structural health monitoring is to extract the damage-sensitive but environment-insensitive features from sampled dynamic response measurements and to carry out the statistical analysis of these features for structural damage detection. A new damage feature is proposed in this paper by using the system matrices of the forward innovation model based on the covariance-driven stochastic subspace identification of a vibrating system. To overcome the variations of the system matrices, a non-singularity transposition matrix is introduced so that the system matrices are normalized to their standard forms. For reducing the effects of modeling errors, noise and environmental variations on measured structural responses, a statistical pattern recognition paradigm is incorporated into the proposed method. The Mahalanobis and Euclidean distance decision functions of the damage feature vector are adopted by defining a statistics-based damage index. The proposed structural damage detection method is verified against one numerical signal and two numerical beams. It is demonstrated that the proposed statistics-based damage index is sensitive to damage and shows some robustness to the noise and false estimation of the system ranks. The method is capable of locating damage of the beam structures under different types of excitations. The robustness of the proposed damage detection method to the variations in environmental temperature is further validated in a companion paper by a reinforced concrete beam tested in the laboratory and a full-scale arch bridge tested in the field.

  6. Improving Robustness of Hydrologic Ensemble Predictions Through Probabilistic Pre- and Post-Processing in Sequential Data Assimilation

    NASA Astrophysics Data System (ADS)

    Wang, S.; Ancell, B. C.; Huang, G. H.; Baetz, B. W.

    2018-03-01

    Data assimilation using the ensemble Kalman filter (EnKF) has been increasingly recognized as a promising tool for probabilistic hydrologic predictions. However, little effort has been made to conduct the pre- and post-processing of assimilation experiments, posing a significant challenge in achieving the best performance of hydrologic predictions. This paper presents a unified data assimilation framework for improving the robustness of hydrologic ensemble predictions. Statistical pre-processing of assimilation experiments is conducted through the factorial design and analysis to identify the best EnKF settings with maximized performance. After the data assimilation operation, statistical post-processing analysis is also performed through the factorial polynomial chaos expansion to efficiently address uncertainties in hydrologic predictions, as well as to explicitly reveal potential interactions among model parameters and their contributions to the predictive accuracy. In addition, the Gaussian anamorphosis is used to establish a seamless bridge between data assimilation and uncertainty quantification of hydrologic predictions. Both synthetic and real data assimilation experiments are carried out to demonstrate feasibility and applicability of the proposed methodology in the Guadalupe River basin, Texas. Results suggest that statistical pre- and post-processing of data assimilation experiments provide meaningful insights into the dynamic behavior of hydrologic systems and enhance robustness of hydrologic ensemble predictions.

  7. Climate Verification Using Running Mann Whitney Z Statistics

    USDA-ARS?s Scientific Manuscript database

    A robust method previously used to detect observed intra- to multi-decadal (IMD) climate regimes was adapted to test whether climate models could reproduce IMD variations in U.S. surface temperatures during 1919-2008. This procedure, called the running Mann Whitney Z (MWZ) method, samples data ranki...

  8. Robust scoring functions for protein-ligand interactions with quantum chemical charge models.

    PubMed

    Wang, Jui-Chih; Lin, Jung-Hsin; Chen, Chung-Ming; Perryman, Alex L; Olson, Arthur J

    2011-10-24

    Ordinary least-squares (OLS) regression has been used widely for constructing the scoring functions for protein-ligand interactions. However, OLS is very sensitive to the existence of outliers, and models constructed using it are easily affected by the outliers or even the choice of the data set. On the other hand, determination of atomic charges is regarded as of central importance, because the electrostatic interaction is known to be a key contributing factor for biomolecular association. In the development of the AutoDock4 scoring function, only OLS was conducted, and the simple Gasteiger method was adopted. It is therefore of considerable interest to see whether more rigorous charge models could improve the statistical performance of the AutoDock4 scoring function. In this study, we have employed two well-established quantum chemical approaches, namely the restrained electrostatic potential (RESP) and the Austin-model 1-bond charge correction (AM1-BCC) methods, to obtain atomic partial charges, and we have compared how different charge models affect the performance of AutoDock4 scoring functions. In combination with robust regression analysis and outlier exclusion, our new protein-ligand free energy regression model with AM1-BCC charges for ligands and Amber99SB charges for proteins achieve lowest root-mean-squared error of 1.637 kcal/mol for the training set of 147 complexes and 2.176 kcal/mol for the external test set of 1427 complexes. The assessment for binding pose prediction with the 100 external decoy sets indicates very high success rate of 87% with the criteria of predicted root-mean-squared deviation of less than 2 Å. The success rates and statistical performance of our robust scoring functions are only weakly class-dependent (hydrophobic, hydrophilic, or mixed).

  9. Present-day constraint for tropical Pacific precipitation changes due to global warming in CMIP5 models

    NASA Astrophysics Data System (ADS)

    Ham, Yoo-Geun; Kug, Jong-Seong

    2016-11-01

    The sensitivity of the precipitation responses to greenhouse warming can depend on the present-day climate. In this study, a robust linkage between the present-day precipitation climatology and precipitation change owing to global warming is examined in inter-model space. A model with drier climatology in the present-day simulation tends to simulate an increase in climatological precipitation owing to global warming. Moreover, the horizontal gradient of the present-day precipitation climatology plays an important role in determining the precipitation changes. On the basis of these robust relationships, future precipitation changes are calibrated by removing the impact of the present-day precipitation bias in the climate models. To validate this result, the perfect model approach is adapted, which treats a particular model's precipitation change as an observed change. The results suggest that the precipitation change pattern can be generally improved by applying the present statistical approach.

  10. Accounting for immunoprecipitation efficiencies in the statistical analysis of ChIP-seq data.

    PubMed

    Bao, Yanchun; Vinciotti, Veronica; Wit, Ernst; 't Hoen, Peter A C

    2013-05-30

    ImmunoPrecipitation (IP) efficiencies may vary largely between different antibodies and between repeated experiments with the same antibody. These differences have a large impact on the quality of ChIP-seq data: a more efficient experiment will necessarily lead to a higher signal to background ratio, and therefore to an apparent larger number of enriched regions, compared to a less efficient experiment. In this paper, we show how IP efficiencies can be explicitly accounted for in the joint statistical modelling of ChIP-seq data. We fit a latent mixture model to eight experiments on two proteins, from two laboratories where different antibodies are used for the two proteins. We use the model parameters to estimate the efficiencies of individual experiments, and find that these are clearly different for the different laboratories, and amongst technical replicates from the same lab. When we account for ChIP efficiency, we find more regions bound in the more efficient experiments than in the less efficient ones, at the same false discovery rate. A priori knowledge of the same number of binding sites across experiments can also be included in the model for a more robust detection of differentially bound regions among two different proteins. We propose a statistical model for the detection of enriched and differentially bound regions from multiple ChIP-seq data sets. The framework that we present accounts explicitly for IP efficiencies in ChIP-seq data, and allows to model jointly, rather than individually, replicates and experiments from different proteins, leading to more robust biological conclusions.

  11. New Approaches to Robust Confidence Intervals for Location: A Simulation Study.

    DTIC Science & Technology

    1984-06-01

    obtain a denominator for the test statistic. Those statistics based on location estimates derived from Hampel’s redescending influence function or v...defined an influence function for a test in terms of the behavior of its P-values when the data are sampled from a model distribution modified by point...proposal could be used for interval estimation as well as hypothesis testing, the extension is immediate. Once an influence function has been defined

  12. Causal Models and Exploratory Analysis in Heterogeneous Information Fusion for Detecting Potential Terrorists

    DTIC Science & Technology

    2015-11-01

    issues and made some of the same distinctions (Walker, Lempert, and Kwakkel ; Bankes, Lempert, and Popper, 2005), but it did appear that we had more than...Statistics,” British Journal of Mathematical Statistics and Pscyhology, 66, pp. 8-38. Jaynes, Edwin T., and G . Larry Bretthorst (ed.) (2003) , Probability...Giroux. Lempert, Robert J., David G . Groves, Steven W. Popper, and Steven C. Bankes (2006), “A General Analytic Method for Generating Robust

  13. Flood risk assessment and robust management under deep uncertainty: Application to Dhaka City

    NASA Astrophysics Data System (ADS)

    Mojtahed, Vahid; Gain, Animesh Kumar; Giupponi, Carlo

    2014-05-01

    The socio-economic changes as well as climatic changes have been the main drivers of uncertainty in environmental risk assessment and in particular flood. The level of future uncertainty that researchers face when dealing with problems in a future perspective with focus on climate change is known as Deep Uncertainty (also known as Knightian uncertainty), since nobody has already experienced and undergone those changes before and our knowledge is limited to the extent that we have no notion of probabilities, and therefore consolidated risk management approaches have limited potential.. Deep uncertainty is referred to circumstances that analysts and experts do not know or parties to decision making cannot agree on: i) the appropriate models describing the interaction among system variables, ii) probability distributions to represent uncertainty about key parameters in the model 3) how to value the desirability of alternative outcomes. The need thus emerges to assist policy-makers by providing them with not a single and optimal solution to the problem at hand, such as crisp estimates for the costs of damages of natural hazards considered, but instead ranges of possible future costs, based on the outcomes of ensembles of assessment models and sets of plausible scenarios. Accordingly, we need to substitute optimality as a decision criterion with robustness. Under conditions of deep uncertainty, the decision-makers do not have statistical and mathematical bases to identify optimal solutions, while instead they should prefer to implement "robust" decisions that perform relatively well over all conceivable outcomes out of all future unknown scenarios. Under deep uncertainty, analysts cannot employ probability theory or other statistics that usually can be derived from observed historical data and therefore, we turn to non-statistical measures such as scenario analysis. We construct several plausible scenarios with each scenario being a full description of what may happen in future and based on a meaningful synthesis of parameters' values with control of their correlations for maintaining internal consistencies. This paper aims at incorporating a set of data mining and sampling tools to assess uncertainty of model outputs under future climatic and socio-economic changes for Dhaka city and providing a decision support system for robust flood management and mitigation policies. After constructing an uncertainty matrix to identify the main sources of uncertainty for Dhaka City, we identify several hazard and vulnerability maps based on future climatic and socio-economic scenarios. The vulnerability of each flood management alternative under different set of scenarios is determined and finally the robustness of each plausible solution considered is defined based on the above assessment.

  14. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Soffientini, Chiara Dolores, E-mail: chiaradolores.soffientini@polimi.it; Baselli, Giuseppe; De Bernardi, Elisabetta

    Purpose: Quantitative {sup 18}F-fluorodeoxyglucose positron emission tomography is limited by the uncertainty in lesion delineation due to poor SNR, low resolution, and partial volume effects, subsequently impacting oncological assessment, treatment planning, and follow-up. The present work develops and validates a segmentation algorithm based on statistical clustering. The introduction of constraints based on background features and contiguity priors is expected to improve robustness vs clinical image characteristics such as lesion dimension, noise, and contrast level. Methods: An eight-class Gaussian mixture model (GMM) clustering algorithm was modified by constraining the mean and variance parameters of four background classes according to the previousmore » analysis of a lesion-free background volume of interest (background modeling). Hence, expectation maximization operated only on the four classes dedicated to lesion detection. To favor the segmentation of connected objects, a further variant was introduced by inserting priors relevant to the classification of neighbors. The algorithm was applied to simulated datasets and acquired phantom data. Feasibility and robustness toward initialization were assessed on a clinical dataset manually contoured by two expert clinicians. Comparisons were performed with respect to a standard eight-class GMM algorithm and to four different state-of-the-art methods in terms of volume error (VE), Dice index, classification error (CE), and Hausdorff distance (HD). Results: The proposed GMM segmentation with background modeling outperformed standard GMM and all the other tested methods. Medians of accuracy indexes were VE <3%, Dice >0.88, CE <0.25, and HD <1.2 in simulations; VE <23%, Dice >0.74, CE <0.43, and HD <1.77 in phantom data. Robustness toward image statistic changes (±15%) was shown by the low index changes: <26% for VE, <17% for Dice, and <15% for CE. Finally, robustness toward the user-dependent volume initialization was demonstrated. The inclusion of the spatial prior improved segmentation accuracy only for lesions surrounded by heterogeneous background: in the relevant simulation subset, the median VE significantly decreased from 13% to 7%. Results on clinical data were found in accordance with simulations, with absolute VE <7%, Dice >0.85, CE <0.30, and HD <0.81. Conclusions: The sole introduction of constraints based on background modeling outperformed standard GMM and the other tested algorithms. Insertion of a spatial prior improved the accuracy for realistic cases of objects in heterogeneous backgrounds. Moreover, robustness against initialization supports the applicability in a clinical setting. In conclusion, application-driven constraints can generally improve the capabilities of GMM and statistical clustering algorithms.« less

  15. Enhanced echolocation via robust statistics and super-resolution of sonar images

    NASA Astrophysics Data System (ADS)

    Kim, Kio

    Echolocation is a process in which an animal uses acoustic signals to exchange information with environments. In a recent study, Neretti et al. have shown that the use of robust statistics can significantly improve the resiliency of echolocation against noise and enhance its accuracy by suppressing the development of sidelobes in the processing of an echo signal. In this research, the use of robust statistics is extended to problems in underwater explorations. The dissertation consists of two parts. Part I describes how robust statistics can enhance the identification of target objects, which in this case are cylindrical containers filled with four different liquids. Particularly, this work employs a variation of an existing robust estimator called an L-estimator, which was first suggested by Koenker and Bassett. As pointed out by Au et al.; a 'highlight interval' is an important feature, and it is closely related with many other important features that are known to be crucial for dolphin echolocation. A varied L-estimator described in this text is used to enhance the detection of highlight intervals, which eventually leads to a successful classification of echo signals. Part II extends the problem into 2 dimensions. Thanks to the advances in material and computer technology, various sonar imaging modalities are available on the market. By registering acoustic images from such video sequences, one can extract more information on the region of interest. Computer vision and image processing allowed application of robust statistics to the acoustic images produced by forward looking sonar systems, such as Dual-frequency Identification Sonar and ProViewer. The first use of robust statistics for sonar image enhancement in this text is in image registration. Random Sampling Consensus (RANSAC) is widely used for image registration. The registration algorithm using RANSAC is optimized for sonar image registration, and the performance is studied. The second use of robust statistics is in fusing the images. It is shown that the maximum a posteriori fusion method can be formulated in a Kalman filter-like manner, and also that the resulting expression is identical to a W-estimator with a specific weight function.

  16. Financial Stylized Facts in the Word of Mouth Model

    NASA Astrophysics Data System (ADS)

    Misawa, Tadanobu; Watanabe, Kyoko; Shimokawa, Tetsuya

    Recently, we proposed an agent-based model called the word of mouth model to analyze the influence of an information transmission process to price formation in financial markets. Especially, the short-term predictability of asset return was focused on and an explanation in the view of information transmission was provided to the question why the predictability was much clearly observed in the small-sized stocks. This paper, to extend the previous study, demonstrates that the word of mouth model also has a consistency with other important financial stylized facts. This strengthens the possibility that the information transmission among investors plays a crucial role in price formation. Concretely, this paper addresses two famous statistical features of returns; the leptokurtic distribution of return and the autocorrelation of return volatility. The reasons why these statistical facts receive especial attentions of researchers among financial stylized facts are their statistical robustness and practical importance, such as the applications to the derivative pricing problems.

  17. Statistical reconstruction for cosmic ray muon tomography.

    PubMed

    Schultz, Larry J; Blanpied, Gary S; Borozdin, Konstantin N; Fraser, Andrew M; Hengartner, Nicolas W; Klimenko, Alexei V; Morris, Christopher L; Orum, Chris; Sossong, Michael J

    2007-08-01

    Highly penetrating cosmic ray muons constantly shower the earth at a rate of about 1 muon per cm2 per minute. We have developed a technique which exploits the multiple Coulomb scattering of these particles to perform nondestructive inspection without the use of artificial radiation. In prior work [1]-[3], we have described heuristic methods for processing muon data to create reconstructed images. In this paper, we present a maximum likelihood/expectation maximization tomographic reconstruction algorithm designed for the technique. This algorithm borrows much from techniques used in medical imaging, particularly emission tomography, but the statistics of muon scattering dictates differences. We describe the statistical model for multiple scattering, derive the reconstruction algorithm, and present simulated examples. We also propose methods to improve the robustness of the algorithm to experimental errors and events departing from the statistical model.

  18. Automated finite element modeling of the lumbar spine: Using a statistical shape model to generate a virtual population of models.

    PubMed

    Campbell, J Q; Petrella, A J

    2016-09-06

    Population-based modeling of the lumbar spine has the potential to be a powerful clinical tool. However, developing a fully parameterized model of the lumbar spine with accurate geometry has remained a challenge. The current study used automated methods for landmark identification to create a statistical shape model of the lumbar spine. The shape model was evaluated using compactness, generalization ability, and specificity. The primary shape modes were analyzed visually, quantitatively, and biomechanically. The biomechanical analysis was performed by using the statistical shape model with an automated method for finite element model generation to create a fully parameterized finite element model of the lumbar spine. Functional finite element models of the mean shape and the extreme shapes (±3 standard deviations) of all 17 shape modes were created demonstrating the robust nature of the methods. This study represents an advancement in finite element modeling of the lumbar spine and will allow population-based modeling in the future. Copyright © 2016 Elsevier Ltd. All rights reserved.

  19. Food Choice Questionnaire (FCQ) revisited. Suggestions for the development of an enhanced general food motivation model.

    PubMed

    Fotopoulos, Christos; Krystallis, Athanasios; Vassallo, Marco; Pagiaslis, Anastasios

    2009-02-01

    Recognising the need for a more statistically robust instrument to investigate general food selection determinants, the research validates and confirms Food Choice Questionnaire (FCQ's) factorial design, develops ad hoc a more robust FCQ version and tests its ability to discriminate between consumer segments in terms of the importance they assign to the FCQ motivational factors. The original FCQ appears to represent a comprehensive and reliable research instrument. However, the empirical data do not support the robustness of its 9-factorial design. On the other hand, segmentation results at the subpopulation level based on the enhanced FCQ version bring about an optimistic message for the FCQ's ability to predict food selection behaviour. The paper concludes that some of the basic components of the original FCQ can be used as a basis for a new general food motivation typology. The development of such a new instrument, with fewer, of higher abstraction FCQ-based dimensions and fewer items per dimension, is a right step forward; yet such a step should be theory-driven, while a rigorous statistical testing across and within population would be necessary.

  20. Validating an Air Traffic Management Concept of Operation Using Statistical Modeling

    NASA Technical Reports Server (NTRS)

    He, Yuning; Davies, Misty Dawn

    2013-01-01

    Validating a concept of operation for a complex, safety-critical system (like the National Airspace System) is challenging because of the high dimensionality of the controllable parameters and the infinite number of states of the system. In this paper, we use statistical modeling techniques to explore the behavior of a conflict detection and resolution algorithm designed for the terminal airspace. These techniques predict the robustness of the system simulation to both nominal and off-nominal behaviors within the overall airspace. They also can be used to evaluate the output of the simulation against recorded airspace data. Additionally, the techniques carry with them a mathematical value of the worth of each prediction-a statistical uncertainty for any robustness estimate. Uncertainty Quantification (UQ) is the process of quantitative characterization and ultimately a reduction of uncertainties in complex systems. UQ is important for understanding the influence of uncertainties on the behavior of a system and therefore is valuable for design, analysis, and verification and validation. In this paper, we apply advanced statistical modeling methodologies and techniques on an advanced air traffic management system, namely the Terminal Tactical Separation Assured Flight Environment (T-TSAFE). We show initial results for a parameter analysis and safety boundary (envelope) detection in the high-dimensional parameter space. For our boundary analysis, we developed a new sequential approach based upon the design of computer experiments, allowing us to incorporate knowledge from domain experts into our modeling and to determine the most likely boundary shapes and its parameters. We carried out the analysis on system parameters and describe an initial approach that will allow us to include time-series inputs, such as the radar track data, into the analysis

  1. A Robust State Estimation Framework Considering Measurement Correlations and Imperfect Synchronization

    DOE PAGES

    Zhao, Junbo; Wang, Shaobu; Mili, Lamine; ...

    2018-01-08

    Here, this paper develops a robust power system state estimation framework with the consideration of measurement correlations and imperfect synchronization. In the framework, correlations of SCADA and Phasor Measurements (PMUs) are calculated separately through unscented transformation and a Vector Auto-Regression (VAR) model. In particular, PMU measurements during the waiting period of two SCADA measurement scans are buffered to develop the VAR model with robustly estimated parameters using projection statistics approach. The latter takes into account the temporal and spatial correlations of PMU measurements and provides redundant measurements to suppress bad data and mitigate imperfect synchronization. In case where the SCADAmore » and PMU measurements are not time synchronized, either the forecasted PMU measurements or the prior SCADA measurements from the last estimation run are leveraged to restore system observability. Then, a robust generalized maximum-likelihood (GM)-estimator is extended to integrate measurement error correlations and to handle the outliers in the SCADA and PMU measurements. Simulation results that stem from a comprehensive comparison with other alternatives under various conditions demonstrate the benefits of the proposed framework.« less

  2. A Robust State Estimation Framework Considering Measurement Correlations and Imperfect Synchronization

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhao, Junbo; Wang, Shaobu; Mili, Lamine

    Here, this paper develops a robust power system state estimation framework with the consideration of measurement correlations and imperfect synchronization. In the framework, correlations of SCADA and Phasor Measurements (PMUs) are calculated separately through unscented transformation and a Vector Auto-Regression (VAR) model. In particular, PMU measurements during the waiting period of two SCADA measurement scans are buffered to develop the VAR model with robustly estimated parameters using projection statistics approach. The latter takes into account the temporal and spatial correlations of PMU measurements and provides redundant measurements to suppress bad data and mitigate imperfect synchronization. In case where the SCADAmore » and PMU measurements are not time synchronized, either the forecasted PMU measurements or the prior SCADA measurements from the last estimation run are leveraged to restore system observability. Then, a robust generalized maximum-likelihood (GM)-estimator is extended to integrate measurement error correlations and to handle the outliers in the SCADA and PMU measurements. Simulation results that stem from a comprehensive comparison with other alternatives under various conditions demonstrate the benefits of the proposed framework.« less

  3. Robust multivariate nonparametric tests for detection of two-sample location shift in clinical trials

    PubMed Central

    Jiang, Xuejun; Guo, Xu; Zhang, Ning; Wang, Bo

    2018-01-01

    This article presents and investigates performance of a series of robust multivariate nonparametric tests for detection of location shift between two multivariate samples in randomized controlled trials. The tests are built upon robust estimators of distribution locations (medians, Hodges-Lehmann estimators, and an extended U statistic) with both unscaled and scaled versions. The nonparametric tests are robust to outliers and do not assume that the two samples are drawn from multivariate normal distributions. Bootstrap and permutation approaches are introduced for determining the p-values of the proposed test statistics. Simulation studies are conducted and numerical results are reported to examine performance of the proposed statistical tests. The numerical results demonstrate that the robust multivariate nonparametric tests constructed from the Hodges-Lehmann estimators are more efficient than those based on medians and the extended U statistic. The permutation approach can provide a more stringent control of Type I error and is generally more powerful than the bootstrap procedure. The proposed robust nonparametric tests are applied to detect multivariate distributional difference between the intervention and control groups in the Thai Healthy Choices study and examine the intervention effect of a four-session motivational interviewing-based intervention developed in the study to reduce risk behaviors among youth living with HIV. PMID:29672555

  4. A reliability study on brain activation during active and passive arm movements supported by an MRI-compatible robot.

    PubMed

    Estévez, Natalia; Yu, Ningbo; Brügger, Mike; Villiger, Michael; Hepp-Reymond, Marie-Claude; Riener, Robert; Kollias, Spyros

    2014-11-01

    In neurorehabilitation, longitudinal assessment of arm movement related brain function in patients with motor disability is challenging due to variability in task performance. MRI-compatible robots monitor and control task performance, yielding more reliable evaluation of brain function over time. The main goals of the present study were first to define the brain network activated while performing active and passive elbow movements with an MRI-compatible arm robot (MaRIA) in healthy subjects, and second to test the reproducibility of this activation over time. For the fMRI analysis two models were compared. In model 1 movement onset and duration were included, whereas in model 2 force and range of motion were added to the analysis. Reliability of brain activation was tested with several statistical approaches applied on individual and group activation maps and on summary statistics. The activated network included mainly the primary motor cortex, primary and secondary somatosensory cortex, superior and inferior parietal cortex, medial and lateral premotor regions, and subcortical structures. Reliability analyses revealed robust activation for active movements with both fMRI models and all the statistical methods used. Imposed passive movements also elicited mainly robust brain activation for individual and group activation maps, and reliability was improved by including additional force and range of motion using model 2. These findings demonstrate that the use of robotic devices, such as MaRIA, can be useful to reliably assess arm movement related brain activation in longitudinal studies and may contribute in studies evaluating therapies and brain plasticity following injury in the nervous system.

  5. A Weibull statistics-based lignocellulose saccharification model and a built-in parameter accurately predict lignocellulose hydrolysis performance.

    PubMed

    Wang, Mingyu; Han, Lijuan; Liu, Shasha; Zhao, Xuebing; Yang, Jinghua; Loh, Soh Kheang; Sun, Xiaomin; Zhang, Chenxi; Fang, Xu

    2015-09-01

    Renewable energy from lignocellulosic biomass has been deemed an alternative to depleting fossil fuels. In order to improve this technology, we aim to develop robust mathematical models for the enzymatic lignocellulose degradation process. By analyzing 96 groups of previously published and newly obtained lignocellulose saccharification results and fitting them to Weibull distribution, we discovered Weibull statistics can accurately predict lignocellulose saccharification data, regardless of the type of substrates, enzymes and saccharification conditions. A mathematical model for enzymatic lignocellulose degradation was subsequently constructed based on Weibull statistics. Further analysis of the mathematical structure of the model and experimental saccharification data showed the significance of the two parameters in this model. In particular, the λ value, defined the characteristic time, represents the overall performance of the saccharification system. This suggestion was further supported by statistical analysis of experimental saccharification data and analysis of the glucose production levels when λ and n values change. In conclusion, the constructed Weibull statistics-based model can accurately predict lignocellulose hydrolysis behavior and we can use the λ parameter to assess the overall performance of enzymatic lignocellulose degradation. Advantages and potential applications of the model and the λ value in saccharification performance assessment were discussed. Copyright © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  6. Statistical Analysis of Big Data on Pharmacogenomics

    PubMed Central

    Fan, Jianqing; Liu, Han

    2013-01-01

    This paper discusses statistical methods for estimating complex correlation structure from large pharmacogenomic datasets. We selectively review several prominent statistical methods for estimating large covariance matrix for understanding correlation structure, inverse covariance matrix for network modeling, large-scale simultaneous tests for selecting significantly differently expressed genes and proteins and genetic markers for complex diseases, and high dimensional variable selection for identifying important molecules for understanding molecule mechanisms in pharmacogenomics. Their applications to gene network estimation and biomarker selection are used to illustrate the methodological power. Several new challenges of Big data analysis, including complex data distribution, missing data, measurement error, spurious correlation, endogeneity, and the need for robust statistical methods, are also discussed. PMID:23602905

  7. Suppression of long-branch attraction artefacts in the animal phylogeny using a site-heterogeneous model

    PubMed Central

    Lartillot, Nicolas; Brinkmann, Henner; Philippe, Hervé

    2007-01-01

    Background Thanks to the large amount of signal contained in genome-wide sequence alignments, phylogenomic analyses are converging towards highly supported trees. However, high statistical support does not imply that the tree is accurate. Systematic errors, such as the Long Branch Attraction (LBA) artefact, can be misleading, in particular when the taxon sampling is poor, or the outgroup is distant. In an otherwise consistent probabilistic framework, systematic errors in genome-wide analyses can be traced back to model mis-specification problems, which suggests that better models of sequence evolution should be devised, that would be more robust to tree reconstruction artefacts, even under the most challenging conditions. Methods We focus on a well characterized LBA artefact analyzed in a previous phylogenomic study of the metazoan tree, in which two fast-evolving animal phyla, nematodes and platyhelminths, emerge either at the base of all other Bilateria, or within protostomes, depending on the outgroup. We use this artefactual result as a case study for comparing the robustness of two alternative models: a standard, site-homogeneous model, based on an empirical matrix of amino-acid replacement (WAG), and a site-heterogeneous mixture model (CAT). In parallel, we propose a posterior predictive test, allowing one to measure how well a model acknowledges sequence saturation. Results Adopting a Bayesian framework, we show that the LBA artefact observed under WAG disappears when the site-heterogeneous model CAT is used. Using cross-validation, we further demonstrate that CAT has a better statistical fit than WAG on this data set. Finally, using our statistical goodness-of-fit test, we show that CAT, but not WAG, correctly accounts for the overall level of saturation, and that this is due to a better estimation of site-specific amino-acid preferences. Conclusion The CAT model appears to be more robust than WAG against LBA artefacts, essentially because it correctly anticipates the high probability of convergences and reversions implied by the small effective size of the amino-acid alphabet at each site of the alignment. More generally, our results provide strong evidence that site-specificities in the substitution process need be accounted for in order to obtain more reliable phylogenetic trees. PMID:17288577

  8. Statistical Analyses of Scatterplots to Identify Important Factors in Large-Scale Simulations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kleijnen, J.P.C.; Helton, J.C.

    1999-04-01

    The robustness of procedures for identifying patterns in scatterplots generated in Monte Carlo sensitivity analyses is investigated. These procedures are based on attempts to detect increasingly complex patterns in the scatterplots under consideration and involve the identification of (1) linear relationships with correlation coefficients, (2) monotonic relationships with rank correlation coefficients, (3) trends in central tendency as defined by means, medians and the Kruskal-Wallis statistic, (4) trends in variability as defined by variances and interquartile ranges, and (5) deviations from randomness as defined by the chi-square statistic. The following two topics related to the robustness of these procedures are consideredmore » for a sequence of example analyses with a large model for two-phase fluid flow: the presence of Type I and Type II errors, and the stability of results obtained with independent Latin hypercube samples. Observations from analysis include: (1) Type I errors are unavoidable, (2) Type II errors can occur when inappropriate analysis procedures are used, (3) physical explanations should always be sought for why statistical procedures identify variables as being important, and (4) the identification of important variables tends to be stable for independent Latin hypercube samples.« less

  9. Variation in reaction norms: Statistical considerations and biological interpretation.

    PubMed

    Morrissey, Michael B; Liefting, Maartje

    2016-09-01

    Analysis of reaction norms, the functions by which the phenotype produced by a given genotype depends on the environment, is critical to studying many aspects of phenotypic evolution. Different techniques are available for quantifying different aspects of reaction norm variation. We examine what biological inferences can be drawn from some of the more readily applicable analyses for studying reaction norms. We adopt a strongly biologically motivated view, but draw on statistical theory to highlight strengths and drawbacks of different techniques. In particular, consideration of some formal statistical theory leads to revision of some recently, and forcefully, advocated opinions on reaction norm analysis. We clarify what simple analysis of the slope between mean phenotype in two environments can tell us about reaction norms, explore the conditions under which polynomial regression can provide robust inferences about reaction norm shape, and explore how different existing approaches may be used to draw inferences about variation in reaction norm shape. We show how mixed model-based approaches can provide more robust inferences than more commonly used multistep statistical approaches, and derive new metrics of the relative importance of variation in reaction norm intercepts, slopes, and curvatures. © 2016 The Author(s). Evolution © 2016 The Society for the Study of Evolution.

  10. [Quantitative structure-gas chromatographic retention relationship of polycyclic aromatic sulfur heterocycles using molecular electronegativity-distance vector].

    PubMed

    Li, Zhenghua; Cheng, Fansheng; Xia, Zhining

    2011-01-01

    The chemical structures of 114 polycyclic aromatic sulfur heterocycles (PASHs) have been studied by molecular electronegativity-distance vector (MEDV). The linear relationships between gas chromatographic retention index and the MEDV have been established by a multiple linear regression (MLR) model. The results of variable selection by stepwise multiple regression (SMR) and the powerful predictive abilities of the optimization model appraised by leave-one-out cross-validation showed that the optimization model with the correlation coefficient (R) of 0.994 7 and the cross-validated correlation coefficient (Rcv) of 0.994 0 possessed the best statistical quality. Furthermore, when the 114 PASHs compounds were divided into calibration and test sets in the ratio of 2:1, the statistical analysis showed our models possesses almost equal statistical quality, the very similar regression coefficients and the good robustness. The quantitative structure-retention relationship (QSRR) model established may provide a convenient and powerful method for predicting the gas chromatographic retention of PASHs.

  11. Ascertainment-adjusted parameter estimation approach to improve robustness against misspecification of health monitoring methods

    NASA Astrophysics Data System (ADS)

    Juesas, P.; Ramasso, E.

    2016-12-01

    Condition monitoring aims at ensuring system safety which is a fundamental requirement for industrial applications and that has become an inescapable social demand. This objective is attained by instrumenting the system and developing data analytics methods such as statistical models able to turn data into relevant knowledge. One difficulty is to be able to correctly estimate the parameters of those methods based on time-series data. This paper suggests the use of the Weighted Distribution Theory together with the Expectation-Maximization algorithm to improve parameter estimation in statistical models with latent variables with an application to health monotonic under uncertainty. The improvement of estimates is made possible by incorporating uncertain and possibly noisy prior knowledge on latent variables in a sound manner. The latent variables are exploited to build a degradation model of dynamical system represented as a sequence of discrete states. Examples on Gaussian Mixture Models, Hidden Markov Models (HMM) with discrete and continuous outputs are presented on both simulated data and benchmarks using the turbofan engine datasets. A focus on the application of a discrete HMM to health monitoring under uncertainty allows to emphasize the interest of the proposed approach in presence of different operating conditions and fault modes. It is shown that the proposed model depicts high robustness in presence of noisy and uncertain prior.

  12. Hierarchical modeling and robust synthesis for the preliminary design of large scale complex systems

    NASA Astrophysics Data System (ADS)

    Koch, Patrick Nathan

    Large-scale complex systems are characterized by multiple interacting subsystems and the analysis of multiple disciplines. The design and development of such systems inevitably requires the resolution of multiple conflicting objectives. The size of complex systems, however, prohibits the development of comprehensive system models, and thus these systems must be partitioned into their constituent parts. Because simultaneous solution of individual subsystem models is often not manageable iteration is inevitable and often excessive. In this dissertation these issues are addressed through the development of a method for hierarchical robust preliminary design exploration to facilitate concurrent system and subsystem design exploration, for the concurrent generation of robust system and subsystem specifications for the preliminary design of multi-level, multi-objective, large-scale complex systems. This method is developed through the integration and expansion of current design techniques: (1) Hierarchical partitioning and modeling techniques for partitioning large-scale complex systems into more tractable parts, and allowing integration of subproblems for system synthesis, (2) Statistical experimentation and approximation techniques for increasing both the efficiency and the comprehensiveness of preliminary design exploration, and (3) Noise modeling techniques for implementing robust preliminary design when approximate models are employed. The method developed and associated approaches are illustrated through their application to the preliminary design of a commercial turbofan turbine propulsion system; the turbofan system-level problem is partitioned into engine cycle and configuration design and a compressor module is integrated for more detailed subsystem-level design exploration, improving system evaluation.

  13. Variational stereo imaging of oceanic waves with statistical constraints.

    PubMed

    Gallego, Guillermo; Yezzi, Anthony; Fedele, Francesco; Benetazzo, Alvise

    2013-11-01

    An image processing observational technique for the stereoscopic reconstruction of the waveform of oceanic sea states is developed. The technique incorporates the enforcement of any given statistical wave law modeling the quasi-Gaussianity of oceanic waves observed in nature. The problem is posed in a variational optimization framework, where the desired waveform is obtained as the minimizer of a cost functional that combines image observations, smoothness priors and a weak statistical constraint. The minimizer is obtained by combining gradient descent and multigrid methods on the necessary optimality equations of the cost functional. Robust photometric error criteria and a spatial intensity compensation model are also developed to improve the performance of the presented image matching strategy. The weak statistical constraint is thoroughly evaluated in combination with other elements presented to reconstruct and enforce constraints on experimental stereo data, demonstrating the improvement in the estimation of the observed ocean surface.

  14. kruX: matrix-based non-parametric eQTL discovery

    PubMed Central

    2014-01-01

    Background The Kruskal-Wallis test is a popular non-parametric statistical test for identifying expression quantitative trait loci (eQTLs) from genome-wide data due to its robustness against variations in the underlying genetic model and expression trait distribution, but testing billions of marker-trait combinations one-by-one can become computationally prohibitive. Results We developed kruX, an algorithm implemented in Matlab, Python and R that uses matrix multiplications to simultaneously calculate the Kruskal-Wallis test statistic for several millions of marker-trait combinations at once. KruX is more than ten thousand times faster than computing associations one-by-one on a typical human dataset. We used kruX and a dataset of more than 500k SNPs and 20k expression traits measured in 102 human blood samples to compare eQTLs detected by the Kruskal-Wallis test to eQTLs detected by the parametric ANOVA and linear model methods. We found that the Kruskal-Wallis test is more robust against data outliers and heterogeneous genotype group sizes and detects a higher proportion of non-linear associations, but is more conservative for calling additive linear associations. Conclusion kruX enables the use of robust non-parametric methods for massive eQTL mapping without the need for a high-performance computing infrastructure and is freely available from http://krux.googlecode.com. PMID:24423115

  15. 3D Face Modeling Using the Multi-Deformable Method

    PubMed Central

    Hwang, Jinkyu; Yu, Sunjin; Kim, Joongrock; Lee, Sangyoun

    2012-01-01

    In this paper, we focus on the problem of the accuracy performance of 3D face modeling techniques using corresponding features in multiple views, which is quite sensitive to feature extraction errors. To solve the problem, we adopt a statistical model-based 3D face modeling approach in a mirror system consisting of two mirrors and a camera. The overall procedure of our 3D facial modeling method has two primary steps: 3D facial shape estimation using a multiple 3D face deformable model and texture mapping using seamless cloning that is a type of gradient-domain blending. To evaluate our method's performance, we generate 3D faces of 30 individuals and then carry out two tests: accuracy test and robustness test. Our method shows not only highly accurate 3D face shape results when compared with the ground truth, but also robustness to feature extraction errors. Moreover, 3D face rendering results intuitively show that our method is more robust to feature extraction errors than other 3D face modeling methods. An additional contribution of our method is that a wide range of face textures can be acquired by the mirror system. By using this texture map, we generate realistic 3D face for individuals at the end of the paper. PMID:23201976

  16. Linear control of oscillator and amplifier flows*

    NASA Astrophysics Data System (ADS)

    Schmid, Peter J.; Sipp, Denis

    2016-08-01

    Linear control applied to fluid systems near an equilibrium point has important applications for many flows of industrial or fundamental interest. In this article we give an exposition of tools and approaches for the design of control strategies for globally stable or unstable flows. For unstable oscillator flows a feedback configuration and a model-based approach is proposed, while for stable noise-amplifier flows a feedforward setup and an approach based on system identification is advocated. Model reduction and robustness issues are addressed for the oscillator case; statistical learning techniques are emphasized for the amplifier case. Effective suppression of global and convective instabilities could be demonstrated for either case, even though the system-identification approach results in a superior robustness to off-design conditions.

  17. A Robust Approach to Risk Assessment Based on Species Sensitivity Distributions.

    PubMed

    Monti, Gianna S; Filzmoser, Peter; Deutsch, Roland C

    2018-05-03

    The guidelines for setting environmental quality standards are increasingly based on probabilistic risk assessment due to a growing general awareness of the need for probabilistic procedures. One of the commonly used tools in probabilistic risk assessment is the species sensitivity distribution (SSD), which represents the proportion of species affected belonging to a biological assemblage as a function of exposure to a specific toxicant. Our focus is on the inverse use of the SSD curve with the aim of estimating the concentration, HCp, of a toxic compound that is hazardous to p% of the biological community under study. Toward this end, we propose the use of robust statistical methods in order to take into account the presence of outliers or apparent skew in the data, which may occur without any ecological basis. A robust approach exploits the full neighborhood of a parametric model, enabling the analyst to account for the typical real-world deviations from ideal models. We examine two classic HCp estimation approaches and consider robust versions of these estimators. In addition, we also use data transformations in conjunction with robust estimation methods in case of heteroscedasticity. Different scenarios using real data sets as well as simulated data are presented in order to illustrate and compare the proposed approaches. These scenarios illustrate that the use of robust estimation methods enhances HCp estimation. © 2018 Society for Risk Analysis.

  18. A terrain-based site characterization map of California with implications for the contiguous United States

    USGS Publications Warehouse

    Yong, Alan K.; Hough, Susan E.; Iwahashi, Junko; Braverman, Amy

    2012-01-01

    We present an approach based on geomorphometry to predict material properties and characterize site conditions using the VS30 parameter (time‐averaged shear‐wave velocity to a depth of 30 m). Our framework consists of an automated terrain classification scheme based on taxonomic criteria (slope gradient, local convexity, and surface texture) that systematically identifies 16 terrain types from 1‐km spatial resolution (30 arcsec) Shuttle Radar Topography Mission digital elevation models (SRTM DEMs). Using 853 VS30 values from California, we apply a simulation‐based statistical method to determine the mean VS30 for each terrain type in California. We then compare the VS30 values with models based on individual proxies, such as mapped surface geology and topographic slope, and show that our systematic terrain‐based approach consistently performs better than semiempirical estimates based on individual proxies. To further evaluate our model, we apply our California‐based estimates to terrains of the contiguous United States. Comparisons of our estimates with 325 VS30 measurements outside of California, as well as estimates based on the topographic slope model, indicate our method to be statistically robust and more accurate. Our approach thus provides an objective and robust method for extending estimates of VS30 for regions where in situ measurements are sparse or not readily available.

  19. Simulating Aqueous-Phase Isoprene-Epoxydiol (IEPOX) Secondary Organic Aerosol Production During the 2013 Southern Oxidant and Aerosol Study (SOAS)

    EPA Science Inventory

    The lack of statistically robust relationships between IEPOX (isoprene epoxydiol)-derived SOA (IEPOX SOA) and aerosol liquid water and pH observed during the 2013 Southern Oxidant and Aerosol Study (SOAS) emphasizes the importance of modeling the whole system to understand the co...

  20. Robust Optimum Invariant Tests for Random MANOVA Models.

    DTIC Science & Technology

    1986-10-01

    are assumed to be independent normal with zero mean and dispersion o2 and o72 respectively, Roy and Gnanadesikan (1959) considered the prob- 2 2 lem of...Part II: The multivariate case. Ann. Math. Statist. 31, 939-968. [7] Roy, S.N. and Gnanadesikan , R. (1959). Some contributions to ANOVA in one or more

  1. Integrating hidden Markov model and PRAAT: a toolbox for robust automatic speech transcription

    NASA Astrophysics Data System (ADS)

    Kabir, A.; Barker, J.; Giurgiu, M.

    2010-09-01

    An automatic time-aligned phone transcription toolbox of English speech corpora has been developed. Especially the toolbox would be very useful to generate robust automatic transcription and able to produce phone level transcription using speaker independent models as well as speaker dependent models without manual intervention. The system is based on standard Hidden Markov Models (HMM) approach and it was successfully experimented over a large audiovisual speech corpus namely GRID corpus. One of the most powerful features of the toolbox is the increased flexibility in speech processing where the speech community would be able to import the automatic transcription generated by HMM Toolkit (HTK) into a popular transcription software, PRAAT, and vice-versa. The toolbox has been evaluated through statistical analysis on GRID data which shows that automatic transcription deviates by an average of 20 ms with respect to manual transcription.

  2. Dynamic modelling of n-of-1 data: powerful and flexible data analytics applied to individualised studies.

    PubMed

    Vieira, Rute; McDonald, Suzanne; Araújo-Soares, Vera; Sniehotta, Falko F; Henderson, Robin

    2017-09-01

    N-of-1 studies are based on repeated observations within an individual or unit over time and are acknowledged as an important research method for generating scientific evidence about the health or behaviour of an individual. Statistical analyses of n-of-1 data require accurate modelling of the outcome while accounting for its distribution, time-related trend and error structures (e.g., autocorrelation) as well as reporting readily usable contextualised effect sizes for decision-making. A number of statistical approaches have been documented but no consensus exists on which method is most appropriate for which type of n-of-1 design. We discuss the statistical considerations for analysing n-of-1 studies and briefly review some currently used methodologies. We describe dynamic regression modelling as a flexible and powerful approach, adaptable to different types of outcomes and capable of dealing with the different challenges inherent to n-of-1 statistical modelling. Dynamic modelling borrows ideas from longitudinal and event history methodologies which explicitly incorporate the role of time and the influence of past on future. We also present an illustrative example of the use of dynamic regression on monitoring physical activity during the retirement transition. Dynamic modelling has the potential to expand researchers' access to robust and user-friendly statistical methods for individualised studies.

  3. Conditionally Unbiased Bounded Influence Robust Regression with Applications to Generalized Linear Models.

    DTIC Science & Technology

    1987-03-01

    some general results and definitions from robust statistics (see Hampel et. al.. 1986). The influence function of an M-Lstimator is IC (y.x.0) = D_...estimator is (4,T T T (\\cond(YXGB) . Yx(x,O.B) ) . where X~xO.) =x T T8 xT-lx 1/2 = x x v(x . b/(x B x) ) - B. The influence function of this...the influence function for (3.7) and (3.8) is equal to -. Eo[\\Pcond(y.X.,B)] -1 cond(Yx. ,B). (4.3) On the other hand, the influence function for the

  4. Recovery after treatment and sensitivity to base rate.

    PubMed

    Doctor, J N

    1999-04-01

    Accurate classification of patients as having recovered after psychotherapy depends largely on the base rate of such recovery. This article presents methods for classifying participants as recovered after therapy. The approach described here considers base rate in the statistical model. These methods can be applied to psychotherapy outcome data for 2 purposes: (a) to determine the robustness of a data set to differing base-rate assumptions and (b) to formulate an appropriate cutoff that is beyond the range of cases that are not robust to plausible base-rate assumptions. Discussion addresses a fundamental premise underlying the study of recovery after psychotherapy.

  5. Robust variable selection method for nonparametric differential equation models with application to nonlinear dynamic gene regulatory network analysis.

    PubMed

    Lu, Tao

    2016-01-01

    The gene regulation network (GRN) evaluates the interactions between genes and look for models to describe the gene expression behavior. These models have many applications; for instance, by characterizing the gene expression mechanisms that cause certain disorders, it would be possible to target those genes to block the progress of the disease. Many biological processes are driven by nonlinear dynamic GRN. In this article, we propose a nonparametric differential equation (ODE) to model the nonlinear dynamic GRN. Specially, we address following questions simultaneously: (i) extract information from noisy time course gene expression data; (ii) model the nonlinear ODE through a nonparametric smoothing function; (iii) identify the important regulatory gene(s) through a group smoothly clipped absolute deviation (SCAD) approach; (iv) test the robustness of the model against possible shortening of experimental duration. We illustrate the usefulness of the model and associated statistical methods through a simulation and a real application examples.

  6. Assessing risk factors for dental caries: a statistical modeling approach.

    PubMed

    Trottini, Mario; Bossù, Maurizio; Corridore, Denise; Ierardo, Gaetano; Luzzi, Valeria; Saccucci, Matteo; Polimeni, Antonella

    2015-01-01

    The problem of identifying potential determinants and predictors of dental caries is of key importance in caries research and it has received considerable attention in the scientific literature. From the methodological side, a broad range of statistical models is currently available to analyze dental caries indices (DMFT, dmfs, etc.). These models have been applied in several studies to investigate the impact of different risk factors on the cumulative severity of dental caries experience. However, in most of the cases (i) these studies focus on a very specific subset of risk factors; and (ii) in the statistical modeling only few candidate models are considered and model selection is at best only marginally addressed. As a result, our understanding of the robustness of the statistical inferences with respect to the choice of the model is very limited; the richness of the set of statistical models available for analysis in only marginally exploited; and inferences could be biased due the omission of potentially important confounding variables in the model's specification. In this paper we argue that these limitations can be overcome considering a general class of candidate models and carefully exploring the model space using standard model selection criteria and measures of global fit and predictive performance of the candidate models. Strengths and limitations of the proposed approach are illustrated with a real data set. In our illustration the model space contains more than 2.6 million models, which require inferences to be adjusted for 'optimism'.

  7. Robust group-wise rigid registration of point sets using t-mixture model

    NASA Astrophysics Data System (ADS)

    Ravikumar, Nishant; Gooya, Ali; Frangi, Alejandro F.; Taylor, Zeike A.

    2016-03-01

    A probabilistic framework for robust, group-wise rigid alignment of point-sets using a mixture of Students t-distribution especially when the point sets are of varying lengths, are corrupted by an unknown degree of outliers or in the presence of missing data. Medical images (in particular magnetic resonance (MR) images), their segmentations and consequently point-sets generated from these are highly susceptible to corruption by outliers. This poses a problem for robust correspondence estimation and accurate alignment of shapes, necessary for training statistical shape models (SSMs). To address these issues, this study proposes to use a t-mixture model (TMM), to approximate the underlying joint probability density of a group of similar shapes and align them to a common reference frame. The heavy-tailed nature of t-distributions provides a more robust registration framework in comparison to state of the art algorithms. Significant reduction in alignment errors is achieved in the presence of outliers, using the proposed TMM-based group-wise rigid registration method, in comparison to its Gaussian mixture model (GMM) counterparts. The proposed TMM-framework is compared with a group-wise variant of the well-known Coherent Point Drift (CPD) algorithm and two other group-wise methods using GMMs, using both synthetic and real data sets. Rigid alignment errors for groups of shapes are quantified using the Hausdorff distance (HD) and quadratic surface distance (QSD) metrics.

  8. Analysis of multi-fragmentation reactions induced by relativistic heavy ions using the statistical multi-fragmentation model

    NASA Astrophysics Data System (ADS)

    Ogawa, T.; Sato, T.; Hashimoto, S.; Niita, K.

    2013-09-01

    The fragmentation cross-sections of relativistic energy nucleus-nucleus collisions were analyzed using the statistical multi-fragmentation model (SMM) incorporated with the Monte-Carlo radiation transport simulation code particle and heavy ion transport code system (PHITS). Comparison with the literature data showed that PHITS-SMM reproduces fragmentation cross-sections of heavy nuclei at relativistic energies better than the original PHITS by up to two orders of magnitude. It was also found that SMM does not degrade the neutron production cross-sections in heavy ion collisions or the fragmentation cross-sections of light nuclei, for which SMM has not been benchmarked. Therefore, SMM is a robust model that can supplement conventional nucleus-nucleus reaction models, enabling more accurate prediction of fragmentation cross-sections.

  9. Statistical Control Paradigm for Aerospace Structures Under Impulsive Disturbances

    DTIC Science & Technology

    2006-08-03

    attitude control system with an innovative and robust statistical controller design shows significant promise for use in attitude hold mode operation...indicate that the existing attitude control system with an innovative and robust statistical controller design shows significant promise for use in...and three thrusters are for use in controlling the attitude of the satellite. Then the angular momentum of the satellite with three thrusters and a

  10. SEGMENTING CT PROSTATE IMAGES USING POPULATION AND PATIENT-SPECIFIC STATISTICS FOR RADIOTHERAPY.

    PubMed

    Feng, Qianjin; Foskey, Mark; Tang, Songyuan; Chen, Wufan; Shen, Dinggang

    2009-08-07

    This paper presents a new deformable model using both population and patient-specific statistics to segment the prostate from CT images. There are two novelties in the proposed method. First, a modified scale invariant feature transform (SIFT) local descriptor, which is more distinctive than general intensity and gradient features, is used to characterize the image features. Second, an online training approach is used to build the shape statistics for accurately capturing intra-patient variation, which is more important than inter-patient variation for prostate segmentation in clinical radiotherapy. Experimental results show that the proposed method is robust and accurate, suitable for clinical application.

  11. SEGMENTING CT PROSTATE IMAGES USING POPULATION AND PATIENT-SPECIFIC STATISTICS FOR RADIOTHERAPY

    PubMed Central

    Feng, Qianjin; Foskey, Mark; Tang, Songyuan; Chen, Wufan; Shen, Dinggang

    2010-01-01

    This paper presents a new deformable model using both population and patient-specific statistics to segment the prostate from CT images. There are two novelties in the proposed method. First, a modified scale invariant feature transform (SIFT) local descriptor, which is more distinctive than general intensity and gradient features, is used to characterize the image features. Second, an online training approach is used to build the shape statistics for accurately capturing intra-patient variation, which is more important than inter-patient variation for prostate segmentation in clinical radiotherapy. Experimental results show that the proposed method is robust and accurate, suitable for clinical application. PMID:21197416

  12. Automatic identification of bacterial types using statistical imaging methods

    NASA Astrophysics Data System (ADS)

    Trattner, Sigal; Greenspan, Hayit; Tepper, Gapi; Abboud, Shimon

    2003-05-01

    The objective of the current study is to develop an automatic tool to identify bacterial types using computer-vision and statistical modeling techniques. Bacteriophage (phage)-typing methods are used to identify and extract representative profiles of bacterial types, such as the Staphylococcus Aureus. Current systems rely on the subjective reading of plaque profiles by human expert. This process is time-consuming and prone to errors, especially as technology is enabling the increase in the number of phages used for typing. The statistical methodology presented in this work, provides for an automated, objective and robust analysis of visual data, along with the ability to cope with increasing data volumes.

  13. Blind calibration of radio interferometric arrays using sparsity constraints and its implications for self-calibration

    NASA Astrophysics Data System (ADS)

    Chiarucci, Simone; Wijnholds, Stefan J.

    2018-02-01

    Blind calibration, i.e. calibration without a priori knowledge of the source model, is robust to the presence of unknown sources such as transient phenomena or (low-power) broad-band radio frequency interference that escaped detection. In this paper, we present a novel method for blind calibration of a radio interferometric array assuming that the observed field only contains a small number of discrete point sources. We show the huge computational advantage over previous blind calibration methods and we assess its statistical efficiency and robustness to noise and the quality of the initial estimate. We demonstrate the method on actual data from a Low-Frequency Array low-band antenna station showing that our blind calibration is able to recover the same gain solutions as the regular calibration approach, as expected from theory and simulations. We also discuss the implications of our findings for the robustness of regular self-calibration to poor starting models.

  14. Robust short-term memory without synaptic learning.

    PubMed

    Johnson, Samuel; Marro, J; Torres, Joaquín J

    2013-01-01

    Short-term memory in the brain cannot in general be explained the way long-term memory can--as a gradual modification of synaptic weights--since it takes place too quickly. Theories based on some form of cellular bistability, however, do not seem able to account for the fact that noisy neurons can collectively store information in a robust manner. We show how a sufficiently clustered network of simple model neurons can be instantly induced into metastable states capable of retaining information for a short time (a few seconds). The mechanism is robust to different network topologies and kinds of neural model. This could constitute a viable means available to the brain for sensory and/or short-term memory with no need of synaptic learning. Relevant phenomena described by neurobiology and psychology, such as local synchronization of synaptic inputs and power-law statistics of forgetting avalanches, emerge naturally from this mechanism, and we suggest possible experiments to test its viability in more biological settings.

  15. Epidemiological characteristics of reported sporadic and outbreak cases of E. coli O157 in people from Alberta, Canada (2000-2002): methodological challenges of comparing clustered to unclustered data.

    PubMed

    Pearl, D L; Louie, M; Chui, L; Doré, K; Grimsrud, K M; Martin, S W; Michel, P; Svenson, L W; McEwen, S A

    2008-04-01

    Using multivariable models, we compared whether there were significant differences between reported outbreak and sporadic cases in terms of their sex, age, and mode and site of disease transmission. We also determined the potential role of administrative, temporal, and spatial factors within these models. We compared a variety of approaches to account for clustering of cases in outbreaks including weighted logistic regression, random effects models, general estimating equations, robust variance estimates, and the random selection of one case from each outbreak. Age and mode of transmission were the only epidemiologically and statistically significant covariates in our final models using the above approaches. Weighing observations in a logistic regression model by the inverse of their outbreak size appeared to be a relatively robust and valid means for modelling these data. Some analytical techniques, designed to account for clustering, had difficulty converging or producing realistic measures of association.

  16. Attitude determination using an adaptive multiple model filtering Scheme

    NASA Technical Reports Server (NTRS)

    Lam, Quang; Ray, Surendra N.

    1995-01-01

    Attitude determination has been considered as a permanent topic of active research and perhaps remaining as a forever-lasting interest for spacecraft system designers. Its role is to provide a reference for controls such as pointing the directional antennas or solar panels, stabilizing the spacecraft or maneuvering the spacecraft to a new orbit. Least Square Estimation (LSE) technique was utilized to provide attitude determination for the Nimbus 6 and G. Despite its poor performance (estimation accuracy consideration), LSE was considered as an effective and practical approach to meet the urgent need and requirement back in the 70's. One reason for this poor performance associated with the LSE scheme is the lack of dynamic filtering or 'compensation'. In other words, the scheme is based totally on the measurements and no attempts were made to model the dynamic equations of motion of the spacecraft. We propose an adaptive filtering approach which employs a bank of Kalman filters to perform robust attitude estimation. The proposed approach, whose architecture is depicted, is essentially based on the latest proof on the interactive multiple model design framework to handle the unknown of the system noise characteristics or statistics. The concept fundamentally employs a bank of Kalman filter or submodel, instead of using fixed values for the system noise statistics for each submodel (per operating condition) as the traditional multiple model approach does, we use an on-line dynamic system noise identifier to 'identify' the system noise level (statistics) and update the filter noise statistics using 'live' information from the sensor model. The advanced noise identifier, whose architecture is also shown, is implemented using an advanced system identifier. To insure the robust performance for the proposed advanced system identifier, it is also further reinforced by a learning system which is implemented (in the outer loop) using neural networks to identify other unknown quantities such as spacecraft dynamics parameters, gyro biases, dynamic disturbances, or environment variations.

  17. Attitude determination using an adaptive multiple model filtering Scheme

    NASA Astrophysics Data System (ADS)

    Lam, Quang; Ray, Surendra N.

    1995-05-01

    Attitude determination has been considered as a permanent topic of active research and perhaps remaining as a forever-lasting interest for spacecraft system designers. Its role is to provide a reference for controls such as pointing the directional antennas or solar panels, stabilizing the spacecraft or maneuvering the spacecraft to a new orbit. Least Square Estimation (LSE) technique was utilized to provide attitude determination for the Nimbus 6 and G. Despite its poor performance (estimation accuracy consideration), LSE was considered as an effective and practical approach to meet the urgent need and requirement back in the 70's. One reason for this poor performance associated with the LSE scheme is the lack of dynamic filtering or 'compensation'. In other words, the scheme is based totally on the measurements and no attempts were made to model the dynamic equations of motion of the spacecraft. We propose an adaptive filtering approach which employs a bank of Kalman filters to perform robust attitude estimation. The proposed approach, whose architecture is depicted, is essentially based on the latest proof on the interactive multiple model design framework to handle the unknown of the system noise characteristics or statistics. The concept fundamentally employs a bank of Kalman filter or submodel, instead of using fixed values for the system noise statistics for each submodel (per operating condition) as the traditional multiple model approach does, we use an on-line dynamic system noise identifier to 'identify' the system noise level (statistics) and update the filter noise statistics using 'live' information from the sensor model. The advanced noise identifier, whose architecture is also shown, is implemented using an advanced system identifier. To insure the robust performance for the proposed advanced system identifier, it is also further reinforced by a learning system which is implemented (in the outer loop) using neural networks to identify other unknown quantities such as spacecraft dynamics parameters, gyro biases, dynamic disturbances, or environment variations.

  18. Robust statistical methods for impulse noise suppressing of spread spectrum induced polarization data, with application to a mine site, Gansu province, China

    NASA Astrophysics Data System (ADS)

    Liu, Weiqiang; Chen, Rujun; Cai, Hongzhu; Luo, Weibin

    2016-12-01

    In this paper, we investigated the robust processing of noisy spread spectrum induced polarization (SSIP) data. SSIP is a new frequency domain induced polarization method that transmits pseudo-random m-sequence as source current where m-sequence is a broadband signal. The potential information at multiple frequencies can be obtained through measurement. Removing the noise is a crucial problem for SSIP data processing. Considering that if the ordinary mean stack and digital filter are not capable of reducing the impulse noise effectively in SSIP data processing, the impact of impulse noise will remain in the complex resistivity spectrum that will affect the interpretation of profile anomalies. We implemented a robust statistical method to SSIP data processing. The robust least-squares regression is used to fit and remove the linear trend from the original data before stacking. The robust M estimate is used to stack the data of all periods. The robust smooth filter is used to suppress the residual noise for data after stacking. For robust statistical scheme, the most appropriate influence function and iterative algorithm are chosen by testing the simulated data to suppress the outliers' influence. We tested the benefits of the robust SSIP data processing using examples of SSIP data recorded in a test site beside a mine in Gansu province, China.

  19. Exploring complex dynamics in multi agent-based intelligent systems: Theoretical and experimental approaches using the Multi Agent-based Behavioral Economic Landscape (MABEL) model

    NASA Astrophysics Data System (ADS)

    Alexandridis, Konstantinos T.

    This dissertation adopts a holistic and detailed approach to modeling spatially explicit agent-based artificial intelligent systems, using the Multi Agent-based Behavioral Economic Landscape (MABEL) model. The research questions that addresses stem from the need to understand and analyze the real-world patterns and dynamics of land use change from a coupled human-environmental systems perspective. Describes the systemic, mathematical, statistical, socio-economic and spatial dynamics of the MABEL modeling framework, and provides a wide array of cross-disciplinary modeling applications within the research, decision-making and policy domains. Establishes the symbolic properties of the MABEL model as a Markov decision process, analyzes the decision-theoretic utility and optimization attributes of agents towards comprising statistically and spatially optimal policies and actions, and explores the probabilogic character of the agents' decision-making and inference mechanisms via the use of Bayesian belief and decision networks. Develops and describes a Monte Carlo methodology for experimental replications of agent's decisions regarding complex spatial parcel acquisition and learning. Recognizes the gap on spatially-explicit accuracy assessment techniques for complex spatial models, and proposes an ensemble of statistical tools designed to address this problem. Advanced information assessment techniques such as the Receiver-Operator Characteristic curve, the impurity entropy and Gini functions, and the Bayesian classification functions are proposed. The theoretical foundation for modular Bayesian inference in spatially-explicit multi-agent artificial intelligent systems, and the ensembles of cognitive and scenario assessment modular tools build for the MABEL model are provided. Emphasizes the modularity and robustness as valuable qualitative modeling attributes, and examines the role of robust intelligent modeling as a tool for improving policy-decisions related to land use change. Finally, the major contributions to the science are presented along with valuable directions for future research.

  20. Accurate landmarking of three-dimensional facial data in the presence of facial expressions and occlusions using a three-dimensional statistical facial feature model.

    PubMed

    Zhao, Xi; Dellandréa, Emmanuel; Chen, Liming; Kakadiaris, Ioannis A

    2011-10-01

    Three-dimensional face landmarking aims at automatically localizing facial landmarks and has a wide range of applications (e.g., face recognition, face tracking, and facial expression analysis). Existing methods assume neutral facial expressions and unoccluded faces. In this paper, we propose a general learning-based framework for reliable landmark localization on 3-D facial data under challenging conditions (i.e., facial expressions and occlusions). Our approach relies on a statistical model, called 3-D statistical facial feature model, which learns both the global variations in configurational relationships between landmarks and the local variations of texture and geometry around each landmark. Based on this model, we further propose an occlusion classifier and a fitting algorithm. Results from experiments on three publicly available 3-D face databases (FRGC, BU-3-DFE, and Bosphorus) demonstrate the effectiveness of our approach, in terms of landmarking accuracy and robustness, in the presence of expressions and occlusions.

  1. Developing appropriate methods for cost-effectiveness analysis of cluster randomized trials.

    PubMed

    Gomes, Manuel; Ng, Edmond S-W; Grieve, Richard; Nixon, Richard; Carpenter, James; Thompson, Simon G

    2012-01-01

    Cost-effectiveness analyses (CEAs) may use data from cluster randomized trials (CRTs), where the unit of randomization is the cluster, not the individual. However, most studies use analytical methods that ignore clustering. This article compares alternative statistical methods for accommodating clustering in CEAs of CRTs. Our simulation study compared the performance of statistical methods for CEAs of CRTs with 2 treatment arms. The study considered a method that ignored clustering--seemingly unrelated regression (SUR) without a robust standard error (SE)--and 4 methods that recognized clustering--SUR and generalized estimating equations (GEEs), both with robust SE, a "2-stage" nonparametric bootstrap (TSB) with shrinkage correction, and a multilevel model (MLM). The base case assumed CRTs with moderate numbers of balanced clusters (20 per arm) and normally distributed costs. Other scenarios included CRTs with few clusters, imbalanced cluster sizes, and skewed costs. Performance was reported as bias, root mean squared error (rMSE), and confidence interval (CI) coverage for estimating incremental net benefits (INBs). We also compared the methods in a case study. Each method reported low levels of bias. Without the robust SE, SUR gave poor CI coverage (base case: 0.89 v. nominal level: 0.95). The MLM and TSB performed well in each scenario (CI coverage, 0.92-0.95). With few clusters, the GEE and SUR (with robust SE) had coverage below 0.90. In the case study, the mean INBs were similar across all methods, but ignoring clustering underestimated statistical uncertainty and the value of further research. MLMs and the TSB are appropriate analytical methods for CEAs of CRTs with the characteristics described. SUR and GEE are not recommended for studies with few clusters.

  2. Developing Appropriate Methods for Cost-Effectiveness Analysis of Cluster Randomized Trials

    PubMed Central

    Gomes, Manuel; Ng, Edmond S.-W.; Nixon, Richard; Carpenter, James; Thompson, Simon G.

    2012-01-01

    Aim. Cost-effectiveness analyses (CEAs) may use data from cluster randomized trials (CRTs), where the unit of randomization is the cluster, not the individual. However, most studies use analytical methods that ignore clustering. This article compares alternative statistical methods for accommodating clustering in CEAs of CRTs. Methods. Our simulation study compared the performance of statistical methods for CEAs of CRTs with 2 treatment arms. The study considered a method that ignored clustering—seemingly unrelated regression (SUR) without a robust standard error (SE)—and 4 methods that recognized clustering—SUR and generalized estimating equations (GEEs), both with robust SE, a “2-stage” nonparametric bootstrap (TSB) with shrinkage correction, and a multilevel model (MLM). The base case assumed CRTs with moderate numbers of balanced clusters (20 per arm) and normally distributed costs. Other scenarios included CRTs with few clusters, imbalanced cluster sizes, and skewed costs. Performance was reported as bias, root mean squared error (rMSE), and confidence interval (CI) coverage for estimating incremental net benefits (INBs). We also compared the methods in a case study. Results. Each method reported low levels of bias. Without the robust SE, SUR gave poor CI coverage (base case: 0.89 v. nominal level: 0.95). The MLM and TSB performed well in each scenario (CI coverage, 0.92–0.95). With few clusters, the GEE and SUR (with robust SE) had coverage below 0.90. In the case study, the mean INBs were similar across all methods, but ignoring clustering underestimated statistical uncertainty and the value of further research. Conclusions. MLMs and the TSB are appropriate analytical methods for CEAs of CRTs with the characteristics described. SUR and GEE are not recommended for studies with few clusters. PMID:22016450

  3. A Review of Some Aspects of Robust Inference for Time Series.

    DTIC Science & Technology

    1984-09-01

    REVIEW OF SOME ASPECTSOF ROBUST INFERNCE FOR TIME SERIES by Ad . Dougla Main TE "iAL REPOW No. 63 Septermber 1984 Department of Statistics University of ...clear. One cannot hope to have a good method for dealing with outliers in time series by using only an instantaneous nonlinear transformation of the data...AI.49 716 A REVIEWd OF SOME ASPECTS OF ROBUST INFERENCE FOR TIME 1/1 SERIES(U) WASHINGTON UNIV SEATTLE DEPT OF STATISTICS R D MARTIN SEP 84 TR-53

  4. The comparison of robust partial least squares regression with robust principal component regression on a real

    NASA Astrophysics Data System (ADS)

    Polat, Esra; Gunay, Suleyman

    2013-10-01

    One of the problems encountered in Multiple Linear Regression (MLR) is multicollinearity, which causes the overestimation of the regression parameters and increase of the variance of these parameters. Hence, in case of multicollinearity presents, biased estimation procedures such as classical Principal Component Regression (CPCR) and Partial Least Squares Regression (PLSR) are then performed. SIMPLS algorithm is the leading PLSR algorithm because of its speed, efficiency and results are easier to interpret. However, both of the CPCR and SIMPLS yield very unreliable results when the data set contains outlying observations. Therefore, Hubert and Vanden Branden (2003) have been presented a robust PCR (RPCR) method and a robust PLSR (RPLSR) method called RSIMPLS. In RPCR, firstly, a robust Principal Component Analysis (PCA) method for high-dimensional data on the independent variables is applied, then, the dependent variables are regressed on the scores using a robust regression method. RSIMPLS has been constructed from a robust covariance matrix for high-dimensional data and robust linear regression. The purpose of this study is to show the usage of RPCR and RSIMPLS methods on an econometric data set, hence, making a comparison of two methods on an inflation model of Turkey. The considered methods have been compared in terms of predictive ability and goodness of fit by using a robust Root Mean Squared Error of Cross-validation (R-RMSECV), a robust R2 value and Robust Component Selection (RCS) statistic.

  5. Fusion And Inference From Multiple And Massive Disparate Distributed Dynamic Data Sets

    DTIC Science & Technology

    2017-07-01

    principled methodology for two-sample graph testing; designed a provably almost-surely perfect vertex clustering algorithm for block model graphs; proved...3.7 Semi-Supervised Clustering Methodology ...................................................................... 9 3.8 Robust Hypothesis Testing...dimensional Euclidean space – allows the full arsenal of statistical and machine learning methodology for multivariate Euclidean data to be deployed for

  6. How to Use Value-Added Analysis to Improve Student Learning: A Field Guide for School and District Leaders

    ERIC Educational Resources Information Center

    Kennedy, Kate; Peters, Mary; Thomas, Mike

    2012-01-01

    Value-added analysis is the most robust, statistically significant method available for helping educators quantify student progress over time. This powerful tool also reveals tangible strategies for improving instruction. Built around the work of Battelle for Kids, this book provides a field-tested continuous improvement model for using…

  7. Passing the Test: Ecological Regression Analysis in the Los Angeles County Case and Beyond.

    ERIC Educational Resources Information Center

    Lichtman, Allan J.

    1991-01-01

    Statistical analysis of racially polarized voting prepared for the Garza v County of Los Angeles (California) (1990) voting rights case is reviewed to demonstrate that ecological regression is a flexible, robust technique that illuminates the reality of ethnic voting, and superior to the neighborhood model supported by the defendants. (SLD)

  8. A Maximum Likelihood Approach to Functional Mapping of Longitudinal Binary Traits

    PubMed Central

    Wang, Chenguang; Li, Hongying; Wang, Zhong; Wang, Yaqun; Wang, Ningtao; Wang, Zuoheng; Wu, Rongling

    2013-01-01

    Despite their importance in biology and biomedicine, genetic mapping of binary traits that change over time has not been well explored. In this article, we develop a statistical model for mapping quantitative trait loci (QTLs) that govern longitudinal responses of binary traits. The model is constructed within the maximum likelihood framework by which the association between binary responses is modeled in terms of conditional log odds-ratios. With this parameterization, the maximum likelihood estimates (MLEs) of marginal mean parameters are robust to the misspecification of time dependence. We implement an iterative procedures to obtain the MLEs of QTL genotype-specific parameters that define longitudinal binary responses. The usefulness of the model was validated by analyzing a real example in rice. Simulation studies were performed to investigate the statistical properties of the model, showing that the model has power to identify and map specific QTLs responsible for the temporal pattern of binary traits. PMID:23183762

  9. Statistical shape (ASM) and appearance (AAM) models for the segmentation of the cerebellum in fetal ultrasound

    NASA Astrophysics Data System (ADS)

    Reyes López, Misael; Arámbula Cosío, Fernando

    2017-11-01

    The cerebellum is an important structure to determine the gestational age of the fetus, moreover most of the abnormalities it presents are related to growth disorders. In this work, we present the results of the segmentation of the fetal cerebellum applying statistical shape and appearance models. Both models were tested on ultrasound images of the fetal brain taken from 23 pregnant women, between 18 and 24 gestational weeks. The accuracy results obtained on 11 ultrasound images show a mean Hausdorff distance of 6.08 mm between the manual segmentation and the segmentation using active shape model, and a mean Hausdorff distance of 7.54 mm between the manual segmentation and the segmentation using active appearance model. The reported results demonstrate that the active shape model is more robust in the segmentation of the fetal cerebellum in ultrasound images.

  10. Linear models: permutation methods

    USGS Publications Warehouse

    Cade, B.S.; Everitt, B.S.; Howell, D.C.

    2005-01-01

    Permutation tests (see Permutation Based Inference) for the linear model have applications in behavioral studies when traditional parametric assumptions about the error term in a linear model are not tenable. Improved validity of Type I error rates can be achieved with properly constructed permutation tests. Perhaps more importantly, increased statistical power, improved robustness to effects of outliers, and detection of alternative distributional differences can be achieved by coupling permutation inference with alternative linear model estimators. For example, it is well-known that estimates of the mean in linear model are extremely sensitive to even a single outlying value of the dependent variable compared to estimates of the median [7, 19]. Traditionally, linear modeling focused on estimating changes in the center of distributions (means or medians). However, quantile regression allows distributional changes to be estimated in all or any selected part of a distribution or responses, providing a more complete statistical picture that has relevance to many biological questions [6]...

  11. Three-Dimensional Color Code Thresholds via Statistical-Mechanical Mapping

    NASA Astrophysics Data System (ADS)

    Kubica, Aleksander; Beverland, Michael E.; Brandão, Fernando; Preskill, John; Svore, Krysta M.

    2018-05-01

    Three-dimensional (3D) color codes have advantages for fault-tolerant quantum computing, such as protected quantum gates with relatively low overhead and robustness against imperfect measurement of error syndromes. Here we investigate the storage threshold error rates for bit-flip and phase-flip noise in the 3D color code (3DCC) on the body-centered cubic lattice, assuming perfect syndrome measurements. In particular, by exploiting a connection between error correction and statistical mechanics, we estimate the threshold for 1D stringlike and 2D sheetlike logical operators to be p3DCC (1 )≃1.9 % and p3DCC (2 )≃27.6 % . We obtain these results by using parallel tempering Monte Carlo simulations to study the disorder-temperature phase diagrams of two new 3D statistical-mechanical models: the four- and six-body random coupling Ising models.

  12. Cosmic shear measurements with Dark Energy Survey Science Verification data

    DOE PAGES

    Becker, M. R.

    2016-07-06

    Here, we present measurements of weak gravitational lensing cosmic shear two-point statistics using Dark Energy Survey Science Verification data. We demonstrate that our results are robust to the choice of shear measurement pipeline, either ngmix or im3shape, and robust to the choice of two-point statistic, including both real and Fourier-space statistics. Our results pass a suite of null tests including tests for B-mode contamination and direct tests for any dependence of the two-point functions on a set of 16 observing conditions and galaxy properties, such as seeing, airmass, galaxy color, galaxy magnitude, etc. We use a large suite of simulationsmore » to compute the covariance matrix of the cosmic shear measurements and assign statistical significance to our null tests. We find that our covariance matrix is consistent with the halo model prediction, indicating that it has the appropriate level of halo sample variance. We also compare the same jackknife procedure applied to the data and the simulations in order to search for additional sources of noise not captured by the simulations. We find no statistically significant extra sources of noise in the data. The overall detection significance with tomography for our highest source density catalog is 9.7σ. Cosmological constraints from the measurements in this work are presented in a companion paper.« less

  13. State estimation and prediction using clustered particle filters.

    PubMed

    Lee, Yoonsang; Majda, Andrew J

    2016-12-20

    Particle filtering is an essential tool to improve uncertain model predictions by incorporating noisy observational data from complex systems including non-Gaussian features. A class of particle filters, clustered particle filters, is introduced for high-dimensional nonlinear systems, which uses relatively few particles compared with the standard particle filter. The clustered particle filter captures non-Gaussian features of the true signal, which are typical in complex nonlinear dynamical systems such as geophysical systems. The method is also robust in the difficult regime of high-quality sparse and infrequent observations. The key features of the clustered particle filtering are coarse-grained localization through the clustering of the state variables and particle adjustment to stabilize the method; each observation affects only neighbor state variables through clustering and particles are adjusted to prevent particle collapse due to high-quality observations. The clustered particle filter is tested for the 40-dimensional Lorenz 96 model with several dynamical regimes including strongly non-Gaussian statistics. The clustered particle filter shows robust skill in both achieving accurate filter results and capturing non-Gaussian statistics of the true signal. It is further extended to multiscale data assimilation, which provides the large-scale estimation by combining a cheap reduced-order forecast model and mixed observations of the large- and small-scale variables. This approach enables the use of a larger number of particles due to the computational savings in the forecast model. The multiscale clustered particle filter is tested for one-dimensional dispersive wave turbulence using a forecast model with model errors.

  14. State estimation and prediction using clustered particle filters

    PubMed Central

    Lee, Yoonsang; Majda, Andrew J.

    2016-01-01

    Particle filtering is an essential tool to improve uncertain model predictions by incorporating noisy observational data from complex systems including non-Gaussian features. A class of particle filters, clustered particle filters, is introduced for high-dimensional nonlinear systems, which uses relatively few particles compared with the standard particle filter. The clustered particle filter captures non-Gaussian features of the true signal, which are typical in complex nonlinear dynamical systems such as geophysical systems. The method is also robust in the difficult regime of high-quality sparse and infrequent observations. The key features of the clustered particle filtering are coarse-grained localization through the clustering of the state variables and particle adjustment to stabilize the method; each observation affects only neighbor state variables through clustering and particles are adjusted to prevent particle collapse due to high-quality observations. The clustered particle filter is tested for the 40-dimensional Lorenz 96 model with several dynamical regimes including strongly non-Gaussian statistics. The clustered particle filter shows robust skill in both achieving accurate filter results and capturing non-Gaussian statistics of the true signal. It is further extended to multiscale data assimilation, which provides the large-scale estimation by combining a cheap reduced-order forecast model and mixed observations of the large- and small-scale variables. This approach enables the use of a larger number of particles due to the computational savings in the forecast model. The multiscale clustered particle filter is tested for one-dimensional dispersive wave turbulence using a forecast model with model errors. PMID:27930332

  15. Racing to learn: statistical inference and learning in a single spiking neuron with adaptive kernels

    PubMed Central

    Afshar, Saeed; George, Libin; Tapson, Jonathan; van Schaik, André; Hamilton, Tara J.

    2014-01-01

    This paper describes the Synapto-dendritic Kernel Adapting Neuron (SKAN), a simple spiking neuron model that performs statistical inference and unsupervised learning of spatiotemporal spike patterns. SKAN is the first proposed neuron model to investigate the effects of dynamic synapto-dendritic kernels and demonstrate their computational power even at the single neuron scale. The rule-set defining the neuron is simple: there are no complex mathematical operations such as normalization, exponentiation or even multiplication. The functionalities of SKAN emerge from the real-time interaction of simple additive and binary processes. Like a biological neuron, SKAN is robust to signal and parameter noise, and can utilize both in its operations. At the network scale neurons are locked in a race with each other with the fastest neuron to spike effectively “hiding” its learnt pattern from its neighbors. The robustness to noise, high speed, and simple building blocks not only make SKAN an interesting neuron model in computational neuroscience, but also make it ideal for implementation in digital and analog neuromorphic systems which is demonstrated through an implementation in a Field Programmable Gate Array (FPGA). Matlab, Python, and Verilog implementations of SKAN are available at: http://www.uws.edu.au/bioelectronics_neuroscience/bens/reproducible_research. PMID:25505378

  16. Racing to learn: statistical inference and learning in a single spiking neuron with adaptive kernels.

    PubMed

    Afshar, Saeed; George, Libin; Tapson, Jonathan; van Schaik, André; Hamilton, Tara J

    2014-01-01

    This paper describes the Synapto-dendritic Kernel Adapting Neuron (SKAN), a simple spiking neuron model that performs statistical inference and unsupervised learning of spatiotemporal spike patterns. SKAN is the first proposed neuron model to investigate the effects of dynamic synapto-dendritic kernels and demonstrate their computational power even at the single neuron scale. The rule-set defining the neuron is simple: there are no complex mathematical operations such as normalization, exponentiation or even multiplication. The functionalities of SKAN emerge from the real-time interaction of simple additive and binary processes. Like a biological neuron, SKAN is robust to signal and parameter noise, and can utilize both in its operations. At the network scale neurons are locked in a race with each other with the fastest neuron to spike effectively "hiding" its learnt pattern from its neighbors. The robustness to noise, high speed, and simple building blocks not only make SKAN an interesting neuron model in computational neuroscience, but also make it ideal for implementation in digital and analog neuromorphic systems which is demonstrated through an implementation in a Field Programmable Gate Array (FPGA). Matlab, Python, and Verilog implementations of SKAN are available at: http://www.uws.edu.au/bioelectronics_neuroscience/bens/reproducible_research.

  17. Optimization of an electromagnetic linear actuator using a network and a finite element model

    NASA Astrophysics Data System (ADS)

    Neubert, Holger; Kamusella, Alfred; Lienig, Jens

    2011-03-01

    Model based design optimization leads to robust solutions only if the statistical deviations of design, load and ambient parameters from nominal values are considered. We describe an optimization methodology that involves these deviations as stochastic variables for an exemplary electromagnetic actuator used to drive a Braille printer. A combined model simulates the dynamic behavior of the actuator and its non-linear load. It consists of a dynamic network model and a stationary magnetic finite element (FE) model. The network model utilizes lookup tables of the magnetic force and the flux linkage computed by the FE model. After a sensitivity analysis using design of experiment (DoE) methods and a nominal optimization based on gradient methods, a robust design optimization is performed. Selected design variables are involved in form of their density functions. In order to reduce the computational effort we use response surfaces instead of the combined system model obtained in all stochastic analysis steps. Thus, Monte-Carlo simulations can be applied. As a result we found an optimum system design meeting our requirements with regard to function and reliability.

  18. An asymptotic theory for cross-correlation between auto-correlated sequences and its application on neuroimaging data.

    PubMed

    Zhou, Yunyi; Tao, Chenyang; Lu, Wenlian; Feng, Jianfeng

    2018-04-20

    Functional connectivity is among the most important tools to study brain. The correlation coefficient, between time series of different brain areas, is the most popular method to quantify functional connectivity. Correlation coefficient in practical use assumes the data to be temporally independent. However, the time series data of brain can manifest significant temporal auto-correlation. A widely applicable method is proposed for correcting temporal auto-correlation. We considered two types of time series models: (1) auto-regressive-moving-average model, (2) nonlinear dynamical system model with noisy fluctuations, and derived their respective asymptotic distributions of correlation coefficient. These two types of models are most commonly used in neuroscience studies. We show the respective asymptotic distributions share a unified expression. We have verified the validity of our method, and shown our method exhibited sufficient statistical power for detecting true correlation on numerical experiments. Employing our method on real dataset yields more robust functional network and higher classification accuracy than conventional methods. Our method robustly controls the type I error while maintaining sufficient statistical power for detecting true correlation in numerical experiments, where existing methods measuring association (linear and nonlinear) fail. In this work, we proposed a widely applicable approach for correcting the effect of temporal auto-correlation on functional connectivity. Empirical results favor the use of our method in functional network analysis. Copyright © 2018. Published by Elsevier B.V.

  19. 3D statistical shape models incorporating 3D random forest regression voting for robust CT liver segmentation

    NASA Astrophysics Data System (ADS)

    Norajitra, Tobias; Meinzer, Hans-Peter; Maier-Hein, Klaus H.

    2015-03-01

    During image segmentation, 3D Statistical Shape Models (SSM) usually conduct a limited search for target landmarks within one-dimensional search profiles perpendicular to the model surface. In addition, landmark appearance is modeled only locally based on linear profiles and weak learners, altogether leading to segmentation errors from landmark ambiguities and limited search coverage. We present a new method for 3D SSM segmentation based on 3D Random Forest Regression Voting. For each surface landmark, a Random Regression Forest is trained that learns a 3D spatial displacement function between the according reference landmark and a set of surrounding sample points, based on an infinite set of non-local randomized 3D Haar-like features. Landmark search is then conducted omni-directionally within 3D search spaces, where voxelwise forest predictions on landmark position contribute to a common voting map which reflects the overall position estimate. Segmentation experiments were conducted on a set of 45 CT volumes of the human liver, of which 40 images were randomly chosen for training and 5 for testing. Without parameter optimization, using a simple candidate selection and a single resolution approach, excellent results were achieved, while faster convergence and better concavity segmentation were observed, altogether underlining the potential of our approach in terms of increased robustness from distinct landmark detection and from better search coverage.

  20. Bayesian truncation errors in chiral effective field theory: model checking and accounting for correlations

    NASA Astrophysics Data System (ADS)

    Melendez, Jordan; Wesolowski, Sarah; Furnstahl, Dick

    2017-09-01

    Chiral effective field theory (EFT) predictions are necessarily truncated at some order in the EFT expansion, which induces an error that must be quantified for robust statistical comparisons to experiment. A Bayesian model yields posterior probability distribution functions for these errors based on expectations of naturalness encoded in Bayesian priors and the observed order-by-order convergence pattern of the EFT. As a general example of a statistical approach to truncation errors, the model was applied to chiral EFT for neutron-proton scattering using various semi-local potentials of Epelbaum, Krebs, and Meißner (EKM). Here we discuss how our model can learn correlation information from the data and how to perform Bayesian model checking to validate that the EFT is working as advertised. Supported in part by NSF PHY-1614460 and DOE NUCLEI SciDAC DE-SC0008533.

  1. Dynamic causal modelling: a critical review of the biophysical and statistical foundations.

    PubMed

    Daunizeau, J; David, O; Stephan, K E

    2011-09-15

    The goal of dynamic causal modelling (DCM) of neuroimaging data is to study experimentally induced changes in functional integration among brain regions. This requires (i) biophysically plausible and physiologically interpretable models of neuronal network dynamics that can predict distributed brain responses to experimental stimuli and (ii) efficient statistical methods for parameter estimation and model comparison. These two key components of DCM have been the focus of more than thirty methodological articles since the seminal work of Friston and colleagues published in 2003. In this paper, we provide a critical review of the current state-of-the-art of DCM. We inspect the properties of DCM in relation to the most common neuroimaging modalities (fMRI and EEG/MEG) and the specificity of inference on neural systems that can be made from these data. We then discuss both the plausibility of the underlying biophysical models and the robustness of the statistical inversion techniques. Finally, we discuss potential extensions of the current DCM framework, such as stochastic DCMs, plastic DCMs and field DCMs. Copyright © 2009 Elsevier Inc. All rights reserved.

  2. QSAR models for anti-malarial activity of 4-aminoquinolines.

    PubMed

    Masand, Vijay H; Toropov, Andrey A; Toropova, Alla P; Mahajan, Devidas T

    2014-03-01

    In the present study, predictive quantitative structure - activity relationship (QSAR) models for anti-malarial activity of 4-aminoquinolines have been developed. CORAL, which is freely available on internet (http://www.insilico.eu/coral), has been used as a tool of QSAR analysis to establish statistically robust QSAR model of anti-malarial activity of 4-aminoquinolines. Six random splits into the visible sub-system of the training and invisible subsystem of validation were examined. Statistical qualities for these splits vary, but in all these cases, statistical quality of prediction for anti-malarial activity was quite good. The optimal SMILES-based descriptor was used to derive the single descriptor based QSAR model for a data set of 112 aminoquinolones. All the splits had r(2)> 0.85 and r(2)> 0.78 for subtraining and validation sets, respectively. The three parametric multilinear regression (MLR) QSAR model has Q(2) = 0.83, R(2) = 0.84 and F = 190.39. The anti-malarial activity has strong correlation with presence/absence of nitrogen and oxygen at a topological distance of six.

  3. The Adequacy of Different Robust Statistical Tests in Comparing Two Independent Groups

    ERIC Educational Resources Information Center

    Pero-Cebollero, Maribel; Guardia-Olmos, Joan

    2013-01-01

    In the current study, we evaluated various robust statistical methods for comparing two independent groups. Two scenarios for simulation were generated: one of equality and another of population mean differences. In each of the scenarios, 33 experimental conditions were used as a function of sample size, standard deviation and asymmetry. For each…

  4. Combining band recovery data and Pollock's robust design to model temporary and permanent emigration

    USGS Publications Warehouse

    Lindberg, M.S.; Kendall, W.L.; Hines, J.E.; Anderson, M.G.

    2001-01-01

    Capture-recapture models are widely used to estimate demographic parameters of marked populations. Recently, this statistical theory has been extended to modeling dispersal of open populations. Multistate models can be used to estimate movement probabilities among subdivided populations if multiple sites are sampled. Frequently, however, sampling is limited to a single site. Models described by Burnham (1993, in Marked Individuals in the Study of Bird Populations, 199-213), which combined open population capture-recapture and band-recovery models, can be used to estimate permanent emigration when sampling is limited to a single population. Similarly, Kendall, Nichols, and Hines (1997, Ecology 51, 563-578) developed models to estimate temporary emigration under Pollock's (1982, Journal of Wildlife Management 46, 757-760) robust design. We describe a likelihood-based approach to simultaneously estimate temporary and permanent emigration when sampling is limited to a single population. We use a sampling design that combines the robust design and recoveries of individuals obtained immediately following each sampling period. We present a general form for our model where temporary emigration is a first-order Markov process, and we discuss more restrictive models. We illustrate these models with analysis of data on marked Canvasback ducks. Our analysis indicates that probability of permanent emigration for adult female Canvasbacks was 0.193 (SE = 0.082) and that birds that were present at the study area in year i - 1 had a higher probability of presence in year i than birds that were not present in year i - 1.

  5. PET image reconstruction: a robust state space approach.

    PubMed

    Liu, Huafeng; Tian, Yi; Shi, Pengcheng

    2005-01-01

    Statistical iterative reconstruction algorithms have shown improved image quality over conventional nonstatistical methods in PET by using accurate system response models and measurement noise models. Strictly speaking, however, PET measurements, pre-corrected for accidental coincidences, are neither Poisson nor Gaussian distributed and thus do not meet basic assumptions of these algorithms. In addition, the difficulty in determining the proper system response model also greatly affects the quality of the reconstructed images. In this paper, we explore the usage of state space principles for the estimation of activity map in tomographic PET imaging. The proposed strategy formulates the organ activity distribution through tracer kinetics models, and the photon-counting measurements through observation equations, thus makes it possible to unify the dynamic reconstruction problem and static reconstruction problem into a general framework. Further, it coherently treats the uncertainties of the statistical model of the imaging system and the noisy nature of measurement data. Since H(infinity) filter seeks minimummaximum-error estimates without any assumptions on the system and data noise statistics, it is particular suited for PET image reconstruction where the statistical properties of measurement data and the system model are very complicated. The performance of the proposed framework is evaluated using Shepp-Logan simulated phantom data and real phantom data with favorable results.

  6. ACCELERATED FAILURE TIME MODELS PROVIDE A USEFUL STATISTICAL FRAMEWORK FOR AGING RESEARCH

    PubMed Central

    Swindell, William R.

    2009-01-01

    Survivorship experiments play a central role in aging research and are performed to evaluate whether interventions alter the rate of aging and increase lifespan. The accelerated failure time (AFT) model is seldom used to analyze survivorship data, but offers a potentially useful statistical approach that is based upon the survival curve rather than the hazard function. In this study, AFT models were used to analyze data from 16 survivorship experiments that evaluated the effects of one or more genetic manipulations on mouse lifespan. Most genetic manipulations were found to have a multiplicative effect on survivorship that is independent of age and well-characterized by the AFT model “deceleration factor”. AFT model deceleration factors also provided a more intuitive measure of treatment effect than the hazard ratio, and were robust to departures from modeling assumptions. Age-dependent treatment effects, when present, were investigated using quantile regression modeling. These results provide an informative and quantitative summary of survivorship data associated with currently known long-lived mouse models. In addition, from the standpoint of aging research, these statistical approaches have appealing properties and provide valuable tools for the analysis of survivorship data. PMID:19007875

  7. Accelerated failure time models provide a useful statistical framework for aging research.

    PubMed

    Swindell, William R

    2009-03-01

    Survivorship experiments play a central role in aging research and are performed to evaluate whether interventions alter the rate of aging and increase lifespan. The accelerated failure time (AFT) model is seldom used to analyze survivorship data, but offers a potentially useful statistical approach that is based upon the survival curve rather than the hazard function. In this study, AFT models were used to analyze data from 16 survivorship experiments that evaluated the effects of one or more genetic manipulations on mouse lifespan. Most genetic manipulations were found to have a multiplicative effect on survivorship that is independent of age and well-characterized by the AFT model "deceleration factor". AFT model deceleration factors also provided a more intuitive measure of treatment effect than the hazard ratio, and were robust to departures from modeling assumptions. Age-dependent treatment effects, when present, were investigated using quantile regression modeling. These results provide an informative and quantitative summary of survivorship data associated with currently known long-lived mouse models. In addition, from the standpoint of aging research, these statistical approaches have appealing properties and provide valuable tools for the analysis of survivorship data.

  8. Relationships between river discharge and abundance of age 0 redhorses (Moxostoma spp.) in the Oconee River, Georgia, USA, with implications for robust redhorse

    USGS Publications Warehouse

    Peterson, R.; Jennings, Cecil A.; Peterson, J.T.

    2013-01-01

    Robust redhorse (Moxostoma robustum) and notchlip redhorse (M. collapsum) are two species of redhorses that reside in the lower Oconee River, Georgia. Robust redhorse is listed as a state endangered species in Georgia and North Carolina, and attempts to investigate factors affecting its reproductive success have met with limited success. Therefore, catch of robust redhorse young were combined with catch of notchlip redhorse to increase sample size. These congeners with similar spawning repertoire were assumed to respond similarly to environmental conditions. River discharge during spawning and rearing seasons may affect abundance of both redhorses in the lower Oconee River. An information-theoretic approach was used to evaluate the relative support of models relating abundance of age 0 redhorses to monthly discharge statistics that represented magnitude, timing, duration, variability and frequency of river discharge events for April through June 1995–2006. The best-approximating model indicated a negative relationship between the abundance of redhorses and mean maximum river discharge and the number of high pulses during June as well as a positive relationship with intermediate duration of low flows during April–June. This model is 9.6 times more plausible than the next best-fitting model, which revealed a negative relationship between the abundance of redhorses and mean maximum river discharge during May and the number of high pulses during June as well as a positive relationship between abundance and intermediate duration of low flows during April–June. Management implications from the results indicate low-stable flows for at least a 2-week period during spawning and rearing may increase reproductive success of robust and notchlip redhorses.

  9. Anyonic braiding in optical lattices

    PubMed Central

    Zhang, Chuanwei; Scarola, V. W.; Tewari, Sumanta; Das Sarma, S.

    2007-01-01

    Topological quantum states of matter, both Abelian and non-Abelian, are characterized by excitations whose wavefunctions undergo nontrivial statistical transformations as one excitation is moved (braided) around another. Topological quantum computation proposes to use the topological protection and the braiding statistics of a non-Abelian topological state to perform quantum computation. The enormous technological prospect of topological quantum computation provides new motivation for experimentally observing a topological state. Here, we explicitly work out a realistic experimental scheme to create and braid the Abelian topological excitations in the Kitaev model built on a tunable robust system, a cold atom optical lattice. We also demonstrate how to detect the key feature of these excitations: their braiding statistics. Observation of this statistics would directly establish the existence of anyons, quantum particles that are neither fermions nor bosons. In addition to establishing topological matter, the experimental scheme we develop here can also be adapted to a non-Abelian topological state, supported by the same Kitaev model but in a different parameter regime, to eventually build topologically protected quantum gates. PMID:18000038

  10. Estimating the mean and standard deviation of environmental data with below detection limit observations: Considering highly skewed data and model misspecification.

    PubMed

    Shoari, Niloofar; Dubé, Jean-Sébastien; Chenouri, Shoja'eddin

    2015-11-01

    In environmental studies, concentration measurements frequently fall below detection limits of measuring instruments, resulting in left-censored data. Some studies employ parametric methods such as the maximum likelihood estimator (MLE), robust regression on order statistic (rROS), and gamma regression on order statistic (GROS), while others suggest a non-parametric approach, the Kaplan-Meier method (KM). Using examples of real data from a soil characterization study in Montreal, we highlight the need for additional investigations that aim at unifying the existing literature. A number of studies have examined this issue; however, those considering data skewness and model misspecification are rare. These aspects are investigated in this paper through simulations. Among other findings, results show that for low skewed data, the performance of different statistical methods is comparable, regardless of the censoring percentage and sample size. For highly skewed data, the performance of the MLE method under lognormal and Weibull distributions is questionable; particularly, when the sample size is small or censoring percentage is high. In such conditions, MLE under gamma distribution, rROS, GROS, and KM are less sensitive to skewness. Related to model misspecification, MLE based on lognormal and Weibull distributions provides poor estimates when the true distribution of data is misspecified. However, the methods of rROS, GROS, and MLE under gamma distribution are generally robust to model misspecifications regardless of skewness, sample size, and censoring percentage. Since the characteristics of environmental data (e.g., type of distribution and skewness) are unknown a priori, we suggest using MLE based on gamma distribution, rROS and GROS. Copyright © 2015 Elsevier Ltd. All rights reserved.

  11. The Impact of US SO2 Emissions on Clouds and the Hydrological Cycle at Global and Regional Scales in Three Coupled Chemistry-Climate Models

    NASA Astrophysics Data System (ADS)

    Westervelt, D. M.

    2016-12-01

    It is widely expected that global and regional emissions of atmospheric aerosols and their precursors will decrease strongly throughout the remainder of the 21st century, due to emission reduction policies enacted to protect human health. Although there is some evidence that these aerosol reductions may lead to significant regional and global climate impacts, we currently lack a full understanding of the magnitude, spatial and temporal pattern, and statistical significance of these influences, especially for clouds and precipitation. Further, we often lack robust understanding of the processes by which regional aerosols influence local and remote climate. Here, we aim to quantify systematically the cloud and hydrological cycle response to regional changes in aerosols through model simulations using three fully coupled chemistry-climate models: NOAA Geophysical Fluid Dynamics Laboratory Coupled Model 3 (GFDL-CM3), NCAR Community Earth System Model (NCAR-CESM1), and NASA Goddard Institute for Space Studies ModelE2 (GISS-E2). The central approach we use is to contrast a long control experiment (400 years) with a collection of long individual perturbation experiments ( 200 years). We perturb emissions of sulfur dioxide (SO2; precursor to sulfate aerosol) in the United States and determine which responses are significant relative to internal variability and robust across the three models. Initial results show robust, statistically significant decreases in cloud droplet number and liquid water path in the source region across the three models due to decreases in sulfate aerosols. Setting SO2 emissions to zero over the U.S. causes both local and remote impacts in precipitation, with notable significant increases in Sahel and Arctic precipitation. In 13 of the 15 regions we analyze, the precipitation response to zero U.S. SO2 emissions agrees in sign, with agreement in magnitude to within one standard deviation in many of those regions. U.S. sulfate also impacts the timing of the arrival of the Sahel rainy season. Our approach enables us to develop a basis for understanding the response of regional emissions of aerosols and their precursors, and will be expanded to other regions and aerosol species in future work.

  12. Robust DEA under discrete uncertain data: a case study of Iranian electricity distribution companies

    NASA Astrophysics Data System (ADS)

    Hafezalkotob, Ashkan; Haji-Sami, Elham; Omrani, Hashem

    2015-06-01

    Crisp input and output data are fundamentally indispensable in traditional data envelopment analysis (DEA). However, the real-world problems often deal with imprecise or ambiguous data. In this paper, we propose a novel robust data envelopment model (RDEA) to investigate the efficiencies of decision-making units (DMU) when there are discrete uncertain input and output data. The method is based upon the discrete robust optimization approaches proposed by Mulvey et al. (1995) that utilizes probable scenarios to capture the effect of ambiguous data in the case study. Our primary concern in this research is evaluating electricity distribution companies under uncertainty about input/output data. To illustrate the ability of proposed model, a numerical example of 38 Iranian electricity distribution companies is investigated. There are a large amount ambiguous data about these companies. Some electricity distribution companies may not report clear and real statistics to the government. Thus, it is needed to utilize a prominent approach to deal with this uncertainty. The results reveal that the RDEA model is suitable and reliable for target setting based on decision makers (DM's) preferences when there are uncertain input/output data.

  13. A new method for detecting signal regions in ordered sequences of real numbers, and application to viral genomic data.

    PubMed

    Gog, Julia R; Lever, Andrew M L; Skittrall, Jordan P

    2018-01-01

    We present a fast, robust and parsimonious approach to detecting signals in an ordered sequence of numbers. Our motivation is in seeking a suitable method to take a sequence of scores corresponding to properties of positions in virus genomes, and find outlying regions of low scores. Suitable statistical methods without using complex models or making many assumptions are surprisingly lacking. We resolve this by developing a method that detects regions of low score within sequences of real numbers. The method makes no assumptions a priori about the length of such a region; it gives the explicit location of the region and scores it statistically. It does not use detailed mechanistic models so the method is fast and will be useful in a wide range of applications. We present our approach in detail, and test it on simulated sequences. We show that it is robust to a wide range of signal morphologies, and that it is able to capture multiple signals in the same sequence. Finally we apply it to viral genomic data to identify regions of evolutionary conservation within influenza and rotavirus.

  14. Competitive repetition suppression (CoRe) clustering: a biologically inspired learning model with application to robust clustering.

    PubMed

    Bacciu, Davide; Starita, Antonina

    2008-11-01

    Determining a compact neural coding for a set of input stimuli is an issue that encompasses several biological memory mechanisms as well as various artificial neural network models. In particular, establishing the optimal network structure is still an open problem when dealing with unsupervised learning models. In this paper, we introduce a novel learning algorithm, named competitive repetition-suppression (CoRe) learning, inspired by a cortical memory mechanism called repetition suppression (RS). We show how such a mechanism is used, at various levels of the cerebral cortex, to generate compact neural representations of the visual stimuli. From the general CoRe learning model, we derive a clustering algorithm, named CoRe clustering, that can automatically estimate the unknown cluster number from the data without using a priori information concerning the input distribution. We illustrate how CoRe clustering, besides its biological plausibility, posses strong theoretical properties in terms of robustness to noise and outliers, and we provide an error function describing CoRe learning dynamics. Such a description is used to analyze CoRe relationships with the state-of-the art clustering models and to highlight CoRe similitude with rival penalized competitive learning (RPCL), showing how CoRe extends such a model by strengthening the rival penalization estimation by means of loss functions from robust statistics.

  15. Statistical Method to Overcome Overfitting Issue in Rational Function Models

    NASA Astrophysics Data System (ADS)

    Alizadeh Moghaddam, S. H.; Mokhtarzade, M.; Alizadeh Naeini, A.; Alizadeh Moghaddam, S. A.

    2017-09-01

    Rational function models (RFMs) are known as one of the most appealing models which are extensively applied in geometric correction of satellite images and map production. Overfitting is a common issue, in the case of terrain dependent RFMs, that degrades the accuracy of RFMs-derived geospatial products. This issue, resulting from the high number of RFMs' parameters, leads to ill-posedness of the RFMs. To tackle this problem, in this study, a fast and robust statistical approach is proposed and compared to Tikhonov regularization (TR) method, as a frequently-used solution to RFMs' overfitting. In the proposed method, a statistical test, namely, significance test is applied to search for the RFMs' parameters that are resistant against overfitting issue. The performance of the proposed method was evaluated for two real data sets of Cartosat-1 satellite images. The obtained results demonstrate the efficiency of the proposed method in term of the achievable level of accuracy. This technique, indeed, shows an improvement of 50-80% over the TR.

  16. Pointwise probability reinforcements for robust statistical inference.

    PubMed

    Frénay, Benoît; Verleysen, Michel

    2014-02-01

    Statistical inference using machine learning techniques may be difficult with small datasets because of abnormally frequent data (AFDs). AFDs are observations that are much more frequent in the training sample that they should be, with respect to their theoretical probability, and include e.g. outliers. Estimates of parameters tend to be biased towards models which support such data. This paper proposes to introduce pointwise probability reinforcements (PPRs): the probability of each observation is reinforced by a PPR and a regularisation allows controlling the amount of reinforcement which compensates for AFDs. The proposed solution is very generic, since it can be used to robustify any statistical inference method which can be formulated as a likelihood maximisation. Experiments show that PPRs can be easily used to tackle regression, classification and projection: models are freed from the influence of outliers. Moreover, outliers can be filtered manually since an abnormality degree is obtained for each observation. Copyright © 2013 Elsevier Ltd. All rights reserved.

  17. Three-Dimensional Color Code Thresholds via Statistical-Mechanical Mapping.

    PubMed

    Kubica, Aleksander; Beverland, Michael E; Brandão, Fernando; Preskill, John; Svore, Krysta M

    2018-05-04

    Three-dimensional (3D) color codes have advantages for fault-tolerant quantum computing, such as protected quantum gates with relatively low overhead and robustness against imperfect measurement of error syndromes. Here we investigate the storage threshold error rates for bit-flip and phase-flip noise in the 3D color code (3DCC) on the body-centered cubic lattice, assuming perfect syndrome measurements. In particular, by exploiting a connection between error correction and statistical mechanics, we estimate the threshold for 1D stringlike and 2D sheetlike logical operators to be p_{3DCC}^{(1)}≃1.9% and p_{3DCC}^{(2)}≃27.6%. We obtain these results by using parallel tempering Monte Carlo simulations to study the disorder-temperature phase diagrams of two new 3D statistical-mechanical models: the four- and six-body random coupling Ising models.

  18. Nowcasting and Forecasting the Monthly Food Stamps Data in the US Using Online Search Data

    PubMed Central

    Fantazzini, Dean

    2014-01-01

    We propose the use of Google online search data for nowcasting and forecasting the number of food stamps recipients. We perform a large out-of-sample forecasting exercise with almost 3000 competing models with forecast horizons up to 2 years ahead, and we show that models including Google search data statistically outperform the competing models at all considered horizons. These results hold also with several robustness checks, considering alternative keywords, a falsification test, different out-of-samples, directional accuracy and forecasts at the state-level. PMID:25369315

  19. Identification of robust statistical downscaling methods based on a comprehensive suite of performance metrics for South Korea

    NASA Astrophysics Data System (ADS)

    Eum, H. I.; Cannon, A. J.

    2015-12-01

    Climate models are a key provider to investigate impacts of projected future climate conditions on regional hydrologic systems. However, there is a considerable mismatch of spatial resolution between GCMs and regional applications, in particular a region characterized by complex terrain such as Korean peninsula. Therefore, a downscaling procedure is an essential to assess regional impacts of climate change. Numerous statistical downscaling methods have been used mainly due to the computational efficiency and simplicity. In this study, four statistical downscaling methods [Bias-Correction/Spatial Disaggregation (BCSD), Bias-Correction/Constructed Analogue (BCCA), Multivariate Adaptive Constructed Analogs (MACA), and Bias-Correction/Climate Imprint (BCCI)] are applied to downscale the latest Climate Forecast System Reanalysis data to stations for precipitation, maximum temperature, and minimum temperature over South Korea. By split sampling scheme, all methods are calibrated with observational station data for 19 years from 1973 to 1991 are and tested for the recent 19 years from 1992 to 2010. To assess skill of the downscaling methods, we construct a comprehensive suite of performance metrics that measure an ability of reproducing temporal correlation, distribution, spatial correlation, and extreme events. In addition, we employ Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) to identify robust statistical downscaling methods based on the performance metrics for each season. The results show that downscaling skill is considerably affected by the skill of CFSR and all methods lead to large improvements in representing all performance metrics. According to seasonal performance metrics evaluated, when TOPSIS is applied, MACA is identified as the most reliable and robust method for all variables and seasons. Note that such result is derived from CFSR output which is recognized as near perfect climate data in climate studies. Therefore, the ranking of this study may be changed when various GCMs are downscaled and evaluated. Nevertheless, it may be informative for end-users (i.e. modelers or water resources managers) to understand and select more suitable downscaling methods corresponding to priorities on regional applications.

  20. Inverse Optimization: A New Perspective on the Black-Litterman Model.

    PubMed

    Bertsimas, Dimitris; Gupta, Vishal; Paschalidis, Ioannis Ch

    2012-12-11

    The Black-Litterman (BL) model is a widely used asset allocation model in the financial industry. In this paper, we provide a new perspective. The key insight is to replace the statistical framework in the original approach with ideas from inverse optimization. This insight allows us to significantly expand the scope and applicability of the BL model. We provide a richer formulation that, unlike the original model, is flexible enough to incorporate investor information on volatility and market dynamics. Equally importantly, our approach allows us to move beyond the traditional mean-variance paradigm of the original model and construct "BL"-type estimators for more general notions of risk such as coherent risk measures. Computationally, we introduce and study two new "BL"-type estimators and their corresponding portfolios: a Mean Variance Inverse Optimization (MV-IO) portfolio and a Robust Mean Variance Inverse Optimization (RMV-IO) portfolio. These two approaches are motivated by ideas from arbitrage pricing theory and volatility uncertainty. Using numerical simulation and historical backtesting, we show that both methods often demonstrate a better risk-reward tradeoff than their BL counterparts and are more robust to incorrect investor views.

  1. How robust are the estimated effects of air pollution on health? Accounting for model uncertainty using Bayesian model averaging.

    PubMed

    Pannullo, Francesca; Lee, Duncan; Waclawski, Eugene; Leyland, Alastair H

    2016-08-01

    The long-term impact of air pollution on human health can be estimated from small-area ecological studies in which the health outcome is regressed against air pollution concentrations and other covariates, such as socio-economic deprivation. Socio-economic deprivation is multi-factorial and difficult to measure, and includes aspects of income, education, and housing as well as others. However, these variables are potentially highly correlated, meaning one can either create an overall deprivation index, or use the individual characteristics, which can result in a variety of pollution-health effects. Other aspects of model choice may affect the pollution-health estimate, such as the estimation of pollution, and spatial autocorrelation model. Therefore, we propose a Bayesian model averaging approach to combine the results from multiple statistical models to produce a more robust representation of the overall pollution-health effect. We investigate the relationship between nitrogen dioxide concentrations and cardio-respiratory mortality in West Central Scotland between 2006 and 2012. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.

  2. Analysis of the dependence of extreme rainfalls

    NASA Astrophysics Data System (ADS)

    Padoan, Simone; Ancey, Christophe; Parlange, Marc

    2010-05-01

    The aim of spatial analysis is to quantitatively describe the behavior of environmental phenomena such as precipitation levels, wind speed or daily temperatures. A number of generic approaches to spatial modeling have been developed[1], but these are not necessarily ideal for handling extremal aspects given their focus on mean process levels. The areal modelling of the extremes of a natural process observed at points in space is important in environmental statistics; for example, understanding extremal spatial rainfall is crucial in flood protection. In light of recent concerns over climate change, the use of robust mathematical and statistical methods for such analyses has grown in importance. Multivariate extreme value models and the class of maxstable processes [2] have a similar asymptotic motivation to the univariate Generalized Extreme Value (GEV) distribution , but providing a general approach to modeling extreme processes incorporating temporal or spatial dependence. Statistical methods for max-stable processes and data analyses of practical problems are discussed by [3] and [4]. This work illustrates methods to the statistical modelling of spatial extremes and gives examples of their use by means of a real extremal data analysis of Switzerland precipitation levels. [1] Cressie, N. A. C. (1993). Statistics for Spatial Data. Wiley, New York. [2] de Haan, L and Ferreria A. (2006). Extreme Value Theory An Introduction. Springer, USA. [3] Padoan, S. A., Ribatet, M and Sisson, S. A. (2009). Likelihood-Based Inference for Max-Stable Processes. Journal of the American Statistical Association, Theory & Methods. In press. [4] Davison, A. C. and Gholamrezaee, M. (2009), Geostatistics of extremes. Journal of the Royal Statistical Society, Series B. To appear.

  3. Joint Seasonal ARMA Approach for Modeling of Load Forecast Errors in Planning Studies

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hafen, Ryan P.; Samaan, Nader A.; Makarov, Yuri V.

    2014-04-14

    To make informed and robust decisions in the probabilistic power system operation and planning process, it is critical to conduct multiple simulations of the generated combinations of wind and load parameters and their forecast errors to handle the variability and uncertainty of these time series. In order for the simulation results to be trustworthy, the simulated series must preserve the salient statistical characteristics of the real series. In this paper, we analyze day-ahead load forecast error data from multiple balancing authority locations and characterize statistical properties such as mean, standard deviation, autocorrelation, correlation between series, time-of-day bias, and time-of-day autocorrelation.more » We then construct and validate a seasonal autoregressive moving average (ARMA) model to model these characteristics, and use the model to jointly simulate day-ahead load forecast error series for all BAs.« less

  4. SU-F-T-187: Quantifying Normal Tissue Sparing with 4D Robust Optimization of Intensity Modulated Proton Therapy

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Newpower, M; Ge, S; Mohan, R

    Purpose: To report an approach to quantify the normal tissue sparing for 4D robustly-optimized versus PTV-optimized IMPT plans. Methods: We generated two sets of 90 DVHs from a patient’s 10-phase 4D CT set; one by conventional PTV-based optimization done in the Eclipse treatment planning system, and the other by an in-house robust optimization algorithm. The 90 DVHs were created for the following scenarios in each of the ten phases of the 4DCT: ± 5mm shift along x, y, z; ± 3.5% range uncertainty and a nominal scenario. A Matlab function written by Gay and Niemierko was modified to calculate EUDmore » for each DVH for the following structures: esophagus, heart, ipsilateral lung and spinal cord. An F-test determined whether or not the variances of each structure’s DVHs were statistically different. Then a t-test determined if the average EUDs for each optimization algorithm were statistically significantly different. Results: T-test results showed each structure had a statistically significant difference in average EUD when comparing robust optimization versus PTV-based optimization. Under robust optimization all structures except the spinal cord received lower EUDs than PTV-based optimization. Using robust optimization the average EUDs decreased 1.45% for the esophagus, 1.54% for the heart and 5.45% for the ipsilateral lung. The average EUD to the spinal cord increased 24.86% but was still well below tolerance. Conclusion: This work has helped quantify a qualitative relationship noted earlier in our work: that robust optimization leads to plans with greater normal tissue sparing compared to PTV-based optimization. Except in the case of the spinal cord all structures received a lower EUD under robust optimization and these results are statistically significant. While the average EUD to the spinal cord increased to 25.06 Gy under robust optimization it is still well under the TD50 value of 66.5 Gy from Emami et al. Supported in part by the NCI U19 CA021239.« less

  5. Statistics-based model for prediction of chemical biosynthesis yield from Saccharomyces cerevisiae

    PubMed Central

    2011-01-01

    Background The robustness of Saccharomyces cerevisiae in facilitating industrial-scale production of ethanol extends its utilization as a platform to synthesize other metabolites. Metabolic engineering strategies, typically via pathway overexpression and deletion, continue to play a key role for optimizing the conversion efficiency of substrates into the desired products. However, chemical production titer or yield remains difficult to predict based on reaction stoichiometry and mass balance. We sampled a large space of data of chemical production from S. cerevisiae, and developed a statistics-based model to calculate production yield using input variables that represent the number of enzymatic steps in the key biosynthetic pathway of interest, metabolic modifications, cultivation modes, nutrition and oxygen availability. Results Based on the production data of about 40 chemicals produced from S. cerevisiae, metabolic engineering methods, nutrient supplementation, and fermentation conditions described therein, we generated mathematical models with numerical and categorical variables to predict production yield. Statistically, the models showed that: 1. Chemical production from central metabolic precursors decreased exponentially with increasing number of enzymatic steps for biosynthesis (>30% loss of yield per enzymatic step, P-value = 0); 2. Categorical variables of gene overexpression and knockout improved product yield by 2~4 folds (P-value < 0.1); 3. Addition of notable amount of intermediate precursors or nutrients improved product yield by over five folds (P-value < 0.05); 4. Performing the cultivation in a well-controlled bioreactor enhanced the yield of product by three folds (P-value < 0.05); 5. Contribution of oxygen to product yield was not statistically significant. Yield calculations for various chemicals using the linear model were in fairly good agreement with the experimental values. The model generally underestimated the ethanol production as compared to other chemicals, which supported the notion that the metabolism of Saccharomyces cerevisiae has historically evolved for robust alcohol fermentation. Conclusions We generated simple mathematical models for first-order approximation of chemical production yield from S. cerevisiae. These linear models provide empirical insights to the effects of strain engineering and cultivation conditions toward biosynthetic efficiency. These models may not only provide guidelines for metabolic engineers to synthesize desired products, but also be useful to compare the biosynthesis performance among different research papers. PMID:21689458

  6. Impact analysis of two kinds of failure strategies in Beijing road transportation network

    NASA Astrophysics Data System (ADS)

    Zhang, Zundong; Xu, Xiaoyang; Zhang, Zhaoran; Zhou, Huijuan

    The Beijing road transportation network (BRTN), as a large-scale technological network, exhibits very complex and complicate features during daily periods. And it has been widely highlighted that how statistical characteristics (i.e. average path length and global network efficiency) change while the network evolves. In this paper, by using different modeling concepts, three kinds of network models of BRTN namely the abstract network model, the static network model with road mileage as weights and the dynamic network model with travel time as weights — are constructed, respectively, according to the topological data and the real detected flow data. The degree distribution of the three kinds of network models are analyzed, which proves that the urban road infrastructure network and the dynamic network behavior like scale-free networks. By analyzing and comparing the important statistical characteristics of three models under random attacks and intentional attacks, it shows that the urban road infrastructure network and the dynamic network of BRTN are both robust and vulnerable.

  7. Systems Engineering Metrics: Organizational Complexity and Product Quality Modeling

    NASA Technical Reports Server (NTRS)

    Mog, Robert A.

    1997-01-01

    Innovative organizational complexity and product quality models applicable to performance metrics for NASA-MSFC's Systems Analysis and Integration Laboratory (SAIL) missions and objectives are presented. An intensive research effort focuses on the synergistic combination of stochastic process modeling, nodal and spatial decomposition techniques, organizational and computational complexity, systems science and metrics, chaos, and proprietary statistical tools for accelerated risk assessment. This is followed by the development of a preliminary model, which is uniquely applicable and robust for quantitative purposes. Exercise of the preliminary model using a generic system hierarchy and the AXAF-I architectural hierarchy is provided. The Kendall test for positive dependence provides an initial verification and validation of the model. Finally, the research and development of the innovation is revisited, prior to peer review. This research and development effort results in near-term, measurable SAIL organizational and product quality methodologies, enhanced organizational risk assessment and evolutionary modeling results, and 91 improved statistical quantification of SAIL productivity interests.

  8. [Application of Stata software to test heterogeneity in meta-analysis method].

    PubMed

    Wang, Dan; Mou, Zhen-yun; Zhai, Jun-xia; Zong, Hong-xia; Zhao, Xiao-dong

    2008-07-01

    To introduce the application of Stata software to heterogeneity test in meta-analysis. A data set was set up according to the example in the study, and the corresponding commands of the methods in Stata 9 software were applied to test the example. The methods used were Q-test and I2 statistic attached to the fixed effect model forest plot, H statistic and Galbraith plot. The existence of the heterogeneity among studies could be detected by Q-test and H statistic and the degree of the heterogeneity could be detected by I2 statistic. The outliers which were the sources of the heterogeneity could be spotted from the Galbraith plot. Heterogeneity test in meta-analysis can be completed by the four methods in Stata software simply and quickly. H and I2 statistics are more robust, and the outliers of the heterogeneity can be clearly seen in the Galbraith plot among the four methods.

  9. Robust Dynamic Multi-objective Vehicle Routing Optimization Method.

    PubMed

    Guo, Yi-Nan; Cheng, Jian; Luo, Sha; Gong, Dun-Wei

    2017-03-21

    For dynamic multi-objective vehicle routing problems, the waiting time of vehicle, the number of serving vehicles, the total distance of routes were normally considered as the optimization objectives. Except for above objectives, fuel consumption that leads to the environmental pollution and energy consumption was focused on in this paper. Considering the vehicles' load and the driving distance, corresponding carbon emission model was built and set as an optimization objective. Dynamic multi-objective vehicle routing problems with hard time windows and randomly appeared dynamic customers, subsequently, were modeled. In existing planning methods, when the new service demand came up, global vehicle routing optimization method was triggered to find the optimal routes for non-served customers, which was time-consuming. Therefore, robust dynamic multi-objective vehicle routing method with two-phase is proposed. Three highlights of the novel method are: (i) After finding optimal robust virtual routes for all customers by adopting multi-objective particle swarm optimization in the first phase, static vehicle routes for static customers are formed by removing all dynamic customers from robust virtual routes in next phase. (ii)The dynamically appeared customers append to be served according to their service time and the vehicles' statues. Global vehicle routing optimization is triggered only when no suitable locations can be found for dynamic customers. (iii)A metric measuring the algorithms' robustness is given. The statistical results indicated that the routes obtained by the proposed method have better stability and robustness, but may be sub-optimum. Moreover, time-consuming global vehicle routing optimization is avoided as dynamic customers appear.

  10. A robust dataset-agnostic heart disease classifier from Phonocardiogram.

    PubMed

    Banerjee, Rohan; Dutta Choudhury, Anirban; Deshpande, Parijat; Bhattacharya, Sakyajit; Pal, Arpan; Mandana, K M

    2017-07-01

    Automatic classification of normal and abnormal heart sounds is a popular area of research. However, building a robust algorithm unaffected by signal quality and patient demography is a challenge. In this paper we have analysed a wide list of Phonocardiogram (PCG) features in time and frequency domain along with morphological and statistical features to construct a robust and discriminative feature set for dataset-agnostic classification of normal and cardiac patients. The large and open access database, made available in Physionet 2016 challenge was used for feature selection, internal validation and creation of training models. A second dataset of 41 PCG segments, collected using our in-house smart phone based digital stethoscope from an Indian hospital was used for performance evaluation. Our proposed methodology yielded sensitivity and specificity scores of 0.76 and 0.75 respectively on the test dataset in classifying cardiovascular diseases. The methodology also outperformed three popular prior art approaches, when applied on the same dataset.

  11. A robust statistical estimation (RoSE) algorithm jointly recovers the 3D location and intensity of single molecules accurately and precisely

    NASA Astrophysics Data System (ADS)

    Mazidi, Hesam; Nehorai, Arye; Lew, Matthew D.

    2018-02-01

    In single-molecule (SM) super-resolution microscopy, the complexity of a biological structure, high molecular density, and a low signal-to-background ratio (SBR) may lead to imaging artifacts without a robust localization algorithm. Moreover, engineered point spread functions (PSFs) for 3D imaging pose difficulties due to their intricate features. We develop a Robust Statistical Estimation algorithm, called RoSE, that enables joint estimation of the 3D location and photon counts of SMs accurately and precisely using various PSFs under conditions of high molecular density and low SBR.

  12. Horsetail matching: a flexible approach to optimization under uncertainty

    NASA Astrophysics Data System (ADS)

    Cook, L. W.; Jarrett, J. P.

    2018-04-01

    It is important to design engineering systems to be robust with respect to uncertainties in the design process. Often, this is done by considering statistical moments, but over-reliance on statistical moments when formulating a robust optimization can produce designs that are stochastically dominated by other feasible designs. This article instead proposes a formulation for optimization under uncertainty that minimizes the difference between a design's cumulative distribution function and a target. A standard target is proposed that produces stochastically non-dominated designs, but the formulation also offers enough flexibility to recover existing approaches for robust optimization. A numerical implementation is developed that employs kernels to give a differentiable objective function. The method is applied to algebraic test problems and a robust transonic airfoil design problem where it is compared to multi-objective, weighted-sum and density matching approaches to robust optimization; several advantages over these existing methods are demonstrated.

  13. A Comprehensive review of group level model performance in the presence of heteroscedasticity: Can a single model control Type I errors in the presence of outliers?

    PubMed Central

    Mumford, Jeanette A.

    2017-01-01

    Even after thorough preprocessing and a careful time series analysis of functional magnetic resonance imaging (fMRI) data, artifact and other issues can lead to violations of the assumption that the variance is constant across subjects in the group level model. This is especially concerning when modeling a continuous covariate at the group level, as the slope is easily biased by outliers. Various models have been proposed to deal with outliers including models that use the first level variance or that use the group level residual magnitude to differentially weight subjects. The most typically used robust regression, implementing a robust estimator of the regression slope, has been previously studied in the context of fMRI studies and was found to perform well in some scenarios, but a loss of Type I error control can occur for some outlier settings. A second type of robust regression using a heteroscedastic autocorrelation consistent (HAC) estimator, which produces robust slope and variance estimates has been shown to perform well, with better Type I error control, but with large sample sizes (500–1000 subjects). The Type I error control with smaller sample sizes has not been studied in this model and has not been compared to other modeling approaches that handle outliers such as FSL’s Flame 1 and FSL’s outlier de-weighting. Focusing on group level inference with a continuous covariate over a range of sample sizes and degree of heteroscedasticity, which can be driven either by the within- or between-subject variability, both styles of robust regression are compared to ordinary least squares (OLS), FSL’s Flame 1, Flame 1 with outlier de-weighting algorithm and Kendall’s Tau. Additionally, subject omission using the Cook’s Distance measure with OLS and nonparametric inference with the OLS statistic are studied. Pros and cons of these models as well as general strategies for detecting outliers in data and taking precaution to avoid inflated Type I error rates are discussed. PMID:28030782

  14. A brief introduction to mixed effects modelling and multi-model inference in ecology

    PubMed Central

    Donaldson, Lynda; Correa-Cano, Maria Eugenia; Goodwin, Cecily E.D.

    2018-01-01

    The use of linear mixed effects models (LMMs) is increasingly common in the analysis of biological data. Whilst LMMs offer a flexible approach to modelling a broad range of data types, ecological data are often complex and require complex model structures, and the fitting and interpretation of such models is not always straightforward. The ability to achieve robust biological inference requires that practitioners know how and when to apply these tools. Here, we provide a general overview of current methods for the application of LMMs to biological data, and highlight the typical pitfalls that can be encountered in the statistical modelling process. We tackle several issues regarding methods of model selection, with particular reference to the use of information theory and multi-model inference in ecology. We offer practical solutions and direct the reader to key references that provide further technical detail for those seeking a deeper understanding. This overview should serve as a widely accessible code of best practice for applying LMMs to complex biological problems and model structures, and in doing so improve the robustness of conclusions drawn from studies investigating ecological and evolutionary questions. PMID:29844961

  15. A brief introduction to mixed effects modelling and multi-model inference in ecology.

    PubMed

    Harrison, Xavier A; Donaldson, Lynda; Correa-Cano, Maria Eugenia; Evans, Julian; Fisher, David N; Goodwin, Cecily E D; Robinson, Beth S; Hodgson, David J; Inger, Richard

    2018-01-01

    The use of linear mixed effects models (LMMs) is increasingly common in the analysis of biological data. Whilst LMMs offer a flexible approach to modelling a broad range of data types, ecological data are often complex and require complex model structures, and the fitting and interpretation of such models is not always straightforward. The ability to achieve robust biological inference requires that practitioners know how and when to apply these tools. Here, we provide a general overview of current methods for the application of LMMs to biological data, and highlight the typical pitfalls that can be encountered in the statistical modelling process. We tackle several issues regarding methods of model selection, with particular reference to the use of information theory and multi-model inference in ecology. We offer practical solutions and direct the reader to key references that provide further technical detail for those seeking a deeper understanding. This overview should serve as a widely accessible code of best practice for applying LMMs to complex biological problems and model structures, and in doing so improve the robustness of conclusions drawn from studies investigating ecological and evolutionary questions.

  16. A statistical parts-based appearance model of inter-subject variability.

    PubMed

    Toews, Matthew; Collins, D Louis; Arbel, Tal

    2006-01-01

    In this article, we present a general statistical parts-based model for representing the appearance of an image set, applied to the problem of inter-subject MR brain image matching. In contrast with global image representations such as active appearance models, the parts-based model consists of a collection of localized image parts whose appearance, geometry and occurrence frequency are quantified statistically. The parts-based approach explicitly addresses the case where one-to-one correspondence does not exist between subjects due to anatomical differences, as parts are not expected to occur in all subjects. The model can be learned automatically, discovering structures that appear with statistical regularity in a large set of subject images, and can be robustly fit to new images, all in the presence of significant inter-subject variability. As parts are derived from generic scale-invariant features, the framework can be applied in a wide variety of image contexts, in order to study the commonality of anatomical parts or to group subjects according to the parts they share. Experimentation shows that a parts-based model can be learned from a large set of MR brain images, and used to determine parts that are common within the group of subjects. Preliminary results indicate that the model can be used to automatically identify distinctive features for inter-subject image registration despite large changes in appearance.

  17. Statistical Models of Fracture Relevant to Nuclear-Grade Graphite: Review and Recommendations

    NASA Technical Reports Server (NTRS)

    Nemeth, Noel N.; Bratton, Robert L.

    2011-01-01

    The nuclear-grade (low-impurity) graphite needed for the fuel element and moderator material for next-generation (Gen IV) reactors displays large scatter in strength and a nonlinear stress-strain response from damage accumulation. This response can be characterized as quasi-brittle. In this expanded review, relevant statistical failure models for various brittle and quasi-brittle material systems are discussed with regard to strength distribution, size effect, multiaxial strength, and damage accumulation. This includes descriptions of the Weibull, Batdorf, and Burchell models as well as models that describe the strength response of composite materials, which involves distributed damage. Results from lattice simulations are included for a physics-based description of material breakdown. Consideration is given to the predicted transition between brittle and quasi-brittle damage behavior versus the density of damage (level of disorder) within the material system. The literature indicates that weakest-link-based failure modeling approaches appear to be reasonably robust in that they can be applied to materials that display distributed damage, provided that the level of disorder in the material is not too large. The Weibull distribution is argued to be the most appropriate statistical distribution to model the stochastic-strength response of graphite.

  18. Statistical modelling of networked human-automation performance using working memory capacity.

    PubMed

    Ahmed, Nisar; de Visser, Ewart; Shaw, Tyler; Mohamed-Ameen, Amira; Campbell, Mark; Parasuraman, Raja

    2014-01-01

    This study examines the challenging problem of modelling the interaction between individual attentional limitations and decision-making performance in networked human-automation system tasks. Analysis of real experimental data from a task involving networked supervision of multiple unmanned aerial vehicles by human participants shows that both task load and network message quality affect performance, but that these effects are modulated by individual differences in working memory (WM) capacity. These insights were used to assess three statistical approaches for modelling and making predictions with real experimental networked supervisory performance data: classical linear regression, non-parametric Gaussian processes and probabilistic Bayesian networks. It is shown that each of these approaches can help designers of networked human-automated systems cope with various uncertainties in order to accommodate future users by linking expected operating conditions and performance from real experimental data to observable cognitive traits like WM capacity. Practitioner Summary: Working memory (WM) capacity helps account for inter-individual variability in operator performance in networked unmanned aerial vehicle supervisory tasks. This is useful for reliable performance prediction near experimental conditions via linear models; robust statistical prediction beyond experimental conditions via Gaussian process models and probabilistic inference about unknown task conditions/WM capacities via Bayesian network models.

  19. Robust Lee local statistic filter for removal of mixed multiplicative and impulse noise

    NASA Astrophysics Data System (ADS)

    Ponomarenko, Nikolay N.; Lukin, Vladimir V.; Egiazarian, Karen O.; Astola, Jaakko T.

    2004-05-01

    A robust version of Lee local statistic filter able to effectively suppress the mixed multiplicative and impulse noise in images is proposed. The performance of the proposed modification is studied for a set of test images, several values of multiplicative noise variance, Gaussian and Rayleigh probability density functions of speckle, and different characteris-tics of impulse noise. The advantages of the designed filter in comparison to the conventional Lee local statistic filter and some other filters able to cope with mixed multiplicative+impulse noise are demonstrated.

  20. Selection vector filter framework

    NASA Astrophysics Data System (ADS)

    Lukac, Rastislav; Plataniotis, Konstantinos N.; Smolka, Bogdan; Venetsanopoulos, Anastasios N.

    2003-10-01

    We provide a unified framework of nonlinear vector techniques outputting the lowest ranked vector. The proposed framework constitutes a generalized filter class for multichannel signal processing. A new class of nonlinear selection filters are based on the robust order-statistic theory and the minimization of the weighted distance function to other input samples. The proposed method can be designed to perform a variety of filtering operations including previously developed filtering techniques such as vector median, basic vector directional filter, directional distance filter, weighted vector median filters and weighted directional filters. A wide range of filtering operations is guaranteed by the filter structure with two independent weight vectors for angular and distance domains of the vector space. In order to adapt the filter parameters to varying signal and noise statistics, we provide also the generalized optimization algorithms taking the advantage of the weighted median filters and the relationship between standard median filter and vector median filter. Thus, we can deal with both statistical and deterministic aspects of the filter design process. It will be shown that the proposed method holds the required properties such as the capability of modelling the underlying system in the application at hand, the robustness with respect to errors in the model of underlying system, the availability of the training procedure and finally, the simplicity of filter representation, analysis, design and implementation. Simulation studies also indicate that the new filters are computationally attractive and have excellent performance in environments corrupted by bit errors and impulsive noise.

  1. Origin of Pareto-like spatial distributions in ecosystems.

    PubMed

    Manor, Alon; Shnerb, Nadav M

    2008-12-31

    Recent studies of cluster distribution in various ecosystems revealed Pareto statistics for the size of spatial colonies. These results were supported by cellular automata simulations that yield robust criticality for endogenous pattern formation based on positive feedback. We show that this patch statistics is a manifestation of the law of proportionate effect. Mapping the stochastic model to a Markov birth-death process, the transition rates are shown to scale linearly with cluster size. This mapping provides a connection between patch statistics and the dynamics of the ecosystem; the "first passage time" for different colonies emerges as a powerful tool that discriminates between endogenous and exogenous clustering mechanisms. Imminent catastrophic shifts (such as desertification) manifest themselves in a drastic change of the stability properties of spatial colonies.

  2. Robust statistical reconstruction for charged particle tomography

    DOEpatents

    Schultz, Larry Joe; Klimenko, Alexei Vasilievich; Fraser, Andrew Mcleod; Morris, Christopher; Orum, John Christopher; Borozdin, Konstantin N; Sossong, Michael James; Hengartner, Nicolas W

    2013-10-08

    Systems and methods for charged particle detection including statistical reconstruction of object volume scattering density profiles from charged particle tomographic data to determine the probability distribution of charged particle scattering using a statistical multiple scattering model and determine a substantially maximum likelihood estimate of object volume scattering density using expectation maximization (ML/EM) algorithm to reconstruct the object volume scattering density. The presence of and/or type of object occupying the volume of interest can be identified from the reconstructed volume scattering density profile. The charged particle tomographic data can be cosmic ray muon tomographic data from a muon tracker for scanning packages, containers, vehicles or cargo. The method can be implemented using a computer program which is executable on a computer.

  3. A robust and efficient statistical method for genetic association studies using case and control samples from multiple cohorts

    PubMed Central

    2013-01-01

    Background The theoretical basis of genome-wide association studies (GWAS) is statistical inference of linkage disequilibrium (LD) between any polymorphic marker and a putative disease locus. Most methods widely implemented for such analyses are vulnerable to several key demographic factors and deliver a poor statistical power for detecting genuine associations and also a high false positive rate. Here, we present a likelihood-based statistical approach that accounts properly for non-random nature of case–control samples in regard of genotypic distribution at the loci in populations under study and confers flexibility to test for genetic association in presence of different confounding factors such as population structure, non-randomness of samples etc. Results We implemented this novel method together with several popular methods in the literature of GWAS, to re-analyze recently published Parkinson’s disease (PD) case–control samples. The real data analysis and computer simulation show that the new method confers not only significantly improved statistical power for detecting the associations but also robustness to the difficulties stemmed from non-randomly sampling and genetic structures when compared to its rivals. In particular, the new method detected 44 significant SNPs within 25 chromosomal regions of size < 1 Mb but only 6 SNPs in two of these regions were previously detected by the trend test based methods. It discovered two SNPs located 1.18 Mb and 0.18 Mb from the PD candidates, FGF20 and PARK8, without invoking false positive risk. Conclusions We developed a novel likelihood-based method which provides adequate estimation of LD and other population model parameters by using case and control samples, the ease in integration of these samples from multiple genetically divergent populations and thus confers statistically robust and powerful analyses of GWAS. On basis of simulation studies and analysis of real datasets, we demonstrated significant improvement of the new method over the non-parametric trend test, which is the most popularly implemented in the literature of GWAS. PMID:23394771

  4. Modeling the Risk of Radiation-Induced Acute Esophagitis for Combined Washington University and RTOG Trial 93-11 Lung Cancer Patients

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Huang, Ellen X.; Bradley, Jeffrey D.; El Naqa, Issam

    2012-04-01

    Purpose: To construct a maximally predictive model of the risk of severe acute esophagitis (AE) for patients who receive definitive radiation therapy (RT) for non-small-cell lung cancer. Methods and Materials: The dataset includes Washington University and RTOG 93-11 clinical trial data (events/patients: 120/374, WUSTL = 101/237, RTOG9311 = 19/137). Statistical model building was performed based on dosimetric and clinical parameters (patient age, sex, weight loss, pretreatment chemotherapy, concurrent chemotherapy, fraction size). A wide range of dose-volume parameters were extracted from dearchived treatment plans, including Dx, Vx, MOHx (mean of hottest x% volume), MOCx (mean of coldest x% volume), and gEUDmore » (generalized equivalent uniform dose) values. Results: The most significant single parameters for predicting acute esophagitis (RTOG Grade 2 or greater) were MOH85, mean esophagus dose (MED), and V30. A superior-inferior weighted dose-center position was derived but not found to be significant. Fraction size was found to be significant on univariate logistic analysis (Spearman R = 0.421, p < 0.00001) but not multivariate logistic modeling. Cross-validation model building was used to determine that an optimal model size needed only two parameters (MOH85 and concurrent chemotherapy, robustly selected on bootstrap model-rebuilding). Mean esophagus dose (MED) is preferred instead of MOH85, as it gives nearly the same statistical performance and is easier to compute. AE risk is given as a logistic function of (0.0688 Asterisk-Operator MED+1.50 Asterisk-Operator ConChemo-3.13), where MED is in Gy and ConChemo is either 1 (yes) if concurrent chemotherapy was given, or 0 (no). This model correlates to the observed risk of AE with a Spearman coefficient of 0.629 (p < 0.000001). Conclusions: Multivariate statistical model building with cross-validation suggests that a two-variable logistic model based on mean dose and the use of concurrent chemotherapy robustly predicts acute esophagitis risk in combined-data WUSTL and RTOG 93-11 trial datasets.« less

  5. Statistical colour models: an automated digital image analysis method for quantification of histological biomarkers.

    PubMed

    Shu, Jie; Dolman, G E; Duan, Jiang; Qiu, Guoping; Ilyas, Mohammad

    2016-04-27

    Colour is the most important feature used in quantitative immunohistochemistry (IHC) image analysis; IHC is used to provide information relating to aetiology and to confirm malignancy. Statistical modelling is a technique widely used for colour detection in computer vision. We have developed a statistical model of colour detection applicable to detection of stain colour in digital IHC images. Model was first trained by massive colour pixels collected semi-automatically. To speed up the training and detection processes, we removed luminance channel, Y channel of YCbCr colour space and chose 128 histogram bins which is the optimal number. A maximum likelihood classifier is used to classify pixels in digital slides into positively or negatively stained pixels automatically. The model-based tool was developed within ImageJ to quantify targets identified using IHC and histochemistry. The purpose of evaluation was to compare the computer model with human evaluation. Several large datasets were prepared and obtained from human oesophageal cancer, colon cancer and liver cirrhosis with different colour stains. Experimental results have demonstrated the model-based tool achieves more accurate results than colour deconvolution and CMYK model in the detection of brown colour, and is comparable to colour deconvolution in the detection of pink colour. We have also demostrated the proposed model has little inter-dataset variations. A robust and effective statistical model is introduced in this paper. The model-based interactive tool in ImageJ, which can create a visual representation of the statistical model and detect a specified colour automatically, is easy to use and available freely at http://rsb.info.nih.gov/ij/plugins/ihc-toolbox/index.html . Testing to the tool by different users showed only minor inter-observer variations in results.

  6. Defect detection of castings in radiography images using a robust statistical feature.

    PubMed

    Zhao, Xinyue; He, Zaixing; Zhang, Shuyou

    2014-01-01

    One of the most commonly used optical methods for defect detection is radiographic inspection. Compared with methods that extract defects directly from the radiography image, model-based methods deal with the case of an object with complex structure well. However, detection of small low-contrast defects in nonuniformly illuminated images is still a major challenge for them. In this paper, we present a new method based on the grayscale arranging pairs (GAP) feature to detect casting defects in radiography images automatically. First, a model is built using pixel pairs with a stable intensity relationship based on the GAP feature from previously acquired images. Second, defects can be extracted by comparing the difference of intensity-difference signs between the input image and the model statistically. The robustness of the proposed method to noise and illumination variations has been verified on casting radioscopic images with defects. The experimental results showed that the average computation time of the proposed method in the testing stage is 28 ms per image on a computer with a Pentium Core 2 Duo 3.00 GHz processor. For the comparison, we also evaluated the performance of the proposed method as well as that of the mixture-of-Gaussian-based and crossing line profile methods. The proposed method achieved 2.7% and 2.0% false negative rates in the noise and illumination variation experiments, respectively.

  7. Distinguishing transient signals and instrumental disturbances in semi-coherent searches for continuous gravitational waves with line-robust statistics

    NASA Astrophysics Data System (ADS)

    Keitel, David

    2016-05-01

    Non-axisymmetries in rotating neutron stars emit quasi-monochromatic gravitational waves. These long-duration ‘continuous wave’ signals are among the main search targets of ground-based interferometric detectors. However, standard detection methods are susceptible to false alarms from instrumental artefacts that resemble a continuous-wave signal. Past work [Keitel, Prix, Papa, Leaci and Siddiqi 2014, Phys. Rev. D 89 064023] showed that a Bayesian approach, based on an explicit model of persistent single-detector disturbances, improves robustness against such artefacts. Since many strong outliers in semi-coherent searches of LIGO data are caused by transient disturbances that last only a few hours or days, I describe in a recent paper [Keitel D 2015, LIGO-P1500159] how to extend this approach to cover transient disturbances, and demonstrate increased sensitivity in realistic simulated data. Additionally, neutron stars could emit transient signals which, for a limited time, also follow the continuous-wave signal model. As a pragmatic alternative to specialized transient searches, I demonstrate how to make standard semi-coherent continuous-wave searches more sensitive to transient signals. Focusing on the time-scale of a single segment in the semi-coherent search, Bayesian model selection yields a simple detection statistic without a significant increase in computational cost. This proceedings contribution gives a brief overview of both works.

  8. Sequential-Optimization-Based Framework for Robust Modeling and Design of Heterogeneous Catalytic Systems

    DOE PAGES

    Rangarajan, Srinivas; Maravelias, Christos T.; Mavrikakis, Manos

    2017-11-09

    Here, we present a general optimization-based framework for (i) ab initio and experimental data driven mechanistic modeling and (ii) optimal catalyst design of heterogeneous catalytic systems. Both cases are formulated as a nonlinear optimization problem that is subject to a mean-field microkinetic model and thermodynamic consistency requirements as constraints, for which we seek sparse solutions through a ridge (L 2 regularization) penalty. The solution procedure involves an iterative sequence of forward simulation of the differential algebraic equations pertaining to the microkinetic model using a numerical tool capable of handling stiff systems, sensitivity calculations using linear algebra, and gradient-based nonlinear optimization.more » A multistart approach is used to explore the solution space, and a hierarchical clustering procedure is implemented for statistically classifying potentially competing solutions. An example of methanol synthesis through hydrogenation of CO and CO 2 on a Cu-based catalyst is used to illustrate the framework. The framework is fast, is robust, and can be used to comprehensively explore the model solution and design space of any heterogeneous catalytic system.« less

  9. Sequential-Optimization-Based Framework for Robust Modeling and Design of Heterogeneous Catalytic Systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Rangarajan, Srinivas; Maravelias, Christos T.; Mavrikakis, Manos

    Here, we present a general optimization-based framework for (i) ab initio and experimental data driven mechanistic modeling and (ii) optimal catalyst design of heterogeneous catalytic systems. Both cases are formulated as a nonlinear optimization problem that is subject to a mean-field microkinetic model and thermodynamic consistency requirements as constraints, for which we seek sparse solutions through a ridge (L 2 regularization) penalty. The solution procedure involves an iterative sequence of forward simulation of the differential algebraic equations pertaining to the microkinetic model using a numerical tool capable of handling stiff systems, sensitivity calculations using linear algebra, and gradient-based nonlinear optimization.more » A multistart approach is used to explore the solution space, and a hierarchical clustering procedure is implemented for statistically classifying potentially competing solutions. An example of methanol synthesis through hydrogenation of CO and CO 2 on a Cu-based catalyst is used to illustrate the framework. The framework is fast, is robust, and can be used to comprehensively explore the model solution and design space of any heterogeneous catalytic system.« less

  10. Robust, Adaptive Functional Regression in Functional Mixed Model Framework.

    PubMed

    Zhu, Hongxiao; Brown, Philip J; Morris, Jeffrey S

    2011-09-01

    Functional data are increasingly encountered in scientific studies, and their high dimensionality and complexity lead to many analytical challenges. Various methods for functional data analysis have been developed, including functional response regression methods that involve regression of a functional response on univariate/multivariate predictors with nonparametrically represented functional coefficients. In existing methods, however, the functional regression can be sensitive to outlying curves and outlying regions of curves, so is not robust. In this paper, we introduce a new Bayesian method, robust functional mixed models (R-FMM), for performing robust functional regression within the general functional mixed model framework, which includes multiple continuous or categorical predictors and random effect functions accommodating potential between-function correlation induced by the experimental design. The underlying model involves a hierarchical scale mixture model for the fixed effects, random effect and residual error functions. These modeling assumptions across curves result in robust nonparametric estimators of the fixed and random effect functions which down-weight outlying curves and regions of curves, and produce statistics that can be used to flag global and local outliers. These assumptions also lead to distributions across wavelet coefficients that have outstanding sparsity and adaptive shrinkage properties, with great flexibility for the data to determine the sparsity and the heaviness of the tails. Together with the down-weighting of outliers, these within-curve properties lead to fixed and random effect function estimates that appear in our simulations to be remarkably adaptive in their ability to remove spurious features yet retain true features of the functions. We have developed general code to implement this fully Bayesian method that is automatic, requiring the user to only provide the functional data and design matrices. It is efficient enough to handle large data sets, and yields posterior samples of all model parameters that can be used to perform desired Bayesian estimation and inference. Although we present details for a specific implementation of the R-FMM using specific distributional choices in the hierarchical model, 1D functions, and wavelet transforms, the method can be applied more generally using other heavy-tailed distributions, higher dimensional functions (e.g. images), and using other invertible transformations as alternatives to wavelets.

  11. Robust, Adaptive Functional Regression in Functional Mixed Model Framework

    PubMed Central

    Zhu, Hongxiao; Brown, Philip J.; Morris, Jeffrey S.

    2012-01-01

    Functional data are increasingly encountered in scientific studies, and their high dimensionality and complexity lead to many analytical challenges. Various methods for functional data analysis have been developed, including functional response regression methods that involve regression of a functional response on univariate/multivariate predictors with nonparametrically represented functional coefficients. In existing methods, however, the functional regression can be sensitive to outlying curves and outlying regions of curves, so is not robust. In this paper, we introduce a new Bayesian method, robust functional mixed models (R-FMM), for performing robust functional regression within the general functional mixed model framework, which includes multiple continuous or categorical predictors and random effect functions accommodating potential between-function correlation induced by the experimental design. The underlying model involves a hierarchical scale mixture model for the fixed effects, random effect and residual error functions. These modeling assumptions across curves result in robust nonparametric estimators of the fixed and random effect functions which down-weight outlying curves and regions of curves, and produce statistics that can be used to flag global and local outliers. These assumptions also lead to distributions across wavelet coefficients that have outstanding sparsity and adaptive shrinkage properties, with great flexibility for the data to determine the sparsity and the heaviness of the tails. Together with the down-weighting of outliers, these within-curve properties lead to fixed and random effect function estimates that appear in our simulations to be remarkably adaptive in their ability to remove spurious features yet retain true features of the functions. We have developed general code to implement this fully Bayesian method that is automatic, requiring the user to only provide the functional data and design matrices. It is efficient enough to handle large data sets, and yields posterior samples of all model parameters that can be used to perform desired Bayesian estimation and inference. Although we present details for a specific implementation of the R-FMM using specific distributional choices in the hierarchical model, 1D functions, and wavelet transforms, the method can be applied more generally using other heavy-tailed distributions, higher dimensional functions (e.g. images), and using other invertible transformations as alternatives to wavelets. PMID:22308015

  12. An Intercompany Perspective on Biopharmaceutical Drug Product Robustness Studies.

    PubMed

    Morar-Mitrica, Sorina; Adams, Monica L; Crotts, George; Wurth, Christine; Ihnat, Peter M; Tabish, Tanvir; Antochshuk, Valentyn; DiLuzio, Willow; Dix, Daniel B; Fernandez, Jason E; Gupta, Kapil; Fleming, Michael S; He, Bing; Kranz, James K; Liu, Dingjiang; Narasimhan, Chakravarthy; Routhier, Eric; Taylor, Katherine D; Truong, Nobel; Stokes, Elaine S E

    2018-02-01

    The Biophorum Development Group (BPDG) is an industry-wide consortium enabling networking and sharing of best practices for the development of biopharmaceuticals. To gain a better understanding of current industry approaches for establishing biopharmaceutical drug product (DP) robustness, the BPDG-Formulation Point Share group conducted an intercompany collaboration exercise, which included a bench-marking survey and extensive group discussions around the scope, design, and execution of robustness studies. The results of this industry collaboration revealed several key common themes: (1) overall DP robustness is defined by both the formulation and the manufacturing process robustness; (2) robustness integrates the principles of quality by design (QbD); (3) DP robustness is an important factor in setting critical quality attribute control strategies and commercial specifications; (4) most companies employ robustness studies, along with prior knowledge, risk assessments, and statistics, to develop the DP design space; (5) studies are tailored to commercial development needs and the practices of each company. Three case studies further illustrate how a robustness study design for a biopharmaceutical DP balances experimental complexity, statistical power, scientific understanding, and risk assessment to provide the desired product and process knowledge. The BPDG-Formulation Point Share discusses identified industry challenges with regard to biopharmaceutical DP robustness and presents some recommendations for best practices. Copyright © 2018 American Pharmacists Association®. Published by Elsevier Inc. All rights reserved.

  13. Optimization Methods in Sherpa

    NASA Astrophysics Data System (ADS)

    Siemiginowska, Aneta; Nguyen, Dan T.; Doe, Stephen M.; Refsdal, Brian L.

    2009-09-01

    Forward fitting is a standard technique used to model X-ray data. A statistic, usually assumed weighted chi^2 or Poisson likelihood (e.g. Cash), is minimized in the fitting process to obtain a set of the best model parameters. Astronomical models often have complex forms with many parameters that can be correlated (e.g. an absorbed power law). Minimization is not trivial in such setting, as the statistical parameter space becomes multimodal and finding the global minimum is hard. Standard minimization algorithms can be found in many libraries of scientific functions, but they are usually focused on specific functions. However, Sherpa designed as general fitting and modeling application requires very robust optimization methods that can be applied to variety of astronomical data (X-ray spectra, images, timing, optical data etc.). We developed several optimization algorithms in Sherpa targeting a wide range of minimization problems. Two local minimization methods were built: Levenberg-Marquardt algorithm was obtained from MINPACK subroutine LMDIF and modified to achieve the required robustness; and Nelder-Mead simplex method has been implemented in-house based on variations of the algorithm described in the literature. A global search Monte-Carlo method has been implemented following a differential evolution algorithm presented by Storn and Price (1997). We will present the methods in Sherpa and discuss their usage cases. We will focus on the application to Chandra data showing both 1D and 2D examples. This work is supported by NASA contract NAS8-03060 (CXC).

  14. A general framework for parametric survival analysis.

    PubMed

    Crowther, Michael J; Lambert, Paul C

    2014-12-30

    Parametric survival models are being increasingly used as an alternative to the Cox model in biomedical research. Through direct modelling of the baseline hazard function, we can gain greater understanding of the risk profile of patients over time, obtaining absolute measures of risk. Commonly used parametric survival models, such as the Weibull, make restrictive assumptions of the baseline hazard function, such as monotonicity, which is often violated in clinical datasets. In this article, we extend the general framework of parametric survival models proposed by Crowther and Lambert (Journal of Statistical Software 53:12, 2013), to incorporate relative survival, and robust and cluster robust standard errors. We describe the general framework through three applications to clinical datasets, in particular, illustrating the use of restricted cubic splines, modelled on the log hazard scale, to provide a highly flexible survival modelling framework. Through the use of restricted cubic splines, we can derive the cumulative hazard function analytically beyond the boundary knots, resulting in a combined analytic/numerical approach, which substantially improves the estimation process compared with only using numerical integration. User-friendly Stata software is provided, which significantly extends parametric survival models available in standard software. Copyright © 2014 John Wiley & Sons, Ltd.

  15. Computational Software to Fit Seismic Data Using Epidemic-Type Aftershock Sequence Models and Modeling Performance Comparisons

    NASA Astrophysics Data System (ADS)

    Chu, A.

    2016-12-01

    Modern earthquake catalogs are often analyzed using spatial-temporal point process models such as the epidemic-type aftershock sequence (ETAS) models of Ogata (1998). My work implements three of the homogeneous ETAS models described in Ogata (1998). With a model's log-likelihood function, my software finds the Maximum-Likelihood Estimates (MLEs) of the model's parameters to estimate the homogeneous background rate and the temporal and spatial parameters that govern triggering effects. EM-algorithm is employed for its advantages of stability and robustness (Veen and Schoenberg, 2008). My work also presents comparisons among the three models in robustness, convergence speed, and implementations from theory to computing practice. Up-to-date regional seismic data of seismic active areas such as Southern California and Japan are used to demonstrate the comparisons. Data analysis has been done using computer languages Java and R. Java has the advantages of being strong-typed and easiness of controlling memory resources, while R has the advantages of having numerous available functions in statistical computing. Comparisons are also made between the two programming languages in convergence and stability, computational speed, and easiness of implementation. Issues that may affect convergence such as spatial shapes are discussed.

  16. Statistics based sampling for controller and estimator design

    NASA Astrophysics Data System (ADS)

    Tenne, Dirk

    The purpose of this research is the development of statistical design tools for robust feed-forward/feedback controllers and nonlinear estimators. This dissertation is threefold and addresses the aforementioned topics nonlinear estimation, target tracking and robust control. To develop statistically robust controllers and nonlinear estimation algorithms, research has been performed to extend existing techniques, which propagate the statistics of the state, to achieve higher order accuracy. The so-called unscented transformation has been extended to capture higher order moments. Furthermore, higher order moment update algorithms based on a truncated power series have been developed. The proposed techniques are tested on various benchmark examples. Furthermore, the unscented transformation has been utilized to develop a three dimensional geometrically constrained target tracker. The proposed planar circular prediction algorithm has been developed in a local coordinate framework, which is amenable to extension of the tracking algorithm to three dimensional space. This tracker combines the predictions of a circular prediction algorithm and a constant velocity filter by utilizing the Covariance Intersection. This combined prediction can be updated with the subsequent measurement using a linear estimator. The proposed technique is illustrated on a 3D benchmark trajectory, which includes coordinated turns and straight line maneuvers. The third part of this dissertation addresses the design of controller which include knowledge of parametric uncertainties and their distributions. The parameter distributions are approximated by a finite set of points which are calculated by the unscented transformation. This set of points is used to design robust controllers which minimize a statistical performance of the plant over the domain of uncertainty consisting of a combination of the mean and variance. The proposed technique is illustrated on three benchmark problems. The first relates to the design of prefilters for a linear and nonlinear spring-mass-dashpot system and the second applies a feedback controller to a hovering helicopter. Lastly, the statistical robust controller design is devoted to a concurrent feed-forward/feedback controller structure for a high-speed low tension tape drive.

  17. Mixed-order phase transition in a two-step contagion model with a single infectious seed.

    PubMed

    Choi, Wonjun; Lee, Deokjae; Kahng, B

    2017-02-01

    Percolation is known as one of the most robust continuous transitions, because its occupation rule is intrinsically local. As one of the ways to break the robustness, occupation is allowed to more than one species of particles and they occupy cooperatively. This generalized percolation model undergoes a discontinuous transition. Here we investigate an epidemic model with two contagion steps and characterize its phase transition analytically and numerically. We find that even though the order parameter jumps at a transition point r_{c}, then increases continuously, it does not exhibit any critical behavior: the fluctuations of the order parameter do not diverge at r_{c}. However, critical behavior appears in mean outbreak size, which diverges at the transition point in a manner that the ordinary percolation shows. Such a type of phase transition is regarded as a mixed-order phase transition. We also obtain scaling relations of cascade outbreak statistics when the order parameter jumps at r_{c}.

  18. Multi-stage learning for robust lung segmentation in challenging CT volumes.

    PubMed

    Sofka, Michal; Wetzl, Jens; Birkbeck, Neil; Zhang, Jingdan; Kohlberger, Timo; Kaftan, Jens; Declerck, Jérôme; Zhou, S Kevin

    2011-01-01

    Simple algorithms for segmenting healthy lung parenchyma in CT are unable to deal with high density tissue common in pulmonary diseases. To overcome this problem, we propose a multi-stage learning-based approach that combines anatomical information to predict an initialization of a statistical shape model of the lungs. The initialization first detects the carina of the trachea, and uses this to detect a set of automatically selected stable landmarks on regions near the lung (e.g., ribs, spine). These landmarks are used to align the shape model, which is then refined through boundary detection to obtain fine-grained segmentation. Robustness is obtained through hierarchical use of discriminative classifiers that are trained on a range of manually annotated data of diseased and healthy lungs. We demonstrate fast detection (35s per volume on average) and segmentation of 2 mm accuracy on challenging data.

  19. Encoding Dissimilarity Data for Statistical Model Building.

    PubMed

    Wahba, Grace

    2010-12-01

    We summarize, review and comment upon three papers which discuss the use of discrete, noisy, incomplete, scattered pairwise dissimilarity data in statistical model building. Convex cone optimization codes are used to embed the objects into a Euclidean space which respects the dissimilarity information while controlling the dimension of the space. A "newbie" algorithm is provided for embedding new objects into this space. This allows the dissimilarity information to be incorporated into a Smoothing Spline ANOVA penalized likelihood model, a Support Vector Machine, or any model that will admit Reproducing Kernel Hilbert Space components, for nonparametric regression, supervised learning, or semi-supervised learning. Future work and open questions are discussed. The papers are: F. Lu, S. Keles, S. Wright and G. Wahba 2005. A framework for kernel regularization with application to protein clustering. Proceedings of the National Academy of Sciences 102, 12332-1233.G. Corrada Bravo, G. Wahba, K. Lee, B. Klein, R. Klein and S. Iyengar 2009. Examining the relative influence of familial, genetic and environmental covariate information in flexible risk models. Proceedings of the National Academy of Sciences 106, 8128-8133F. Lu, Y. Lin and G. Wahba. Robust manifold unfolding with kernel regularization. TR 1008, Department of Statistics, University of Wisconsin-Madison.

  20. Defining surfaces for skewed, highly variable data

    USGS Publications Warehouse

    Helsel, D.R.; Ryker, S.J.

    2002-01-01

    Skewness of environmental data is often caused by more than simply a handful of outliers in an otherwise normal distribution. Statistical procedures for such datasets must be sufficiently robust to deal with distributions that are strongly non-normal, containing both a large proportion of outliers and a skewed main body of data. In the field of water quality, skewness is commonly associated with large variation over short distances. Spatial analysis of such data generally requires either considerable effort at modeling or the use of robust procedures not strongly affected by skewness and local variability. Using a skewed dataset of 675 nitrate measurements in ground water, commonly used methods for defining a surface (least-squares regression and kriging) are compared to a more robust method (loess). Three choices are critical in defining a surface: (i) is the surface to be a central mean or median surface? (ii) is either a well-fitting transformation or a robust and scale-independent measure of center used? (iii) does local spatial autocorrelation assist in or detract from addressing objectives? Published in 2002 by John Wiley & Sons, Ltd.

  1. Evaluating performances of simplified physically based landslide susceptibility models.

    NASA Astrophysics Data System (ADS)

    Capparelli, Giovanna; Formetta, Giuseppe; Versace, Pasquale

    2015-04-01

    Rainfall induced shallow landslides cause significant damages involving loss of life and properties. Prediction of shallow landslides susceptible locations is a complex task that involves many disciplines: hydrology, geotechnical science, geomorphology, and statistics. Usually to accomplish this task two main approaches are used: statistical or physically based model. This paper presents a package of GIS based models for landslide susceptibility analysis. It was integrated in the NewAge-JGrass hydrological model using the Object Modeling System (OMS) modeling framework. The package includes three simplified physically based models for landslides susceptibility analysis (M1, M2, and M3) and a component for models verifications. It computes eight goodness of fit indices (GOF) by comparing pixel-by-pixel model results and measurements data. Moreover, the package integration in NewAge-JGrass allows the use of other components such as geographic information system tools to manage inputs-output processes, and automatic calibration algorithms to estimate model parameters. The system offers the possibility to investigate and fairly compare the quality and the robustness of models and models parameters, according a procedure that includes: i) model parameters estimation by optimizing each of the GOF index separately, ii) models evaluation in the ROC plane by using each of the optimal parameter set, and iii) GOF robustness evaluation by assessing their sensitivity to the input parameter variation. This procedure was repeated for all three models. The system was applied for a case study in Calabria (Italy) along the Salerno-Reggio Calabria highway, between Cosenza and Altilia municipality. The analysis provided that among all the optimized indices and all the three models, Average Index (AI) optimization coupled with model M3 is the best modeling solution for our test case. This research was funded by PON Project No. 01_01503 "Integrated Systems for Hydrogeological Risk Monitoring, Early Warning and Mitigation Along the Main Lifelines", CUP B31H11000370005, in the framework of the National Operational Program for "Research and Competitiveness" 2007-2013.

  2. A novel latent gaussian copula framework for modeling spatial correlation in quantized SAR imagery with applications to ATR

    NASA Astrophysics Data System (ADS)

    Thelen, Brian T.; Xique, Ismael J.; Burns, Joseph W.; Goley, G. Steven; Nolan, Adam R.; Benson, Jonathan W.

    2017-04-01

    With all of the new remote sensing modalities available, and with ever increasing capabilities and frequency of collection, there is a desire to fundamentally understand/quantify the information content in the collected image data relative to various exploitation goals, such as detection/classification. A fundamental approach for this is the framework of Bayesian decision theory, but a daunting challenge is to have significantly flexible and accurate multivariate models for the features and/or pixels that capture a wide assortment of distributions and dependen- cies. In addition, data can come in the form of both continuous and discrete representations, where the latter is often generated based on considerations of robustness to imaging conditions and occlusions/degradations. In this paper we propose a novel suite of "latent" models fundamentally based on multivariate Gaussian copula models that can be used for quantized data from SAR imagery. For this Latent Gaussian Copula (LGC) model, we derive an approximate, maximum-likelihood estimation algorithm and demonstrate very reasonable estimation performance even for the larger images with many pixels. However applying these LGC models to large dimen- sions/images within a Bayesian decision/classification theory is infeasible due to the computational/numerical issues in evaluating the true full likelihood, and we propose an alternative class of novel pseudo-likelihoood detection statistics that are computationally feasible. We show in a few simple examples that these statistics have the potential to provide very good and robust detection/classification performance. All of this framework is demonstrated on a simulated SLICY data set, and the results show the importance of modeling the dependencies, and of utilizing the pseudo-likelihood methods.

  3. A Statistical Model for Multilingual Entity Detection and Tracking

    DTIC Science & Technology

    2004-01-01

    tomatic Content Extraction ( ACE ) evaluation achieved top-tier results in all three evaluation languages. 1 Introduction Detecting entities, whether named...of com- bining the detected mentions into groups of references to the same object. The work presented here is motivated by the ACE eval- uation...Entropy (MaxEnt henceforth) (Berger et al., 1996) and Robust Risk Minimization (RRM henceforth) 1For a description of the ACE program see http

  4. VHSIC/VHSIC-Like Reliability Prediction Modeling

    DTIC Science & Technology

    1989-10-01

    prediction would require ’ kowledge of event statistics as well as device robustness. Ii1 Additionally, although this is primarily a theoretical, bottom...Degradation in Section 5.3 P = Power PDIP = Plastic DIP P(f) = Probability of Failure due to EOS or ESD P(flc) = Probability of Failure given Contact from an...the results of those stresses: Device Stress Part Number Power Dissipation Manufacturer Test Type Part Description Junction Teniperatune Package Type

  5. Method for factor analysis of GC/MS data

    DOEpatents

    Van Benthem, Mark H; Kotula, Paul G; Keenan, Michael R

    2012-09-11

    The method of the present invention provides a fast, robust, and automated multivariate statistical analysis of gas chromatography/mass spectroscopy (GC/MS) data sets. The method can involve systematic elimination of undesired, saturated peak masses to yield data that follow a linear, additive model. The cleaned data can then be subjected to a combination of PCA and orthogonal factor rotation followed by refinement with MCR-ALS to yield highly interpretable results.

  6. Neural network uncertainty assessment using Bayesian statistics: a remote sensing application

    NASA Technical Reports Server (NTRS)

    Aires, F.; Prigent, C.; Rossow, W. B.

    2004-01-01

    Neural network (NN) techniques have proved successful for many regression problems, in particular for remote sensing; however, uncertainty estimates are rarely provided. In this article, a Bayesian technique to evaluate uncertainties of the NN parameters (i.e., synaptic weights) is first presented. In contrast to more traditional approaches based on point estimation of the NN weights, we assess uncertainties on such estimates to monitor the robustness of the NN model. These theoretical developments are illustrated by applying them to the problem of retrieving surface skin temperature, microwave surface emissivities, and integrated water vapor content from a combined analysis of satellite microwave and infrared observations over land. The weight uncertainty estimates are then used to compute analytically the uncertainties in the network outputs (i.e., error bars and correlation structure of these errors). Such quantities are very important for evaluating any application of an NN model. The uncertainties on the NN Jacobians are then considered in the third part of this article. Used for regression fitting, NN models can be used effectively to represent highly nonlinear, multivariate functions. In this situation, most emphasis is put on estimating the output errors, but almost no attention has been given to errors associated with the internal structure of the regression model. The complex structure of dependency inside the NN is the essence of the model, and assessing its quality, coherency, and physical character makes all the difference between a blackbox model with small output errors and a reliable, robust, and physically coherent model. Such dependency structures are described to the first order by the NN Jacobians: they indicate the sensitivity of one output with respect to the inputs of the model for given input data. We use a Monte Carlo integration procedure to estimate the robustness of the NN Jacobians. A regularization strategy based on principal component analysis is proposed to suppress the multicollinearities in order to make these Jacobians robust and physically meaningful.

  7. Robust small area estimation of poverty indicators using M-quantile approach (Case study: Sub-district level in Bogor district)

    NASA Astrophysics Data System (ADS)

    Girinoto, Sadik, Kusman; Indahwati

    2017-03-01

    The National Socio-Economic Survey samples are designed to produce estimates of parameters of planned domains (provinces and districts). The estimation of unplanned domains (sub-districts and villages) has its limitation to obtain reliable direct estimates. One of the possible solutions to overcome this problem is employing small area estimation techniques. The popular choice of small area estimation is based on linear mixed models. However, such models need strong distributional assumptions and do not easy allow for outlier-robust estimation. As an alternative approach for this purpose, M-quantile regression approach to small area estimation based on modeling specific M-quantile coefficients of conditional distribution of study variable given auxiliary covariates. It obtained outlier-robust estimation from influence function of M-estimator type and also no need strong distributional assumptions. In this paper, the aim of study is to estimate the poverty indicator at sub-district level in Bogor District-West Java using M-quantile models for small area estimation. Using data taken from National Socioeconomic Survey and Villages Potential Statistics, the results provide a detailed description of pattern of incidence and intensity of poverty within Bogor district. We also compare the results with direct estimates. The results showed the framework may be preferable when direct estimate having no incidence of poverty at all in the small area.

  8. Distinguishing synchronous and time-varying synergies using point process interval statistics: motor primitives in frog and rat

    PubMed Central

    Hart, Corey B.; Giszter, Simon F.

    2013-01-01

    We present and apply a method that uses point process statistics to discriminate the forms of synergies in motor pattern data, prior to explicit synergy extraction. The method uses electromyogram (EMG) pulse peak timing or onset timing. Peak timing is preferable in complex patterns where pulse onsets may be overlapping. An interval statistic derived from the point processes of EMG peak timings distinguishes time-varying synergies from synchronous synergies (SS). Model data shows that the statistic is robust for most conditions. Its application to both frog hindlimb EMG and rat locomotion hindlimb EMG show data from these preparations is clearly most consistent with synchronous synergy models (p < 0.001). Additional direct tests of pulse and interval relations in frog data further bolster the support for synchronous synergy mechanisms in these data. Our method and analyses support separated control of rhythm and pattern of motor primitives, with the low level execution primitives comprising pulsed SS in both frog and rat, and both episodic and rhythmic behaviors. PMID:23675341

  9. An Intelligent Model for Pairs Trading Using Genetic Algorithms.

    PubMed

    Huang, Chien-Feng; Hsu, Chi-Jen; Chen, Chi-Chung; Chang, Bao Rong; Li, Chen-An

    2015-01-01

    Pairs trading is an important and challenging research area in computational finance, in which pairs of stocks are bought and sold in pair combinations for arbitrage opportunities. Traditional methods that solve this set of problems mostly rely on statistical methods such as regression. In contrast to the statistical approaches, recent advances in computational intelligence (CI) are leading to promising opportunities for solving problems in the financial applications more effectively. In this paper, we present a novel methodology for pairs trading using genetic algorithms (GA). Our results showed that the GA-based models are able to significantly outperform the benchmark and our proposed method is capable of generating robust models to tackle the dynamic characteristics in the financial application studied. Based upon the promising results obtained, we expect this GA-based method to advance the research in computational intelligence for finance and provide an effective solution to pairs trading for investment in practice.

  10. An Intelligent Model for Pairs Trading Using Genetic Algorithms

    PubMed Central

    Hsu, Chi-Jen; Chen, Chi-Chung; Li, Chen-An

    2015-01-01

    Pairs trading is an important and challenging research area in computational finance, in which pairs of stocks are bought and sold in pair combinations for arbitrage opportunities. Traditional methods that solve this set of problems mostly rely on statistical methods such as regression. In contrast to the statistical approaches, recent advances in computational intelligence (CI) are leading to promising opportunities for solving problems in the financial applications more effectively. In this paper, we present a novel methodology for pairs trading using genetic algorithms (GA). Our results showed that the GA-based models are able to significantly outperform the benchmark and our proposed method is capable of generating robust models to tackle the dynamic characteristics in the financial application studied. Based upon the promising results obtained, we expect this GA-based method to advance the research in computational intelligence for finance and provide an effective solution to pairs trading for investment in practice. PMID:26339236

  11. Sampling methods to the statistical control of the production of blood components.

    PubMed

    Pereira, Paulo; Seghatchian, Jerard; Caldeira, Beatriz; Santos, Paula; Castro, Rosa; Fernandes, Teresa; Xavier, Sandra; de Sousa, Gracinda; de Almeida E Sousa, João Paulo

    2017-12-01

    The control of blood components specifications is a requirement generalized in Europe by the European Commission Directives and in the US by the AABB standards. The use of a statistical process control methodology is recommended in the related literature, including the EDQM guideline. The control reliability is dependent of the sampling. However, a correct sampling methodology seems not to be systematically applied. Commonly, the sampling is intended to comply uniquely with the 1% specification to the produced blood components. Nevertheless, on a purely statistical viewpoint, this model could be argued not to be related to a consistent sampling technique. This could be a severe limitation to detect abnormal patterns and to assure that the production has a non-significant probability of producing nonconforming components. This article discusses what is happening in blood establishments. Three statistical methodologies are proposed: simple random sampling, sampling based on the proportion of a finite population, and sampling based on the inspection level. The empirical results demonstrate that these models are practicable in blood establishments contributing to the robustness of sampling and related statistical process control decisions for the purpose they are suggested for. Copyright © 2017 Elsevier Ltd. All rights reserved.

  12. A system for learning statistical motion patterns.

    PubMed

    Hu, Weiming; Xiao, Xuejuan; Fu, Zhouyu; Xie, Dan; Tan, Tieniu; Maybank, Steve

    2006-09-01

    Analysis of motion patterns is an effective approach for anomaly detection and behavior prediction. Current approaches for the analysis of motion patterns depend on known scenes, where objects move in predefined ways. It is highly desirable to automatically construct object motion patterns which reflect the knowledge of the scene. In this paper, we present a system for automatically learning motion patterns for anomaly detection and behavior prediction based on a proposed algorithm for robustly tracking multiple objects. In the tracking algorithm, foreground pixels are clustered using a fast accurate fuzzy K-means algorithm. Growing and prediction of the cluster centroids of foreground pixels ensure that each cluster centroid is associated with a moving object in the scene. In the algorithm for learning motion patterns, trajectories are clustered hierarchically using spatial and temporal information and then each motion pattern is represented with a chain of Gaussian distributions. Based on the learned statistical motion patterns, statistical methods are used to detect anomalies and predict behaviors. Our system is tested using image sequences acquired, respectively, from a crowded real traffic scene and a model traffic scene. Experimental results show the robustness of the tracking algorithm, the efficiency of the algorithm for learning motion patterns, and the encouraging performance of algorithms for anomaly detection and behavior prediction.

  13. Detecting associated single-nucleotide polymorphisms on the X chromosome in case control genome-wide association studies.

    PubMed

    Chen, Zhongxue; Ng, Hon Keung Tony; Li, Jing; Liu, Qingzhong; Huang, Hanwen

    2017-04-01

    In the past decade, hundreds of genome-wide association studies have been conducted to detect the significant single-nucleotide polymorphisms that are associated with certain diseases. However, most of the data from the X chromosome were not analyzed and only a few significant associated single-nucleotide polymorphisms from the X chromosome have been identified from genome-wide association studies. This is mainly due to the lack of powerful statistical tests. In this paper, we propose a novel statistical approach that combines the information of single-nucleotide polymorphisms on the X chromosome from both males and females in an efficient way. The proposed approach avoids the need of making strong assumptions about the underlying genetic models. Our proposed statistical test is a robust method that only makes the assumption that the risk allele is the same for both females and males if the single-nucleotide polymorphism is associated with the disease for both genders. Through simulation study and a real data application, we show that the proposed procedure is robust and have excellent performance compared to existing methods. We expect that many more associated single-nucleotide polymorphisms on the X chromosome will be identified if the proposed approach is applied to current available genome-wide association studies data.

  14. Robustness of S1 statistic with Hodges-Lehmann for skewed distributions

    NASA Astrophysics Data System (ADS)

    Ahad, Nor Aishah; Yahaya, Sharipah Soaad Syed; Yin, Lee Ping

    2016-10-01

    Analysis of variance (ANOVA) is a common use parametric method to test the differences in means for more than two groups when the populations are normally distributed. ANOVA is highly inefficient under the influence of non- normal and heteroscedastic settings. When the assumptions are violated, researchers are looking for alternative such as Kruskal-Wallis under nonparametric or robust method. This study focused on flexible method, S1 statistic for comparing groups using median as the location estimator. S1 statistic was modified by substituting the median with Hodges-Lehmann and the default scale estimator with the variance of Hodges-Lehmann and MADn to produce two different test statistics for comparing groups. Bootstrap method was used for testing the hypotheses since the sampling distributions of these modified S1 statistics are unknown. The performance of the proposed statistic in terms of Type I error was measured and compared against the original S1 statistic, ANOVA and Kruskal-Wallis. The propose procedures show improvement compared to the original statistic especially under extremely skewed distribution.

  15. Disconcordance in Statistical Models of Bisphenol A and Chronic Disease Outcomes in NHANES 2003-08

    PubMed Central

    Casey, Martin F.; Neidell, Matthew

    2013-01-01

    Background Bisphenol A (BPA), a high production chemical commonly found in plastics, has drawn great attention from researchers due to the substance’s potential toxicity. Using data from three National Health and Nutrition Examination Survey (NHANES) cycles, we explored the consistency and robustness of BPA’s reported effects on coronary heart disease and diabetes. Methods And Findings We report the use of three different statistical models in the analysis of BPA: (1) logistic regression, (2) log-linear regression, and (3) dose-response logistic regression. In each variation, confounders were added in six blocks to account for demographics, urinary creatinine, source of BPA exposure, healthy behaviours, and phthalate exposure. Results were sensitive to the variations in functional form of our statistical models, but no single model yielded consistent results across NHANES cycles. Reported ORs were also found to be sensitive to inclusion/exclusion criteria. Further, observed effects, which were most pronounced in NHANES 2003-04, could not be explained away by confounding. Conclusions Limitations in the NHANES data and a poor understanding of the mode of action of BPA have made it difficult to develop informative statistical models. Given the sensitivity of effect estimates to functional form, researchers should report results using multiple specifications with different assumptions about BPA measurement, thus allowing for the identification of potential discrepancies in the data. PMID:24223205

  16. An entropy-based nonparametric test for the validation of surrogate endpoints.

    PubMed

    Miao, Xiaopeng; Wang, Yong-Cheng; Gangopadhyay, Ashis

    2012-06-30

    We present a nonparametric test to validate surrogate endpoints based on measure of divergence and random permutation. This test is a proposal to directly verify the Prentice statistical definition of surrogacy. The test does not impose distributional assumptions on the endpoints, and it is robust to model misspecification. Our simulation study shows that the proposed nonparametric test outperforms the practical test of the Prentice criterion in terms of both robustness of size and power. We also evaluate the performance of three leading methods that attempt to quantify the effect of surrogate endpoints. The proposed method is applied to validate magnetic resonance imaging lesions as the surrogate endpoint for clinical relapses in a multiple sclerosis trial. Copyright © 2012 John Wiley & Sons, Ltd.

  17. The road map towards providing a robust Raman spectroscopy-based cancer diagnostic platform and integration into clinic

    NASA Astrophysics Data System (ADS)

    Lau, Katherine; Isabelle, Martin; Lloyd, Gavin R.; Old, Oliver; Shepherd, Neil; Bell, Ian M.; Dorney, Jennifer; Lewis, Aaran; Gaifulina, Riana; Rodriguez-Justo, Manuel; Kendall, Catherine; Stone, Nicolas; Thomas, Geraint; Reece, David

    2016-03-01

    Despite the demonstrated potential as an accurate cancer diagnostic tool, Raman spectroscopy (RS) is yet to be adopted by the clinic for histopathology reviews. The Stratified Medicine through Advanced Raman Technologies (SMART) consortium has begun to address some of the hurdles in its adoption for cancer diagnosis. These hurdles include awareness and acceptance of the technology, practicality of integration into the histopathology workflow, data reproducibility and availability of transferrable models. We have formed a consortium, in joint efforts, to develop optimised protocols for tissue sample preparation, data collection and analysis. These protocols will be supported by provision of suitable hardware and software tools to allow statistically sound classification models to be built and transferred for use on different systems. In addition, we are building a validated gastrointestinal (GI) cancers model, which can be trialled as part of the histopathology workflow at hospitals, and a classification tool. At the end of the project, we aim to deliver a robust Raman based diagnostic platform to enable clinical researchers to stage cancer, define tumour margin, build cancer diagnostic models and discover novel disease bio markers.

  18. Universal Capacitance Model for Real-Time Biomass in Cell Culture.

    PubMed

    Konakovsky, Viktor; Yagtu, Ali Civan; Clemens, Christoph; Müller, Markus Michael; Berger, Martina; Schlatter, Stefan; Herwig, Christoph

    2015-09-02

    : Capacitance probes have the potential to revolutionize bioprocess control due to their safe and robust use and ability to detect even the smallest capacitors in the form of biological cells. Several techniques have evolved to model biomass statistically, however, there are problems with model transfer between cell lines and process conditions. Errors of transferred models in the declining phase of the culture range for linear models around +100% or worse, causing unnecessary delays with test runs during bioprocess development. The goal of this work was to develop one single universal model which can be adapted by considering a potentially mechanistic factor to estimate biomass in yet untested clones and scales. The novelty of this work is a methodology to select sensitive frequencies to build a statistical model which can be shared among fermentations with an error between 9% and 38% (mean error around 20%) for the whole process, including the declining phase. A simple linear factor was found to be responsible for the transferability of biomass models between cell lines, indicating a link to their phenotype or physiology.

  19. Noise level and MPEG-2 encoder statistics

    NASA Astrophysics Data System (ADS)

    Lee, Jungwoo

    1997-01-01

    Most software in the movie and broadcasting industries are still in analog film or tape format, which typically contains random noise that originated from film, CCD camera, and tape recording. The performance of the MPEG-2 encoder may be significantly degraded by the noise. It is also affected by the scene type that includes spatial and temporal activity. The statistical property of noise originating from camera and tape player is analyzed and the models for the two types of noise are developed. The relationship between the noise, the scene type, and encoder statistics of a number of MPEG-2 parameters such as motion vector magnitude, prediction error, and quant scale are discussed. This analysis is intended to be a tool for designing robust MPEG encoding algorithms such as preprocessing and rate control.

  20. [Statistical validity of the Mexican Food Security Scale and the Latin American and Caribbean Food Security Scale].

    PubMed

    Villagómez-Ornelas, Paloma; Hernández-López, Pedro; Carrasco-Enríquez, Brenda; Barrios-Sánchez, Karina; Pérez-Escamilla, Rafael; Melgar-Quiñónez, Hugo

    2014-01-01

    This article validates the statistical consistency of two food security scales: the Mexican Food Security Scale (EMSA) and the Latin American and Caribbean Food Security Scale (ELCSA). Validity tests were conducted in order to verify that both scales were consistent instruments, conformed by independent, properly calibrated and adequately sorted items, arranged in a continuum of severity. The following tests were developed: sorting of items; Cronbach's alpha analysis; parallelism of prevalence curves; Rasch models; sensitivity analysis through mean differences' hypothesis test. The tests showed that both scales meet the required attributes and are robust statistical instruments for food security measurement. This is relevant given that the lack of access to food indicator, included in multidimensional poverty measurement in Mexico, is calculated with EMSA.

  1. Integrating chronological uncertainties for annually laminated lake sediments using layer counting, independent chronologies and Bayesian age modelling (Lake Ohau, South Island, New Zealand)

    NASA Astrophysics Data System (ADS)

    Vandergoes, Marcus J.; Howarth, Jamie D.; Dunbar, Gavin B.; Turnbull, Jocelyn C.; Roop, Heidi A.; Levy, Richard H.; Li, Xun; Prior, Christine; Norris, Margaret; Keller, Liz D.; Baisden, W. Troy; Ditchburn, Robert; Fitzsimons, Sean J.; Bronk Ramsey, Christopher

    2018-05-01

    Annually resolved (varved) lake sequences are important palaeoenvironmental archives as they offer a direct incremental dating technique for high-frequency reconstruction of environmental and climate change. Despite the importance of these records, establishing a robust chronology and quantifying its precision and accuracy (estimations of error) remains an essential but challenging component of their development. We outline an approach for building reliable independent chronologies, testing the accuracy of layer counts and integrating all chronological uncertainties to provide quantitative age and error estimates for varved lake sequences. The approach incorporates (1) layer counts and estimates of counting precision; (2) radiometric and biostratigrapic dating techniques to derive independent chronology; and (3) the application of Bayesian age modelling to produce an integrated age model. This approach is applied to a case study of an annually resolved sediment record from Lake Ohau, New Zealand. The most robust age model provides an average error of 72 years across the whole depth range. This represents a fractional uncertainty of ∼5%, higher than the <3% quoted for most published varve records. However, the age model and reported uncertainty represent the best fit between layer counts and independent chronology and the uncertainties account for both layer counting precision and the chronological accuracy of the layer counts. This integrated approach provides a more representative estimate of age uncertainty and therefore represents a statistically more robust chronology.

  2. Enhancing Dairy Manufacturing through customer feedback: A statistical approach

    NASA Astrophysics Data System (ADS)

    Vineesh, D.; Anbuudayasankar, S. P.; Narassima, M. S.

    2018-02-01

    Dairy products have become inevitable of habitual diet. This study aims to investigate the consumers’ satisfaction towards dairy products so as to provide useful information for the manufacturers which would serve as useful inputs for enriching the quality of products delivered. The study involved consumers of dairy products from various demographical backgrounds across South India. The questionnaire focussed on quality aspects of dairy products and also the service provided. A customer satisfaction model was developed based on various factors identified, with robust hypotheses that govern the use of the product. The developed model proved to be statistically significant as it passed the required statistical tests for reliability, construct validity and interdependency between the constructs. Some major concerns detected were regarding the fat content, taste and odour of packaged milk. A minor proportion of people (15.64%) were unsatisfied with the quality of service provided, which is another issue to be addressed to eliminate the sense of dissatisfaction in the minds of consumers.

  3. Inverse Optimization: A New Perspective on the Black-Litterman Model

    PubMed Central

    Bertsimas, Dimitris; Gupta, Vishal; Paschalidis, Ioannis Ch.

    2014-01-01

    The Black-Litterman (BL) model is a widely used asset allocation model in the financial industry. In this paper, we provide a new perspective. The key insight is to replace the statistical framework in the original approach with ideas from inverse optimization. This insight allows us to significantly expand the scope and applicability of the BL model. We provide a richer formulation that, unlike the original model, is flexible enough to incorporate investor information on volatility and market dynamics. Equally importantly, our approach allows us to move beyond the traditional mean-variance paradigm of the original model and construct “BL”-type estimators for more general notions of risk such as coherent risk measures. Computationally, we introduce and study two new “BL”-type estimators and their corresponding portfolios: a Mean Variance Inverse Optimization (MV-IO) portfolio and a Robust Mean Variance Inverse Optimization (RMV-IO) portfolio. These two approaches are motivated by ideas from arbitrage pricing theory and volatility uncertainty. Using numerical simulation and historical backtesting, we show that both methods often demonstrate a better risk-reward tradeoff than their BL counterparts and are more robust to incorrect investor views. PMID:25382873

  4. Modeling and replicating statistical topology and evidence for CMB nonhomogeneity

    PubMed Central

    Agami, Sarit

    2017-01-01

    Under the banner of “big data,” the detection and classification of structure in extremely large, high-dimensional, data sets are two of the central statistical challenges of our times. Among the most intriguing new approaches to this challenge is “TDA,” or “topological data analysis,” one of the primary aims of which is providing nonmetric, but topologically informative, preanalyses of data which make later, more quantitative, analyses feasible. While TDA rests on strong mathematical foundations from topology, in applications, it has faced challenges due to difficulties in handling issues of statistical reliability and robustness, often leading to an inability to make scientific claims with verifiable levels of statistical confidence. We propose a methodology for the parametric representation, estimation, and replication of persistence diagrams, the main diagnostic tool of TDA. The power of the methodology lies in the fact that even if only one persistence diagram is available for analysis—the typical case for big data applications—the replications permit conventional statistical hypothesis testing. The methodology is conceptually simple and computationally practical, and provides a broadly effective statistical framework for persistence diagram TDA analysis. We demonstrate the basic ideas on a toy example, and the power of the parametric approach to TDA modeling in an analysis of cosmic microwave background (CMB) nonhomogeneity. PMID:29078301

  5. A robust TDT-type association test under informative parental missingness.

    PubMed

    Chen, J H; Cheng, K F

    2011-02-10

    Many family-based association tests rely on the random transmission of alleles from parents to offspring. Among them, the transmission/disequilibrium test (TDT) may be considered to be the most popular statistical test. The TDT statistic and its variations were proposed to evaluate nonrandom transmission of alleles from parents to the diseased children. However, in family studies, parental genotypes may be missing due to parental death, loss, divorce, or other reasons. Under some missingness conditions, nonrandom transmission of alleles may still occur even when the gene and disease are not associated. As a consequence, the usual TDT-type tests would produce excessive false positive conclusions in association studies. In this paper, we propose a novel TDT-type association test which is not only simple in computation but also robust to the joint effect of population stratification and informative parental missingness. Our test is model-free and allows for different mechanisms of parental missingness across subpopulations. We use a simulation study to compare the performance of the new test with TDT and point out the advantage of the new method. Copyright © 2010 John Wiley & Sons, Ltd.

  6. The heritability of the functional connectome is robust to common nonlinear registration methods

    NASA Astrophysics Data System (ADS)

    Hafzalla, George W.; Prasad, Gautam; Baboyan, Vatche G.; Faskowitz, Joshua; Jahanshad, Neda; McMahon, Katie L.; de Zubicaray, Greig I.; Wright, Margaret J.; Braskie, Meredith N.; Thompson, Paul M.

    2016-03-01

    Nonlinear registration algorithms are routinely used in brain imaging, to align data for inter-subject and group comparisons, and for voxelwise statistical analyses. To understand how the choice of registration method affects maps of functional brain connectivity in a sample of 611 twins, we evaluated three popular nonlinear registration methods: Advanced Normalization Tools (ANTs), Automatic Registration Toolbox (ART), and FMRIB's Nonlinear Image Registration Tool (FNIRT). Using both structural and functional MRI, we used each of the three methods to align the MNI152 brain template, and 80 regions of interest (ROIs), to each subject's T1-weighted (T1w) anatomical image. We then transformed each subject's ROIs onto the associated resting state functional MRI (rs-fMRI) scans and computed a connectivity network or functional connectome for each subject. Given the different degrees of genetic similarity between pairs of monozygotic (MZ) and same-sex dizygotic (DZ) twins, we used structural equation modeling to estimate the additive genetic influences on the elements of the function networks, or their heritability. The functional connectome and derived statistics were relatively robust to nonlinear registration effects.

  7. Statistical Analysis of Bus Networks in India

    PubMed Central

    2016-01-01

    In this paper, we model the bus networks of six major Indian cities as graphs in L-space, and evaluate their various statistical properties. While airline and railway networks have been extensively studied, a comprehensive study on the structure and growth of bus networks is lacking. In India, where bus transport plays an important role in day-to-day commutation, it is of significant interest to analyze its topological structure and answer basic questions on its evolution, growth, robustness and resiliency. Although the common feature of small-world property is observed, our analysis reveals a wide spectrum of network topologies arising due to significant variation in the degree-distribution patterns in the networks. We also observe that these networks although, robust and resilient to random attacks are particularly degree-sensitive. Unlike real-world networks, such as Internet, WWW and airline, that are virtual, bus networks are physically constrained. Our findings therefore, throw light on the evolution of such geographically and constrained networks that will help us in designing more efficient bus networks in the future. PMID:27992590

  8. Synthetic CO, H2 and H I surveys of the second galactic quadrant, and the properties of molecular gas

    NASA Astrophysics Data System (ADS)

    Duarte-Cabral, A.; Acreman, D. M.; Dobbs, C. L.; Mottram, J. C.; Gibson, S. J.; Brunt, C. M.; Douglas, K. A.

    2015-03-01

    We present CO, H2, H I and HISA (H I self-absorption) distributions from a set of simulations of grand design spirals including stellar feedback, self-gravity, heating and cooling. We replicate the emission of the second galactic quadrant by placing the observer inside the modelled galaxies and post-process the simulations using a radiative transfer code, so as to create synthetic observations. We compare the synthetic data cubes to observations of the second quadrant of the Milky Way to test the ability of the current models to reproduce the basic chemistry of the Galactic interstellar medium (ISM), as well as to test how sensitive such galaxy models are to different recipes of chemistry and/or feedback. We find that models which include feedback and self-gravity can reproduce the production of CO with respect to H2 as observed in our Galaxy, as well as the distribution of the material perpendicular to the Galactic plane. While changes in the chemistry/feedback recipes do not have a huge impact on the statistical properties of the chemistry in the simulated galaxies, we find that the inclusion of both feedback and self-gravity are crucial ingredients, as our test without feedback failed to reproduce all of the observables. Finally, even though the transition from H2 to CO seems to be robust, we find that all models seem to underproduce molecular gas, and have a lower molecular to atomic gas fraction than is observed. Nevertheless, our fiducial model with feedback and self-gravity has shown to be robust in reproducing the statistical properties of the basic molecular gas components of the ISM in our Galaxy.

  9. Functional status predicts acute care readmission in the traumatic spinal cord injury population.

    PubMed

    Huang, Donna; Slocum, Chloe; Silver, Julie K; Morgan, James W; Goldstein, Richard; Zafonte, Ross; Schneider, Jeffrey C

    2018-03-29

    Context/objective Acute care readmission has been identified as an important marker of healthcare quality. Most previous models assessing risk prediction of readmission incorporate variables for medical comorbidity. We hypothesized that functional status is a more robust predictor of readmission in the spinal cord injury population than medical comorbidities. Design Retrospective cross-sectional analysis. Setting Inpatient rehabilitation facilities, Uniform Data System for Medical Rehabilitation data from 2002 to 2012 Participants traumatic spinal cord injury patients. Outcome measures A logistic regression model for predicting acute care readmission based on demographic variables and functional status (Functional Model) was compared with models incorporating demographics, functional status, and medical comorbidities (Functional-Plus) or models including demographics and medical comorbidities (Demographic-Comorbidity). The primary outcomes were 3- and 30-day readmission, and the primary measure of model performance was the c-statistic. Results There were a total of 68,395 patients with 1,469 (2.15%) readmitted at 3 days and 7,081 (10.35%) readmitted at 30 days. The c-statistics for the Functional Model were 0.703 and 0.654 for 3 and 30 days. The Functional Model outperformed Demographic-Comorbidity models at 3 days (c-statistic difference: 0.066-0.096) and outperformed two of the three Demographic-Comorbidity models at 30 days (c-statistic difference: 0.029-0.056). The Functional-Plus models exhibited negligible improvements (0.002-0.010) in model performance compared to the Functional models. Conclusion Readmissions are used as a marker of hospital performance. Function-based readmission models in the spinal cord injury population outperform models incorporating medical comorbidities. Readmission risk models for this population would benefit from the inclusion of functional status.

  10. Hidden Markov model analysis of force/torque information in telemanipulation

    NASA Technical Reports Server (NTRS)

    Hannaford, Blake; Lee, Paul

    1991-01-01

    A model for the prediction and analysis of sensor information recorded during robotic performance of telemanipulation tasks is presented. The model uses the hidden Markov model to describe the task structure, the operator's or intelligent controller's goal structure, and the sensor signals. A methodology for constructing the model parameters based on engineering knowledge of the task is described. It is concluded that the model and its optimal state estimation algorithm, the Viterbi algorithm, are very succesful at the task of segmenting the data record into phases corresponding to subgoals of the task. The model provides a rich modeling structure within a statistical framework, which enables it to represent complex systems and be robust to real-world sensory signals.

  11. Three dimensional measurement of minimum joint space width in the knee from stereo radiographs using statistical shape models.

    PubMed

    van IJsseldijk, E A; Valstar, E R; Stoel, B C; Nelissen, R G H H; Baka, N; Van't Klooster, R; Kaptein, B L

    2016-08-01

    An important measure for the diagnosis and monitoring of knee osteoarthritis is the minimum joint space width (mJSW). This requires accurate alignment of the x-ray beam with the tibial plateau, which may not be accomplished in practice. We investigate the feasibility of a new mJSW measurement method from stereo radiographs using 3D statistical shape models (SSM) and evaluate its sensitivity to changes in the mJSW and its robustness to variations in patient positioning and bone geometry. A validation study was performed using five cadaver specimens. The actual mJSW was varied and images were acquired with variation in the cadaver positioning. For comparison purposes, the mJSW was also assessed from plain radiographs. To study the influence of SSM model accuracy, the 3D mJSW measurement was repeated with models from the actual bones, obtained from CT scans. The SSM-based measurement method was more robust (consistent output for a wide range of input data/consistent output under varying measurement circumstances) than the conventional 2D method, showing that the 3D reconstruction indeed reduces the influence of patient positioning. However, the SSM-based method showed comparable sensitivity to changes in the mJSW with respect to the conventional method. The CT-based measurement was more accurate than the SSM-based measurement (smallest detectable differences 0.55 mm versus 0. 82 mm, respectively). The proposed measurement method is not a substitute for the conventional 2D measurement due to limitations in the SSM model accuracy. However, further improvement of the model accuracy and optimisation technique can be obtained. Combined with the promising options for applications using quantitative information on bone morphology, SSM based 3D reconstructions of natural knees are attractive for further development.Cite this article: E. A. van IJsseldijk, E. R. Valstar, B. C. Stoel, R. G. H. H. Nelissen, N. Baka, R. van't Klooster, B. L. Kaptein. Three dimensional measurement of minimum joint space width in the knee from stereo radiographs using statistical shape models. Bone Joint Res 2016;320-327. DOI: 10.1302/2046-3758.58.2000626. © 2016 van IJsseldijk et al.

  12. Enhancing image classification models with multi-modal biomarkers

    NASA Astrophysics Data System (ADS)

    Caban, Jesus J.; Liao, David; Yao, Jianhua; Mollura, Daniel J.; Gochuico, Bernadette; Yoo, Terry

    2011-03-01

    Currently, most computer-aided diagnosis (CAD) systems rely on image analysis and statistical models to diagnose, quantify, and monitor the progression of a particular disease. In general, CAD systems have proven to be effective at providing quantitative measurements and assisting physicians during the decision-making process. As the need for more flexible and effective CADs continues to grow, questions about how to enhance their accuracy have surged. In this paper, we show how statistical image models can be augmented with multi-modal physiological values to create more robust, stable, and accurate CAD systems. In particular, this paper demonstrates how highly correlated blood and EKG features can be treated as biomarkers and used to enhance image classification models designed to automatically score subjects with pulmonary fibrosis. In our results, a 3-5% improvement was observed when comparing the accuracy of CADs that use multi-modal biomarkers with those that only used image features. Our results show that lab values such as Erythrocyte Sedimentation Rate and Fibrinogen, as well as EKG measurements such as QRS and I:40, are statistically significant and can provide valuable insights about the severity of the pulmonary fibrosis disease.

  13. The comparison between several robust ridge regression estimators in the presence of multicollinearity and multiple outliers

    NASA Astrophysics Data System (ADS)

    Zahari, Siti Meriam; Ramli, Norazan Mohamed; Moktar, Balkiah; Zainol, Mohammad Said

    2014-09-01

    In the presence of multicollinearity and multiple outliers, statistical inference of linear regression model using ordinary least squares (OLS) estimators would be severely affected and produces misleading results. To overcome this, many approaches have been investigated. These include robust methods which were reported to be less sensitive to the presence of outliers. In addition, ridge regression technique was employed to tackle multicollinearity problem. In order to mitigate both problems, a combination of ridge regression and robust methods was discussed in this study. The superiority of this approach was examined when simultaneous presence of multicollinearity and multiple outliers occurred in multiple linear regression. This study aimed to look at the performance of several well-known robust estimators; M, MM, RIDGE and robust ridge regression estimators, namely Weighted Ridge M-estimator (WRM), Weighted Ridge MM (WRMM), Ridge MM (RMM), in such a situation. Results of the study showed that in the presence of simultaneous multicollinearity and multiple outliers (in both x and y-direction), the RMM and RIDGE are more or less similar in terms of superiority over the other estimators, regardless of the number of observation, level of collinearity and percentage of outliers used. However, when outliers occurred in only single direction (y-direction), the WRMM estimator is the most superior among the robust ridge regression estimators, by producing the least variance. In conclusion, the robust ridge regression is the best alternative as compared to robust and conventional least squares estimators when dealing with simultaneous presence of multicollinearity and outliers.

  14. Robust hierarchical state-space models reveal diel variation in travel rates of migrating leatherback turtles.

    PubMed

    Jonsen, Ian D; Myers, Ransom A; James, Michael C

    2006-09-01

    1. Biological and statistical complexity are features common to most ecological data that hinder our ability to extract meaningful patterns using conventional tools. Recent work on implementing modern statistical methods for analysis of such ecological data has focused primarily on population dynamics but other types of data, such as animal movement pathways obtained from satellite telemetry, can also benefit from the application of modern statistical tools. 2. We develop a robust hierarchical state-space approach for analysis of multiple satellite telemetry pathways obtained via the Argos system. State-space models are time-series methods that allow unobserved states and biological parameters to be estimated from data observed with error. We show that the approach can reveal important patterns in complex, noisy data where conventional methods cannot. 3. Using the largest Atlantic satellite telemetry data set for critically endangered leatherback turtles, we show that the diel pattern in travel rates of these turtles changes over different phases of their migratory cycle. While foraging in northern waters the turtles show similar travel rates during day and night, but on their southward migration to tropical waters travel rates are markedly faster during the day. These patterns are generally consistent with diving data, and may be related to changes in foraging behaviour. Interestingly, individuals that migrate southward to breed generally show higher daytime travel rates than individuals that migrate southward in a non-breeding year. 4. Our approach is extremely flexible and can be applied to many ecological analyses that use complex, sequential data.

  15. Vehicle active steering control research based on two-DOF robust internal model control

    NASA Astrophysics Data System (ADS)

    Wu, Jian; Liu, Yahui; Wang, Fengbo; Bao, Chunjiang; Sun, Qun; Zhao, Youqun

    2016-07-01

    Because of vehicle's external disturbances and model uncertainties, robust control algorithms have obtained popularity in vehicle stability control. The robust control usually gives up performance in order to guarantee the robustness of the control algorithm, therefore an improved robust internal model control(IMC) algorithm blending model tracking and internal model control is put forward for active steering system in order to reach high performance of yaw rate tracking with certain robustness. The proposed algorithm inherits the good model tracking ability of the IMC control and guarantees robustness to model uncertainties. In order to separate the design process of model tracking from the robustness design process, the improved 2 degree of freedom(DOF) robust internal model controller structure is given from the standard Youla parameterization. Simulations of double lane change maneuver and those of crosswind disturbances are conducted for evaluating the robust control algorithm, on the basis of a nonlinear vehicle simulation model with a magic tyre model. Results show that the established 2-DOF robust IMC method has better model tracking ability and a guaranteed level of robustness and robust performance, which can enhance the vehicle stability and handling, regardless of variations of the vehicle model parameters and the external crosswind interferences. Contradiction between performance and robustness of active steering control algorithm is solved and higher control performance with certain robustness to model uncertainties is obtained.

  16. Study of subgrid-scale velocity models for reacting and nonreacting flows

    NASA Astrophysics Data System (ADS)

    Langella, I.; Doan, N. A. K.; Swaminathan, N.; Pope, S. B.

    2018-05-01

    A study is conducted to identify advantages and limitations of existing large-eddy simulation (LES) closures for the subgrid-scale (SGS) kinetic energy using a database of direct numerical simulations (DNS). The analysis is conducted for both reacting and nonreacting flows, different turbulence conditions, and various filter sizes. A model, based on dissipation and diffusion of momentum (LD-D model), is proposed in this paper based on the observed behavior of four existing models. Our model shows the best overall agreements with DNS statistics. Two main investigations are conducted for both reacting and nonreacting flows: (i) an investigation on the robustness of the model constants, showing that commonly used constants lead to a severe underestimation of the SGS kinetic energy and enlightening their dependence on Reynolds number and filter size; and (ii) an investigation on the statistical behavior of the SGS closures, which suggests that the dissipation of momentum is the key parameter to be considered in such closures and that dilatation effect is important and must be captured correctly in reacting flows. Additional properties of SGS kinetic energy modeling are identified and discussed.

  17. A statistically harmonized alignment-classification in image space enables accurate and robust alignment of noisy images in single particle analysis.

    PubMed

    Kawata, Masaaki; Sato, Chikara

    2007-06-01

    In determining the three-dimensional (3D) structure of macromolecular assemblies in single particle analysis, a large representative dataset of two-dimensional (2D) average images from huge number of raw images is a key for high resolution. Because alignments prior to averaging are computationally intensive, currently available multireference alignment (MRA) software does not survey every possible alignment. This leads to misaligned images, creating blurred averages and reducing the quality of the final 3D reconstruction. We present a new method, in which multireference alignment is harmonized with classification (multireference multiple alignment: MRMA). This method enables a statistical comparison of multiple alignment peaks, reflecting the similarities between each raw image and a set of reference images. Among the selected alignment candidates for each raw image, misaligned images are statistically excluded, based on the principle that aligned raw images of similar projections have a dense distribution around the correctly aligned coordinates in image space. This newly developed method was examined for accuracy and speed using model image sets with various signal-to-noise ratios, and with electron microscope images of the Transient Receptor Potential C3 and the sodium channel. In every data set, the newly developed method outperformed conventional methods in robustness against noise and in speed, creating 2D average images of higher quality. This statistically harmonized alignment-classification combination should greatly improve the quality of single particle analysis.

  18. Dendritic growth model of multilevel marketing

    NASA Astrophysics Data System (ADS)

    Pang, James Christopher S.; Monterola, Christopher P.

    2017-02-01

    Biologically inspired dendritic network growth is utilized to model the evolving connections of a multilevel marketing (MLM) enterprise. Starting from agents at random spatial locations, a network is formed by minimizing a distance cost function controlled by a parameter, termed the balancing factor bf, that weighs the wiring and the path length costs of connection. The paradigm is compared to an actual MLM membership data and is shown to be successful in statistically capturing the membership distribution, better than the previously reported agent based preferential attachment or analytic branching process models. Moreover, it recovers the known empirical statistics of previously studied MLM, specifically: (i) a membership distribution characterized by the existence of peak levels indicating limited growth, and (ii) an income distribution obeying the 80 - 20 Pareto principle. Extensive types of income distributions from uniform to Pareto to a "winner-take-all" kind are also modeled by varying bf. Finally, the robustness of our dendritic growth paradigm to random agent removals is explored and its implications to MLM income distributions are discussed.

  19. Model based inference from microvascular measurements: Combining experimental measurements and model predictions using a Bayesian probabilistic approach

    PubMed Central

    Rasmussen, Peter M.; Smith, Amy F.; Sakadžić, Sava; Boas, David A.; Pries, Axel R.; Secomb, Timothy W.; Østergaard, Leif

    2017-01-01

    Objective In vivo imaging of the microcirculation and network-oriented modeling have emerged as powerful means of studying microvascular function and understanding its physiological significance. Network-oriented modeling may provide the means of summarizing vast amounts of data produced by high-throughput imaging techniques in terms of key, physiological indices. To estimate such indices with sufficient certainty, however, network-oriented analysis must be robust to the inevitable presence of uncertainty due to measurement errors as well as model errors. Methods We propose the Bayesian probabilistic data analysis framework as a means of integrating experimental measurements and network model simulations into a combined and statistically coherent analysis. The framework naturally handles noisy measurements and provides posterior distributions of model parameters as well as physiological indices associated with uncertainty. Results We applied the analysis framework to experimental data from three rat mesentery networks and one mouse brain cortex network. We inferred distributions for more than five hundred unknown pressure and hematocrit boundary conditions. Model predictions were consistent with previous analyses, and remained robust when measurements were omitted from model calibration. Conclusion Our Bayesian probabilistic approach may be suitable for optimizing data acquisition and for analyzing and reporting large datasets acquired as part of microvascular imaging studies. PMID:27987383

  20. Simulated performance of an order statistic threshold strategy for detection of narrowband signals

    NASA Technical Reports Server (NTRS)

    Satorius, E.; Brady, R.; Deich, W.; Gulkis, S.; Olsen, E.

    1988-01-01

    The application of order statistics to signal detection is becoming an increasingly active area of research. This is due to the inherent robustness of rank estimators in the presence of large outliers that would significantly degrade more conventional mean-level-based detection systems. A detection strategy is presented in which the threshold estimate is obtained using order statistics. The performance of this algorithm in the presence of simulated interference and broadband noise is evaluated. In this way, the robustness of the proposed strategy in the presence of the interference can be fully assessed as a function of the interference, noise, and detector parameters.

  1. Employing Sensitivity Derivatives for Robust Optimization under Uncertainty in CFD

    NASA Technical Reports Server (NTRS)

    Newman, Perry A.; Putko, Michele M.; Taylor, Arthur C., III

    2004-01-01

    A robust optimization is demonstrated on a two-dimensional inviscid airfoil problem in subsonic flow. Given uncertainties in statistically independent, random, normally distributed flow parameters (input variables), an approximate first-order statistical moment method is employed to represent the Computational Fluid Dynamics (CFD) code outputs as expected values with variances. These output quantities are used to form the objective function and constraints. The constraints are cast in probabilistic terms; that is, the probability that a constraint is satisfied is greater than or equal to some desired target probability. Gradient-based robust optimization of this stochastic problem is accomplished through use of both first and second-order sensitivity derivatives. For each robust optimization, the effect of increasing both input standard deviations and target probability of constraint satisfaction are demonstrated. This method provides a means for incorporating uncertainty when considering small deviations from input mean values.

  2. Atlas-based liver segmentation and hepatic fat-fraction assessment for clinical trials.

    PubMed

    Yan, Zhennan; Zhang, Shaoting; Tan, Chaowei; Qin, Hongxing; Belaroussi, Boubakeur; Yu, Hui Jing; Miller, Colin; Metaxas, Dimitris N

    2015-04-01

    Automated assessment of hepatic fat-fraction is clinically important. A robust and precise segmentation would enable accurate, objective and consistent measurement of hepatic fat-fraction for disease quantification, therapy monitoring and drug development. However, segmenting the liver in clinical trials is a challenging task due to the variability of liver anatomy as well as the diverse sources the images were acquired from. In this paper, we propose an automated and robust framework for liver segmentation and assessment. It uses single statistical atlas registration to initialize a robust deformable model to obtain fine segmentation. Fat-fraction map is computed by using chemical shift based method in the delineated region of liver. This proposed method is validated on 14 abdominal magnetic resonance (MR) volumetric scans. The qualitative and quantitative comparisons show that our proposed method can achieve better segmentation accuracy with less variance comparing with two other atlas-based methods. Experimental results demonstrate the promises of our assessment framework. Copyright © 2014 Elsevier Ltd. All rights reserved.

  3. Efficient and robust quantum random number generation by photon number detection

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Applegate, M. J.; Cavendish Laboratory, University of Cambridge, 19 JJ Thomson Avenue, Cambridge CB3 0HE; Thomas, O.

    2015-08-17

    We present an efficient and robust quantum random number generator based upon high-rate room temperature photon number detection. We employ an electric field-modulated silicon avalanche photodiode, a type of device particularly suited to high-rate photon number detection with excellent photon number resolution to detect, without an applied dead-time, up to 4 photons from the optical pulses emitted by a laser. By both measuring and modeling the response of the detector to the incident photons, we are able to determine the illumination conditions that achieve an optimal bit rate that we show is robust against variation in the photon flux. Wemore » extract random bits from the detected photon numbers with an efficiency of 99% corresponding to 1.97 bits per detected photon number yielding a bit rate of 143 Mbit/s, and verify that the extracted bits pass stringent statistical tests for randomness. Our scheme is highly scalable and has the potential of multi-Gbit/s bit rates.« less

  4. Quick Overview Scout 2008 Version 1.0

    EPA Science Inventory

    The Scout 2008 version 1.0 statistical software package has been updated from past DOS and Windows versions to provide classical and robust univariate and multivariate graphical and statistical methods that are not typically available in commercial or freeware statistical softwar...

  5. A model of strength

    USGS Publications Warehouse

    Johnson, Douglas H.; Cook, R.D.

    2013-01-01

    In her AAAS News & Notes piece "Can the Southwest manage its thirst?" (26 July, p. 362), K. Wren quotes Ajay Kalra, who advocates a particular method for predicting Colorado River streamflow "because it eschews complex physical climate models for a statistical data-driven modeling approach." A preference for data-driven models may be appropriate in this individual situation, but it is not so generally, Data-driven models often come with a warning against extrapolating beyond the range of the data used to develop the models. When the future is like the past, data-driven models can work well for prediction, but it is easy to over-model local or transient phenomena, often leading to predictive inaccuracy (1). Mechanistic models are built on established knowledge of the process that connects the response variables with the predictors, using information obtained outside of an extant data set. One may shy away from a mechanistic approach when the underlying process is judged to be too complicated, but good predictive models can be constructed with statistical components that account for ingredients missing in the mechanistic analysis. Models with sound mechanistic components are more generally applicable and robust than data-driven models.

  6. Thermodynamic Controls on Deep Convection in the Tropics: Observations and Applications to Modeling

    NASA Astrophysics Data System (ADS)

    Schiro, Kathleen Anne

    Constraining precipitation processes in climate models with observations is crucial to accurately simulating current climate and reducing uncertainties in future projections. This work presents robust relationships between tropical deep convection, column-integrated water vapor (CWV), and other thermodynamic quantities analyzed with data from the DOE Atmospheric Radiation Measurement (ARM) Mobile Facility in Manacapuru, Brazil as part of the GOAmazon campaign and are directly compared to such relationships at DOE ARM sites in the tropical western Pacific. A robust relationship between CWV and precipitation, as explained by variability in lower tropospheric humidity, exists just as strongly in a tropical continental region as it does in a tropical oceanic region. Given sufficient mixing in the lower troposphere, higher CWV generally results in greater plume buoyancies through a deep convective layer. Although sensitivity of convection to other controls is suggested, such as microphysical processes and dynamical lifting mechanisms, the increase in buoyancy with CWV is consistent with the sharp increase in precipitation observed. Entraining plume buoyancy calculations confirm that CWV is a good proxy for the conditional instability of the environment, yet differences in convective onset as a function of CWV exist over land and ocean, as well as seasonally and diurnally over land. This is largely due to variability in the contribution of lower tropospheric humidity to the total column moisture. Over land, the relationship between deep convection and lower free tropospheric moisture is robust across all seasons and times of day, whereas the relation to boundary layer moisture is robust for the daytime only. Using S-Band radar, these transition statistics are examined separately for unorganized and mesoscale-organized convection, which exhibit sharp increases in probability of occurrence with increasing moisture throughout the column, particularly in the lower free troposphere. An observational basis for an integrated buoyancy measure from a single plume buoyancy formulation that provides a strong relation to precipitation can be useful for constraining convective parameterizations. A mixing scheme corresponding to deep inflow of environmental air into a plume that grows with height provides a weighting of boundary layer and free tropospheric air that yields buoyancies consistent with the observed onset of deep convection across seasons and times of day, across land and ocean sites, and for all convection types. This provides a substantial improvement relative to more traditional constant mixing assumptions, and a dramatic improvement relative to no mixing. Furthermore, it provides relationships that are as strong or stronger for mesoscale-organized convection as for unorganized convection. Downdrafts and their associated parameters are poorly constrained in models, as physical and microphysical processes of leading order importance are difficult to observe with sufficient frequency for development of robust statistics. Downdrafts and cold pool characteristics for mesoscale convective systems (MCSs) and isolated, unorganized deep precipitating convection in the Amazon are composited and both exhibit similar signatures in wind speed, surface fluxes, surface equivalent potential temperature (theta e) and precipitation. For both MCSs and unorganized convection, downdrafts associated with the strongest modifications to surface thermodynamics have increasing probability of occurrence with decreasing height through the lowest 4 km and show similar mean downdraft magnitudes with height. If theta e is approximately conserved following descent, a large fraction of the air reaching the surface likely originates at altitudes in the lowest 2 km. Mixing computations suggest that, on average, air originating at heights greater than 3 km would require substantial mixing, particularly in the case of isolated cells, to match the observed cold pool thetae . Statistics from two years of surface meteorological data at the GOAmazon site and 15 years of data at the DOE ARM site on Manus Island in the tropical western Pacific show that Deltathetae conditioned on precipitation levels off with increasing precipitation rate, bounded by the maximum difference between surface thetae and its minimum in the profile aloft. Robustness of these statistics observed across scales and regions suggests their potential use as model diagnostic tools for the improvement of downdraft parameterizations in climate models.

  7. Granger causality revisited

    PubMed Central

    Friston, Karl J.; Bastos, André M.; Oswal, Ashwini; van Wijk, Bernadette; Richter, Craig; Litvak, Vladimir

    2014-01-01

    This technical paper offers a critical re-evaluation of (spectral) Granger causality measures in the analysis of biological timeseries. Using realistic (neural mass) models of coupled neuronal dynamics, we evaluate the robustness of parametric and nonparametric Granger causality. Starting from a broad class of generative (state-space) models of neuronal dynamics, we show how their Volterra kernels prescribe the second-order statistics of their response to random fluctuations; characterised in terms of cross-spectral density, cross-covariance, autoregressive coefficients and directed transfer functions. These quantities in turn specify Granger causality — providing a direct (analytic) link between the parameters of a generative model and the expected Granger causality. We use this link to show that Granger causality measures based upon autoregressive models can become unreliable when the underlying dynamics is dominated by slow (unstable) modes — as quantified by the principal Lyapunov exponent. However, nonparametric measures based on causal spectral factors are robust to dynamical instability. We then demonstrate how both parametric and nonparametric spectral causality measures can become unreliable in the presence of measurement noise. Finally, we show that this problem can be finessed by deriving spectral causality measures from Volterra kernels, estimated using dynamic causal modelling. PMID:25003817

  8. Robust tissue-air volume segmentation of MR images based on the statistics of phase and magnitude: Its applications in the display of susceptibility-weighted imaging of the brain.

    PubMed

    Du, Yiping P; Jin, Zhaoyang

    2009-10-01

    To develop a robust algorithm for tissue-air segmentation in magnetic resonance imaging (MRI) using the statistics of phase and magnitude of the images. A multivariate measure based on the statistics of phase and magnitude was constructed for tissue-air volume segmentation. The standard deviation of first-order phase difference and the standard deviation of magnitude were calculated in a 3 x 3 x 3 kernel in the image domain. To improve differentiation accuracy, the uniformity of phase distribution in the kernel was also calculated and linear background phase introduced by field inhomogeneity was corrected. The effectiveness of the proposed volume segmentation technique was compared to a conventional approach that uses the magnitude data alone. The proposed algorithm was shown to be more effective and robust in volume segmentation in both synthetic phantom and susceptibility-weighted images of human brain. Using our proposed volume segmentation method, veins in the peripheral regions of the brain were well depicted in the minimum-intensity projection of the susceptibility-weighted images. Using the additional statistics of phase, tissue-air volume segmentation can be substantially improved compared to that using the statistics of magnitude data alone. (c) 2009 Wiley-Liss, Inc.

  9. On the streaming model for redshift-space distortions

    NASA Astrophysics Data System (ADS)

    Kuruvilla, Joseph; Porciani, Cristiano

    2018-06-01

    The streaming model describes the mapping between real and redshift space for 2-point clustering statistics. Its key element is the probability density function (PDF) of line-of-sight pairwise peculiar velocities. Following a kinetic-theory approach, we derive the fundamental equations of the streaming model for ordered and unordered pairs. In the first case, we recover the classic equation while we demonstrate that modifications are necessary for unordered pairs. We then discuss several statistical properties of the pairwise velocities for DM particles and haloes by using a suite of high-resolution N-body simulations. We test the often used Gaussian ansatz for the PDF of pairwise velocities and discuss its limitations. Finally, we introduce a mixture of Gaussians which is known in statistics as the generalised hyperbolic distribution and show that it provides an accurate fit to the PDF. Once inserted in the streaming equation, the fit yields an excellent description of redshift-space correlations at all scales that vastly outperforms the Gaussian and exponential approximations. Using a principal-component analysis, we reduce the complexity of our model for large redshift-space separations. Our results increase the robustness of studies of anisotropic galaxy clustering and are useful for extending them towards smaller scales in order to test theories of gravity and interacting dark-energy models.

  10. VBA: A Probabilistic Treatment of Nonlinear Models for Neurobiological and Behavioural Data

    PubMed Central

    Daunizeau, Jean; Adam, Vincent; Rigoux, Lionel

    2014-01-01

    This work is in line with an on-going effort tending toward a computational (quantitative and refutable) understanding of human neuro-cognitive processes. Many sophisticated models for behavioural and neurobiological data have flourished during the past decade. Most of these models are partly unspecified (i.e. they have unknown parameters) and nonlinear. This makes them difficult to peer with a formal statistical data analysis framework. In turn, this compromises the reproducibility of model-based empirical studies. This work exposes a software toolbox that provides generic, efficient and robust probabilistic solutions to the three problems of model-based analysis of empirical data: (i) data simulation, (ii) parameter estimation/model selection, and (iii) experimental design optimization. PMID:24465198

  11. A statistical approach for isolating fossil fuel emissions in atmospheric inverse problems

    DOE PAGES

    Yadav, Vineet; Michalak, Anna M.; Ray, Jaideep; ...

    2016-10-27

    We study independent verification and quantification of fossil fuel (FF) emissions that constitutes a considerable scientific challenge. By coupling atmospheric observations of CO 2 with models of atmospheric transport, inverse models offer the possibility of overcoming this challenge. However, disaggregating the biospheric and FF flux components of terrestrial fluxes from CO 2 concentration measurements has proven to be difficult, due to observational and modeling limitations. In this study, we propose a statistical inverse modeling scheme for disaggregating winter time fluxes on the basis of their unique error covariances and covariates, where these covariances and covariates are representative of the underlyingmore » processes affecting FF and biospheric fluxes. The application of the method is demonstrated with one synthetic and two real data prototypical inversions by using in situ CO 2 measurements over North America. Also, inversions are performed only for the month of January, as predominance of biospheric CO 2 signal relative to FF CO 2 signal and observational limitations preclude disaggregation of the fluxes in other months. The quality of disaggregation is assessed primarily through examination of a posteriori covariance between disaggregated FF and biospheric fluxes at regional scales. Findings indicate that the proposed method is able to robustly disaggregate fluxes regionally at monthly temporal resolution with a posteriori cross covariance lower than 0.15 µmol m -2 s -1 between FF and biospheric fluxes. Error covariance models and covariates based on temporally varying FF inventory data provide a more robust disaggregation over static proxies (e.g., nightlight intensity and population density). However, the synthetic data case study shows that disaggregation is possible even in absence of detailed temporally varying FF inventory data.« less

  12. A Poisson hierarchical modelling approach to detecting copy number variation in sequence coverage data.

    PubMed

    Sepúlveda, Nuno; Campino, Susana G; Assefa, Samuel A; Sutherland, Colin J; Pain, Arnab; Clark, Taane G

    2013-02-26

    The advent of next generation sequencing technology has accelerated efforts to map and catalogue copy number variation (CNV) in genomes of important micro-organisms for public health. A typical analysis of the sequence data involves mapping reads onto a reference genome, calculating the respective coverage, and detecting regions with too-low or too-high coverage (deletions and amplifications, respectively). Current CNV detection methods rely on statistical assumptions (e.g., a Poisson model) that may not hold in general, or require fine-tuning the underlying algorithms to detect known hits. We propose a new CNV detection methodology based on two Poisson hierarchical models, the Poisson-Gamma and Poisson-Lognormal, with the advantage of being sufficiently flexible to describe different data patterns, whilst robust against deviations from the often assumed Poisson model. Using sequence coverage data of 7 Plasmodium falciparum malaria genomes (3D7 reference strain, HB3, DD2, 7G8, GB4, OX005, and OX006), we showed that empirical coverage distributions are intrinsically asymmetric and overdispersed in relation to the Poisson model. We also demonstrated a low baseline false positive rate for the proposed methodology using 3D7 resequencing data and simulation. When applied to the non-reference isolate data, our approach detected known CNV hits, including an amplification of the PfMDR1 locus in DD2 and a large deletion in the CLAG3.2 gene in GB4, and putative novel CNV regions. When compared to the recently available FREEC and cn.MOPS approaches, our findings were more concordant with putative hits from the highest quality array data for the 7G8 and GB4 isolates. In summary, the proposed methodology brings an increase in flexibility, robustness, accuracy and statistical rigour to CNV detection using sequence coverage data.

  13. Robust effects of cloud superparameterization on simulated daily rainfall intensity statistics across multiple versions of the Community Earth System Model

    DOE PAGES

    Kooperman, Gabriel J.; Pritchard, Michael S.; Burt, Melissa A.; ...

    2016-02-01

    This study evaluates several important statistics of daily rainfall based on frequency and amount distributions as simulated by a global climate model whose precipitation does not depend on convective parameterization—Super-Parameterized Community Atmosphere Model (SPCAM). Three superparameterized and conventional versions of CAM, coupled within the Community Earth System Model (CESM1 and CCSM4), are compared against two modern rainfall products (GPCP 1DD and TRMM 3B42) to discriminate robust effects of superparameterization that emerge across multiple versions. The geographic pattern of annual-mean rainfall is mostly insensitive to superparameterization, with only slight improvements in the double-ITCZ bias. However, unfolding intensity distributions reveal several improvementsmore » in the character of rainfall simulated by SPCAM. The rainfall rate that delivers the most accumulated rain (i.e., amount mode) is systematically too weak in all versions of CAM relative to TRMM 3B42 and does not improve with horizontal resolution. It is improved by superparameterization though, with higher modes in regions of tropical wave, Madden-Julian Oscillation, and monsoon activity. Superparameterization produces better representations of extreme rates compared to TRMM 3B42, without sensitivity to horizontal resolution seen in CAM. SPCAM produces more dry days over land and fewer over the ocean. Updates to CAM’s low cloud parameterizations have narrowed the frequency peak of light rain, converging toward SPCAM. Poleward of 50°, where more rainfall is produced by resolved-scale processes in CAM, few differences discriminate the rainfall properties of the two models. Lastly, these results are discussed in light of their implication for future rainfall changes in response to climate forcing.« less

  14. Uncertainties in Future Regional Sea Level Trends: How to Deal with the Internal Climate Variability?

    NASA Astrophysics Data System (ADS)

    Becker, M.; Karpytchev, M.; Hu, A.; Deser, C.; Lennartz-Sassinek, S.

    2017-12-01

    Today, the Climate models (CM) are the main tools for forecasting sea level rise (SLR) at global and regional scales. The CM forecasts are accompanied by inherent uncertainties. Understanding and reducing these uncertainties is becoming a matter of increasing urgency in order to provide robust estimates of SLR impact on coastal societies, which need sustainable choices of climate adaptation strategy. These CM uncertainties are linked to structural model formulation, initial conditions, emission scenario and internal variability. The internal variability is due to complex non-linear interactions within the Earth Climate System and can induce diverse quasi-periodic oscillatory modes and long-term persistences. To quantify the effects of internal variability, most studies used multi-model ensembles or sea level projections from a single model ran with perturbed initial conditions. However, large ensembles are not generally available, or too small, and computationally expensive. In this study, we use a power-law scaling of sea level fluctuations, as observed in many other geophysical signals and natural systems, which can be used to characterize the internal climate variability. From this specific statistical framework, we (1) use the pre-industrial control run of the National Center for Atmospheric Research Community Climate System Model (NCAR-CCSM) to test the robustness of the power-law scaling hypothesis; (2) employ the power-law statistics as a tool for assessing the spread of regional sea level projections due to the internal climate variability for the 21st century NCAR-CCSM; (3) compare the uncertainties in predicted sea level changes obtained from a NCAR-CCSM multi-member ensemble simulations with estimates derived for power-law processes, and (4) explore the sensitivity of spatial patterns of the internal variability and its effects on regional sea level projections.

  15. Robustness and structure of complex networks

    NASA Astrophysics Data System (ADS)

    Shao, Shuai

    This dissertation covers the two major parts of my PhD research on statistical physics and complex networks: i) modeling a new type of attack -- localized attack, and investigating robustness of complex networks under this type of attack; ii) discovering the clustering structure in complex networks and its influence on the robustness of coupled networks. Complex networks appear in every aspect of our daily life and are widely studied in Physics, Mathematics, Biology, and Computer Science. One important property of complex networks is their robustness under attacks, which depends crucially on the nature of attacks and the structure of the networks themselves. Previous studies have focused on two types of attack: random attack and targeted attack, which, however, are insufficient to describe many real-world damages. Here we propose a new type of attack -- localized attack, and study the robustness of complex networks under this type of attack, both analytically and via simulation. On the other hand, we also study the clustering structure in the network, and its influence on the robustness of a complex network system. In the first part, we propose a theoretical framework to study the robustness of complex networks under localized attack based on percolation theory and generating function method. We investigate the percolation properties, including the critical threshold of the phase transition pc and the size of the giant component Pinfinity. We compare localized attack with random attack and find that while random regular (RR) networks are more robust against localized attack, Erdoḧs-Renyi (ER) networks are equally robust under both types of attacks. As for scale-free (SF) networks, their robustness depends crucially on the degree exponent lambda. The simulation results show perfect agreement with theoretical predictions. We also test our model on two real-world networks: a peer-to-peer computer network and an airline network, and find that the real-world networks are much more vulnerable to localized attack compared with random attack. In the second part, we extend the tree-like generating function method to incorporating clustering structure in complex networks. We study the robustness of a complex network system, especially a network of networks (NON) with clustering structure in each network. We find that the system becomes less robust as we increase the clustering coefficient of each network. For a partially dependent network system, we also find that the influence of the clustering coefficient on network robustness decreases as we decrease the coupling strength, and the critical coupling strength qc, at which the first-order phase transition changes to second-order, increases as we increase the clustering coefficient.

  16. Hierarchical Modeling and Robust Synthesis for the Preliminary Design of Large Scale Complex Systems

    NASA Technical Reports Server (NTRS)

    Koch, Patrick N.

    1997-01-01

    Large-scale complex systems are characterized by multiple interacting subsystems and the analysis of multiple disciplines. The design and development of such systems inevitably requires the resolution of multiple conflicting objectives. The size of complex systems, however, prohibits the development of comprehensive system models, and thus these systems must be partitioned into their constituent parts. Because simultaneous solution of individual subsystem models is often not manageable iteration is inevitable and often excessive. In this dissertation these issues are addressed through the development of a method for hierarchical robust preliminary design exploration to facilitate concurrent system and subsystem design exploration, for the concurrent generation of robust system and subsystem specifications for the preliminary design of multi-level, multi-objective, large-scale complex systems. This method is developed through the integration and expansion of current design techniques: Hierarchical partitioning and modeling techniques for partitioning large-scale complex systems into more tractable parts, and allowing integration of subproblems for system synthesis; Statistical experimentation and approximation techniques for increasing both the efficiency and the comprehensiveness of preliminary design exploration; and Noise modeling techniques for implementing robust preliminary design when approximate models are employed. Hierarchical partitioning and modeling techniques including intermediate responses, linking variables, and compatibility constraints are incorporated within a hierarchical compromise decision support problem formulation for synthesizing subproblem solutions for a partitioned system. Experimentation and approximation techniques are employed for concurrent investigations and modeling of partitioned subproblems. A modified composite experiment is introduced for fitting better predictive models across the ranges of the factors, and an approach for constructing partitioned response surfaces is developed to reduce the computational expense of experimentation for fitting models in a large number of factors. Noise modeling techniques are compared and recommendations are offered for the implementation of robust design when approximate models are sought. These techniques, approaches, and recommendations are incorporated within the method developed for hierarchical robust preliminary design exploration. This method as well as the associated approaches are illustrated through their application to the preliminary design of a commercial turbofan turbine propulsion system. The case study is developed in collaboration with Allison Engine Company, Rolls Royce Aerospace, and is based on the Allison AE3007 existing engine designed for midsize commercial, regional business jets. For this case study, the turbofan system-level problem is partitioned into engine cycle design and configuration design and a compressor modules integrated for more detailed subsystem-level design exploration, improving system evaluation. The fan and low pressure turbine subsystems are also modeled, but in less detail. Given the defined partitioning, these subproblems are investigated independently and concurrently, and response surface models are constructed to approximate the responses of each. These response models are then incorporated within a commercial turbofan hierarchical compromise decision support problem formulation. Five design scenarios are investigated, and robust solutions are identified. The method and solutions identified are verified by comparison with the AE3007 engine. The solutions obtained are similar to the AE3007 cycle and configuration, but are better with respect to many of the requirements.

  17. Statistical Interior Tomography

    PubMed Central

    Xu, Qiong; Wang, Ge; Sieren, Jered; Hoffman, Eric A.

    2011-01-01

    This paper presents a statistical interior tomography (SIT) approach making use of compressed sensing (CS) theory. With the projection data modeled by the Poisson distribution, an objective function with a total variation (TV) regularization term is formulated in the maximization of a posteriori (MAP) framework to solve the interior problem. An alternating minimization method is used to optimize the objective function with an initial image from the direct inversion of the truncated Hilbert transform. The proposed SIT approach is extensively evaluated with both numerical and real datasets. The results demonstrate that SIT is robust with respect to data noise and down-sampling, and has better resolution and less bias than its deterministic counterpart in the case of low count data. PMID:21233044

  18. Variable selection for marginal longitudinal generalized linear models.

    PubMed

    Cantoni, Eva; Flemming, Joanna Mills; Ronchetti, Elvezio

    2005-06-01

    Variable selection is an essential part of any statistical analysis and yet has been somewhat neglected in the context of longitudinal data analysis. In this article, we propose a generalized version of Mallows's C(p) (GC(p)) suitable for use with both parametric and nonparametric models. GC(p) provides an estimate of a measure of model's adequacy for prediction. We examine its performance with popular marginal longitudinal models (fitted using GEE) and contrast results with what is typically done in practice: variable selection based on Wald-type or score-type tests. An application to real data further demonstrates the merits of our approach while at the same time emphasizing some important robust features inherent to GC(p).

  19. A robust method for estimating motorbike count based on visual information learning

    NASA Astrophysics Data System (ADS)

    Huynh, Kien C.; Thai, Dung N.; Le, Sach T.; Thoai, Nam; Hamamoto, Kazuhiko

    2015-03-01

    Estimating the number of vehicles in traffic videos is an important and challenging task in traffic surveillance, especially with a high level of occlusions between vehicles, e.g.,in crowded urban area with people and/or motorbikes. In such the condition, the problem of separating individual vehicles from foreground silhouettes often requires complicated computation [1][2][3]. Thus, the counting problem is gradually shifted into drawing statistical inferences of target objects density from their shape [4], local features [5], etc. Those researches indicate a correlation between local features and the number of target objects. However, they are inadequate to construct an accurate model for vehicles density estimation. In this paper, we present a reliable method that is robust to illumination changes and partial affine transformations. It can achieve high accuracy in case of occlusions. Firstly, local features are extracted from images of the scene using Speed-Up Robust Features (SURF) method. For each image, a global feature vector is computed using a Bag-of-Words model which is constructed from the local features above. Finally, a mapping between the extracted global feature vectors and their labels (the number of motorbikes) is learned. That mapping provides us a strong prediction model for estimating the number of motorbikes in new images. The experimental results show that our proposed method can achieve a better accuracy in comparison to others.

  20. Experimental design and statistical methods for improved hit detection in high-throughput screening.

    PubMed

    Malo, Nathalie; Hanley, James A; Carlile, Graeme; Liu, Jing; Pelletier, Jerry; Thomas, David; Nadon, Robert

    2010-09-01

    Identification of active compounds in high-throughput screening (HTS) contexts can be substantially improved by applying classical experimental design and statistical inference principles to all phases of HTS studies. The authors present both experimental and simulated data to illustrate how true-positive rates can be maximized without increasing false-positive rates by the following analytical process. First, the use of robust data preprocessing methods reduces unwanted variation by removing row, column, and plate biases. Second, replicate measurements allow estimation of the magnitude of the remaining random error and the use of formal statistical models to benchmark putative hits relative to what is expected by chance. Receiver Operating Characteristic (ROC) analyses revealed superior power for data preprocessed by a trimmed-mean polish method combined with the RVM t-test, particularly for small- to moderate-sized biological hits.

  1. A robust bayesian estimate of the concordance correlation coefficient.

    PubMed

    Feng, Dai; Baumgartner, Richard; Svetnik, Vladimir

    2015-01-01

    A need for assessment of agreement arises in many situations including statistical biomarker qualification or assay or method validation. Concordance correlation coefficient (CCC) is one of the most popular scaled indices reported in evaluation of agreement. Robust methods for CCC estimation currently present an important statistical challenge. Here, we propose a novel Bayesian method of robust estimation of CCC based on multivariate Student's t-distribution and compare it with its alternatives. Furthermore, we extend the method to practically relevant settings, enabling incorporation of confounding covariates and replications. The superiority of the new approach is demonstrated using simulation as well as real datasets from biomarker application in electroencephalography (EEG). This biomarker is relevant in neuroscience for development of treatments for insomnia.

  2. Robust LOD scores for variance component-based linkage analysis.

    PubMed

    Blangero, J; Williams, J T; Almasy, L

    2000-01-01

    The variance component method is now widely used for linkage analysis of quantitative traits. Although this approach offers many advantages, the importance of the underlying assumption of multivariate normality of the trait distribution within pedigrees has not been studied extensively. Simulation studies have shown that traits with leptokurtic distributions yield linkage test statistics that exhibit excessive Type I error when analyzed naively. We derive analytical formulae relating the deviation from the expected asymptotic distribution of the lod score to the kurtosis and total heritability of the quantitative trait. A simple correction constant yields a robust lod score for any deviation from normality and for any pedigree structure, and effectively eliminates the problem of inflated Type I error due to misspecification of the underlying probability model in variance component-based linkage analysis.

  3. Comparing geological and statistical approaches for element selection in sediment tracing research

    NASA Astrophysics Data System (ADS)

    Laceby, J. Patrick; McMahon, Joe; Evrard, Olivier; Olley, Jon

    2015-04-01

    Elevated suspended sediment loads reduce reservoir capacity and significantly increase the cost of operating water treatment infrastructure, making the management of sediment supply to reservoirs of increasingly importance. Sediment fingerprinting techniques can be used to determine the relative contributions of different sources of sediment accumulating in reservoirs. The objective of this research is to compare geological and statistical approaches to element selection for sediment fingerprinting modelling. Time-integrated samplers (n=45) were used to obtain source samples from four major subcatchments flowing into the Baroon Pocket Dam in South East Queensland, Australia. The geochemistry of potential sources were compared to the geochemistry of sediment cores (n=12) sampled in the reservoir. The geochemical approach selected elements for modelling that provided expected, observed and statistical discrimination between sediment sources. Two statistical approaches selected elements for modelling with the Kruskal-Wallis H-test and Discriminatory Function Analysis (DFA). In particular, two different significance levels (0.05 & 0.35) for the DFA were included to investigate the importance of element selection on modelling results. A distribution model determined the relative contributions of different sources to sediment sampled in the Baroon Pocket Dam. Elemental discrimination was expected between one subcatchment (Obi Obi Creek) and the remaining subcatchments (Lexys, Falls and Bridge Creek). Six major elements were expected to provide discrimination. Of these six, only Fe2O3 and SiO2 provided expected, observed and statistical discrimination. Modelling results with this geological approach indicated 36% (+/- 9%) of sediment sampled in the reservoir cores were from mafic-derived sources and 64% (+/- 9%) were from felsic-derived sources. The geological and the first statistical approach (DFA0.05) differed by only 1% (σ 5%) for 5 out of 6 model groupings with only the Lexys Creek modelling results differing significantly (35%). The statistical model with expanded elemental selection (DFA0.35) differed from the geological model by an average of 30% for all 6 models. Elemental selection for sediment fingerprinting therefore has the potential to impact modeling results. Accordingly is important to incorporate both robust geological and statistical approaches when selecting elements for sediment fingerprinting. For the Baroon Pocket Dam, management should focus on reducing the supply of sediments derived from felsic sources in each of the subcatchments.

  4. Application of Statistics in Engineering Technology Programs

    ERIC Educational Resources Information Center

    Zhan, Wei; Fink, Rainer; Fang, Alex

    2010-01-01

    Statistics is a critical tool for robustness analysis, measurement system error analysis, test data analysis, probabilistic risk assessment, and many other fields in the engineering world. Traditionally, however, statistics is not extensively used in undergraduate engineering technology (ET) programs, resulting in a major disconnect from industry…

  5. Inverse probability weighting and doubly robust methods in correcting the effects of non-response in the reimbursed medication and self-reported turnout estimates in the ATH survey.

    PubMed

    Härkänen, Tommi; Kaikkonen, Risto; Virtala, Esa; Koskinen, Seppo

    2014-11-06

    To assess the nonresponse rates in a questionnaire survey with respect to administrative register data, and to correct the bias statistically. The Finnish Regional Health and Well-being Study (ATH) in 2010 was based on a national sample and several regional samples. Missing data analysis was based on socio-demographic register data covering the whole sample. Inverse probability weighting (IPW) and doubly robust (DR) methods were estimated using the logistic regression model, which was selected using the Bayesian information criteria. The crude, weighted and true self-reported turnout in the 2008 municipal election and prevalences of entitlements to specially reimbursed medication, and the crude and weighted body mass index (BMI) means were compared. The IPW method appeared to remove a relatively large proportion of the bias compared to the crude prevalence estimates of the turnout and the entitlements to specially reimbursed medication. Several demographic factors were shown to be associated with missing data, but few interactions were found. Our results suggest that the IPW method can improve the accuracy of results of a population survey, and the model selection provides insight into the structure of missing data. However, health-related missing data mechanisms are beyond the scope of statistical methods, which mainly rely on socio-demographic information to correct the results.

  6. Efficient Variable Selection Method for Exposure Variables on Binary Data

    NASA Astrophysics Data System (ADS)

    Ohno, Manabu; Tarumi, Tomoyuki

    In this paper, we propose a new variable selection method for "robust" exposure variables. We define "robust" as property that the same variable can select among original data and perturbed data. There are few studies of effective for the selection method. The problem that selects exposure variables is almost the same as a problem that extracts correlation rules without robustness. [Brin 97] is suggested that correlation rules are possible to extract efficiently using chi-squared statistic of contingency table having monotone property on binary data. But the chi-squared value does not have monotone property, so it's is easy to judge the method to be not independent with an increase in the dimension though the variable set is completely independent, and the method is not usable in variable selection for robust exposure variables. We assume anti-monotone property for independent variables to select robust independent variables and use the apriori algorithm for it. The apriori algorithm is one of the algorithms which find association rules from the market basket data. The algorithm use anti-monotone property on the support which is defined by association rules. But independent property does not completely have anti-monotone property on the AIC of independent probability model, but the tendency to have anti-monotone property is strong. Therefore, selected variables with anti-monotone property on the AIC have robustness. Our method judges whether a certain variable is exposure variable for the independent variable using previous comparison of the AIC. Our numerical experiments show that our method can select robust exposure variables efficiently and precisely.

  7. Interpretable dimensionality reduction of single cell transcriptome data with deep generative models.

    PubMed

    Ding, Jiarui; Condon, Anne; Shah, Sohrab P

    2018-05-21

    Single-cell RNA-sequencing has great potential to discover cell types, identify cell states, trace development lineages, and reconstruct the spatial organization of cells. However, dimension reduction to interpret structure in single-cell sequencing data remains a challenge. Existing algorithms are either not able to uncover the clustering structures in the data or lose global information such as groups of clusters that are close to each other. We present a robust statistical model, scvis, to capture and visualize the low-dimensional structures in single-cell gene expression data. Simulation results demonstrate that low-dimensional representations learned by scvis preserve both the local and global neighbor structures in the data. In addition, scvis is robust to the number of data points and learns a probabilistic parametric mapping function to add new data points to an existing embedding. We then use scvis to analyze four single-cell RNA-sequencing datasets, exemplifying interpretable two-dimensional representations of the high-dimensional single-cell RNA-sequencing data.

  8. The power metric: a new statistically robust enrichment-type metric for virtual screening applications with early recovery capability.

    PubMed

    Lopes, Julio Cesar Dias; Dos Santos, Fábio Mendes; Martins-José, Andrelly; Augustyns, Koen; De Winter, Hans

    2017-01-01

    A new metric for the evaluation of model performance in the field of virtual screening and quantitative structure-activity relationship applications is described. This metric has been termed the power metric and is defined as the fraction of the true positive rate divided by the sum of the true positive and false positive rates, for a given cutoff threshold. The performance of this metric is compared with alternative metrics such as the enrichment factor, the relative enrichment factor, the receiver operating curve enrichment factor, the correct classification rate, Matthews correlation coefficient and Cohen's kappa coefficient. The performance of this new metric is found to be quite robust with respect to variations in the applied cutoff threshold and ratio of the number of active compounds to the total number of compounds, and at the same time being sensitive to variations in model quality. It possesses the correct characteristics for its application in early-recognition virtual screening problems.

  9. Comparison of beam position calculation methods for application in digital acquisition systems

    NASA Astrophysics Data System (ADS)

    Reiter, A.; Singh, R.

    2018-05-01

    Different approaches to the data analysis of beam position monitors in hadron accelerators are compared adopting the perspective of an analog-to-digital converter in a sampling acquisition system. Special emphasis is given to position uncertainty and robustness against bias and interference that may be encountered in an accelerator environment. In a time-domain analysis of data in the presence of statistical noise, the position calculation based on the difference-over-sum method with algorithms like signal integral or power can be interpreted as a least-squares analysis of a corresponding fit function. This link to the least-squares method is exploited in the evaluation of analysis properties and in the calculation of position uncertainty. In an analytical model and experimental evaluations the positions derived from a straight line fit or equivalently the standard deviation are found to be the most robust and to offer the least variance. The measured position uncertainty is consistent with the model prediction in our experiment, and the results of tune measurements improve significantly.

  10. Flow throughout the Earth's core inverted from geomagnetic observations and numerical dynamo models

    NASA Astrophysics Data System (ADS)

    Aubert, Julien

    2013-02-01

    This paper introduces inverse geodynamo modelling, a framework imaging flow throughout the Earth's core from observations of the geomagnetic field and its secular variation. The necessary prior information is provided by statistics from 3-D and self-consistent numerical simulations of the geodynamo. The core method is a linear estimation (or Kalman filtering) procedure, combined with standard frozen-flux core surface flow inversions in order to handle the non-linearity of the problem. The inversion scheme is successfully validated using synthetic test experiments. A set of four numerical dynamo models of increasing physical complexity and similarity to the geomagnetic field is then used to invert for flows at single epochs within the period 1970-2010, using data from the geomagnetic field models CM4 and gufm-sat-Q3. The resulting core surface flows generally provide satisfactory fits to the secular variation within the level of modelled errors, and robustly reproduce the most commonly observed patterns while additionally presenting a high degree of equatorial symmetry. The corresponding deep flows present a robust, highly columnar structure once rotational constraints are enforced to a high level in the prior models, with patterns strikingly similar to the results of quasi-geostrophic inversions. In particular, the presence of a persistent planetary scale, eccentric westward columnar gyre circling around the inner core is confirmed. The strength of the approach is to uniquely determine the trade-off between fit to the data and complexity of the solution by clearly connecting it to first principle physics; statistical deviations observed between the inverted flows and the standard model behaviour can then be used to quantitatively assess the shortcomings of the physical modelling. Such deviations include the (i) westwards and (ii) hemispherical character of the eccentric gyre. A prior model with angular momentum conservation of the core-mantle inner-core system, and gravitational coupling of reasonable strength between the mantle and the inner core, is shown to produce enough westward drift to resolve statistical deviation (i). Deviation (ii) is resolved by a prior with an hemispherical buoyancy release at the inner-core boundary, with excess buoyancy below Asia. This latter result suggests that the recently proposed inner-core translational instability presently transports the solid inner-core material westwards, opposite to the seismologically inferred long-term trend but consistently with the eccentricity of the geomagnetic dipole in recent times.

  11. Robust versus consistent variance estimators in marginal structural Cox models.

    PubMed

    Enders, Dirk; Engel, Susanne; Linder, Roland; Pigeot, Iris

    2018-06-11

    In survival analyses, inverse-probability-of-treatment (IPT) and inverse-probability-of-censoring (IPC) weighted estimators of parameters in marginal structural Cox models are often used to estimate treatment effects in the presence of time-dependent confounding and censoring. In most applications, a robust variance estimator of the IPT and IPC weighted estimator is calculated leading to conservative confidence intervals. This estimator assumes that the weights are known rather than estimated from the data. Although a consistent estimator of the asymptotic variance of the IPT and IPC weighted estimator is generally available, applications and thus information on the performance of the consistent estimator are lacking. Reasons might be a cumbersome implementation in statistical software, which is further complicated by missing details on the variance formula. In this paper, we therefore provide a detailed derivation of the variance of the asymptotic distribution of the IPT and IPC weighted estimator and explicitly state the necessary terms to calculate a consistent estimator of this variance. We compare the performance of the robust and consistent variance estimators in an application based on routine health care data and in a simulation study. The simulation reveals no substantial differences between the 2 estimators in medium and large data sets with no unmeasured confounding, but the consistent variance estimator performs poorly in small samples or under unmeasured confounding, if the number of confounders is large. We thus conclude that the robust estimator is more appropriate for all practical purposes. Copyright © 2018 John Wiley & Sons, Ltd.

  12. On the Computation of the RMSEA and CFI from the Mean-And-Variance Corrected Test Statistic with Nonnormal Data in SEM.

    PubMed

    Savalei, Victoria

    2018-01-01

    A new type of nonnormality correction to the RMSEA has recently been developed, which has several advantages over existing corrections. In particular, the new correction adjusts the sample estimate of the RMSEA for the inflation due to nonnormality, while leaving its population value unchanged, so that established cutoff criteria can still be used to judge the degree of approximate fit. A confidence interval (CI) for the new robust RMSEA based on the mean-corrected ("Satorra-Bentler") test statistic has also been proposed. Follow up work has provided the same type of nonnormality correction for the CFI (Brosseau-Liard & Savalei, 2014). These developments have recently been implemented in lavaan. This note has three goals: a) to show how to compute the new robust RMSEA and CFI from the mean-and-variance corrected test statistic; b) to offer a new CI for the robust RMSEA based on the mean-and-variance corrected test statistic; and c) to caution that the logic of the new nonnormality corrections to RMSEA and CFI is most appropriate for the maximum likelihood (ML) estimator, and cannot easily be generalized to the most commonly used categorical data estimators.

  13. A QSAR study of integrase strand transfer inhibitors based on a large set of pyrimidine, pyrimidone, and pyridopyrazine carboxamide derivatives

    NASA Astrophysics Data System (ADS)

    de Campos, Luana Janaína; de Melo, Eduardo Borges

    2017-08-01

    In the present study, 199 compounds derived from pyrimidine, pyrimidone and pyridopyrazine carboxamides with inhibitory activity against HIV-1 integrase were modeled. Subsequently, a multivariate QSAR study was conducted with 54 molecules employed by Ordered Predictors Selection (OPS) and Partial Least Squares (PLS) for the selection of variables and model construction, respectively. Topological, electrotopological, geometric, and molecular descriptors were used. The selected real model was robust and free from chance correlation; in addition, it demonstrated favorable internal and external statistical quality. Once statistically validated, the training model was used to predict the activity of a second data set (n = 145). The root mean square deviation (RMSD) between observed and predicted values was 0.698. Although it is a value outside of the standards, only 15 (10.34%) of the samples exhibited higher residual values than 1 log unit, a result considered acceptable. Results of Williams and Euclidean applicability domains relative to the prediction showed that the predictions did not occur by extrapolation and that the model is representative of the chemical space of test compounds.

  14. Nine time steps: ultra-fast statistical consistency testing of the Community Earth System Model (pyCECT v3.0)

    NASA Astrophysics Data System (ADS)

    Milroy, Daniel J.; Baker, Allison H.; Hammerling, Dorit M.; Jessup, Elizabeth R.

    2018-02-01

    The Community Earth System Model Ensemble Consistency Test (CESM-ECT) suite was developed as an alternative to requiring bitwise identical output for quality assurance. This objective test provides a statistical measurement of consistency between an accepted ensemble created by small initial temperature perturbations and a test set of CESM simulations. In this work, we extend the CESM-ECT suite with an inexpensive and robust test for ensemble consistency that is applied to Community Atmospheric Model (CAM) output after only nine model time steps. We demonstrate that adequate ensemble variability is achieved with instantaneous variable values at the ninth step, despite rapid perturbation growth and heterogeneous variable spread. We refer to this new test as the Ultra-Fast CAM Ensemble Consistency Test (UF-CAM-ECT) and demonstrate its effectiveness in practice, including its ability to detect small-scale events and its applicability to the Community Land Model (CLM). The new ultra-fast test facilitates CESM development, porting, and optimization efforts, particularly when used to complement information from the original CESM-ECT suite of tools.

  15. An improved empirical dynamic control system model of global mean sea level rise and surface temperature change

    NASA Astrophysics Data System (ADS)

    Wu, Qing; Luu, Quang-Hung; Tkalich, Pavel; Chen, Ge

    2018-04-01

    Having great impacts on human lives, global warming and associated sea level rise are believed to be strongly linked to anthropogenic causes. Statistical approach offers a simple and yet conceptually verifiable combination of remotely connected climate variables and indices, including sea level and surface temperature. We propose an improved statistical reconstruction model based on the empirical dynamic control system by taking into account the climate variability and deriving parameters from Monte Carlo cross-validation random experiments. For the historic data from 1880 to 2001, we yielded higher correlation results compared to those from other dynamic empirical models. The averaged root mean square errors are reduced in both reconstructed fields, namely, the global mean surface temperature (by 24-37%) and the global mean sea level (by 5-25%). Our model is also more robust as it notably diminished the unstable problem associated with varying initial values. Such results suggest that the model not only enhances significantly the global mean reconstructions of temperature and sea level but also may have a potential to improve future projections.

  16. Land-Use and Climate : first results from the LUCID experiments ; implications for experimental design in IPCC-AR5

    NASA Astrophysics Data System (ADS)

    de Noblet, N.; Pitman, A.; Participants, Lucid

    2009-04-01

    The project "Land-Use and Climate, IDentification of robust impacts" (LUCID) was conceived under the auspices of IGBP-iLEAPS and GEWEX-GLASS, to address the robustness of 'local' and possible remote impacts of land-use induced land-cover changes (LCC). LUCID explores, using methodologies that major climate modelling groups recognise, those impacts of LCC that are robust - that is, above the noise generated by model variability and consistent across a suite of climate models. To start with, seven climate models were run, in ensemble mode (5 realisations per 31-years long experiment), with prescribed observed sea-surface temperatures (SSTs) and sea ice extent (SIc). Pre-industrial and present-day simulations were used to explore the impacts of biogeophysical impacts of human-induced land cover change. The imposed LCC perturbation led to statistically significant changes in latent heat flux and near-surface temperature over the regions of land cover change, but few significant changes in precipitation. Our results show no common remote impacts of land cover change. They also highlight a dilemma for both historical hind-casts and future projections; land cover change is regionally important, but it is not feasible within the time frame of the next IPCC (AR5) assessment to implement this change commonly across multiple models. Further analysis are in progress and will be presented to identify the continental regions where changes in LCC may have been more important than the combined changes in SSTs, SIc and CO2 between the pre-industrial times and nowadays.

  17. The use of imputed sibling genotypes in sibship-based association analysis: on modeling alternatives, power and model misspecification.

    PubMed

    Minică, Camelia C; Dolan, Conor V; Hottenga, Jouke-Jan; Willemsen, Gonneke; Vink, Jacqueline M; Boomsma, Dorret I

    2013-05-01

    When phenotypic, but no genotypic data are available for relatives of participants in genetic association studies, previous research has shown that family-based imputed genotypes can boost the statistical power when included in such studies. Here, using simulations, we compared the performance of two statistical approaches suitable to model imputed genotype data: the mixture approach, which involves the full distribution of the imputed genotypes and the dosage approach, where the mean of the conditional distribution features as the imputed genotype. Simulations were run by varying sibship size, size of the phenotypic correlations among siblings, imputation accuracy and minor allele frequency of the causal SNP. Furthermore, as imputing sibling data and extending the model to include sibships of size two or greater requires modeling the familial covariance matrix, we inquired whether model misspecification affects power. Finally, the results obtained via simulations were empirically verified in two datasets with continuous phenotype data (height) and with a dichotomous phenotype (smoking initiation). Across the settings considered, the mixture and the dosage approach are equally powerful and both produce unbiased parameter estimates. In addition, the likelihood-ratio test in the linear mixed model appears to be robust to the considered misspecification in the background covariance structure, given low to moderate phenotypic correlations among siblings. Empirical results show that the inclusion in association analysis of imputed sibling genotypes does not always result in larger test statistic. The actual test statistic may drop in value due to small effect sizes. That is, if the power benefit is small, that the change in distribution of the test statistic under the alternative is relatively small, the probability is greater of obtaining a smaller test statistic. As the genetic effects are typically hypothesized to be small, in practice, the decision on whether family-based imputation could be used as a means to increase power should be informed by prior power calculations and by the consideration of the background correlation.

  18. Optimal designs for copula models

    PubMed Central

    Perrone, E.; Müller, W.G.

    2016-01-01

    Copula modelling has in the past decade become a standard tool in many areas of applied statistics. However, a largely neglected aspect concerns the design of related experiments. Particularly the issue of whether the estimation of copula parameters can be enhanced by optimizing experimental conditions and how robust all the parameter estimates for the model are with respect to the type of copula employed. In this paper an equivalence theorem for (bivariate) copula models is provided that allows formulation of efficient design algorithms and quick checks of whether designs are optimal or at least efficient. Some examples illustrate that in practical situations considerable gains in design efficiency can be achieved. A natural comparison between different copula models with respect to design efficiency is provided as well. PMID:27453616

  19. Scout 2008 Version 1.0 User Guide

    EPA Science Inventory

    The Scout 2008 version 1.0 software package provides a wide variety of classical and robust statistical methods that are not typically available in other commercial software packages. A major part of Scout deals with classical, robust, and resistant univariate and multivariate ou...

  20. Social and economic sustainability of urban systems: comparative analysis of metropolitan statistical areas in Ohio, USA

    EPA Science Inventory

    This article presents a general and versatile methodology for assessing sustainability with Fisher Information as a function of dynamic changes in urban systems. Using robust statistical methods, six Metropolitan Statistical Areas (MSAs) in Ohio were evaluated to comparatively as...

  1. Nontargeted metabolomic analysis and "commercial-homophyletic" comparison-induced biomarkers verification for the systematic chemical differentiation of five different parts of Panax ginseng.

    PubMed

    Qiu, Shi; Yang, Wen-Zhi; Yao, Chang-Liang; Qiu, Zhi-Dong; Shi, Xiao-Jian; Zhang, Jing-Xian; Hou, Jin-Jun; Wang, Qiu-Rong; Wu, Wan-Ying; Guo, De-An

    2016-07-01

    A key segment in authentication of herbal medicines is the establishment of robust biomarkers that embody the intrinsic metabolites difference independent of the growing environment or processing technics. We present a strategy by nontargeted metabolomics and "Commercial-homophyletic" comparison-induced biomarkers verification with new bioinformatic vehicles, to improve the efficiency and reliability in authentication of herbal medicines. The chemical differentiation of five different parts (root, leaf, flower bud, berry, and seed) of Panax ginseng was illustrated as a case study. First, an optimized ultra-performance liquid chromatography/quadrupole time-of-flight-MS(E) (UPLC/QTOF-MS(E)) approach was established for global metabolites profiling. Second, UNIFI™ combined with search of an in-house library was employed to automatically characterize the metabolites. Third, pattern recognition multivariate statistical analysis of the MS(E) data of different parts of commercial and homophyletic samples were separately performed to explore potential biomarkers. Fourth, potential biomarkers deduced from commercial and homophyletic root and leaf samples were cross-compared to infer robust biomarkers. Fifth, discriminating models by artificial neutral network (ANN) were established to identify different parts of P. ginseng. Consequently, 164 compounds were characterized, and 11 robust biomarkers enabling the differentiation among root, leaf, flower bud, and berry, were discovered by removing those structurally unstable and possibly processing-related ones. The ANN models using the robust biomarkers managed to exactly discriminate four different parts and root adulterant with leaf as well. Conclusively, biomarkers verification using homophyletic samples conduces to the discovery of robust biomarkers. The integrated strategy facilitates authentication of herbal medicines in a more efficient and more intelligent manner. Copyright © 2016 Elsevier B.V. All rights reserved.

  2. A Robust Bayesian Random Effects Model for Nonlinear Calibration Problems

    PubMed Central

    Fong, Y.; Wakefield, J.; De Rosa, S.; Frahm, N.

    2013-01-01

    Summary In the context of a bioassay or an immunoassay, calibration means fitting a curve, usually nonlinear, through the observations collected on a set of samples containing known concentrations of a target substance, and then using the fitted curve and observations collected on samples of interest to predict the concentrations of the target substance in these samples. Recent technological advances have greatly improved our ability to quantify minute amounts of substance from a tiny volume of biological sample. This has in turn led to a need to improve statistical methods for calibration. In this paper, we focus on developing calibration methods robust to dependent outliers. We introduce a novel normal mixture model with dependent error terms to model the experimental noise. In addition, we propose a re-parameterization of the five parameter logistic nonlinear regression model that allows us to better incorporate prior information. We examine the performance of our methods with simulation studies and show that they lead to a substantial increase in performance measured in terms of mean squared error of estimation and a measure of the average prediction accuracy. A real data example from the HIV Vaccine Trials Network Laboratory is used to illustrate the methods. PMID:22551415

  3. How allele frequency and study design affect association test statistics with misrepresentation errors.

    PubMed

    Escott-Price, Valentina; Ghodsi, Mansoureh; Schmidt, Karl Michael

    2014-04-01

    We evaluate the effect of genotyping errors on the type-I error of a general association test based on genotypes, showing that, in the presence of errors in the case and control samples, the test statistic asymptotically follows a scaled non-central $\\chi ^2$ distribution. We give explicit formulae for the scaling factor and non-centrality parameter for the symmetric allele-based genotyping error model and for additive and recessive disease models. They show how genotyping errors can lead to a significantly higher false-positive rate, growing with sample size, compared with the nominal significance levels. The strength of this effect depends very strongly on the population distribution of the genotype, with a pronounced effect in the case of rare alleles, and a great robustness against error in the case of large minor allele frequency. We also show how these results can be used to correct $p$-values.

  4. Asymmetry of projected increases in extreme temperature distributions

    PubMed Central

    Kodra, Evan; Ganguly, Auroop R.

    2014-01-01

    A statistical analysis reveals projections of consistently larger increases in the highest percentiles of summer and winter temperature maxima and minima versus the respective lowest percentiles, resulting in a wider range of temperature extremes in the future. These asymmetric changes in tail distributions of temperature appear robust when explored through 14 CMIP5 climate models and three reanalysis datasets. Asymmetry of projected increases in temperature extremes generalizes widely. Magnitude of the projected asymmetry depends significantly on region, season, land-ocean contrast, and climate model variability as well as whether the extremes of consideration are seasonal minima or maxima events. An assessment of potential physical mechanisms provides support for asymmetric tail increases and hence wider temperature extremes ranges, especially for northern winter extremes. These results offer statistically grounded perspectives on projected changes in the IPCC-recommended extremes indices relevant for impacts and adaptation studies. PMID:25073751

  5. Reference aquaplanet climate in the Community Atmosphere Model, Version 5

    DOE PAGES

    Medeiros, Brian; Williamson, David L.; Olson, Jerry G.

    2016-03-18

    In this study, fundamental characteristics of the aquaplanet climate simulated by the Community Atmosphere Model, Version 5.3 (CAM5.3) are presented. The assumptions and simplifications of the configuration are described. A 16 year long, perpetual equinox integration with prescribed SST using the model’s standard 18 grid spacing is presented as a reference simulation. Statistical analysis is presented that shows similar aquaplanet configurations can be run for about 2 years to obtain robust climatological structures, including global and zonal means, eddy statistics, and precipitation distributions. Such a simulation can be compared to the reference simulation to discern differences in the climate, includingmore » an assessment of confidence in the differences. To aid such comparisons, the reference simulation has been made available via earthsystemgrid.org. Examples are shown comparing the reference simulation with simulations from the CAM5 series that make different microphysical assumptions and use a different dynamical core.« less

  6. STATISTICAL SAMPLING AND DATA ANALYSIS

    EPA Science Inventory

    Research is being conducted to develop approaches to improve soil and sediment sampling techniques, measurement design and geostatistics, and data analysis via chemometric, environmetric, and robust statistical methods. Improvements in sampling contaminated soil and other hetero...

  7. Robust multiscale prediction of Po River discharge using a twofold AR-NN approach

    NASA Astrophysics Data System (ADS)

    Alessio, Silvia; Taricco, Carla; Rubinetti, Sara; Zanchettin, Davide; Rubino, Angelo; Mancuso, Salvatore

    2017-04-01

    The Mediterranean area is among the regions most exposed to hydroclimatic changes, with a likely increase of frequency and duration of droughts in the last decades and potentially substantial future drying according to climate projections. However, significant decadal variability is often superposed or even dominates these long-term hydrological trend as observed, for instance, in North Italian precipitation and river discharge records. The capability to accurately predict such decadal changes is, therefore, of utmost environmental and social importance. In order to forecast short and noisy hydroclimatic time series, we apply a twofold statistical approach that we improved with respect to previous works [1]. Our prediction strategy consists in the application of two independent methods that use autoregressive models and feed-forward neural networks. Since all prediction methods work better on clean signals, the predictions are not performed directly on the series, but rather on each significant variability components extracted with Singular Spectrum Analysis (SSA). In this contribution, we will illustrate the multiscale prediction approach and its application to the case of decadal prediction of annual-average Po River discharges (Italy). The discharge record is available for the last 209 years and allows to work with both interannual and decadal time-scale components. Fifteen-year forecasts obtained with both methods robustly indicate a prominent dry period in the second half of the 2020s. We will discuss advantages and limitations of the proposed statistical approach in the light of the current capabilities of decadal climate prediction systems based on numerical climate models, toward an integrated dynamical and statistical approach for the interannual-to-decadal prediction of hydroclimate variability in medium-size river basins. [1] Alessio et. al., Natural variability and anthropogenic effects in a Central Mediterranean core, Clim. of the Past, 8, 831-839, 2012.

  8. A global reconstruction of climate-driven subdecadal water storage variability

    NASA Astrophysics Data System (ADS)

    Humphrey, V.; Gudmundsson, L.; Seneviratne, S. I.

    2017-03-01

    Since 2002, the Gravity Recovery and Climate Experiment (GRACE) mission has provided unprecedented observations of global mass redistribution caused by hydrological processes. However, there are still few sources on pre-2002 global terrestrial water storage (TWS). Classical approaches to retrieve past TWS rely on either land surface models (LSMs) or basin-scale water balance calculations. Here we propose a new approach which statistically relates anomalies in atmospheric drivers to monthly GRACE anomalies. Gridded subdecadal TWS changes and time-dependent uncertainty intervals are reconstructed for the period 1985-2015. Comparisons with model results demonstrate the performance and robustness of the derived data set, which represents a new and valuable source for studying subdecadal TWS variability, closing the ocean/land water budgets and assessing GRACE uncertainties. At midpoint between GRACE observations and LSM simulations, the statistical approach provides TWS estimates (doi:10.5905/ethz-1007-85) that are essentially derived from observations and are based on a limited number of transparent model assumptions.

  9. Statistical distributions of extreme dry spell in Peninsular Malaysia

    NASA Astrophysics Data System (ADS)

    Zin, Wan Zawiah Wan; Jemain, Abdul Aziz

    2010-11-01

    Statistical distributions of annual extreme (AE) series and partial duration (PD) series for dry-spell event are analyzed for a database of daily rainfall records of 50 rain-gauge stations in Peninsular Malaysia, with recording period extending from 1975 to 2004. The three-parameter generalized extreme value (GEV) and generalized Pareto (GP) distributions are considered to model both series. In both cases, the parameters of these two distributions are fitted by means of the L-moments method, which provides a robust estimation of them. The goodness-of-fit (GOF) between empirical data and theoretical distributions are then evaluated by means of the L-moment ratio diagram and several goodness-of-fit tests for each of the 50 stations. It is found that for the majority of stations, the AE and PD series are well fitted by the GEV and GP models, respectively. Based on the models that have been identified, we can reasonably predict the risks associated with extreme dry spells for various return periods.

  10. The Application of FT-IR Spectroscopy for Quality Control of Flours Obtained from Polish Producers

    PubMed Central

    Ceglińska, Alicja; Reder, Magdalena; Ciemniewska-Żytkiewicz, Hanna

    2017-01-01

    Samples of wheat, spelt, rye, and triticale flours produced by different Polish mills were studied by both classic chemical methods and FT-IR MIR spectroscopy. An attempt was made to statistically correlate FT-IR spectral data with reference data with regard to content of various components, for example, proteins, fats, ash, and fatty acids as well as properties such as moisture, falling number, and energetic value. This correlation resulted in calibrated and validated statistical models for versatile evaluation of unknown flour samples. The calibration data set was used to construct calibration models with use of the CSR and the PLS with the leave one-out, cross-validation techniques. The calibrated models were validated with a validation data set. The results obtained confirmed that application of statistical models based on MIR spectral data is a robust, accurate, precise, rapid, inexpensive, and convenient methodology for determination of flour characteristics, as well as for detection of content of selected flour ingredients. The obtained models' characteristics were as follows: R2 = 0.97, PRESS = 2.14; R2 = 0.96, PRESS = 0.69; R2 = 0.95, PRESS = 1.27; R2 = 0.94, PRESS = 0.76, for content of proteins, lipids, ash, and moisture level, respectively. Best results of CSR models were obtained for protein, ash, and crude fat (R2 = 0.86; 0.82; and 0.78, resp.). PMID:28243483

  11. The structure and resilience of financial market networks

    NASA Astrophysics Data System (ADS)

    Kauê Dal'Maso Peron, Thomas; da Fontoura Costa, Luciano; Rodrigues, Francisco A.

    2012-03-01

    Financial markets can be viewed as a highly complex evolving system that is very sensitive to economic instabilities. The complex organization of the market can be represented in a suitable fashion in terms of complex networks, which can be constructed from stock prices such that each pair of stocks is connected by a weighted edge that encodes the distance between them. In this work, we propose an approach to analyze the topological and dynamic evolution of financial networks based on the stock correlation matrices. An entropy-related measurement is adopted to quantify the robustness of the evolving financial market organization. It is verified that the network topological organization suffers strong variation during financial instabilities and the networks in such periods become less robust. A statistical robust regression model is proposed to quantity the relationship between the network structure and resilience. The obtained coefficients of such model indicate that the average shortest path length is the measurement most related to network resilience coefficient. This result indicates that a collective behavior is observed between stocks during financial crisis. More specifically, stocks tend to synchronize their price evolution, leading to a high correlation between pair of stock prices, which contributes to the increase in distance between them and, consequently, decrease the network resilience.

  12. The use of resighting data to estimate the rate of population growth of the snail kite in Florida

    USGS Publications Warehouse

    Dreitz, V.J.; Nichols, J.D.; Hines, J.E.; Bennetts, R.E.; Kitchens, W.M.; DeAngelis, D.L.

    2002-01-01

    The rate of population growth (lambda) is an important demographic parameter used to assess the viability of a population and to develop management and conservation agendas. We examined the use of resighting data to estimate lambda for the snail kite population in Florida from 1997-2000. The analyses consisted of (1) a robust design approach that derives an estimate of lambda from estimates of population size and (2) the Pradel (1996) temporal symmetry (TSM) approach that directly estimates lambda using an open-population capture-recapture model. Besides resighting data, both approaches required information on the number of unmarked individuals that were sighted during the sampling periods. The point estimates of lambda differed between the robust design and TSM approaches, but the 95% confidence intervals overlapped substantially. We believe the differences may be the result of sparse data and do not indicate the inappropriateness of either modelling technique. We focused on the results of the robust design because this approach provided estimates for all study years. Variation among these estimates was smaller than levels of variation among ad hoc estimates based on previously reported index statistics. We recommend that lambda of snail kites be estimated using capture-resighting methods rather than ad hoc counts.

  13. Robust algebraic image enhancement for intelligent control systems

    NASA Technical Reports Server (NTRS)

    Lerner, Bao-Ting; Morrelli, Michael

    1993-01-01

    Robust vision capability for intelligent control systems has been an elusive goal in image processing. The computationally intensive techniques a necessary for conventional image processing make real-time applications, such as object tracking and collision avoidance difficult. In order to endow an intelligent control system with the needed vision robustness, an adequate image enhancement subsystem capable of compensating for the wide variety of real-world degradations, must exist between the image capturing and the object recognition subsystems. This enhancement stage must be adaptive and must operate with consistency in the presence of both statistical and shape-based noise. To deal with this problem, we have developed an innovative algebraic approach which provides a sound mathematical framework for image representation and manipulation. Our image model provides a natural platform from which to pursue dynamic scene analysis, and its incorporation into a vision system would serve as the front-end to an intelligent control system. We have developed a unique polynomial representation of gray level imagery and applied this representation to develop polynomial operators on complex gray level scenes. This approach is highly advantageous since polynomials can be manipulated very easily, and are readily understood, thus providing a very convenient environment for image processing. Our model presents a highly structured and compact algebraic representation of grey-level images which can be viewed as fuzzy sets.

  14. Avoid lost discoveries, because of violations of standard assumptions, by using modern robust statistical methods.

    PubMed

    Wilcox, Rand; Carlson, Mike; Azen, Stan; Clark, Florence

    2013-03-01

    Recently, there have been major advances in statistical techniques for assessing central tendency and measures of association. The practical utility of modern methods has been documented extensively in the statistics literature, but they remain underused and relatively unknown in clinical trials. Our objective was to address this issue. STUDY DESIGN AND PURPOSE: The first purpose was to review common problems associated with standard methodologies (low power, lack of control over type I errors, and incorrect assessments of the strength of the association). The second purpose was to summarize some modern methods that can be used to circumvent such problems. The third purpose was to illustrate the practical utility of modern robust methods using data from the Well Elderly 2 randomized controlled trial. In multiple instances, robust methods uncovered differences among groups and associations among variables that were not detected by classic techniques. In particular, the results demonstrated that details of the nature and strength of the association were sometimes overlooked when using ordinary least squares regression and Pearson correlation. Modern robust methods can make a practical difference in detecting and describing differences between groups and associations between variables. Such procedures should be applied more frequently when analyzing trial-based data. Copyright © 2013 Elsevier Inc. All rights reserved.

  15. Using ventricular modeling to robustly probe significant deep gray matter pathologies: Application to cerebral palsy.

    PubMed

    Pagnozzi, Alex M; Shen, Kaikai; Doecke, James D; Boyd, Roslyn N; Bradley, Andrew P; Rose, Stephen; Dowson, Nicholas

    2016-11-01

    Understanding the relationships between the structure and function of the brain largely relies on the qualitative assessment of Magnetic Resonance Images (MRIs) by expert clinicians. Automated analysis systems can support these assessments by providing quantitative measures of brain injury. However, the assessment of deep gray matter structures, which are critical to motor and executive function, remains difficult as a result of large anatomical injuries commonly observed in children with Cerebral Palsy (CP). Hence, this article proposes a robust surrogate marker of the extent of deep gray matter injury based on impingement due to local ventricular enlargement on surrounding anatomy. Local enlargement was computed using a statistical shape model of the lateral ventricles constructed from 44 healthy subjects. Measures of injury on 95 age-matched CP patients were used to train a regression model to predict six clinical measures of function. The robustness of identifying ventricular enlargement was demonstrated by an area under the curve of 0.91 when tested against a dichotomised expert clinical assessment. The measures also showed strong and significant relationships for multiple clinical scores, including: motor function (r 2  = 0.62, P < 0.005), executive function (r 2  = 0.55, P < 0.005), and communication (r 2  = 0.50, P < 0.005), especially compared to using volumes obtained from standard anatomical segmentation approaches. The lack of reliance on accurate anatomical segmentations and its resulting robustness to large anatomical variations is a key feature of the proposed automated approach. This coupled with its strong correlation with clinically meaningful scores, signifies the potential utility to repeatedly assess MRIs for clinicians diagnosing children with CP. Hum Brain Mapp 37:3795-3809, 2016. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  16. Interrogating the topological robustness of gene regulatory circuits by randomization

    PubMed Central

    Levine, Herbert; Onuchic, Jose N.

    2017-01-01

    One of the most important roles of cells is performing their cellular tasks properly for survival. Cells usually achieve robust functionality, for example, cell-fate decision-making and signal transduction, through multiple layers of regulation involving many genes. Despite the combinatorial complexity of gene regulation, its quantitative behavior has been typically studied on the basis of experimentally verified core gene regulatory circuitry, composed of a small set of important elements. It is still unclear how such a core circuit operates in the presence of many other regulatory molecules and in a crowded and noisy cellular environment. Here we report a new computational method, named random circuit perturbation (RACIPE), for interrogating the robust dynamical behavior of a gene regulatory circuit even without accurate measurements of circuit kinetic parameters. RACIPE generates an ensemble of random kinetic models corresponding to a fixed circuit topology, and utilizes statistical tools to identify generic properties of the circuit. By applying RACIPE to simple toggle-switch-like motifs, we observed that the stable states of all models converge to experimentally observed gene state clusters even when the parameters are strongly perturbed. RACIPE was further applied to a proposed 22-gene network of the Epithelial-to-Mesenchymal Transition (EMT), from which we identified four experimentally observed gene states, including the states that are associated with two different types of hybrid Epithelial/Mesenchymal phenotypes. Our results suggest that dynamics of a gene circuit is mainly determined by its topology, not by detailed circuit parameters. Our work provides a theoretical foundation for circuit-based systems biology modeling. We anticipate RACIPE to be a powerful tool to predict and decode circuit design principles in an unbiased manner, and to quantitatively evaluate the robustness and heterogeneity of gene expression. PMID:28362798

  17. A probabilistic approach to aircraft design emphasizing stability and control uncertainties

    NASA Astrophysics Data System (ADS)

    Delaurentis, Daniel Andrew

    In order to address identified deficiencies in current approaches to aerospace systems design, a new method has been developed. This new method for design is based on the premise that design is a decision making activity, and that deterministic analysis and synthesis can lead to poor, or misguided decision making. This is due to a lack of disciplinary knowledge of sufficient fidelity about the product, to the presence of uncertainty at multiple levels of the aircraft design hierarchy, and to a failure to focus on overall affordability metrics as measures of goodness. Design solutions are desired which are robust to uncertainty and are based on the maximum knowledge possible. The new method represents advances in the two following general areas. 1. Design models and uncertainty. The research performed completes a transition from a deterministic design representation to a probabilistic one through a modeling of design uncertainty at multiple levels of the aircraft design hierarchy, including: (1) Consistent, traceable uncertainty classification and representation; (2) Concise mathematical statement of the Probabilistic Robust Design problem; (3) Variants of the Cumulative Distribution Functions (CDFs) as decision functions for Robust Design; (4) Probabilistic Sensitivities which identify the most influential sources of variability. 2. Multidisciplinary analysis and design. Imbedded in the probabilistic methodology is a new approach for multidisciplinary design analysis and optimization (MDA/O), employing disciplinary analysis approximations formed through statistical experimentation and regression. These approximation models are a function of design variables common to the system level as well as other disciplines. For aircraft, it is proposed that synthesis/sizing is the proper avenue for integrating multiple disciplines. Research hypotheses are translated into a structured method, which is subsequently tested for validity. Specifically, the implementation involves the study of the relaxed static stability technology for a supersonic commercial transport aircraft. The probabilistic robust design method is exercised resulting in a series of robust design solutions based on different interpretations of "robustness". Insightful results are obtained and the ability of the method to expose trends in the design space are noted as a key advantage.

  18. A wavelet-based statistical analysis of FMRI data: I. motivation and data distribution modeling.

    PubMed

    Dinov, Ivo D; Boscardin, John W; Mega, Michael S; Sowell, Elizabeth L; Toga, Arthur W

    2005-01-01

    We propose a new method for statistical analysis of functional magnetic resonance imaging (fMRI) data. The discrete wavelet transformation is employed as a tool for efficient and robust signal representation. We use structural magnetic resonance imaging (MRI) and fMRI to empirically estimate the distribution of the wavelet coefficients of the data both across individuals and spatial locations. An anatomical subvolume probabilistic atlas is used to tessellate the structural and functional signals into smaller regions each of which is processed separately. A frequency-adaptive wavelet shrinkage scheme is employed to obtain essentially optimal estimations of the signals in the wavelet space. The empirical distributions of the signals on all the regions are computed in a compressed wavelet space. These are modeled by heavy-tail distributions because their histograms exhibit slower tail decay than the Gaussian. We discovered that the Cauchy, Bessel K Forms, and Pareto distributions provide the most accurate asymptotic models for the distribution of the wavelet coefficients of the data. Finally, we propose a new model for statistical analysis of functional MRI data using this atlas-based wavelet space representation. In the second part of our investigation, we will apply this technique to analyze a large fMRI dataset involving repeated presentation of sensory-motor response stimuli in young, elderly, and demented subjects.

  19. A Robust Alternative to the Normal Distribution.

    DTIC Science & Technology

    1982-07-07

    for any Purpose of the United States Governuent DEPARTMENT OF STATISTICS t -, STANFORD UIVERSITY I STANFORD, CALIFORNIA A Robust Alternative to the...Stanford University Technical Report No. 3. [5] Bhattacharya, S. K. (1966). A Modified Bessel Function lodel in Life Testing. Metrika 10, 133-144

  20. Robust Statistics: What They Are, and Why They Are So Important

    ERIC Educational Resources Information Center

    Corlu, Sencer M.

    2009-01-01

    The problem with "classical" statistics all invoking the mean is that these estimates are notoriously influenced by atypical scores (outliers), partly because the mean itself is differentially influenced by outliers. In theory, "modern" statistics may generate more replicable characterizations of data, because at least in some…

  1. A Temperature-Based Model for Estimating Monthly Average Daily Global Solar Radiation in China

    PubMed Central

    Li, Huashan; Cao, Fei; Wang, Xianlong; Ma, Weibin

    2014-01-01

    Since air temperature records are readily available around the world, the models based on air temperature for estimating solar radiation have been widely accepted. In this paper, a new model based on Hargreaves and Samani (HS) method for estimating monthly average daily global solar radiation is proposed. With statistical error tests, the performance of the new model is validated by comparing with the HS model and its two modifications (Samani model and Chen model) against the measured data at 65 meteorological stations in China. Results show that the new model is more accurate and robust than the HS, Samani, and Chen models in all climatic regions, especially in the humid regions. Hence, the new model can be recommended for estimating solar radiation in areas where only air temperature data are available in China. PMID:24605046

  2. Robust Proton Pencil Beam Scanning Treatment Planning for Rectal Cancer Radiation Therapy

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Blanco Kiely, Janid Patricia, E-mail: jkiely@sas.upenn.edu; White, Benjamin M.

    2016-05-01

    Purpose: To investigate, in a treatment plan design and robustness study, whether proton pencil beam scanning (PBS) has the potential to offer advantages, relative to interfraction uncertainties, over photon volumetric modulated arc therapy (VMAT) in a locally advanced rectal cancer patient population. Methods and Materials: Ten patients received a planning CT scan, followed by an average of 4 weekly offline CT verification CT scans, which were rigidly co-registered to the planning CT. Clinical PBS plans were generated on the planning CT, using a single-field uniform-dose technique with single-posterior and parallel-opposed (LAT) fields geometries. The VMAT plans were generated on the planningmore » CT using 2 6-MV, 220° coplanar arcs. Clinical plans were forward-calculated on verification CTs to assess robustness relative to anatomic changes. Setup errors were assessed by forward-calculating clinical plans with a ±5-mm (left–right, anterior–posterior, superior–inferior) isocenter shift on the planning CT. Differences in clinical target volume and organ at risk dose–volume histogram (DHV) indicators between plans were tested for significance using an appropriate Wilcoxon test (P<.05). Results: Dosimetrically, PBS plans were statistically different from VMAT plans, showing greater organ at risk sparing. However, the bladder was statistically identical among LAT and VMAT plans. The clinical target volume coverage was statistically identical among all plans. The robustness test found that all DVH indicators for PBS and VMAT plans were robust, except the LAT's genitalia (V5, V35). The verification CT plans showed that all DVH indicators were robust. Conclusions: Pencil beam scanning plans were found to be as robust as VMAT plans relative to interfractional changes during treatment when posterior beam angles and appropriate range margins are used. Pencil beam scanning dosimetric gains in the bowel (V15, V20) over VMAT suggest that using PBS to treat rectal cancer may reduce radiation treatment–related toxicity.« less

  3. Feature maps driven no-reference image quality prediction of authentically distorted images

    NASA Astrophysics Data System (ADS)

    Ghadiyaram, Deepti; Bovik, Alan C.

    2015-03-01

    Current blind image quality prediction models rely on benchmark databases comprised of singly and synthetically distorted images, thereby learning image features that are only adequate to predict human perceived visual quality on such inauthentic distortions. However, real world images often contain complex mixtures of multiple distortions. Rather than a) discounting the effect of these mixtures of distortions on an image's perceptual quality and considering only the dominant distortion or b) using features that are only proven to be efficient for singly distorted images, we deeply study the natural scene statistics of authentically distorted images, in different color spaces and transform domains. We propose a feature-maps-driven statistical approach which avoids any latent assumptions about the type of distortion(s) contained in an image, and focuses instead on modeling the remarkable consistencies in the scene statistics of real world images in the absence of distortions. We design a deep belief network that takes model-based statistical image features derived from a very large database of authentically distorted images as input and discovers good feature representations by generalizing over different distortion types, mixtures, and severities, which are later used to learn a regressor for quality prediction. We demonstrate the remarkable competence of our features for improving automatic perceptual quality prediction on a benchmark database and on the newly designed LIVE Authentic Image Quality Challenge Database and show that our approach of combining robust statistical features and the deep belief network dramatically outperforms the state-of-the-art.

  4. Internet dynamics

    NASA Astrophysics Data System (ADS)

    Lukose, Rajan Mathew

    The World Wide Web and the Internet are rapidly expanding spaces, of great economic and social significance, which offer an opportunity to study many phenomena, often previously inaccessible, on an unprecedented scale and resolution with relative ease. These phenomena are measurable on the scale of tens of millions of users and hundreds of millions of pages. By virtue of nearly complete electronic mediation, it is possible in principle to observe the time and ``spatial'' evolution of nearly all choices and interactions. This cyber-space therefore provides a view into a number of traditional research questions (from many academic disciplines) and creates its own new phenomena accessible for study. Despite its largely self-organized and dynamic nature, a number of robust quantitative regularities are found in the aggregate statistics of interesting and useful quantities. These regularities can be understood with the help of models that draw on ideas from statistical physics as well as other fields such as economics, psychology and decision theory. This thesis develops models that can account for regularities found in the statistics of Internet congestion and user surfing patterns and discusses some practical consequences. practical consequences.

  5. Design of experiments enhanced statistical process control for wind tunnel check standard testing

    NASA Astrophysics Data System (ADS)

    Phillips, Ben D.

    The current wind tunnel check standard testing program at NASA Langley Research Center is focused on increasing data quality, uncertainty quantification and overall control and improvement of wind tunnel measurement processes. The statistical process control (SPC) methodology employed in the check standard testing program allows for the tracking of variations in measurements over time as well as an overall assessment of facility health. While the SPC approach can and does provide researchers with valuable information, it has certain limitations in the areas of process improvement and uncertainty quantification. It is thought by utilizing design of experiments methodology in conjunction with the current SPC practices that one can efficiently and more robustly characterize uncertainties and develop enhanced process improvement procedures. In this research, methodologies were developed to generate regression models for wind tunnel calibration coefficients, balance force coefficients and wind tunnel flow angularities. The coefficients of these regression models were then tracked in statistical process control charts, giving a higher level of understanding of the processes. The methodology outlined is sufficiently generic such that this research can be applicable to any wind tunnel check standard testing program.

  6. Comparison of optimization strategy and similarity metric in atlas-to-subject registration using statistical deformation model

    NASA Astrophysics Data System (ADS)

    Otake, Y.; Murphy, R. J.; Grupp, R. B.; Sato, Y.; Taylor, R. H.; Armand, M.

    2015-03-01

    A robust atlas-to-subject registration using a statistical deformation model (SDM) is presented. The SDM uses statistics of voxel-wise displacement learned from pre-computed deformation vectors of a training dataset. This allows an atlas instance to be directly translated into an intensity volume and compared with a patient's intensity volume. Rigid and nonrigid transformation parameters were simultaneously optimized via the Covariance Matrix Adaptation - Evolutionary Strategy (CMA-ES), with image similarity used as the objective function. The algorithm was tested on CT volumes of the pelvis from 55 female subjects. A performance comparison of the CMA-ES and Nelder-Mead downhill simplex optimization algorithms with the mutual information and normalized cross correlation similarity metrics was conducted. Simulation studies using synthetic subjects were performed, as well as leave-one-out cross validation studies. Both studies suggested that mutual information and CMA-ES achieved the best performance. The leave-one-out test demonstrated 4.13 mm error with respect to the true displacement field, and 26,102 function evaluations in 180 seconds, on average.

  7. Diagnostic methods for atmospheric inversions of long-lived greenhouse gases

    NASA Astrophysics Data System (ADS)

    Michalak, Anna M.; Randazzo, Nina A.; Chevallier, Frédéric

    2017-06-01

    The ability to predict the trajectory of climate change requires a clear understanding of the emissions and uptake (i.e., surface fluxes) of long-lived greenhouse gases (GHGs). Furthermore, the development of climate policies is driving a need to constrain the budgets of anthropogenic GHG emissions. Inverse problems that couple atmospheric observations of GHG concentrations with an atmospheric chemistry and transport model have increasingly been used to gain insights into surface fluxes. Given the inherent technical challenges associated with their solution, it is imperative that objective approaches exist for the evaluation of such inverse problems. Because direct observation of fluxes at compatible spatiotemporal scales is rarely possible, diagnostics tools must rely on indirect measures. Here we review diagnostics that have been implemented in recent studies and discuss their use in informing adjustments to model setup. We group the diagnostics along a continuum starting with those that are most closely related to the scientific question being targeted, and ending with those most closely tied to the statistical and computational setup of the inversion. We thus begin with diagnostics based on assessments against independent information (e.g., unused atmospheric observations, large-scale scientific constraints), followed by statistical diagnostics of inversion results, diagnostics based on sensitivity tests, and analyses of robustness (e.g., tests focusing on the chemistry and transport model, the atmospheric observations, or the statistical and computational framework), and close with the use of synthetic data experiments (i.e., observing system simulation experiments, OSSEs). We find that existing diagnostics provide a crucial toolbox for evaluating and improving flux estimates but, not surprisingly, cannot overcome the fundamental challenges associated with limited atmospheric observations or the lack of direct flux measurements at compatible scales. As atmospheric inversions are increasingly expected to contribute to national reporting of GHG emissions, the need for developing and implementing robust and transparent evaluation approaches will only grow.

  8. How young water fractions can delineate travel time distributions in contrasting catchments

    NASA Astrophysics Data System (ADS)

    Lutz, Stefanie; Zink, Matthias; Merz, Ralf

    2017-04-01

    Travel time distributions (TTDs) are crucial descriptors of flow and transport processes in catchments. Tracking fluxes of environmental tracers such as stable water isotopes offers a practicable method to determine TTDs. The mean transit time (MTT) is the most commonly reported statistic of TTDs; however, MTT assessments are prone to large aggregation biases resulting from spatial heterogeneity and non-stationarity in real-world catchments. Recently, the young water fraction (Fyw) has been introduced as a more robust statistic that can be derived from seasonal tracer cycles. In this study, we aimed at improving the assessment of TTDs by using Fyw as additional information in lumped isotope models. First, we calculated Fyw from monthly δ18O-samples for 24 contrasting sub-catchments in a meso-scale catchment (3300 km2). Fyw ranged from 0.01 to 0.27 (mean= 0.11) and was not significantly correlated with catchment characteristics (e.g., mean slope, catchment area, and baseflow index) apart from the dominant soil type. Second, assuming gamma-shaped TTDs, we determined time-invariant TTDs for each sub-catchment by optimization of lumped isotope models using the convolution integral method. Whereas multiple optimization runs for the same sub-catchment showed a wide range of TTD parameters, the use of Fyw as additional information allowed constraining this range and thus improving the assessment of MTTs. Hence, the best model fit to observed isotope data might not be the desired solution, as the resulting TTD might define a young water fraction non-consistent with the tracer-cycle based Fyw. Given that the latter is a robust descriptor of fast-flow contribution, isotope models should instead aim at accurately describing both Fyw and the isotope time series in order to improve our understanding of flow and transport in catchments.

  9. Statistical prediction of September Arctic Sea Ice minimum based on stable teleconnections with global climate and oceanic patterns

    NASA Astrophysics Data System (ADS)

    Ionita, M.; Grosfeld, K.; Scholz, P.; Lohmann, G.

    2016-12-01

    Sea ice in both Polar Regions is an important indicator for the expression of global climate change and its polar amplification. Consequently, a broad information interest exists on sea ice, its coverage, variability and long term change. Knowledge on sea ice requires high quality data on ice extent, thickness and its dynamics. However, its predictability depends on various climate parameters and conditions. In order to provide insights into the potential development of a monthly/seasonal signal, we developed a robust statistical model based on ocean heat content, sea surface temperature and atmospheric variables to calculate an estimate of the September minimum sea ice extent for every year. Although previous statistical attempts at monthly/seasonal forecasts of September sea ice minimum show a relatively reduced skill, here it is shown that more than 97% (r = 0.98) of the September sea ice extent can predicted three months in advance by using previous months conditions via a multiple linear regression model based on global sea surface temperature (SST), mean sea level pressure (SLP), air temperature at 850hPa (TT850), surface winds and sea ice extent persistence. The statistical model is based on the identification of regions with stable teleconnections between the predictors (climatological parameters) and the predictand (here sea ice extent). The results based on our statistical model contribute to the sea ice prediction network for the sea ice outlook report (https://www.arcus.org/sipn) and could provide a tool for identifying relevant regions and climate parameters that are important for the sea ice development in the Arctic and for detecting sensitive and critical regions in global coupled climate models with focus on sea ice formation.

  10. Income inequality, poverty, and population health: evidence from recent data for the United States.

    PubMed

    Ram, Rati

    2005-12-01

    In this study, state-level US data for the years 2000 and 1990 are used to provide additional evidence on the roles of income inequality and poverty in population health. Five main points are noted. First, contrary to the suggestion made in several recent studies, the income inequality parameter is observed to be quite robust and carries statistical significance in mortality equations estimated from several observation sets and a fairly wide variety of specificational choices. Second, the evidence does not indicate that significance of income inequality is lost when education variables are included. Third, similarly, the income inequality parameter shows significance when a race variable is added, and also when both race and urbanization terms are entered. Fourth, while poverty is seen to have some mortality-increasing consequence, the role of income inequality appears stronger. Fifth, income inequality retains statistical significance when a quadratic income term is added and also if the log-log version of a fairly inclusive model is estimated. I therefore suggest that the recent skepticism articulated by several scholars in regard to the robustness of the income inequality parameters in mortality equations estimated from the US data should be reconsidered.

  11. Counteracting structural errors in ensemble forecast of influenza outbreaks.

    PubMed

    Pei, Sen; Shaman, Jeffrey

    2017-10-13

    For influenza forecasts generated using dynamical models, forecast inaccuracy is partly attributable to the nonlinear growth of error. As a consequence, quantification of the nonlinear error structure in current forecast models is needed so that this growth can be corrected and forecast skill improved. Here, we inspect the error growth of a compartmental influenza model and find that a robust error structure arises naturally from the nonlinear model dynamics. By counteracting these structural errors, diagnosed using error breeding, we develop a new forecast approach that combines dynamical error correction and statistical filtering techniques. In retrospective forecasts of historical influenza outbreaks for 95 US cities from 2003 to 2014, overall forecast accuracy for outbreak peak timing, peak intensity and attack rate, are substantially improved for predicted lead times up to 10 weeks. This error growth correction method can be generalized to improve the forecast accuracy of other infectious disease dynamical models.Inaccuracy of influenza forecasts based on dynamical models is partly due to nonlinear error growth. Here the authors address the error structure of a compartmental influenza model, and develop a new improved forecast approach combining dynamical error correction and statistical filtering techniques.

  12. Efficient detection of wound-bed and peripheral skin with statistical colour models.

    PubMed

    Veredas, Francisco J; Mesa, Héctor; Morente, Laura

    2015-04-01

    A pressure ulcer is a clinical pathology of localised damage to the skin and underlying tissue caused by pressure, shear or friction. Reliable diagnosis supported by precise wound evaluation is crucial in order to success on treatment decisions. This paper presents a computer-vision approach to wound-area detection based on statistical colour models. Starting with a training set consisting of 113 real wound images, colour histogram models are created for four different tissue types. Back-projections of colour pixels on those histogram models are used, from a Bayesian perspective, to get an estimate of the posterior probability of a pixel to belong to any of those tissue classes. Performance measures obtained from contingency tables based on a gold standard of segmented images supplied by experts have been used for model selection. The resulting fitted model has been validated on a training set consisting of 322 wound images manually segmented and labelled by expert clinicians. The final fitted segmentation model shows robustness and gives high mean performance rates [(AUC: .9426 (SD .0563); accuracy: .8777 (SD .0799); F-score: 0.7389 (SD .1550); Cohen's kappa: .6585 (SD .1787)] when segmenting significant wound areas that include healing tissues.

  13. Predicting the aquatic toxicity mode of action using logistic regression and linear discriminant analysis.

    PubMed

    Ren, Y Y; Zhou, L C; Yang, L; Liu, P Y; Zhao, B W; Liu, H X

    2016-09-01

    The paper highlights the use of the logistic regression (LR) method in the construction of acceptable statistically significant, robust and predictive models for the classification of chemicals according to their aquatic toxic modes of action. Essentials accounting for a reliable model were all considered carefully. The model predictors were selected by stepwise forward discriminant analysis (LDA) from a combined pool of experimental data and chemical structure-based descriptors calculated by the CODESSA and DRAGON software packages. Model predictive ability was validated both internally and externally. The applicability domain was checked by the leverage approach to verify prediction reliability. The obtained models are simple and easy to interpret. In general, LR performs much better than LDA and seems to be more attractive for the prediction of the more toxic compounds, i.e. compounds that exhibit excess toxicity versus non-polar narcotic compounds and more reactive compounds versus less reactive compounds. In addition, model fit and regression diagnostics was done through the influence plot which reflects the hat-values, studentized residuals, and Cook's distance statistics of each sample. Overdispersion was also checked for the LR model. The relationships between the descriptors and the aquatic toxic behaviour of compounds are also discussed.

  14. Distributed video data fusion and mining

    NASA Astrophysics Data System (ADS)

    Chang, Edward Y.; Wang, Yuan-Fang; Rodoplu, Volkan

    2004-09-01

    This paper presents an event sensing paradigm for intelligent event-analysis in a wireless, ad hoc, multi-camera, video surveillance system. In particilar, we present statistical methods that we have developed to support three aspects of event sensing: 1) energy-efficient, resource-conserving, and robust sensor data fusion and analysis, 2) intelligent event modeling and recognition, and 3) rapid deployment, dynamic configuration, and continuous operation of the camera networks. We outline our preliminary results, and discuss future directions that research might take.

  15. Accurate and robust genomic prediction of celiac disease using statistical learning.

    PubMed

    Abraham, Gad; Tye-Din, Jason A; Bhalala, Oneil G; Kowalczyk, Adam; Zobel, Justin; Inouye, Michael

    2014-02-01

    Practical application of genomic-based risk stratification to clinical diagnosis is appealing yet performance varies widely depending on the disease and genomic risk score (GRS) method. Celiac disease (CD), a common immune-mediated illness, is strongly genetically determined and requires specific HLA haplotypes. HLA testing can exclude diagnosis but has low specificity, providing little information suitable for clinical risk stratification. Using six European cohorts, we provide a proof-of-concept that statistical learning approaches which simultaneously model all SNPs can generate robust and highly accurate predictive models of CD based on genome-wide SNP profiles. The high predictive capacity replicated both in cross-validation within each cohort (AUC of 0.87-0.89) and in independent replication across cohorts (AUC of 0.86-0.9), despite differences in ethnicity. The models explained 30-35% of disease variance and up to ∼43% of heritability. The GRS's utility was assessed in different clinically relevant settings. Comparable to HLA typing, the GRS can be used to identify individuals without CD with ≥99.6% negative predictive value however, unlike HLA typing, fine-scale stratification of individuals into categories of higher-risk for CD can identify those that would benefit from more invasive and costly definitive testing. The GRS is flexible and its performance can be adapted to the clinical situation by adjusting the threshold cut-off. Despite explaining a minority of disease heritability, our findings indicate a genomic risk score provides clinically relevant information to improve upon current diagnostic pathways for CD and support further studies evaluating the clinical utility of this approach in CD and other complex diseases.

  16. Model identification using stochastic differential equation grey-box models in diabetes.

    PubMed

    Duun-Henriksen, Anne Katrine; Schmidt, Signe; Røge, Rikke Meldgaard; Møller, Jonas Bech; Nørgaard, Kirsten; Jørgensen, John Bagterp; Madsen, Henrik

    2013-03-01

    The acceptance of virtual preclinical testing of control algorithms is growing and thus also the need for robust and reliable models. Models based on ordinary differential equations (ODEs) can rarely be validated with standard statistical tools. Stochastic differential equations (SDEs) offer the possibility of building models that can be validated statistically and that are capable of predicting not only a realistic trajectory, but also the uncertainty of the prediction. In an SDE, the prediction error is split into two noise terms. This separation ensures that the errors are uncorrelated and provides the possibility to pinpoint model deficiencies. An identifiable model of the glucoregulatory system in a type 1 diabetes mellitus (T1DM) patient is used as the basis for development of a stochastic-differential-equation-based grey-box model (SDE-GB). The parameters are estimated on clinical data from four T1DM patients. The optimal SDE-GB is determined from likelihood-ratio tests. Finally, parameter tracking is used to track the variation in the "time to peak of meal response" parameter. We found that the transformation of the ODE model into an SDE-GB resulted in a significant improvement in the prediction and uncorrelated errors. Tracking of the "peak time of meal absorption" parameter showed that the absorption rate varied according to meal type. This study shows the potential of using SDE-GBs in diabetes modeling. Improved model predictions were obtained due to the separation of the prediction error. SDE-GBs offer a solid framework for using statistical tools for model validation and model development. © 2013 Diabetes Technology Society.

  17. An adaptive two-stage dose-response design method for establishing proof of concept.

    PubMed

    Franchetti, Yoko; Anderson, Stewart J; Sampson, Allan R

    2013-01-01

    We propose an adaptive two-stage dose-response design where a prespecified adaptation rule is used to add and/or drop treatment arms between the stages. We extend the multiple comparison procedures-modeling (MCP-Mod) approach into a two-stage design. In each stage, we use the same set of candidate dose-response models and test for a dose-response relationship or proof of concept (PoC) via model-associated statistics. The stage-wise test results are then combined to establish "global" PoC using a conditional error function. Our simulation studies showed good and more robust power in our design method compared to conventional and fixed designs.

  18. Risk Estimation for Lung Cancer in Libya: Analysis Based on Standardized Morbidity Ratio, Poisson-Gamma Model, BYM Model and Mixture Model

    PubMed

    Alhdiri, Maryam Ahmed; Samat, Nor Azah; Mohamed, Zulkifley

    2017-03-01

    Cancer is the most rapidly spreading disease in the world, especially in developing countries, including Libya. Cancer represents a significant burden on patients, families, and their societies. This disease can be controlled if detected early. Therefore, disease mapping has recently become an important method in the fields of public health research and disease epidemiology. The correct choice of statistical model is a very important step to producing a good map of a disease. Libya was selected to perform this work and to examine its geographical variation in the incidence of lung cancer. The objective of this paper is to estimate the relative risk for lung cancer. Four statistical models to estimate the relative risk for lung cancer and population censuses of the study area for the time period 2006 to 2011 were used in this work. They are initially known as Standardized Morbidity Ratio, which is the most popular statistic, which used in the field of disease mapping, Poisson-gamma model, which is one of the earliest applications of Bayesian methodology, Besag, York and Mollie (BYM) model and Mixture model. As an initial step, this study begins by providing a review of all proposed models, which we then apply to lung cancer data in Libya. Maps, tables and graph, goodness-of-fit (GOF) were used to compare and present the preliminary results. This GOF is common in statistical modelling to compare fitted models. The main general results presented in this study show that the Poisson-gamma model, BYM model, and Mixture model can overcome the problem of the first model (SMR) when there is no observed lung cancer case in certain districts. Results show that the Mixture model is most robust and provides better relative risk estimates across a range of models. Creative Commons Attribution License

  19. Risk Estimation for Lung Cancer in Libya: Analysis Based on Standardized Morbidity Ratio, Poisson-Gamma Model, BYM Model and Mixture Model

    PubMed Central

    Alhdiri, Maryam Ahmed; Samat, Nor Azah; Mohamed, Zulkifley

    2017-01-01

    Cancer is the most rapidly spreading disease in the world, especially in developing countries, including Libya. Cancer represents a significant burden on patients, families, and their societies. This disease can be controlled if detected early. Therefore, disease mapping has recently become an important method in the fields of public health research and disease epidemiology. The correct choice of statistical model is a very important step to producing a good map of a disease. Libya was selected to perform this work and to examine its geographical variation in the incidence of lung cancer. The objective of this paper is to estimate the relative risk for lung cancer. Four statistical models to estimate the relative risk for lung cancer and population censuses of the study area for the time period 2006 to 2011 were used in this work. They are initially known as Standardized Morbidity Ratio, which is the most popular statistic, which used in the field of disease mapping, Poisson-gamma model, which is one of the earliest applications of Bayesian methodology, Besag, York and Mollie (BYM) model and Mixture model. As an initial step, this study begins by providing a review of all proposed models, which we then apply to lung cancer data in Libya. Maps, tables and graph, goodness-of-fit (GOF) were used to compare and present the preliminary results. This GOF is common in statistical modelling to compare fitted models. The main general results presented in this study show that the Poisson-gamma model, BYM model, and Mixture model can overcome the problem of the first model (SMR) when there is no observed lung cancer case in certain districts. Results show that the Mixture model is most robust and provides better relative risk estimates across a range of models. PMID:28440974

  20. A Comparison of Forecast Error Generators for Modeling Wind and Load Uncertainty

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lu, Ning; Diao, Ruisheng; Hafen, Ryan P.

    2013-07-25

    This paper presents four algorithms to generate random forecast error time series. The performance of four algorithms is compared. The error time series are used to create real-time (RT), hour-ahead (HA), and day-ahead (DA) wind and load forecast time series that statistically match historically observed forecasting data sets used in power grid operation to study the net load balancing need in variable generation integration studies. The four algorithms are truncated-normal distribution models, state-space based Markov models, seasonal autoregressive moving average (ARMA) models, and a stochastic-optimization based approach. The comparison is made using historical DA load forecast and actual load valuesmore » to generate new sets of DA forecasts with similar stoical forecast error characteristics (i.e., mean, standard deviation, autocorrelation, and cross-correlation). The results show that all methods generate satisfactory results. One method may preserve one or two required statistical characteristics better the other methods, but may not preserve other statistical characteristics as well compared with the other methods. Because the wind and load forecast error generators are used in wind integration studies to produce wind and load forecasts time series for stochastic planning processes, it is sometimes critical to use multiple methods to generate the error time series to obtain a statistically robust result. Therefore, this paper discusses and compares the capabilities of each algorithm to preserve the characteristics of the historical forecast data sets.« less

  1. On Lack of Robustness in Hydrological Model Development Due to Absence of Guidelines for Selecting Calibration and Evaluation Data: Demonstration for Data-Driven Models

    NASA Astrophysics Data System (ADS)

    Zheng, Feifei; Maier, Holger R.; Wu, Wenyan; Dandy, Graeme C.; Gupta, Hoshin V.; Zhang, Tuqiao

    2018-02-01

    Hydrological models are used for a wide variety of engineering purposes, including streamflow forecasting and flood-risk estimation. To develop such models, it is common to allocate the available data to calibration and evaluation data subsets. Surprisingly, the issue of how this allocation can affect model evaluation performance has been largely ignored in the research literature. This paper discusses the evaluation performance bias that can arise from how available data are allocated to calibration and evaluation subsets. As a first step to assessing this issue in a statistically rigorous fashion, we present a comprehensive investigation of the influence of data allocation on the development of data-driven artificial neural network (ANN) models of streamflow. Four well-known formal data splitting methods are applied to 754 catchments from Australia and the U.S. to develop 902,483 ANN models. Results clearly show that the choice of the method used for data allocation has a significant impact on model performance, particularly for runoff data that are more highly skewed, highlighting the importance of considering the impact of data splitting when developing hydrological models. The statistical behavior of the data splitting methods investigated is discussed and guidance is offered on the selection of the most appropriate data splitting methods to achieve representative evaluation performance for streamflow data with different statistical properties. Although our results are obtained for data-driven models, they highlight the fact that this issue is likely to have a significant impact on all types of hydrological models, especially conceptual rainfall-runoff models.

  2. On the consistency of tomographically imaged lower mantle slabs

    NASA Astrophysics Data System (ADS)

    Shephard, Grace E.; Matthews, Kara J.; Hosseini, Kasra; Domeier, Mathew

    2017-04-01

    Over the last few decades numerous seismic tomography models have been published, each constructed with choices of data input, parameterization and reference model. The broader geoscience community is increasingly utilizing these models, or a selection thereof, to interpret Earth's mantle structure and processes. It follows that seismically identified remnants of subducted slabs have been used to validate, test or refine relative plate motions, absolute plate reference frames, and mantle sinking rates. With an increasing number of models to include, or exclude, the question arises - how robust is a given positive seismic anomaly, inferred to be a slab, across a given suite of tomography models? Here we generate a series of "vote maps" for the lower mantle by comparing 14 seismic tomography models, including 7 s-wave and 7 p-wave. Considerations include the retention or removal of the mean, the use of a consistent or variable reference model, the statistical value which defines the slab "contour", and the effect of depth interpolation. Preliminary results will be presented that address the depth, location and degree of agreement between seismic tomography models, both for the 14 combined, and between the p-waves and s-waves. The analysis also permits a broader discussion of slab volumes and subduction flux. And whilst the location and geometry of slabs, matches some the documented regions of long-lived subduction, other features do not, illustrating the importance of a robust approach to slab identification.

  3. Models are likely to underestimate increase in heavy rainfall in regions with high rainfall intensity

    NASA Astrophysics Data System (ADS)

    Borodina, Aleksandra; Fischer, Erich M.; Knutti, Reto

    2017-04-01

    Model projections of heavy rainfall are uncertain. On timescales of few decades, internal variability plays an important role and therefore poses a challenge to detect robust model responses. We show that spatial aggregation across regions with intense heavy rainfall events, - defined as grid cells with high annual precipitation maxima (Rx1day), - allows to reduce the role of internal variability and thus to detect a robust signal during the historical period. This enables us to evaluate models with observational datasets and to constrain long-term projections of the intensification of heavy rainfall, i.e., to recalibrate full model ensemble consistent with observations resulting in narrower range of projections. In the regions of intense heavy rainfall, we found two present-day metrics that are related to a model's projection. The first metric is the observed relationship between the area-weighted mean of the annual precipitation maxima (Rx1day) and the global land temperatures. The second is the fraction of land exhibiting statistically significant relationships between local annual precipitation maxima (Rx1day) and global land temperatures over the historical period. The models that simulate high values in both metrics are those that are in better agreement with observations and show strong future intensification of heavy rainfall. This implies that changes in heavy rainfall are likely to be more intense than anticipated from the multi-model mean.

  4. AG Channel Measurement and Modeling Results for Over-Water and Hilly Terrain Conditions

    NASA Technical Reports Server (NTRS)

    Matolak, David W.; Sun, Ruoyu

    2015-01-01

    This report describes work completed over the past year on our project, entitled "Unmanned Aircraft Systems (UAS) Research: The AG Channel, Robust Waveforms, and Aeronautical Network Simulations." This project is funded under the NASA project "Unmanned Aircraft Systems (UAS) in the National Airspace System (NAS)." In this report we provide the following: an update on project progress; a description of the over-freshwater and hilly terrain initial results on path loss, delay spread, small-scale fading, and correlations; complete path loss models for the over-water AG channels; analysis for obtaining parameter statistics required for development of accurate wideband AG channel models; and analysis of an atypical AG channel in which the aircraft flies out of the ground site antenna main beam. We have modeled the small-scale fading of these channels with Ricean statistics, and have quantified the behavior of the Ricean K-factor. We also provide some results for correlations of signal components, both intra-band and inter-band. An updated literature review, and a summary that also describes future work, are also included.

  5. Cocaine Dependence Treatment Data: Methods for Measurement Error Problems With Predictors Derived From Stationary Stochastic Processes

    PubMed Central

    Guan, Yongtao; Li, Yehua; Sinha, Rajita

    2011-01-01

    In a cocaine dependence treatment study, we use linear and nonlinear regression models to model posttreatment cocaine craving scores and first cocaine relapse time. A subset of the covariates are summary statistics derived from baseline daily cocaine use trajectories, such as baseline cocaine use frequency and average daily use amount. These summary statistics are subject to estimation error and can therefore cause biased estimators for the regression coefficients. Unlike classical measurement error problems, the error we encounter here is heteroscedastic with an unknown distribution, and there are no replicates for the error-prone variables or instrumental variables. We propose two robust methods to correct for the bias: a computationally efficient method-of-moments-based method for linear regression models and a subsampling extrapolation method that is generally applicable to both linear and nonlinear regression models. Simulations and an application to the cocaine dependence treatment data are used to illustrate the efficacy of the proposed methods. Asymptotic theory and variance estimation for the proposed subsampling extrapolation method and some additional simulation results are described in the online supplementary material. PMID:21984854

  6. Manufacturing Execution Systems: Examples of Performance Indicator and Operational Robustness Tools.

    PubMed

    Gendre, Yannick; Waridel, Gérard; Guyon, Myrtille; Demuth, Jean-François; Guelpa, Hervé; Humbert, Thierry

    Manufacturing Execution Systems (MES) are computerized systems used to measure production performance in terms of productivity, yield, and quality. In the first part, performance indicator and overall equipment effectiveness (OEE), process robustness tools and statistical process control are described. The second part details some tools to help process robustness and control by operators by preventing deviations from target control charts. MES was developed by Syngenta together with CIMO for automation.

  7. Multi-criterion model ensemble of CMIP5 surface air temperature over China

    NASA Astrophysics Data System (ADS)

    Yang, Tiantian; Tao, Yumeng; Li, Jingjing; Zhu, Qian; Su, Lu; He, Xiaojia; Zhang, Xiaoming

    2018-05-01

    The global circulation models (GCMs) are useful tools for simulating climate change, projecting future temperature changes, and therefore, supporting the preparation of national climate adaptation plans. However, different GCMs are not always in agreement with each other over various regions. The reason is that GCMs' configurations, module characteristics, and dynamic forcings vary from one to another. Model ensemble techniques are extensively used to post-process the outputs from GCMs and improve the variability of model outputs. Root-mean-square error (RMSE), correlation coefficient (CC, or R) and uncertainty are commonly used statistics for evaluating the performances of GCMs. However, the simultaneous achievements of all satisfactory statistics cannot be guaranteed in using many model ensemble techniques. In this paper, we propose a multi-model ensemble framework, using a state-of-art evolutionary multi-objective optimization algorithm (termed MOSPD), to evaluate different characteristics of ensemble candidates and to provide comprehensive trade-off information for different model ensemble solutions. A case study of optimizing the surface air temperature (SAT) ensemble solutions over different geographical regions of China is carried out. The data covers from the period of 1900 to 2100, and the projections of SAT are analyzed with regard to three different statistical indices (i.e., RMSE, CC, and uncertainty). Among the derived ensemble solutions, the trade-off information is further analyzed with a robust Pareto front with respect to different statistics. The comparison results over historical period (1900-2005) show that the optimized solutions are superior over that obtained simple model average, as well as any single GCM output. The improvements of statistics are varying for different climatic regions over China. Future projection (2006-2100) with the proposed ensemble method identifies that the largest (smallest) temperature changes will happen in the South Central China (the Inner Mongolia), the North Eastern China (the South Central China), and the North Western China (the South Central China), under RCP 2.6, RCP 4.5, and RCP 8.5 scenarios, respectively.

  8. Predicting carcinogenicity of diverse chemicals using probabilistic neural network modeling approaches

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Singh, Kunwar P., E-mail: kpsingh_52@yahoo.com; Environmental Chemistry Division, CSIR-Indian Institute of Toxicology Research, Post Box 80, Mahatma Gandhi Marg, Lucknow 226 001; Gupta, Shikha

    Robust global models capable of discriminating positive and non-positive carcinogens; and predicting carcinogenic potency of chemicals in rodents were developed. The dataset of 834 structurally diverse chemicals extracted from Carcinogenic Potency Database (CPDB) was used which contained 466 positive and 368 non-positive carcinogens. Twelve non-quantum mechanical molecular descriptors were derived. Structural diversity of the chemicals and nonlinearity in the data were evaluated using Tanimoto similarity index and Brock–Dechert–Scheinkman statistics. Probabilistic neural network (PNN) and generalized regression neural network (GRNN) models were constructed for classification and function optimization problems using the carcinogenicity end point in rat. Validation of the models wasmore » performed using the internal and external procedures employing a wide series of statistical checks. PNN constructed using five descriptors rendered classification accuracy of 92.09% in complete rat data. The PNN model rendered classification accuracies of 91.77%, 80.70% and 92.08% in mouse, hamster and pesticide data, respectively. The GRNN constructed with nine descriptors yielded correlation coefficient of 0.896 between the measured and predicted carcinogenic potency with mean squared error (MSE) of 0.44 in complete rat data. The rat carcinogenicity model (GRNN) applied to the mouse and hamster data yielded correlation coefficient and MSE of 0.758, 0.71 and 0.760, 0.46, respectively. The results suggest for wide applicability of the inter-species models in predicting carcinogenic potency of chemicals. Both the PNN and GRNN (inter-species) models constructed here can be useful tools in predicting the carcinogenicity of new chemicals for regulatory purposes. - Graphical abstract: Figure (a) shows classification accuracies (positive and non-positive carcinogens) in rat, mouse, hamster, and pesticide data yielded by optimal PNN model. Figure (b) shows generalization and predictive abilities of the interspecies GRNN model to predict the carcinogenic potency of diverse chemicals. - Highlights: • Global robust models constructed for carcinogenicity prediction of diverse chemicals. • Tanimoto/BDS test revealed structural diversity of chemicals and nonlinearity in data. • PNN/GRNN successfully predicted carcinogenicity/carcinogenic potency of chemicals. • Developed interspecies PNN/GRNN models for carcinogenicity prediction. • Proposed models can be used as tool to predict carcinogenicity of new chemicals.« less

  9. Modeling landslide susceptibility in data-scarce environments using optimized data mining and statistical methods

    NASA Astrophysics Data System (ADS)

    Lee, Jung-Hyun; Sameen, Maher Ibrahim; Pradhan, Biswajeet; Park, Hyuck-Jin

    2018-02-01

    This study evaluated the generalizability of five models to select a suitable approach for landslide susceptibility modeling in data-scarce environments. In total, 418 landslide inventories and 18 landslide conditioning factors were analyzed. Multicollinearity and factor optimization were investigated before data modeling, and two experiments were then conducted. In each experiment, five susceptibility maps were produced based on support vector machine (SVM), random forest (RF), weight-of-evidence (WoE), ridge regression (Rid_R), and robust regression (RR) models. The highest accuracy (AUC = 0.85) was achieved with the SVM model when either the full or limited landslide inventories were used. Furthermore, the RF and WoE models were severely affected when less landslide samples were used for training. The other models were affected slightly when the training samples were limited.

  10. A prospective earthquake forecast experiment in the western Pacific

    NASA Astrophysics Data System (ADS)

    Eberhard, David A. J.; Zechar, J. Douglas; Wiemer, Stefan

    2012-09-01

    Since the beginning of 2009, the Collaboratory for the Study of Earthquake Predictability (CSEP) has been conducting an earthquake forecast experiment in the western Pacific. This experiment is an extension of the Kagan-Jackson experiments begun 15 years earlier and is a prototype for future global earthquake predictability experiments. At the beginning of each year, seismicity models make a spatially gridded forecast of the number of Mw≥ 5.8 earthquakes expected in the next year. For the three participating statistical models, we analyse the first two years of this experiment. We use likelihood-based metrics to evaluate the consistency of the forecasts with the observed target earthquakes and we apply measures based on Student's t-test and the Wilcoxon signed-rank test to compare the forecasts. Overall, a simple smoothed seismicity model (TripleS) performs the best, but there are some exceptions that indicate continued experiments are vital to fully understand the stability of these models, the robustness of model selection and, more generally, earthquake predictability in this region. We also estimate uncertainties in our results that are caused by uncertainties in earthquake location and seismic moment. Our uncertainty estimates are relatively small and suggest that the evaluation metrics are relatively robust. Finally, we consider the implications of our results for a global earthquake forecast experiment.

  11. Robust optimization of supersonic ORC nozzle guide vanes

    NASA Astrophysics Data System (ADS)

    Bufi, Elio A.; Cinnella, Paola

    2017-03-01

    An efficient Robust Optimization (RO) strategy is developed for the design of 2D supersonic Organic Rankine Cycle turbine expanders. The dense gas effects are not-negligible for this application and they are taken into account describing the thermodynamics by means of the Peng-Robinson-Stryjek-Vera equation of state. The design methodology combines an Uncertainty Quantification (UQ) loop based on a Bayesian kriging model of the system response to the uncertain parameters, used to approximate statistics (mean and variance) of the uncertain system output, a CFD solver, and a multi-objective non-dominated sorting algorithm (NSGA), also based on a Kriging surrogate of the multi-objective fitness function, along with an adaptive infill strategy for surrogate enrichment at each generation of the NSGA. The objective functions are the average and variance of the isentropic efficiency. The blade shape is parametrized by means of a Free Form Deformation (FFD) approach. The robust optimal blades are compared to the baseline design (based on the Method of Characteristics) and to a blade obtained by means of a deterministic CFD-based optimization.

  12. Validating Coherence Measurements Using Aligned and Unaligned Coherence Functions

    NASA Technical Reports Server (NTRS)

    Miles, Jeffrey Hilton

    2006-01-01

    This paper describes a novel approach based on the use of coherence functions and statistical theory for sensor validation in a harsh environment. By the use of aligned and unaligned coherence functions and statistical theory one can test for sensor degradation, total sensor failure or changes in the signal. This advanced diagnostic approach and the novel data processing methodology discussed provides a single number that conveys this information. This number as calculated with standard statistical procedures for comparing the means of two distributions is compared with results obtained using Yuen's robust statistical method to create confidence intervals. Examination of experimental data from Kulite pressure transducers mounted in a Pratt & Whitney PW4098 combustor using spectrum analysis methods on aligned and unaligned time histories has verified the effectiveness of the proposed method. All the procedures produce good results which demonstrates how robust the technique is.

  13. Large ensemble modeling of the last deglacial retreat of the West Antarctic Ice Sheet: comparison of simple and advanced statistical techniques

    NASA Astrophysics Data System (ADS)

    Pollard, David; Chang, Won; Haran, Murali; Applegate, Patrick; DeConto, Robert

    2016-05-01

    A 3-D hybrid ice-sheet model is applied to the last deglacial retreat of the West Antarctic Ice Sheet over the last ˜ 20 000 yr. A large ensemble of 625 model runs is used to calibrate the model to modern and geologic data, including reconstructed grounding lines, relative sea-level records, elevation-age data and uplift rates, with an aggregate score computed for each run that measures overall model-data misfit. Two types of statistical methods are used to analyze the large-ensemble results: simple averaging weighted by the aggregate score, and more advanced Bayesian techniques involving Gaussian process-based emulation and calibration, and Markov chain Monte Carlo. The analyses provide sea-level-rise envelopes with well-defined parametric uncertainty bounds, but the simple averaging method only provides robust results with full-factorial parameter sampling in the large ensemble. Results for best-fit parameter ranges and envelopes of equivalent sea-level rise with the simple averaging method agree well with the more advanced techniques. Best-fit parameter ranges confirm earlier values expected from prior model tuning, including large basal sliding coefficients on modern ocean beds.

  14. Change Detection in Rough Time Series

    DTIC Science & Technology

    2014-09-01

    Business Statistics : An Inferential Approach, Dellen: San Francisco. [18] Winston, W. (1997) Operations Research Applications and Algorithms, Duxbury...distribution that can present significant challenges to conventional statistical tracking techniques. To address this problem the proposed method...applies hybrid fuzzy statistical techniques to series granules instead of to individual measures. Three examples demonstrated the robust nature of the

  15. Robustness of Type I Error and Power in Set Correlation Analysis of Contingency Tables.

    ERIC Educational Resources Information Center

    Cohen, Jacob; Nee, John C. M.

    1990-01-01

    The analysis of contingency tables via set correlation allows the assessment of subhypotheses involving contrast functions of the categories of the nominal scales. The robustness of such methods with regard to Type I error and statistical power was studied via a Monte Carlo experiment. (TJH)

  16. Robust inference for group sequential trials.

    PubMed

    Ganju, Jitendra; Lin, Yunzhi; Zhou, Kefei

    2017-03-01

    For ethical reasons, group sequential trials were introduced to allow trials to stop early in the event of extreme results. Endpoints in such trials are usually mortality or irreversible morbidity. For a given endpoint, the norm is to use a single test statistic and to use that same statistic for each analysis. This approach is risky because the test statistic has to be specified before the study is unblinded, and there is loss in power if the assumptions that ensure optimality for each analysis are not met. To minimize the risk of moderate to substantial loss in power due to a suboptimal choice of a statistic, a robust method was developed for nonsequential trials. The concept is analogous to diversification of financial investments to minimize risk. The method is based on combining P values from multiple test statistics for formal inference while controlling the type I error rate at its designated value.This article evaluates the performance of 2 P value combining methods for group sequential trials. The emphasis is on time to event trials although results from less complex trials are also included. The gain or loss in power with the combination method relative to a single statistic is asymmetric in its favor. Depending on the power of each individual test, the combination method can give more power than any single test or give power that is closer to the test with the most power. The versatility of the method is that it can combine P values from different test statistics for analysis at different times. The robustness of results suggests that inference from group sequential trials can be strengthened with the use of combined tests. Copyright © 2017 John Wiley & Sons, Ltd.

  17. Development of the Statistical Reasoning in Biology Concept Inventory (SRBCI)

    PubMed Central

    Deane, Thomas; Nomme, Kathy; Jeffery, Erica; Pollock, Carol; Birol, Gülnur

    2016-01-01

    We followed established best practices in concept inventory design and developed a 12-item inventory to assess student ability in statistical reasoning in biology (Statistical Reasoning in Biology Concept Inventory [SRBCI]). It is important to assess student thinking in this conceptual area, because it is a fundamental requirement of being statistically literate and associated skills are needed in almost all walks of life. Despite this, previous work shows that non–expert-like thinking in statistical reasoning is common, even after instruction. As science educators, our goal should be to move students along a novice-to-expert spectrum, which could be achieved with growing experience in statistical reasoning. We used item response theory analyses (the one-parameter Rasch model and associated analyses) to assess responses gathered from biology students in two populations at a large research university in Canada in order to test SRBCI’s robustness and sensitivity in capturing useful data relating to the students’ conceptual ability in statistical reasoning. Our analyses indicated that SRBCI is a unidimensional construct, with items that vary widely in difficulty and provide useful information about such student ability. SRBCI should be useful as a diagnostic tool in a variety of biology settings and as a means of measuring the success of teaching interventions designed to improve statistical reasoning skills. PMID:26903497

  18. Exploring the topological sources of robustness against invasion in biological and technological networks.

    PubMed

    Alcalde Cuesta, Fernando; González Sequeiros, Pablo; Lozano Rojo, Álvaro

    2016-02-10

    For a network, the accomplishment of its functions despite perturbations is called robustness. Although this property has been extensively studied, in most cases, the network is modified by removing nodes. In our approach, it is no longer perturbed by site percolation, but evolves after site invasion. The process transforming resident/healthy nodes into invader/mutant/diseased nodes is described by the Moran model. We explore the sources of robustness (or its counterpart, the propensity to spread favourable innovations) of the US high-voltage power grid network, the Internet2 academic network, and the C. elegans connectome. We compare them to three modular and non-modular benchmark networks, and samples of one thousand random networks with the same degree distribution. It is found that, contrary to what happens with networks of small order, fixation probability and robustness are poorly correlated with most of standard statistics, but they depend strongly on the degree distribution. While community detection techniques are able to detect the existence of a central core in Internet2, they are not effective in detecting hierarchical structures whose topological complexity arises from the repetition of a few rules. Box counting dimension and Rent's rule are applied to show a subtle trade-off between topological and wiring complexity.

  19. Exploring the topological sources of robustness against invasion in biological and technological networks

    PubMed Central

    Alcalde Cuesta, Fernando; González Sequeiros, Pablo; Lozano Rojo, Álvaro

    2016-01-01

    For a network, the accomplishment of its functions despite perturbations is called robustness. Although this property has been extensively studied, in most cases, the network is modified by removing nodes. In our approach, it is no longer perturbed by site percolation, but evolves after site invasion. The process transforming resident/healthy nodes into invader/mutant/diseased nodes is described by the Moran model. We explore the sources of robustness (or its counterpart, the propensity to spread favourable innovations) of the US high-voltage power grid network, the Internet2 academic network, and the C. elegans connectome. We compare them to three modular and non-modular benchmark networks, and samples of one thousand random networks with the same degree distribution. It is found that, contrary to what happens with networks of small order, fixation probability and robustness are poorly correlated with most of standard statistics, but they depend strongly on the degree distribution. While community detection techniques are able to detect the existence of a central core in Internet2, they are not effective in detecting hierarchical structures whose topological complexity arises from the repetition of a few rules. Box counting dimension and Rent’s rule are applied to show a subtle trade-off between topological and wiring complexity. PMID:26861189

  20. QSAR modeling for anti-human African trypanosomiasis activity of substituted 2-Phenylimidazopyridines

    NASA Astrophysics Data System (ADS)

    Masand, Vijay H.; El-Sayed, Nahed N. E.; Mahajan, Devidas T.; Mercader, Andrew G.; Alafeefy, Ahmed M.; Shibi, I. G.

    2017-02-01

    In the present work, sixty substituted 2-Phenylimidazopyridines previously reported with potent anti-human African trypanosomiasis (HAT) activity were selected to build genetic algorithm (GA) based QSAR models to determine the structural features that have significant correlation with the activity. Multiple QSAR models were built using easily interpretable descriptors that are directly associated with the presence or the absence of a structural scaffold, or a specific atom. All the QSAR models have been thoroughly validated according to the OECD principles. All the QSAR models are statistically very robust (R2 = 0.80-0.87) with high external predictive ability (CCCex = 0.81-0.92). The QSAR analysis reveals that the HAT activity has good correlation with the presence of five membered rings in the molecule.

  1. Wave Propagation in Non-Stationary Statistical Mantle Models at the Global Scale

    NASA Astrophysics Data System (ADS)

    Meschede, M.; Romanowicz, B. A.

    2014-12-01

    We study the effect of statistically distributed heterogeneities that are smaller than the resolution of current tomographic models on seismic waves that propagate through the Earth's mantle at teleseismic distances. Current global tomographic models are missing small-scale structure as evidenced by the failure of even accurate numerical synthetics to explain enhanced coda in observed body and surface waveforms. One way to characterize small scale heterogeneity is to construct random models and confront observed coda waveforms with predictions from these models. Statistical studies of the coda typically rely on models with simplified isotropic and stationary correlation functions in Cartesian geometries. We show how to construct more complex random models for the mantle that can account for arbitrary non-stationary and anisotropic correlation functions as well as for complex geometries. Although this method is computationally heavy, model characteristics such as translational, cylindrical or spherical symmetries can be used to greatly reduce the complexity such that this method becomes practical. With this approach, we can create 3D models of the full spherical Earth that can be radially anisotropic, i.e. with different horizontal and radial correlation functions, and radially non-stationary, i.e. with radially varying model power and correlation functions. Both of these features are crucial for a statistical description of the mantle in which structure depends to first order on the spherical geometry of the Earth. We combine different random model realizations of S velocity with current global tomographic models that are robust at long wavelengths (e.g. Meschede and Romanowicz, 2014, GJI submitted), and compute the effects of these hybrid models on the wavefield with a spectral element code (SPECFEM3D_GLOBE). We finally analyze the resulting coda waves for our model selection and compare our computations with observations. Based on these observations, we make predictions about the strength of unresolved small-scale structure and extrinsic attenuation.

  2. System level modeling and component level control of fuel cells

    NASA Astrophysics Data System (ADS)

    Xue, Xingjian

    This dissertation investigates the fuel cell systems and the related technologies in three aspects: (1) system-level dynamic modeling of both PEM fuel cell (PEMFC) and solid oxide fuel cell (SOFC); (2) condition monitoring scheme development of PEM fuel cell system using model-based statistical method; and (3) strategy and algorithm development of precision control with potential application in energy systems. The dissertation first presents a system level dynamic modeling strategy for PEM fuel cells. It is well known that water plays a critical role in PEM fuel cell operations. It makes the membrane function appropriately and improves the durability. The low temperature operating conditions, however, impose modeling difficulties in characterizing the liquid-vapor two phase change phenomenon, which becomes even more complex under dynamic operating conditions. This dissertation proposes an innovative method to characterize this phenomenon, and builds a comprehensive model for PEM fuel cell at the system level. The model features the complete characterization of multi-physics dynamic coupling effects with the inclusion of dynamic phase change. The model is validated using Ballard stack experimental result from open literature. The system behavior and the internal coupling effects are also investigated using this model under various operating conditions. Anode-supported tubular SOFC is also investigated in the dissertation. While the Nernst potential plays a central role in characterizing the electrochemical performance, the traditional Nernst equation may lead to incorrect analysis results under dynamic operating conditions due to the current reverse flow phenomenon. This dissertation presents a systematic study in this regard to incorporate a modified Nernst potential expression and the heat/mass transfer into the analysis. The model is used to investigate the limitations and optimal results of various operating conditions; it can also be utilized to perform the optimal design of tubular SOFC. With the system-level dynamic model as a basis, a framework for the robust, online monitoring of PEM fuel cell is developed in the dissertation. The monitoring scheme employs the Hotelling T2 based statistical scheme to handle the measurement noise and system uncertainties and identifies the fault conditions through a series of self-checking and conformal testing. A statistical sampling strategy is also utilized to improve the computation efficiency. Fuel/gas flow control is the fundamental operation for fuel cell energy systems. In the final part of the dissertation, a high-precision and robust tracking control scheme using piezoelectric actuator circuit with direct hysteresis compensation is developed. The key characteristic of the developed control algorithm includes the nonlinear continuous control action with the adaptive boundary layer strategy.

  3. Robust inference from multiple test statistics via permutations: a better alternative to the single test statistic approach for randomized trials.

    PubMed

    Ganju, Jitendra; Yu, Xinxin; Ma, Guoguang Julie

    2013-01-01

    Formal inference in randomized clinical trials is based on controlling the type I error rate associated with a single pre-specified statistic. The deficiency of using just one method of analysis is that it depends on assumptions that may not be met. For robust inference, we propose pre-specifying multiple test statistics and relying on the minimum p-value for testing the null hypothesis of no treatment effect. The null hypothesis associated with the various test statistics is that the treatment groups are indistinguishable. The critical value for hypothesis testing comes from permutation distributions. Rejection of the null hypothesis when the smallest p-value is less than the critical value controls the type I error rate at its designated value. Even if one of the candidate test statistics has low power, the adverse effect on the power of the minimum p-value statistic is not much. Its use is illustrated with examples. We conclude that it is better to rely on the minimum p-value rather than a single statistic particularly when that single statistic is the logrank test, because of the cost and complexity of many survival trials. Copyright © 2013 John Wiley & Sons, Ltd.

  4. A microscopic model of the Stokes-Einstein relation in arbitrary dimension.

    PubMed

    Charbonneau, Benoit; Charbonneau, Patrick; Szamel, Grzegorz

    2018-06-14

    The Stokes-Einstein relation (SER) is one of the most robust and widely employed results from the theory of liquids. Yet sizable deviations can be observed for self-solvation, which cannot be explained by the standard hydrodynamic derivation. Here, we revisit the work of Masters and Madden [J. Chem. Phys. 74, 2450-2459 (1981)], who first solved a statistical mechanics model of the SER using the projection operator formalism. By generalizing their analysis to all spatial dimensions and to partially structured solvents, we identify a potential microscopic origin of some of these deviations. We also reproduce the SER-like result from the exact dynamics of infinite-dimensional fluids.

  5. Semantic Indexing of Multimedia Content Using Visual, Audio, and Text Cues

    NASA Astrophysics Data System (ADS)

    Adams, W. H.; Iyengar, Giridharan; Lin, Ching-Yung; Naphade, Milind Ramesh; Neti, Chalapathy; Nock, Harriet J.; Smith, John R.

    2003-12-01

    We present a learning-based approach to the semantic indexing of multimedia content using cues derived from audio, visual, and text features. We approach the problem by developing a set of statistical models for a predefined lexicon. Novel concepts are then mapped in terms of the concepts in the lexicon. To achieve robust detection of concepts, we exploit features from multiple modalities, namely, audio, video, and text. Concept representations are modeled using Gaussian mixture models (GMM), hidden Markov models (HMM), and support vector machines (SVM). Models such as Bayesian networks and SVMs are used in a late-fusion approach to model concepts that are not explicitly modeled in terms of features. Our experiments indicate promise in the proposed classification and fusion methodologies: our proposed fusion scheme achieves more than 10% relative improvement over the best unimodal concept detector.

  6. Crises and Collective Socio-Economic Phenomena: Simple Models and Challenges

    NASA Astrophysics Data System (ADS)

    Bouchaud, Jean-Philippe

    2013-05-01

    Financial and economic history is strewn with bubbles and crashes, booms and busts, crises and upheavals of all sorts. Understanding the origin of these events is arguably one of the most important problems in economic theory. In this paper, we review recent efforts to include heterogeneities and interactions in models of decision. We argue that the so-called Random Field Ising model ( rfim) provides a unifying framework to account for many collective socio-economic phenomena that lead to sudden ruptures and crises. We discuss different models that can capture potentially destabilizing self-referential feedback loops, induced either by herding, i.e. reference to peers, or trending, i.e. reference to the past, and that account for some of the phenomenology missing in the standard models. We discuss some empirically testable predictions of these models, for example robust signatures of rfim-like herding effects, or the logarithmic decay of spatial correlations of voting patterns. One of the most striking result, inspired by statistical physics methods, is that Adam Smith's invisible hand can fail badly at solving simple coordination problems. We also insist on the issue of time-scales, that can be extremely long in some cases, and prevent socially optimal equilibria from being reached. As a theoretical challenge, the study of so-called "detailed-balance" violating decision rules is needed to decide whether conclusions based on current models (that all assume detailed-balance) are indeed robust and generic.

  7. The case for increasing the statistical power of eddy covariance ecosystem studies: why, where and how?

    PubMed

    Hill, Timothy; Chocholek, Melanie; Clement, Robert

    2017-06-01

    Eddy covariance (EC) continues to provide invaluable insights into the dynamics of Earth's surface processes. However, despite its many strengths, spatial replication of EC at the ecosystem scale is rare. High equipment costs are likely to be partially responsible. This contributes to the low sampling, and even lower replication, of ecoregions in Africa, Oceania (excluding Australia) and South America. The level of replication matters as it directly affects statistical power. While the ergodicity of turbulence and temporal replication allow an EC tower to provide statistically robust flux estimates for its footprint, these principles do not extend to larger ecosystem scales. Despite the challenge of spatially replicating EC, it is clearly of interest to be able to use EC to provide statistically robust flux estimates for larger areas. We ask: How much spatial replication of EC is required for statistical confidence in our flux estimates of an ecosystem? We provide the reader with tools to estimate the number of EC towers needed to achieve a given statistical power. We show that for a typical ecosystem, around four EC towers are needed to have 95% statistical confidence that the annual flux of an ecosystem is nonzero. Furthermore, if the true flux is small relative to instrument noise and spatial variability, the number of towers needed can rise dramatically. We discuss approaches for improving statistical power and describe one solution: an inexpensive EC system that could help by making spatial replication more affordable. However, we note that diverting limited resources from other key measurements in order to allow spatial replication may not be optimal, and a balance needs to be struck. While individual EC towers are well suited to providing fluxes from the flux footprint, we emphasize that spatial replication is essential for statistically robust fluxes if a wider ecosystem is being studied. © 2016 The Authors Global Change Biology Published by John Wiley & Sons Ltd.

  8. Optimizing ELISAs for precision and robustness using laboratory automation and statistical design of experiments.

    PubMed

    Joelsson, Daniel; Moravec, Phil; Troutman, Matthew; Pigeon, Joseph; DePhillips, Pete

    2008-08-20

    Transferring manual ELISAs to automated platforms requires optimizing the assays for each particular robotic platform. These optimization experiments are often time consuming and difficult to perform using a traditional one-factor-at-a-time strategy. In this manuscript we describe the development of an automated process using statistical design of experiments (DOE) to quickly optimize immunoassays for precision and robustness on the Tecan EVO liquid handler. By using fractional factorials and a split-plot design, five incubation time variables and four reagent concentration variables can be optimized in a short period of time.

  9. DARHT Multi-intelligence Seismic and Acoustic Data Analysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Stevens, Garrison Nicole; Van Buren, Kendra Lu; Hemez, Francois M.

    The purpose of this report is to document the analysis of seismic and acoustic data collected at the Dual-Axis Radiographic Hydrodynamic Test (DARHT) facility at Los Alamos National Laboratory for robust, multi-intelligence decision making. The data utilized herein is obtained from two tri-axial seismic sensors and three acoustic sensors, resulting in a total of nine data channels. The goal of this analysis is to develop a generalized, automated framework to determine internal operations at DARHT using informative features extracted from measurements collected external of the facility. Our framework involves four components: (1) feature extraction, (2) data fusion, (3) classification, andmore » finally (4) robustness analysis. Two approaches are taken for extracting features from the data. The first of these, generic feature extraction, involves extraction of statistical features from the nine data channels. The second approach, event detection, identifies specific events relevant to traffic entering and leaving the facility as well as explosive activities at DARHT and nearby explosive testing sites. Event detection is completed using a two stage method, first utilizing signatures in the frequency domain to identify outliers and second extracting short duration events of interest among these outliers by evaluating residuals of an autoregressive exogenous time series model. Features extracted from each data set are then fused to perform analysis with a multi-intelligence paradigm, where information from multiple data sets are combined to generate more information than available through analysis of each independently. The fused feature set is used to train a statistical classifier and predict the state of operations to inform a decision maker. We demonstrate this classification using both generic statistical features and event detection and provide a comparison of the two methods. Finally, the concept of decision robustness is presented through a preliminary analysis where uncertainty is added to the system through noise in the measurements.« less

  10. Effects and detection of raw material variability on the performance of near-infrared calibration models for pharmaceutical products.

    PubMed

    Igne, Benoit; Shi, Zhenqi; Drennen, James K; Anderson, Carl A

    2014-02-01

    The impact of raw material variability on the prediction ability of a near-infrared calibration model was studied. Calibrations, developed from a quaternary mixture design comprising theophylline anhydrous, lactose monohydrate, microcrystalline cellulose, and soluble starch, were challenged by intentional variation of raw material properties. A design with two theophylline physical forms, three lactose particle sizes, and two starch manufacturers was created to test model robustness. Further challenges to the models were accomplished through environmental conditions. Along with full-spectrum partial least squares (PLS) modeling, variable selection by dynamic backward PLS and genetic algorithms was utilized in an effort to mitigate the effects of raw material variability. In addition to evaluating models based on their prediction statistics, prediction residuals were analyzed by analyses of variance and model diagnostics (Hotelling's T(2) and Q residuals). Full-spectrum models were significantly affected by lactose particle size. Models developed by selecting variables gave lower prediction errors and proved to be a good approach to limit the effect of changing raw material characteristics. Hotelling's T(2) and Q residuals provided valuable information that was not detectable when studying only prediction trends. Diagnostic statistics were demonstrated to be critical in the appropriate interpretation of the prediction of quality parameters. © 2013 Wiley Periodicals, Inc. and the American Pharmacists Association.

  11. No-Reference Video Quality Assessment Based on Statistical Analysis in 3D-DCT Domain.

    PubMed

    Li, Xuelong; Guo, Qun; Lu, Xiaoqiang

    2016-05-13

    It is an important task to design models for universal no-reference video quality assessment (NR-VQA) in multiple video processing and computer vision applications. However, most existing NR-VQA metrics are designed for specific distortion types which are not often aware in practical applications. A further deficiency is that the spatial and temporal information of videos is hardly considered simultaneously. In this paper, we propose a new NR-VQA metric based on the spatiotemporal natural video statistics (NVS) in 3D discrete cosine transform (3D-DCT) domain. In the proposed method, a set of features are firstly extracted based on the statistical analysis of 3D-DCT coefficients to characterize the spatiotemporal statistics of videos in different views. These features are used to predict the perceived video quality via the efficient linear support vector regression (SVR) model afterwards. The contributions of this paper are: 1) we explore the spatiotemporal statistics of videos in 3DDCT domain which has the inherent spatiotemporal encoding advantage over other widely used 2D transformations; 2) we extract a small set of simple but effective statistical features for video visual quality prediction; 3) the proposed method is universal for multiple types of distortions and robust to different databases. The proposed method is tested on four widely used video databases. Extensive experimental results demonstrate that the proposed method is competitive with the state-of-art NR-VQA metrics and the top-performing FR-VQA and RR-VQA metrics.

  12. OPTIMA: sensitive and accurate whole-genome alignment of error-prone genomic maps by combinatorial indexing and technology-agnostic statistical analysis.

    PubMed

    Verzotto, Davide; M Teo, Audrey S; Hillmer, Axel M; Nagarajan, Niranjan

    2016-01-01

    Resolution of complex repeat structures and rearrangements in the assembly and analysis of large eukaryotic genomes is often aided by a combination of high-throughput sequencing and genome-mapping technologies (for example, optical restriction mapping). In particular, mapping technologies can generate sparse maps of large DNA fragments (150 kilo base pairs (kbp) to 2 Mbp) and thus provide a unique source of information for disambiguating complex rearrangements in cancer genomes. Despite their utility, combining high-throughput sequencing and mapping technologies has been challenging because of the lack of efficient and sensitive map-alignment algorithms for robustly aligning error-prone maps to sequences. We introduce a novel seed-and-extend glocal (short for global-local) alignment method, OPTIMA (and a sliding-window extension for overlap alignment, OPTIMA-Overlap), which is the first to create indexes for continuous-valued mapping data while accounting for mapping errors. We also present a novel statistical model, agnostic with respect to technology-dependent error rates, for conservatively evaluating the significance of alignments without relying on expensive permutation-based tests. We show that OPTIMA and OPTIMA-Overlap outperform other state-of-the-art approaches (1.6-2 times more sensitive) and are more efficient (170-200 %) and precise in their alignments (nearly 99 % precision). These advantages are independent of the quality of the data, suggesting that our indexing approach and statistical evaluation are robust, provide improved sensitivity and guarantee high precision.

  13. 3D shape recovery from image focus using gray level co-occurrence matrix

    NASA Astrophysics Data System (ADS)

    Mahmood, Fahad; Munir, Umair; Mehmood, Fahad; Iqbal, Javaid

    2018-04-01

    Recovering a precise and accurate 3-D shape of the target object utilizing robust 3-D shape recovery algorithm is an ultimate objective of computer vision community. Focus measure algorithm plays an important role in this architecture which convert the color values of each pixel of the acquired 2-D image dataset into corresponding focus values. After convolving the focus measure filter with the input 2-D image dataset, a 3-D shape recovery approach is applied which will recover the depth map. In this document, we are concerned with proposing Gray Level Co-occurrence Matrix along with its statistical features for computing the focus information of the image dataset. The Gray Level Co-occurrence Matrix quantifies the texture present in the image using statistical features and then applies joint probability distributive function of the gray level pairs of the input image. Finally, we quantify the focus value of the input image using Gaussian Mixture Model. Due to its little computational complexity, sharp focus measure curve, robust to random noise sources and accuracy, it is considered as superior alternative to most of recently proposed 3-D shape recovery approaches. This algorithm is deeply investigated on real image sequences and synthetic image dataset. The efficiency of the proposed scheme is also compared with the state of art 3-D shape recovery approaches. Finally, by means of two global statistical measures, root mean square error and correlation, we claim that this approach -in spite of simplicity generates accurate results.

  14. Logistic regression applied to natural hazards: rare event logistic regression with replications

    NASA Astrophysics Data System (ADS)

    Guns, M.; Vanacker, V.

    2012-06-01

    Statistical analysis of natural hazards needs particular attention, as most of these phenomena are rare events. This study shows that the ordinary rare event logistic regression, as it is now commonly used in geomorphologic studies, does not always lead to a robust detection of controlling factors, as the results can be strongly sample-dependent. In this paper, we introduce some concepts of Monte Carlo simulations in rare event logistic regression. This technique, so-called rare event logistic regression with replications, combines the strength of probabilistic and statistical methods, and allows overcoming some of the limitations of previous developments through robust variable selection. This technique was here developed for the analyses of landslide controlling factors, but the concept is widely applicable for statistical analyses of natural hazards.

  15. Rocks and Rain: orographic precipitation and the form of mountain ranges

    NASA Astrophysics Data System (ADS)

    Roe, G. H.; Anders, A. M.; Durran, D. R.; Montgomery, D. R.; Hallet, B.

    2005-12-01

    In mountainous landscapes patterns of erosion reflect patterns of precipitation that are, in turn, controlled by the orography. Ultimately therefore, the feedbacks between orography and the climate it creates are responsible for the sculpting of mountain ranges. Key questions concerning these interactions are: 1) how robust are patterns of precipitation on geologic time scales? and 2) how do those patterns affect landscape form? Since climate is by definition the statistics of weather, there is tremendous information to be gleaned from how patterns of precipitation vary between different weather events. However up to now sparse measurements and computational limitations have hampered our knowledge of such variations. For the Olympics in Washington State, a characteristic midlatitude mountain range, we report results from a high-resolution, state-of-the-art numerical weather prediction model and a dense network of precipitation gauges. Down to scales around 10 km, the patterns of precipitation are remarkably robust both storm-by-storm and year-to-year, lending confidence that they are indeed persistent on the relevant time scales. Secondly, the consequences of the coupled interactions are presented using a landscape evolution model coupled with a simple model of orographic precipitation that is able to substantially reproduce the observed precipitation patterns.

  16. Incorporating uncertainty into the ranking of SPARROW model nutrient yields from Mississippi/Atchafalaya River basin watersheds

    USGS Publications Warehouse

    Robertson, Dale M.; Schwarz, Gregory E.; Saad, David A.; Alexander, Richard B.

    2009-01-01

    Excessive loads of nutrients transported by tributary rivers have been linked to hypoxia in the Gulf of Mexico. Management efforts to reduce the hypoxic zone in the Gulf of Mexico and improve the water quality of rivers and streams could benefit from targeting nutrient reductions toward watersheds with the highest nutrient yields delivered to sensitive downstream waters. One challenge is that most conventional watershed modeling approaches (e.g., mechanistic models) used in these management decisions do not consider uncertainties in the predictions of nutrient yields and their downstream delivery. The increasing use of parameter estimation procedures to statistically estimate model coefficients, however, allows uncertainties in these predictions to be reliably estimated. Here, we use a robust bootstrapping procedure applied to the results of a previous application of the hybrid statistical/mechanistic watershed model SPARROW (Spatially Referenced Regression On Watershed attributes) to develop a statistically reliable method for identifying “high priority” areas for management, based on a probabilistic ranking of delivered nutrient yields from watersheds throughout a basin. The method is designed to be used by managers to prioritize watersheds where additional stream monitoring and evaluations of nutrient-reduction strategies could be undertaken. Our ranking procedure incorporates information on the confidence intervals of model predictions and the corresponding watershed rankings of the delivered nutrient yields. From this quantified uncertainty, we estimate the probability that individual watersheds are among a collection of watersheds that have the highest delivered nutrient yields. We illustrate the application of the procedure to 818 eight-digit Hydrologic Unit Code watersheds in the Mississippi/Atchafalaya River basin by identifying 150 watersheds having the highest delivered nutrient yields to the Gulf of Mexico. Highest delivered yields were from watersheds in the Central Mississippi, Ohio, and Lower Mississippi River basins. With 90% confidence, only a few watersheds can be reliably placed into the highest 150 category; however, many more watersheds can be removed from consideration as not belonging to the highest 150 category. Results from this ranking procedure provide robust information on watershed nutrient yields that can benefit management efforts to reduce nutrient loadings to downstream coastal waters, such as the Gulf of Mexico, or to local receiving streams and reservoirs.

  17. Weak lensing shear and aperture mass from linear to non-linear scales

    NASA Astrophysics Data System (ADS)

    Munshi, Dipak; Valageas, Patrick; Barber, Andrew J.

    2004-05-01

    We describe the predictions for the smoothed weak lensing shear, γs, and aperture mass,Map, of two simple analytical models of the density field: the minimal tree model and the stellar model. Both models give identical results for the statistics of the three-dimensional density contrast smoothed over spherical cells and only differ by the detailed angular dependence of the many-body density correlations. We have shown in previous work that they also yield almost identical results for the probability distribution function (PDF) of the smoothed convergence, κs. We find that the two models give rather close results for both the shear and the positive tail of the aperture mass. However, we note that at small angular scales (θs<~ 2 arcmin) the tail of the PDF, , for negative Map shows a strong variation between the two models, and the stellar model actually breaks down for θs<~ 0.4 arcmin and Map < 0. This shows that the statistics of the aperture mass provides a very precise probe of the detailed structure of the density field, as it is sensitive to both the amplitude and the detailed angular behaviour of the many-body correlations. On the other hand, the minimal tree model shows good agreement with numerical simulations over all the scales and redshifts of interest, while both models provide a good description of the PDF, , of the smoothed shear components. Therefore, the shear and the aperture mass provide robust and complementary tools to measure the cosmological parameters as well as the detailed statistical properties of the density field.

  18. The VSGB 2.0 Model: A Next Generation Energy Model for High Resolution Protein Structure Modeling

    PubMed Central

    Li, Jianing; Abel, Robert; Zhu, Kai; Cao, Yixiang; Zhao, Suwen; Friesner, Richard A.

    2011-01-01

    A novel energy model (VSGB 2.0) for high resolution protein structure modeling is described, which features an optimized implicit solvent model as well as physics-based corrections for hydrogen bonding, π-π interactions, self-contact interactions and hydrophobic interactions. Parameters of the VSGB 2.0 model were fit to a crystallographic database of 2239 single side chain and 100 11–13 residue loop predictions. Combined with an advanced method of sampling and a robust algorithm for protonation state assignment, the VSGB 2.0 model was validated by predicting 115 super long loops up to 20 residues. Despite the dramatically increasing difficulty in reconstructing longer loops, a high accuracy was achieved: all of the lowest energy conformations have global backbone RMSDs better than 2.0 Å from the native conformations. Average global backbone RMSDs of the predictions are 0.51, 0.63, 0.70, 0.62, 0.80, 1.41, and 1.59 Å for 14, 15, 16, 17, 18, 19, and 20 residue loop predictions, respectively. When these results are corrected for possible statistical bias as explained in the text, the average global backbone RMSDs are 0.61, 0.71, 0.86, 0.62, 1.06, 1.67, and 1.59 Å. Given the precision and robustness of the calculations, we believe that the VSGB 2.0 model is suitable to tackle “real” problems, such as biological function modeling and structure-based drug discovery. PMID:21905107

  19. SSD for R: A Comprehensive Statistical Package to Analyze Single-System Data

    ERIC Educational Resources Information Center

    Auerbach, Charles; Schudrich, Wendy Zeitlin

    2013-01-01

    The need for statistical analysis in single-subject designs presents a challenge, as analytical methods that are applied to group comparison studies are often not appropriate in single-subject research. "SSD for R" is a robust set of statistical functions with wide applicability to single-subject research. It is a comprehensive package…

  20. Analysis of genetic effects of nuclear-cytoplasmic interaction on quantitative traits: genetic model for diploid plants.

    PubMed

    Han, Lide; Yang, Jian; Zhu, Jun

    2007-06-01

    A genetic model was proposed for simultaneously analyzing genetic effects of nuclear, cytoplasm, and nuclear-cytoplasmic interaction (NCI) as well as their genotype by environment (GE) interaction for quantitative traits of diploid plants. In the model, the NCI effects were further partitioned into additive and dominance nuclear-cytoplasmic interaction components. Mixed linear model approaches were used for statistical analysis. On the basis of diallel cross designs, Monte Carlo simulations showed that the genetic model was robust for estimating variance components under several situations without specific effects. Random genetic effects were predicted by an adjusted unbiased prediction (AUP) method. Data on four quantitative traits (boll number, lint percentage, fiber length, and micronaire) in Upland cotton (Gossypium hirsutum L.) were analyzed as a worked example to show the effectiveness of the model.

  1. Robust model selection and the statistical classification of languages

    NASA Astrophysics Data System (ADS)

    García, J. E.; González-López, V. A.; Viola, M. L. L.

    2012-10-01

    In this paper we address the problem of model selection for the set of finite memory stochastic processes with finite alphabet, when the data is contaminated. We consider m independent samples, with more than half of them being realizations of the same stochastic process with law Q, which is the one we want to retrieve. We devise a model selection procedure such that for a sample size large enough, the selected process is the one with law Q. Our model selection strategy is based on estimating relative entropies to select a subset of samples that are realizations of the same law. Although the procedure is valid for any family of finite order Markov models, we will focus on the family of variable length Markov chain models, which include the fixed order Markov chain model family. We define the asymptotic breakdown point (ABDP) for a model selection procedure, and we show the ABDP for our procedure. This means that if the proportion of contaminated samples is smaller than the ABDP, then, as the sample size grows our procedure selects a model for the process with law Q. We also use our procedure in a setting where we have one sample conformed by the concatenation of sub-samples of two or more stochastic processes, with most of the subsamples having law Q. We conducted a simulation study. In the application section we address the question of the statistical classification of languages according to their rhythmic features using speech samples. This is an important open problem in phonology. A persistent difficulty on this problem is that the speech samples correspond to several sentences produced by diverse speakers, corresponding to a mixture of distributions. The usual procedure to deal with this problem has been to choose a subset of the original sample which seems to best represent each language. The selection is made by listening to the samples. In our application we use the full dataset without any preselection of samples. We apply our robust methodology estimating a model which represent the main law for each language. Our findings agree with the linguistic conjecture, related to the rhythm of the languages included on our dataset.

  2. A nonlinear isobologram model with Box-Cox transformation to both sides for chemical mixtures.

    PubMed

    Chen, D G; Pounds, J G

    1998-12-01

    The linear logistical isobologram is a commonly used and powerful graphical and statistical tool for analyzing the combined effects of simple chemical mixtures. In this paper a nonlinear isobologram model is proposed to analyze the joint action of chemical mixtures for quantitative dose-response relationships. This nonlinear isobologram model incorporates two additional new parameters, Ymin and Ymax, to facilitate analysis of response data that are not constrained between 0 and 1, where parameters Ymin and Ymax represent the minimal and the maximal observed toxic response. This nonlinear isobologram model for binary mixtures can be expressed as [formula: see text] In addition, a Box-Cox transformation to both sides is introduced to improve the goodness of fit and to provide a more robust model for achieving homogeneity and normality of the residuals. Finally, a confidence band is proposed for selected isobols, e.g., the median effective dose, to facilitate graphical and statistical analysis of the isobologram. The versatility of this approach is demonstrated using published data describing the toxicity of the binary mixtures of citrinin and ochratoxin as well as a new experimental data from our laboratory for mixtures of mercury and cadmium.

  3. A nonlinear isobologram model with Box-Cox transformation to both sides for chemical mixtures.

    PubMed Central

    Chen, D G; Pounds, J G

    1998-01-01

    The linear logistical isobologram is a commonly used and powerful graphical and statistical tool for analyzing the combined effects of simple chemical mixtures. In this paper a nonlinear isobologram model is proposed to analyze the joint action of chemical mixtures for quantitative dose-response relationships. This nonlinear isobologram model incorporates two additional new parameters, Ymin and Ymax, to facilitate analysis of response data that are not constrained between 0 and 1, where parameters Ymin and Ymax represent the minimal and the maximal observed toxic response. This nonlinear isobologram model for binary mixtures can be expressed as [formula: see text] In addition, a Box-Cox transformation to both sides is introduced to improve the goodness of fit and to provide a more robust model for achieving homogeneity and normality of the residuals. Finally, a confidence band is proposed for selected isobols, e.g., the median effective dose, to facilitate graphical and statistical analysis of the isobologram. The versatility of this approach is demonstrated using published data describing the toxicity of the binary mixtures of citrinin and ochratoxin as well as a new experimental data from our laboratory for mixtures of mercury and cadmium. PMID:9860894

  4. Multi-Hypothesis Modelling Capabilities for Robust Data-Model Integration

    NASA Astrophysics Data System (ADS)

    Walker, A. P.; De Kauwe, M. G.; Lu, D.; Medlyn, B.; Norby, R. J.; Ricciuto, D. M.; Rogers, A.; Serbin, S.; Weston, D. J.; Ye, M.; Zaehle, S.

    2017-12-01

    Large uncertainty is often inherent in model predictions due to imperfect knowledge of how to describe the mechanistic processes (hypotheses) that a model is intended to represent. Yet this model hypothesis uncertainty (MHU) is often overlooked or informally evaluated, as methods to quantify and evaluate MHU are limited. MHU is increased as models become more complex because each additional processes added to a model comes with inherent MHU as well as parametric unceratinty. With the current trend of adding more processes to Earth System Models (ESMs), we are adding uncertainty, which can be quantified for parameters but not MHU. Model inter-comparison projects do allow for some consideration of hypothesis uncertainty but in an ad hoc and non-independent fashion. This has stymied efforts to evaluate ecosystem models against data and intepret the results mechanistically because it is not simple to interpret exactly why a model is producing the results it does and identify which model assumptions are key as they combine models of many sub-systems and processes, each of which may be conceptualised and represented mathematically in various ways. We present a novel modelling framework—the multi-assumption architecture and testbed (MAAT)—that automates the combination, generation, and execution of a model ensemble built with different representations of process. We will present the argument that multi-hypothesis modelling needs to be considered in conjunction with other capabilities (e.g. the Predictive Ecosystem Analyser; PecAn) and statistical methods (e.g. sensitivity anaylsis, data assimilation) to aid efforts in robust data model integration to enhance our predictive understanding of biological systems.

  5. A robust design mark-resight abundance estimator allowing heterogeneity in resighting probabilities

    USGS Publications Warehouse

    McClintock, B.T.; White, Gary C.; Burnham, K.P.

    2006-01-01

    This article introduces the beta-binomial estimator (BBE), a closed-population abundance mark-resight model combining the favorable qualities of maximum likelihood theory and the allowance of individual heterogeneity in sighting probability (p). The model may be parameterized for a robust sampling design consisting of multiple primary sampling occasions where closure need not be met between primary occasions. We applied the model to brown bear data from three study areas in Alaska and compared its performance to the joint hypergeometric estimator (JHE) and Bowden's estimator (BOWE). BBE estimates suggest heterogeneity levels were non-negligible and discourage the use of JHE for these data. Compared to JHE and BOWE, confidence intervals were considerably shorter for the AICc model-averaged BBE. To evaluate the properties of BBE relative to JHE and BOWE when sample sizes are small, simulations were performed with data from three primary occasions generated under both individual heterogeneity and temporal variation in p. All models remained consistent regardless of levels of variation in p. In terms of precision, the AICc model-averaged BBE showed advantages over JHE and BOWE when heterogeneity was present and mean sighting probabilities were similar between primary occasions. Based on the conditions examined, BBE is a reliable alternative to JHE or BOWE and provides a framework for further advances in mark-resight abundance estimation. ?? 2006 American Statistical Association and the International Biometric Society.

  6. The Role of Design-of-Experiments in Managing Flow in Compact Air Vehicle Inlets

    NASA Technical Reports Server (NTRS)

    Anderson, Bernhard H.; Miller, Daniel N.; Gridley, Marvin C.; Agrell, Johan

    2003-01-01

    It is the purpose of this study to demonstrate the viability and economy of Design-of-Experiments methodologies to arrive at microscale secondary flow control array designs that maintain optimal inlet performance over a wide range of the mission variables and to explore how these statistical methods provide a better understanding of the management of flow in compact air vehicle inlets. These statistical design concepts were used to investigate the robustness properties of low unit strength micro-effector arrays. Low unit strength micro-effectors are micro-vanes set at very low angles-of-incidence with very long chord lengths. They were designed to influence the near wall inlet flow over an extended streamwise distance, and their advantage lies in low total pressure loss and high effectiveness in managing engine face distortion. The term robustness is used in this paper in the same sense as it is used in the industrial problem solving community. It refers to minimizing the effects of the hard-to-control factors that influence the development of a product or process. In Robustness Engineering, the effects of the hard-to-control factors are often called noise , and the hard-to-control factors themselves are referred to as the environmental variables or sometimes as the Taguchi noise variables. Hence Robust Optimization refers to minimizing the effects of the environmental or noise variables on the development (design) of a product or process. In the management of flow in compact inlets, the environmental or noise variables can be identified with the mission variables. Therefore this paper formulates a statistical design methodology that minimizes the impact of variations in the mission variables on inlet performance and demonstrates that these statistical design concepts can lead to simpler inlet flow management systems.

  7. Reliability of analog quantum simulation

    NASA Astrophysics Data System (ADS)

    Sarovar, Mohan; Zhang, Jun; Zeng, Lishan

    Analog quantum simulators (AQS) will likely be the first nontrivial application of quantum technology for predictive simulation. However, there remain questions regarding the degree of confidence that can be placed in the results of AQS since they do not naturally incorporate error correction. We formalize the notion of AQS reliability to calibration errors by determining sensitivity of AQS outputs to underlying parameters, and formulate conditions for robust simulation. Our approach connects to the notion of parameter space compression in statistical physics and naturally reveals the importance of model symmetries in dictating the robust properties. This work was supported by the Laboratory Directed Research and Development program at Sandia National Laboratories. Sandia National Laboratories is a multi-mission laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the United States Department of Energy's National Nuclear Security Administration under Contract No. DE-AC04-94AL85000.

  8. The HTM Spatial Pooler-A Neocortical Algorithm for Online Sparse Distributed Coding.

    PubMed

    Cui, Yuwei; Ahmad, Subutai; Hawkins, Jeff

    2017-01-01

    Hierarchical temporal memory (HTM) provides a theoretical framework that models several key computational principles of the neocortex. In this paper, we analyze an important component of HTM, the HTM spatial pooler (SP). The SP models how neurons learn feedforward connections and form efficient representations of the input. It converts arbitrary binary input patterns into sparse distributed representations (SDRs) using a combination of competitive Hebbian learning rules and homeostatic excitability control. We describe a number of key properties of the SP, including fast adaptation to changing input statistics, improved noise robustness through learning, efficient use of cells, and robustness to cell death. In order to quantify these properties we develop a set of metrics that can be directly computed from the SP outputs. We show how the properties are met using these metrics and targeted artificial simulations. We then demonstrate the value of the SP in a complete end-to-end real-world HTM system. We discuss the relationship with neuroscience and previous studies of sparse coding. The HTM spatial pooler represents a neurally inspired algorithm for learning sparse representations from noisy data streams in an online fashion.

  9. PAKDD Data Mining Competition 2009: New Ways of Using Known Methods

    NASA Astrophysics Data System (ADS)

    Linhart, Chaim; Harari, Guy; Abramovich, Sharon; Buchris, Altina

    The PAKDD 2009 competition focuses on the problem of credit risk assessment. As required, we had to confront the problem of the robustness of the credit-scoring model against performance degradation caused by gradual market changes along a few years of business operation. We utilized the following standard models: logistic regression, KNN, SVM, GBM and decision tree. The novelty of our approach is two-fold: the integration of existing models, namely feeding the results of KNN as an input variable to the logistic regression, and re-coding categorical variables as numerical values that represent each category's statistical impact on the target label. The best solution we obtained reached 3rd place in the competition, with an AUC score of 0.655.

  10. A geostatistical extreme-value framework for fast simulation of natural hazard events

    PubMed Central

    Stephenson, David B.

    2016-01-01

    We develop a statistical framework for simulating natural hazard events that combines extreme value theory and geostatistics. Robust generalized additive model forms represent generalized Pareto marginal distribution parameters while a Student’s t-process captures spatial dependence and gives a continuous-space framework for natural hazard event simulations. Efficiency of the simulation method allows many years of data (typically over 10 000) to be obtained at relatively little computational cost. This makes the model viable for forming the hazard module of a catastrophe model. We illustrate the framework by simulating maximum wind gusts for European windstorms, which are found to have realistic marginal and spatial properties, and validate well against wind gust measurements. PMID:27279768

  11. Impact of baryonic physics on intrinsic alignments

    DOE PAGES

    Tenneti, Ananth; Gnedin, Nickolay Y.; Feng, Yu

    2017-01-11

    We explore the effects of specific assumptions in the subgrid models of star formation and stellar and AGN feedback on intrinsic alignments of galaxies in cosmological simulations of "MassiveBlack-II" family. Using smaller volume simulations, we explored the parameter space of the subgrid star formation and feedback model and found remarkable robustness of the observable statistical measures to the details of subgrid physics. The one observational probe most sensitive to modeling details is the distribution of misalignment angles. We hypothesize that the amount of angular momentum carried away by the galactic wind is the primary physical quantity that controls the orientationmore » of the stellar distribution. Finally, our results are also consistent with a similar study by the EAGLE simulation team.« less

  12. Gaussian process-based surrogate modeling framework for process planning in laser powder-bed fusion additive manufacturing of 316L stainless steel

    DOE PAGES

    Tapia, Gustavo; Khairallah, Saad A.; Matthews, Manyalibo J.; ...

    2017-09-22

    Here, Laser Powder-Bed Fusion (L-PBF) metal-based additive manufacturing (AM) is complex and not fully understood. Successful processing for one material, might not necessarily apply to a different material. This paper describes a workflow process that aims at creating a material data sheet standard that describes regimes where the process can be expected to be robust. The procedure consists of building a Gaussian process-based surrogate model of the L-PBF process that predicts melt pool depth in single-track experiments given a laser power, scan speed, and laser beam size combination. The predictions are then mapped onto a power versus scan speed diagrammore » delimiting the conduction from the keyhole melting controlled regimes. This statistical framework is shown to be robust even for cases where experimental training data might be suboptimal in quality, if appropriate physics-based filters are applied. Additionally, it is demonstrated that a high-fidelity simulation model of L-PBF can equally be successfully used for building a surrogate model, which is beneficial since simulations are getting more efficient and are more practical to study the response of different materials, than to re-tool an AM machine for new material powder.« less

  13. A less field-intensive robust design for estimating demographic parameters with Mark-resight data

    USGS Publications Warehouse

    McClintock, B.T.; White, Gary C.

    2009-01-01

    The robust design has become popular among animal ecologists as a means for estimating population abundance and related demographic parameters with mark-recapture data. However, two drawbacks of traditional mark-recapture are financial cost and repeated disturbance to animals. Mark-resight methodology may in many circumstances be a less expensive and less invasive alternative to mark-recapture, but the models developed to date for these data have overwhelmingly concentrated only on the estimation of abundance. Here we introduce a mark-resight model analogous to that used in mark-recapture for the simultaneous estimation of abundance, apparent survival, and transition probabilities between observable and unobservable states. The model may be implemented using standard statistical computing software, but it has also been incorporated into the freeware package Program MARK. We illustrate the use of our model with mainland New Zealand Robin (Petroica australis) data collected to ascertain whether this methodology may be a reliable alternative for monitoring endangered populations of a closely related species inhabiting the Chatham Islands. We found this method to be a viable alternative to traditional mark-recapture when cost or disturbance to species is of particular concern in long-term population monitoring programs. ?? 2009 by the Ecological Society of America.

  14. Gaussian process-based surrogate modeling framework for process planning in laser powder-bed fusion additive manufacturing of 316L stainless steel

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tapia, Gustavo; Khairallah, Saad A.; Matthews, Manyalibo J.

    Here, Laser Powder-Bed Fusion (L-PBF) metal-based additive manufacturing (AM) is complex and not fully understood. Successful processing for one material, might not necessarily apply to a different material. This paper describes a workflow process that aims at creating a material data sheet standard that describes regimes where the process can be expected to be robust. The procedure consists of building a Gaussian process-based surrogate model of the L-PBF process that predicts melt pool depth in single-track experiments given a laser power, scan speed, and laser beam size combination. The predictions are then mapped onto a power versus scan speed diagrammore » delimiting the conduction from the keyhole melting controlled regimes. This statistical framework is shown to be robust even for cases where experimental training data might be suboptimal in quality, if appropriate physics-based filters are applied. Additionally, it is demonstrated that a high-fidelity simulation model of L-PBF can equally be successfully used for building a surrogate model, which is beneficial since simulations are getting more efficient and are more practical to study the response of different materials, than to re-tool an AM machine for new material powder.« less

  15. On adaptive robustness approach to Anti-Jam signal processing

    NASA Astrophysics Data System (ADS)

    Poberezhskiy, Y. S.; Poberezhskiy, G. Y.

    An effective approach to exploiting statistical differences between desired and jamming signals named adaptive robustness is proposed and analyzed in this paper. It combines conventional Bayesian, adaptive, and robust approaches that are complementary to each other. This combining strengthens the advantages and mitigates the drawbacks of the conventional approaches. Adaptive robustness is equally applicable to both jammers and their victim systems. The capabilities required for realization of adaptive robustness in jammers and victim systems are determined. The employment of a specific nonlinear robust algorithm for anti-jam (AJ) processing is described and analyzed. Its effectiveness in practical situations has been proven analytically and confirmed by simulation. Since adaptive robustness can be used by both sides in electronic warfare, it is more advantageous for the fastest and most intelligent side. Many results obtained and discussed in this paper are also applicable to commercial applications such as communications in unregulated or poorly regulated frequency ranges and systems with cognitive capabilities.

  16. Analytical mesoscale modeling of aeolian sand transport

    NASA Astrophysics Data System (ADS)

    Lämmel, Marc; Kroy, Klaus

    2017-11-01

    The mesoscale structure of aeolian sand transport determines a variety of natural phenomena studied in planetary and Earth science. We analyze it theoretically beyond the mean-field level, based on the grain-scale transport kinetics and splash statistics. A coarse-grained analytical model is proposed and verified by numerical simulations resolving individual grain trajectories. The predicted height-resolved sand flux and other important characteristics of the aeolian transport layer agree remarkably well with a comprehensive compilation of field and wind-tunnel data, suggesting that the model robustly captures the essential mesoscale physics. By comparing the predicted saturation length with field data for the minimum sand-dune size, we elucidate the importance of intermittent turbulent wind fluctuations for field measurements and reconcile conflicting previous models for this most enigmatic emergent aeolian scale.

  17. Statistical quality assessment criteria for a linear mixing model with elliptical t-distribution errors

    NASA Astrophysics Data System (ADS)

    Manolakis, Dimitris G.

    2004-10-01

    The linear mixing model is widely used in hyperspectral imaging applications to model the reflectance spectra of mixed pixels in the SWIR atmospheric window or the radiance spectra of plume gases in the LWIR atmospheric window. In both cases it is important to detect the presence of materials or gases and then estimate their amount, if they are present. The detection and estimation algorithms available for these tasks are related but they are not identical. The objective of this paper is to theoretically investigate how the heavy tails observed in hyperspectral background data affect the quality of abundance estimates and how the F-test, used for endmember selection, is robust to the presence of heavy tails when the model fits the data.

  18. The MAX Statistic is Less Powerful for Genome Wide Association Studies Under Most Alternative Hypotheses.

    PubMed

    Shifflett, Benjamin; Huang, Rong; Edland, Steven D

    2017-01-01

    Genotypic association studies are prone to inflated type I error rates if multiple hypothesis testing is performed, e.g., sequentially testing for recessive, multiplicative, and dominant risk. Alternatives to multiple hypothesis testing include the model independent genotypic χ 2 test, the efficiency robust MAX statistic, which corrects for multiple comparisons but with some loss of power, or a single Armitage test for multiplicative trend, which has optimal power when the multiplicative model holds but with some loss of power when dominant or recessive models underlie the genetic association. We used Monte Carlo simulations to describe the relative performance of these three approaches under a range of scenarios. All three approaches maintained their nominal type I error rates. The genotypic χ 2 and MAX statistics were more powerful when testing a strictly recessive genetic effect or when testing a dominant effect when the allele frequency was high. The Armitage test for multiplicative trend was most powerful for the broad range of scenarios where heterozygote risk is intermediate between recessive and dominant risk. Moreover, all tests had limited power to detect recessive genetic risk unless the sample size was large, and conversely all tests were relatively well powered to detect dominant risk. Taken together, these results suggest the general utility of the multiplicative trend test when the underlying genetic model is unknown.

  19. Worrying trends in econophysics

    NASA Astrophysics Data System (ADS)

    Gallegati, Mauro; Keen, Steve; Lux, Thomas; Ormerod, Paul

    2006-10-01

    Econophysics has already made a number of important empirical contributions to our understanding of the social and economic world. These fall mainly into the areas of finance and industrial economics, where in each case there is a large amount of reasonably well-defined data. More recently, Econophysics has also begun to tackle other areas of economics where data is much more sparse and much less reliable. In addition, econophysicists have attempted to apply the theoretical approach of statistical physics to try to understand empirical findings. Our concerns are fourfold. First, a lack of awareness of work that has been done within economics itself. Second, resistance to more rigorous and robust statistical methodology. Third, the belief that universal empirical regularities can be found in many areas of economic activity. Fourth, the theoretical models which are being used to explain empirical phenomena. The latter point is of particular concern. Essentially, the models are based upon models of statistical physics in which energy is conserved in exchange processes. There are examples in economics where the principle of conservation may be a reasonable approximation to reality, such as primitive hunter-gatherer societies. But in the industrialised capitalist economies, income is most definitely not conserved. The process of production and not exchange is responsible for this. Models which focus purely on exchange and not on production cannot by definition offer a realistic description of the generation of income in the capitalist, industrialised economies.

  20. Evaluation of statistical and rainfall-runoff models for predicting historical daily streamflow time series in the Des Moines and Iowa River watersheds

    USGS Publications Warehouse

    Farmer, William H.; Knight, Rodney R.; Eash, David A.; Kasey J. Hutchinson,; Linhart, S. Mike; Christiansen, Daniel E.; Archfield, Stacey A.; Over, Thomas M.; Kiang, Julie E.

    2015-08-24

    Daily records of streamflow are essential to understanding hydrologic systems and managing the interactions between human and natural systems. Many watersheds and locations lack streamgages to provide accurate and reliable records of daily streamflow. In such ungaged watersheds, statistical tools and rainfall-runoff models are used to estimate daily streamflow. Previous work compared 19 different techniques for predicting daily streamflow records in the southeastern United States. Here, five of the better-performing methods are compared in a different hydroclimatic region of the United States, in Iowa. The methods fall into three classes: (1) drainage-area ratio methods, (2) nonlinear spatial interpolations using flow duration curves, and (3) mechanistic rainfall-runoff models. The first two classes are each applied with nearest-neighbor and map-correlated index streamgages. Using a threefold validation and robust rank-based evaluation, the methods are assessed for overall goodness of fit of the hydrograph of daily streamflow, the ability to reproduce a daily, no-fail storage-yield curve, and the ability to reproduce key streamflow statistics. As in the Southeast study, a nonlinear spatial interpolation of daily streamflow using flow duration curves is found to be a method with the best predictive accuracy. Comparisons with previous work in Iowa show that the accuracy of mechanistic models with at-site calibration is substantially degraded in the ungaged framework.

  1. Comparison of robustness to outliers between robust poisson models and log-binomial models when estimating relative risks for common binary outcomes: a simulation study.

    PubMed

    Chen, Wansu; Shi, Jiaxiao; Qian, Lei; Azen, Stanley P

    2014-06-26

    To estimate relative risks or risk ratios for common binary outcomes, the most popular model-based methods are the robust (also known as modified) Poisson and the log-binomial regression. Of the two methods, it is believed that the log-binomial regression yields more efficient estimators because it is maximum likelihood based, while the robust Poisson model may be less affected by outliers. Evidence to support the robustness of robust Poisson models in comparison with log-binomial models is very limited. In this study a simulation was conducted to evaluate the performance of the two methods in several scenarios where outliers existed. The findings indicate that for data coming from a population where the relationship between the outcome and the covariate was in a simple form (e.g. log-linear), the two models yielded comparable biases and mean square errors. However, if the true relationship contained a higher order term, the robust Poisson models consistently outperformed the log-binomial models even when the level of contamination is low. The robust Poisson models are more robust (or less sensitive) to outliers compared to the log-binomial models when estimating relative risks or risk ratios for common binary outcomes. Users should be aware of the limitations when choosing appropriate models to estimate relative risks or risk ratios.

  2. A Simple Simulation Technique for Nonnormal Data with Prespecified Skewness, Kurtosis, and Covariance Matrix.

    PubMed

    Foldnes, Njål; Olsson, Ulf Henning

    2016-01-01

    We present and investigate a simple way to generate nonnormal data using linear combinations of independent generator (IG) variables. The simulated data have prespecified univariate skewness and kurtosis and a given covariance matrix. In contrast to the widely used Vale-Maurelli (VM) transform, the obtained data are shown to have a non-Gaussian copula. We analytically obtain asymptotic robustness conditions for the IG distribution. We show empirically that popular test statistics in covariance analysis tend to reject true models more often under the IG transform than under the VM transform. This implies that overly optimistic evaluations of estimators and fit statistics in covariance structure analysis may be tempered by including the IG transform for nonnormal data generation. We provide an implementation of the IG transform in the R environment.

  3. Statistical analysis of the magnetization signatures of impact basins

    NASA Astrophysics Data System (ADS)

    Gabasova, L. R.; Wieczorek, M. A.

    2017-09-01

    We quantify the magnetic signatures of the largest lunar impact basins using recent mission data and robust statistical bounds, and obtain an early activity timeline for the lunar core dynamo which appears to peak earlier than indicated by Apollo paleointensity measurements.

  4. A UNIFYING CONCEPT FOR ASSESSING TOXICOLOGICAL INTERACTIONS: CHANGES IN SLOPE

    EPA Science Inventory

    Robust statistical methods are important to the evaluation of interactions among chemicals in a mixture. However, different concepts of interaction as applied to the statistical analysis of chemical mixture toxicology data or as used in environmental risk assessment often can ap...

  5. Robust Fixed-Structure Control

    DTIC Science & Technology

    1994-10-30

    Deterministic Foundation for Statistical Energy Analysis ," J. Sound Vibr., to appear. 1.96 D. S. Bernstein and S. P. Bhat, "Lyapunov Stability, Semistability...S. Bernstein, "Power Flow, Energy Balance, and Statistical Energy Analysis for Large Scale, Interconnected Systems," Proc. Amer. Contr. Conf., pp

  6. Field significance of performance measures in the context of regional climate model evaluation. Part 2: precipitation

    NASA Astrophysics Data System (ADS)

    Ivanov, Martin; Warrach-Sagi, Kirsten; Wulfmeyer, Volker

    2018-04-01

    A new approach for rigorous spatial analysis of the downscaling performance of regional climate model (RCM) simulations is introduced. It is based on a multiple comparison of the local tests at the grid cells and is also known as `field' or `global' significance. The block length for the local resampling tests is precisely determined to adequately account for the time series structure. New performance measures for estimating the added value of downscaled data relative to the large-scale forcing fields are developed. The methodology is exemplarily applied to a standard EURO-CORDEX hindcast simulation with the Weather Research and Forecasting (WRF) model coupled with the land surface model NOAH at 0.11 ∘ grid resolution. Daily precipitation climatology for the 1990-2009 period is analysed for Germany for winter and summer in comparison with high-resolution gridded observations from the German Weather Service. The field significance test controls the proportion of falsely rejected local tests in a meaningful way and is robust to spatial dependence. Hence, the spatial patterns of the statistically significant local tests are also meaningful. We interpret them from a process-oriented perspective. While the downscaled precipitation distributions are statistically indistinguishable from the observed ones in most regions in summer, the biases of some distribution characteristics are significant over large areas in winter. WRF-NOAH generates appropriate stationary fine-scale climate features in the daily precipitation field over regions of complex topography in both seasons and appropriate transient fine-scale features almost everywhere in summer. As the added value of global climate model (GCM)-driven simulations cannot be smaller than this perfect-boundary estimate, this work demonstrates in a rigorous manner the clear additional value of dynamical downscaling over global climate simulations. The evaluation methodology has a broad spectrum of applicability as it is distribution-free, robust to spatial dependence, and accounts for time series structure.

  7. A Poisson hierarchical modelling approach to detecting copy number variation in sequence coverage data

    PubMed Central

    2013-01-01

    Background The advent of next generation sequencing technology has accelerated efforts to map and catalogue copy number variation (CNV) in genomes of important micro-organisms for public health. A typical analysis of the sequence data involves mapping reads onto a reference genome, calculating the respective coverage, and detecting regions with too-low or too-high coverage (deletions and amplifications, respectively). Current CNV detection methods rely on statistical assumptions (e.g., a Poisson model) that may not hold in general, or require fine-tuning the underlying algorithms to detect known hits. We propose a new CNV detection methodology based on two Poisson hierarchical models, the Poisson-Gamma and Poisson-Lognormal, with the advantage of being sufficiently flexible to describe different data patterns, whilst robust against deviations from the often assumed Poisson model. Results Using sequence coverage data of 7 Plasmodium falciparum malaria genomes (3D7 reference strain, HB3, DD2, 7G8, GB4, OX005, and OX006), we showed that empirical coverage distributions are intrinsically asymmetric and overdispersed in relation to the Poisson model. We also demonstrated a low baseline false positive rate for the proposed methodology using 3D7 resequencing data and simulation. When applied to the non-reference isolate data, our approach detected known CNV hits, including an amplification of the PfMDR1 locus in DD2 and a large deletion in the CLAG3.2 gene in GB4, and putative novel CNV regions. When compared to the recently available FREEC and cn.MOPS approaches, our findings were more concordant with putative hits from the highest quality array data for the 7G8 and GB4 isolates. Conclusions In summary, the proposed methodology brings an increase in flexibility, robustness, accuracy and statistical rigour to CNV detection using sequence coverage data. PMID:23442253

  8. Statistical modeling of SRAM yield performance and circuit variability

    NASA Astrophysics Data System (ADS)

    Cheng, Qi; Chen, Yijian

    2015-03-01

    In this paper, we develop statistical models to investigate SRAM yield performance and circuit variability in the presence of self-aligned multiple patterning (SAMP) process. It is assumed that SRAM fins are fabricated by a positivetone (spacer is line) self-aligned sextuple patterning (SASP) process which accommodates two types of spacers, while gates are fabricated by a more pitch-relaxed self-aligned quadruple patterning (SAQP) process which only allows one type of spacer. A number of possible inverter and SRAM structures are identified and the related circuit multi-modality is studied using the developed failure-probability and yield models. It is shown that SRAM circuit yield is significantly impacted by the multi-modality of fins' spatial variations in a SRAM cell. The sensitivity of 6-transistor SRAM read/write failure probability to SASP process variations is calculated and the specific circuit type with the highest probability to fail in the reading/writing operation is identified. Our study suggests that the 6-transistor SRAM configuration may not be scalable to 7-nm half pitch and more robust SRAM circuit design needs to be researched.

  9. The Effect of Temperature on the Electricity Demand: An Empirical Investigation

    NASA Astrophysics Data System (ADS)

    Kim, H.; Kim, I. G.; Park, K. J.; Yoo, S. H.

    2015-12-01

    This paper attempts to estimate the electricity demand function in Korea with quarterly data of average temperature, GDP and electricity price over the period 2005-2013. We apply lagged dependent variable model and ordinary least square method as a robust approach to estimating the parameters of the electricity demand function. The results show that short-run price and income elasticities of the electricity demand are estimated to be -0.569 and 0.631 respectively. They are statistically significant at the 1% level. Moreover, long-run income and price elasticities are estimated to be 1.589 and -1.433 respectively. Both of results reveal that the demand for electricity demand is about 15.2℃. It is shown that power of explanation and goodness-of-fit statistics are improved in the use of the lagged dependent variable model rather than conventional model. Acknowledgements: This research was carried out as a part of "Development and application of technology for weather forecast" supported by the 2015 National Institute of Meteorological Research (NIMR) in the Korea Meteorological Administration.

  10. Replication of a gene-environment interaction Via Multimodel inference: additive-genetic variance in adolescents' general cognitive ability increases with family-of-origin socioeconomic status.

    PubMed

    Kirkpatrick, Robert M; McGue, Matt; Iacono, William G

    2015-03-01

    The present study of general cognitive ability attempts to replicate and extend previous investigations of a biometric moderator, family-of-origin socioeconomic status (SES), in a sample of 2,494 pairs of adolescent twins, non-twin biological siblings, and adoptive siblings assessed with individually administered IQ tests. We hypothesized that SES would covary positively with additive-genetic variance and negatively with shared-environmental variance. Important potential confounds unaddressed in some past studies, such as twin-specific effects, assortative mating, and differential heritability by trait level, were found to be negligible. In our main analysis, we compared models by their sample-size corrected AIC, and base our statistical inference on model-averaged point estimates and standard errors. Additive-genetic variance increased with SES-an effect that was statistically significant and robust to model specification. We found no evidence that SES moderated shared-environmental influence. We attempt to explain the inconsistent replication record of these effects, and provide suggestions for future research.

  11. Replication of a Gene-Environment Interaction via Multimodel Inference: Additive-Genetic Variance in Adolescents’ General Cognitive Ability Increases with Family-of-Origin Socioeconomic Status

    PubMed Central

    Kirkpatrick, Robert M.; McGue, Matt; Iacono, William G.

    2015-01-01

    The present study of general cognitive ability attempts to replicate and extend previous investigations of a biometric moderator, family-of-origin socioeconomic status (SES), in a sample of 2,494 pairs of adolescent twins, non-twin biological siblings, and adoptive siblings assessed with individually administered IQ tests. We hypothesized that SES would covary positively with additive-genetic variance and negatively with shared-environmental variance. Important potential confounds unaddressed in some past studies, such as twin-specific effects, assortative mating, and differential heritability by trait level, were found to be negligible. In our main analysis, we compared models by their sample-size corrected AIC, and base our statistical inference on model-averaged point estimates and standard errors. Additive-genetic variance increased with SES—an effect that was statistically significant and robust to model specification. We found no evidence that SES moderated shared-environmental influence. We attempt to explain the inconsistent replication record of these effects, and provide suggestions for future research. PMID:25539975

  12. Statistical molecular design of balanced compound libraries for QSAR modeling.

    PubMed

    Linusson, A; Elofsson, M; Andersson, I E; Dahlgren, M K

    2010-01-01

    A fundamental step in preclinical drug development is the computation of quantitative structure-activity relationship (QSAR) models, i.e. models that link chemical features of compounds with activities towards a target macromolecule associated with the initiation or progression of a disease. QSAR models are computed by combining information on the physicochemical and structural features of a library of congeneric compounds, typically assembled from two or more building blocks, and biological data from one or more in vitro assays. Since the models provide information on features affecting the compounds' biological activity they can be used as guides for further optimization. However, in order for a QSAR model to be relevant to the targeted disease, and drug development in general, the compound library used must contain molecules with balanced variation of the features spanning the chemical space believed to be important for interaction with the biological target. In addition, the assays used must be robust and deliver high quality data that are directly related to the function of the biological target and the associated disease state. In this review, we discuss and exemplify the concept of statistical molecular design (SMD) in the selection of building blocks and final synthetic targets (i.e. compounds to synthesize) to generate information-rich, balanced libraries for biological testing and computation of QSAR models.

  13. A Field Guide to Extra-Tropical Cyclones: Comparing Models to Observations

    NASA Astrophysics Data System (ADS)

    Bauer, M.

    2008-12-01

    Climate it is said is the accumulation of weather. And weather is not the concern of climate models. Justification for this latter sentiment has long hidden behind coarse model resolutions and blunt validation tools based on climatological maps and the like. The spatial-temporal resolutions of today's models and observations are converging onto meteorological scales however, which means that with the correct tools we can test the largely unproven assumption that climate model weather is correct enough, or at least lacks perverting biases, such that its accumulation does in fact result in a robust climate prediction. Towards this effort we introduce a new tool for extracting detailed cyclone statistics from climate model output. These include the usual cyclone distribution statistics (maps, histograms), but also adaptive cyclone- centric composites. We have also created a complementary dataset, The MAP Climatology of Mid-latitude Storminess (MCMS), which provides a detailed 6 hourly assessment of the areas under the influence of mid- latitude cyclones based on Reanalysis products. Using this we then extract complimentary composites from sources such as ISCCP and GPCP to create a large comparative dataset for climate model validation. A demonstration of the potential usefulness of these tools will be shown. dime.giss.nasa.gov/mcms/mcms.html

  14. Consistent integration of experimental and ab initio data into molecular and coarse-grained models

    NASA Astrophysics Data System (ADS)

    Vlcek, Lukas

    As computer simulations are increasingly used to complement or replace experiments, highly accurate descriptions of physical systems at different time and length scales are required to achieve realistic predictions. The questions of how to objectively measure model quality in relation to reference experimental or ab initio data, and how to transition seamlessly between different levels of resolution are therefore of prime interest. To address these issues, we use the concept of statistical distance to define a measure of similarity between statistical mechanical systems, i.e., a model and its target, and show that its minimization leads to general convergence of the systems' measurable properties. Through systematic coarse-graining, we arrive at appropriate expressions for optimization loss functions consistently incorporating microscopic ab initio data as well as macroscopic experimental data. The design of coarse-grained and multiscale models is then based on factoring the model system partition function into terms describing the system at different resolution levels. The optimization algorithm takes advantage of thermodynamic perturbation expressions for fast exploration of the model parameter space, enabling us to scan millions of parameter combinations per hour on a single CPU. The robustness and generality of the new model optimization framework and its efficient implementation are illustrated on selected examples including aqueous solutions, magnetic systems, and metal alloys.

  15. Relaxing the closure assumption in single-season occupancy models: staggered arrival and departure times

    USGS Publications Warehouse

    Kendall, William L.; Hines, James E.; Nichols, James D.; Grant, Evan H. Campbell

    2013-01-01

    Occupancy statistical models that account for imperfect detection have proved very useful in several areas of ecology, including species distribution and spatial dynamics, disease ecology, and ecological responses to climate change. These models are based on the collection of multiple samples at each of a number of sites within a given season, during which it is assumed the species is either absent or present and available for detection while each sample is taken. However, for some species, individuals are only present or available for detection seasonally. We present a statistical model that relaxes the closure assumption within a season by permitting staggered entry and exit times for the species of interest at each site. Based on simulation, our open model eliminates bias in occupancy estimators and in some cases increases precision. The power to detect the violation of closure is high if detection probability is reasonably high. In addition to providing more robust estimation of occupancy, this model permits comparison of phenology across sites, species, or years, by modeling variation in arrival or departure probabilities. In a comparison of four species of amphibians in Maryland we found that two toad species arrived at breeding sites later in the season than a salamander and frog species, and departed from sites earlier.

  16. Open Source Tools for Seismicity Analysis

    NASA Astrophysics Data System (ADS)

    Powers, P.

    2010-12-01

    The spatio-temporal analysis of seismicity plays an important role in earthquake forecasting and is integral to research on earthquake interactions and triggering. For instance, the third version of the Uniform California Earthquake Rupture Forecast (UCERF), currently under development, will use Epidemic Type Aftershock Sequences (ETAS) as a model for earthquake triggering. UCERF will be a "living" model and therefore requires robust, tested, and well-documented ETAS algorithms to ensure transparency and reproducibility. Likewise, as earthquake aftershock sequences unfold, real-time access to high quality hypocenter data makes it possible to monitor the temporal variability of statistical properties such as the parameters of the Omori Law and the Gutenberg Richter b-value. Such statistical properties are valuable as they provide a measure of how much a particular sequence deviates from expected behavior and can be used when assigning probabilities of aftershock occurrence. To address these demands and provide public access to standard methods employed in statistical seismology, we present well-documented, open-source JavaScript and Java software libraries for the on- and off-line analysis of seismicity. The Javascript classes facilitate web-based asynchronous access to earthquake catalog data and provide a framework for in-browser display, analysis, and manipulation of catalog statistics; implementations of this framework will be made available on the USGS Earthquake Hazards website. The Java classes, in addition to providing tools for seismicity analysis, provide tools for modeling seismicity and generating synthetic catalogs. These tools are extensible and will be released as part of the open-source OpenSHA Commons library.

  17. Kidney function changes with aging in adults: comparison between cross-sectional and longitudinal data analyses in renal function assessment.

    PubMed

    Chung, Sang M; Lee, David J; Hand, Austin; Young, Philip; Vaidyanathan, Jayabharathi; Sahajwalla, Chandrahas

    2015-12-01

    The study evaluated whether the renal function decline rate per year with age in adults varies based on two primary statistical analyses: cross-section (CS), using one observation per subject, and longitudinal (LT), using multiple observations per subject over time. A total of 16628 records (3946 subjects; age range 30-92 years) of creatinine clearance and relevant demographic data were used. On average, four samples per subject were collected for up to 2364 days (mean: 793 days). A simple linear regression and random coefficient models were selected for CS and LT analyses, respectively. The renal function decline rates per year were 1.33 and 0.95 ml/min/year for CS and LT analyses, respectively, and were slower when the repeated individual measurements were considered. The study confirms that rates are different based on statistical analyses, and that a statistically robust longitudinal model with a proper sampling design provides reliable individual as well as population estimates of the renal function decline rates per year with age in adults. In conclusion, our findings indicated that one should be cautious in interpreting the renal function decline rate with aging information because its estimation was highly dependent on the statistical analyses. From our analyses, a population longitudinal analysis (e.g. random coefficient model) is recommended if individualization is critical, such as a dose adjustment based on renal function during a chronic therapy. Copyright © 2015 John Wiley & Sons, Ltd.

  18. Novel Kalman filter algorithm for statistical monitoring of extensive landscapes with synoptic sensor data

    Treesearch

    Raymond L. Czaplewski

    2015-01-01

    Wall-to-wall remotely sensed data are increasingly available to monitor landscape dynamics over large geographic areas. However, statistical monitoring programs that use post-stratification cannot fully utilize those sensor data. The Kalman filter (KF) is an alternative statistical estimator. I develop a new KF algorithm that is numerically robust with large numbers of...

  19. Robust detection of multiple sclerosis lesions from intensity-normalized multi-channel MRI

    NASA Astrophysics Data System (ADS)

    Karpate, Yogesh; Commowick, Olivier; Barillot, Christian

    2015-03-01

    Multiple sclerosis (MS) is a disease with heterogeneous evolution among the patients. Quantitative analysis of longitudinal Magnetic Resonance Images (MRI) provides a spatial analysis of the brain tissues which may lead to the discovery of biomarkers of disease evolution. Better understanding of the disease will lead to a better discovery of pathogenic mechanisms, allowing for patient-adapted therapeutic strategies. To characterize MS lesions, we propose a novel paradigm to detect white matter lesions based on a statistical framework. It aims at studying the benefits of using multi-channel MRI to detect statistically significant differences between each individual MS patient and a database of control subjects. This framework consists in two components. First, intensity standardization is conducted to minimize the inter-subject intensity difference arising from variability of the acquisition process and different scanners. The intensity normalization maps parameters obtained using a robust Gaussian Mixture Model (GMM) estimation not affected by the presence of MS lesions. The second part studies the comparison of multi-channel MRI of MS patients with respect to an atlas built from the control subjects, thereby allowing us to look for differences in normal appearing white matter, in and around the lesions of each patient. Experimental results demonstrate that our technique accurately detects significant differences in lesions consequently improving the results of MS lesion detection.

  20. Impact of South American heroin on the US heroin market 1993-2004.

    PubMed

    Ciccarone, Daniel; Unick, George J; Kraus, Allison

    2009-09-01

    The past two decades have seen an increase in heroin-related morbidity and mortality in the United States. We report on trends in US heroin retail price and purity, including the effect of entry of Colombian-sourced heroin on the US heroin market. The average standardized price ($/mg-pure) and purity (% by weight) of heroin from 1993 to 2004 was from obtained from US Drug Enforcement Agency retail purchase data for 20 metropolitan statistical areas. Univariate statistics, robust Ordinary Least Squares regression and mixed fixed and random effect growth curve models were used to predict the price and purity data in each metropolitan statistical area over time. Over the 12 study years, heroin price decreased 62%. The median percentage of all heroin samples that are of South American origin increased an absolute 7% per year. Multivariate models suggest percent South American heroin is a significant predictor of lower heroin price and higher purity adjusting for time and demographics. These analyses reveal trends to historically low-cost heroin in many US cities. These changes correspond to the entrance into and rapid domination of the US heroin market by Colombian-sourced heroin. The implications of these changes are discussed.

  1. Imprints of dynamical interactions on brown dwarf pairing statistics and kinematics

    NASA Astrophysics Data System (ADS)

    Sterzik, M. F.; Durisen, R. H.

    2003-03-01

    We present statistically robust predictions of brown dwarf properties arising from dynamical interactions during their early evolution in small clusters. Our conclusions are based on numerical calculations of the internal cluster dynamics as well as on Monte-Carlo models. Accounting for recent observational constraints on the sub-stellar mass function and initial properties in fragmenting star forming clumps, we derive multiplicity fractions, mass ratios, separation distributions, and velocity dispersions. We compare them with observations of brown dwarfs in the field and in young clusters. Observed brown dwarf companion fractions around 15 +/- 7% for very low-mass stars as reported recently by Close et al. (\\cite{CSFB03}) are consistent with certain dynamical decay models. A significantly smaller mean separation distribution for brown dwarf binaries than for binaries of late-type stars can be explained by similar specific energy at the time of cluster formation for all cluster masses. Due to their higher velocity dispersions, brown-dwarfs and low-mass single stars will undergo time-dependent spatial segregation from higher-mass stars and multiple systems. This will cause mass functions and binary statistics in star forming regions to vary with the age of the region and the volume sampled.

  2. Neuropsychological study of IQ scores in offspring of parents with bipolar I disorder.

    PubMed

    Sharma, Aditya; Camilleri, Nigel; Grunze, Heinz; Barron, Evelyn; Le Couteur, James; Close, Andrew; Rushton, Steven; Kelly, Thomas; Ferrier, Ian Nicol; Le Couteur, Ann

    2017-01-01

    Studies comparing IQ in Offspring of Bipolar Parents (OBP) with Offspring of Healthy Controls (OHC) have reported conflicting findings. They have included OBP with mental health/neurodevelopmental disorders and/or pharmacological treatment which could affect results. This UK study aimed to assess IQ in OBP with no mental health/neurodevelopmental disorder and assess the relationship of sociodemographic variables with IQ. IQ data using the Wechsler Abbreviated Scale of Intelligence (WASI) from 24 OBP and 34 OHC from the North East of England was analysed using mixed-effects modelling. All participants had IQ in the average range. OBP differed statistically significantly from OHC on Full Scale IQ (p = .001), Performance IQ (PIQ) (p = .003) and Verbal IQ (VIQ) (p = .001) but not on the PIQ-VIQ split. OBP and OHC groups did not differ on socio-economic status (SES) and gender. SES made a statistically significant contribution to the variance of IQ scores (p = .001). Using a robust statistical model of analysis, the OBP with no current/past history of mental health/neurodevelopmental disorders had lower IQ scores compared to OHC. This finding should be borne in mind when assessing and recommending interventions for OBP.

  3. How Do You Determine Whether The Earth Is Warming Up?

    NASA Astrophysics Data System (ADS)

    Restrepo, J. M.; Comeau, D.; Flaschka, H.

    2012-12-01

    How does one determine whether the extreme summer temperatures in the North East of the US, or in Moscow during the summer of 2010, was an extreme weather fluctuation or the result of a systematic global climate warming trend? It is only under exceptional circumstances that one can determine whether an observational climate signal belongs to a particular statistical distribution. In fact, observed climate signals are rarely "statistical" and thus there is usually no way to rigorously obtain enough field data to produce a trend or tendency, based upon data alone. Furthermore, this type of data is often multi-scale. We propose a trend or tendency methodology that does not make use of a parametric or a statistical assumption. The most important feature of this trend strategy is that it is defined in very precise mathematical terms. The tendency is easily understood and practical, and its algorithmic realization is fairly robust. In addition to proposing a trend, the methodology can be adopted to generate surrogate statistical models, useful in reduced filtering schemes of time dependent processes.

  4. Statistical guides to estimating the number of undiscovered mineral deposits: an example with porphyry copper deposits

    USGS Publications Warehouse

    Singer, Donald A.; Menzie, W.D.; Cheng, Qiuming; Bonham-Carter, G. F.

    2005-01-01

    Estimating numbers of undiscovered mineral deposits is a fundamental part of assessing mineral resources. Some statistical tools can act as guides to low variance, unbiased estimates of the number of deposits. The primary guide is that the estimates must be consistent with the grade and tonnage models. Another statistical guide is the deposit density (i.e., the number of deposits per unit area of permissive rock in well-explored control areas). Preliminary estimates and confidence limits of the number of undiscovered deposits in a tract of given area may be calculated using linear regression and refined using frequency distributions with appropriate parameters. A Poisson distribution leads to estimates having lower relative variances than the regression estimates and implies a random distribution of deposits. Coefficients of variation are used to compare uncertainties of negative binomial, Poisson, or MARK3 empirical distributions that have the same expected number of deposits as the deposit density. Statistical guides presented here allow simple yet robust estimation of the number of undiscovered deposits in permissive terranes. 

  5. Surveying Europe's Only Cave-Dwelling Chordate Species (Proteus anguinus) Using Environmental DNA.

    PubMed

    Vörös, Judit; Márton, Orsolya; Schmidt, Benedikt R; Gál, Júlia Tünde; Jelić, Dušan

    2017-01-01

    In surveillance of subterranean fauna, especially in the case of rare or elusive aquatic species, traditional techniques used for epigean species are often not feasible. We developed a non-invasive survey method based on environmental DNA (eDNA) to detect the presence of the red-listed cave-dwelling amphibian, Proteus anguinus, in the caves of the Dinaric Karst. We tested the method in fifteen caves in Croatia, from which the species was previously recorded or expected to occur. We successfully confirmed the presence of P. anguinus from ten caves and detected the species for the first time in five others. Using a hierarchical occupancy model we compared the availability and detection probability of eDNA of two water sampling methods, filtration and precipitation. The statistical analysis showed that both availability and detection probability depended on the method and estimates for both probabilities were higher using filter samples than for precipitation samples. Combining reliable field and laboratory methods with robust statistical modeling will give the best estimates of species occurrence.

  6. An Electrochemical Impedance Spectroscopy-Based Technique to Identify and Quantify Fermentable Sugars in Pineapple Waste Valorization for Bioethanol Production

    PubMed Central

    Conesa, Claudia; García-Breijo, Eduardo; Loeff, Edwin; Seguí, Lucía; Fito, Pedro; Laguarda-Miró, Nicolás

    2015-01-01

    Electrochemical Impedance Spectroscopy (EIS) has been used to develop a methodology able to identify and quantify fermentable sugars present in the enzymatic hydrolysis phase of second-generation bioethanol production from pineapple waste. Thus, a low-cost non-destructive system consisting of a stainless double needle electrode associated to an electronic equipment that allows the implementation of EIS was developed. In order to validate the system, different concentrations of glucose, fructose and sucrose were added to the pineapple waste and analyzed both individually and in combination. Next, statistical data treatment enabled the design of specific Artificial Neural Networks-based mathematical models for each one of the studied sugars and their respective combinations. The obtained prediction models are robust and reliable and they are considered statistically valid (CCR% > 93.443%). These results allow us to introduce this EIS-based technique as an easy, fast, non-destructive, and in-situ alternative to the traditional laboratory methods for enzymatic hydrolysis monitoring. PMID:26378537

  7. Estimating population diversity with CatchAll

    PubMed Central

    Bunge, John; Woodard, Linda; Böhning, Dankmar; Foster, James A.; Connolly, Sean; Allen, Heather K.

    2012-01-01

    Motivation: The massive data produced by next-generation sequencing require advanced statistical tools. We address estimating the total diversity or species richness in a population. To date, only relatively simple methods have been implemented in available software. There is a need for software employing modern, computationally intensive statistical analyses including error, goodness-of-fit and robustness assessments. Results: We present CatchAll, a fast, easy-to-use, platform-independent program that computes maximum likelihood estimates for finite-mixture models, weighted linear regression-based analyses and coverage-based non-parametric methods, along with outlier diagnostics. Given sample ‘frequency count’ data, CatchAll computes 12 different diversity estimates and applies a model-selection algorithm. CatchAll also derives discounted diversity estimates to adjust for possibly uncertain low-frequency counts. It is accompanied by an Excel-based graphics program. Availability: Free executable downloads for Linux, Windows and Mac OS, with manual and source code, at www.northeastern.edu/catchall. Contact: jab18@cornell.edu PMID:22333246

  8. Lack of large-angle TT correlations persists in WMAP and Planck

    NASA Astrophysics Data System (ADS)

    Copi, Craig J.; Huterer, Dragan; Schwarz, Dominik J.; Starkman, Glenn D.

    2015-08-01

    The lack of large-angle correlations in the observed microwave background temperature fluctuations persists in the final-year maps from Wilkinson Microwave Anisotropy Probe (WMAP) and the first cosmological data release from Planck. We find a statistically robust and significant result: p-values for the missing correlations lying below 0.24 per cent (i.e. evidence at more than 3σ) for foreground cleaned maps, in complete agreement with previous analyses based upon earlier WMAP data. A cut-sky analysis of the Planck HFI 100 GHz frequency band, the `cleanest CMB channel' of this instrument, returns a p-value as small as 0.03 per cent, based on the conservative mask defined by WMAP. These findings are in stark contrast to expectations from the inflationary Lambda cold dark matter model and still lack a convincing explanation. If this lack of large-angle correlations is a true feature of our Universe, and not just a statistical fluke, then the cosmological dipole must be considerably smaller than that predicted in the best-fitting model.

  9. Relations between Brain Structure and Attentional Function in Spina Bifida: Utilization of Robust Statistical Approaches

    PubMed Central

    Kulesz, Paulina A.; Tian, Siva; Juranek, Jenifer; Fletcher, Jack M.; Francis, David J.

    2015-01-01

    Objective Weak structure-function relations for brain and behavior may stem from problems in estimating these relations in small clinical samples with frequently occurring outliers. In the current project, we focused on the utility of using alternative statistics to estimate these relations. Method Fifty-four children with spina bifida meningomyelocele performed attention tasks and received MRI of the brain. Using a bootstrap sampling process, the Pearson product moment correlation was compared with four robust correlations: the percentage bend correlation, the Winsorized correlation, the skipped correlation using the Donoho-Gasko median, and the skipped correlation using the minimum volume ellipsoid estimator Results All methods yielded similar estimates of the relations between measures of brain volume and attention performance. The similarity of estimates across correlation methods suggested that the weak structure-function relations previously found in many studies are not readily attributable to the presence of outlying observations and other factors that violate the assumptions behind the Pearson correlation. Conclusions Given the difficulty of assembling large samples for brain-behavior studies, estimating correlations using multiple, robust methods may enhance the statistical conclusion validity of studies yielding small, but often clinically significant, correlations. PMID:25495830

  10. Relations between volumetric measures of brain structure and attentional function in spina bifida: utilization of robust statistical approaches.

    PubMed

    Kulesz, Paulina A; Tian, Siva; Juranek, Jenifer; Fletcher, Jack M; Francis, David J

    2015-03-01

    Weak structure-function relations for brain and behavior may stem from problems in estimating these relations in small clinical samples with frequently occurring outliers. In the current project, we focused on the utility of using alternative statistics to estimate these relations. Fifty-four children with spina bifida meningomyelocele performed attention tasks and received MRI of the brain. Using a bootstrap sampling process, the Pearson product-moment correlation was compared with 4 robust correlations: the percentage bend correlation, the Winsorized correlation, the skipped correlation using the Donoho-Gasko median, and the skipped correlation using the minimum volume ellipsoid estimator. All methods yielded similar estimates of the relations between measures of brain volume and attention performance. The similarity of estimates across correlation methods suggested that the weak structure-function relations previously found in many studies are not readily attributable to the presence of outlying observations and other factors that violate the assumptions behind the Pearson correlation. Given the difficulty of assembling large samples for brain-behavior studies, estimating correlations using multiple, robust methods may enhance the statistical conclusion validity of studies yielding small, but often clinically significant, correlations. PsycINFO Database Record (c) 2015 APA, all rights reserved.

  11. Interactions and triggering in a 3D rate and state asperity model

    NASA Astrophysics Data System (ADS)

    Dublanchet, P.; Bernard, P.

    2012-12-01

    Precise relocation of micro-seismicity and careful analysis of seismic source parameters have progressively imposed the concept of seismic asperities embedded in a creeping fault segment as being one of the most important aspect that should appear in a realistic representation of micro-seismic sources. Another important issue concerning micro-seismic activity is the existence of robust empirical laws describing the temporal and magnitude distribution of earthquakes, such as the Omori law, the distribution of inter-event time and the Gutenberg-Richter law. In this framework, this study aims at understanding statistical properties of earthquakes, by generating synthetic catalogs with a 3D, quasi-dynamic continuous rate and state asperity model, that takes into account a realistic geometry of asperities. Our approach contrasts with ETAS models (Kagan and Knopoff, 1981) usually implemented to produce earthquake catalogs, in the sense that the non linearity observed in rock friction experiments (Dieterich, 1979) is fully taken into account by the use of rate and state friction law. Furthermore, our model differs from discrete models of faults (Ziv and Cochard, 2006) because the continuity allows us to define realistic geometries and distributions of asperities by the assembling of sub-critical computational cells that always fail in a single event. Moreover, this model allows us to adress the question of the influence of barriers and distribution of asperities on the event statistics. After recalling the main observations of asperities in the specific case of Parkfield segment of San-Andreas Fault, we analyse earthquake statistical properties computed for this area. Then, we present synthetic statistics obtained by our model that allow us to discuss the role of barriers on clustering and triggering phenomena among a population of sources. It appears that an effective size of barrier, that depends on its frictional strength, controls the presence or the absence, in the synthetic catalog, of statistical laws that are similar to what is observed for real earthquakes. As an application, we attempt to draw a comparison between synthetic statistics and the observed statistics of Parkfield in order to characterize what could be a realistic frictional model of Parkfield area. More generally, we obtained synthetic statistical properties that are in agreement with power-law decays characterized by exponents that match the observations at a global scale, showing that our mechanical model is able to provide new insights into the understanding of earthquake interaction processes in general.

  12. Estimating temporary emigration and breeding proportions using capture-recapture data with Pollock's robust design

    USGS Publications Warehouse

    Kendall, W.L.; Nichols, J.D.; Hines, J.E.

    1997-01-01

    Statistical inference for capture-recapture studies of open animal populations typically relies on the assumption that all emigration from the studied population is permanent. However, there are many instances in which this assumption is unlikely to be met. We define two general models for the process of temporary emigration, completely random and Markovian. We then consider effects of these two types of temporary emigration on Jolly-Seber (Seber 1982) estimators and on estimators arising from the full-likelihood approach of Kendall et al. (1995) to robust design data. Capture-recapture data arising from Pollock's (1982) robust design provide the basis for obtaining unbiased estimates of demographic parameters in the presence of temporary emigration and for estimating the probability of temporary emigration. We present a likelihood-based approach to dealing with temporary emigration that permits estimation under different models of temporary emigration and yields tests for completely random and Markovian emigration. In addition, we use the relationship between capture probability estimates based on closed and open models under completely random temporary emigration to derive three ad hoc estimators for the probability of temporary emigration, two of which should be especially useful in situations where capture probabilities are heterogeneous among individual animals. Ad hoc and full-likelihood estimators are illustrated for small mammal capture-recapture data sets. We believe that these models and estimators will be useful for testing hypotheses about the process of temporary emigration, for estimating demographic parameters in the presence of temporary emigration, and for estimating probabilities of temporary emigration. These latter estimates are frequently of ecological interest as indicators of animal movement and, in some sampling situations, as direct estimates of breeding probabilities and proportions.

  13. Proficiency Testing for Determination of Water Content in Toluene of Chemical Reagents by iteration robust statistic technique

    NASA Astrophysics Data System (ADS)

    Wang, Hao; Wang, Qunwei; He, Ming

    2018-05-01

    In order to investigate and improve the level of detection technology of water content in liquid chemical reagents of domestic laboratories, proficiency testing provider PT0031 (CNAS) has organized proficiency testing program of water content in toluene, 48 laboratories from 18 provinces/cities/municipals took part in the PT. This paper introduces the implementation process of proficiency testing for determination of water content in toluene, including sample preparation, homogeneity and stability test, the results of statistics of iteration robust statistic technique and analysis, summarized and analyzed those of the different test standards which are widely used in the laboratories, put forward the technological suggestions for the improvement of the test quality of water content. Satisfactory results were obtained by 43 laboratories, amounting to 89.6% of the total participating laboratories.

  14. Hypothesis Testing, "p" Values, Confidence Intervals, Measures of Effect Size, and Bayesian Methods in Light of Modern Robust Techniques

    ERIC Educational Resources Information Center

    Wilcox, Rand R.; Serang, Sarfaraz

    2017-01-01

    The article provides perspectives on p values, null hypothesis testing, and alternative techniques in light of modern robust statistical methods. Null hypothesis testing and "p" values can provide useful information provided they are interpreted in a sound manner, which includes taking into account insights and advances that have…

  15. Robust nonlinear system identification: Bayesian mixture of experts using the t-distribution

    NASA Astrophysics Data System (ADS)

    Baldacchino, Tara; Worden, Keith; Rowson, Jennifer

    2017-02-01

    A novel variational Bayesian mixture of experts model for robust regression of bifurcating and piece-wise continuous processes is introduced. The mixture of experts model is a powerful model which probabilistically splits the input space allowing different models to operate in the separate regions. However, current methods have no fail-safe against outliers. In this paper, a robust mixture of experts model is proposed which consists of Student-t mixture models at the gates and Student-t distributed experts, trained via Bayesian inference. The Student-t distribution has heavier tails than the Gaussian distribution, and so it is more robust to outliers, noise and non-normality in the data. Using both simulated data and real data obtained from the Z24 bridge this robust mixture of experts performs better than its Gaussian counterpart when outliers are present. In particular, it provides robustness to outliers in two forms: unbiased parameter regression models, and robustness to overfitting/complex models.

  16. Improved nucleic acid descriptors for siRNA efficacy prediction.

    PubMed

    Sciabola, Simone; Cao, Qing; Orozco, Modesto; Faustino, Ignacio; Stanton, Robert V

    2013-02-01

    Although considerable progress has been made recently in understanding how gene silencing is mediated by the RNAi pathway, the rational design of effective sequences is still a challenging task. In this article, we demonstrate that including three-dimensional descriptors improved the discrimination between active and inactive small interfering RNAs (siRNAs) in a statistical model. Five descriptor types were used: (i) nucleotide position along the siRNA sequence, (ii) nucleotide composition in terms of presence/absence of specific combinations of di- and trinucleotides, (iii) nucleotide interactions by means of a modified auto- and cross-covariance function, (iv) nucleotide thermodynamic stability derived by the nearest neighbor model representation and (v) nucleic acid structure flexibility. The duplex flexibility descriptors are derived from extended molecular dynamics simulations, which are able to describe the sequence-dependent elastic properties of RNA duplexes, even for non-standard oligonucleotides. The matrix of descriptors was analysed using three statistical packages in R (partial least squares, random forest, and support vector machine), and the most predictive model was implemented in a modeling tool we have made publicly available through SourceForge. Our implementation of new RNA descriptors coupled with appropriate statistical algorithms resulted in improved model performance for the selection of siRNA candidates when compared with publicly available siRNA prediction tools and previously published test sets. Additional validation studies based on in-house RNA interference projects confirmed the robustness of the scoring procedure in prospective studies.

  17. Computational statistics using the Bayesian Inference Engine

    NASA Astrophysics Data System (ADS)

    Weinberg, Martin D.

    2013-09-01

    This paper introduces the Bayesian Inference Engine (BIE), a general parallel, optimized software package for parameter inference and model selection. This package is motivated by the analysis needs of modern astronomical surveys and the need to organize and reuse expensive derived data. The BIE is the first platform for computational statistics designed explicitly to enable Bayesian update and model comparison for astronomical problems. Bayesian update is based on the representation of high-dimensional posterior distributions using metric-ball-tree based kernel density estimation. Among its algorithmic offerings, the BIE emphasizes hybrid tempered Markov chain Monte Carlo schemes that robustly sample multimodal posterior distributions in high-dimensional parameter spaces. Moreover, the BIE implements a full persistence or serialization system that stores the full byte-level image of the running inference and previously characterized posterior distributions for later use. Two new algorithms to compute the marginal likelihood from the posterior distribution, developed for and implemented in the BIE, enable model comparison for complex models and data sets. Finally, the BIE was designed to be a collaborative platform for applying Bayesian methodology to astronomy. It includes an extensible object-oriented and easily extended framework that implements every aspect of the Bayesian inference. By providing a variety of statistical algorithms for all phases of the inference problem, a scientist may explore a variety of approaches with a single model and data implementation. Additional technical details and download details are available from http://www.astro.umass.edu/bie. The BIE is distributed under the GNU General Public License.

  18. Median statistics estimates of Hubble and Newton's constants

    NASA Astrophysics Data System (ADS)

    Bethapudi, Suryarao; Desai, Shantanu

    2017-02-01

    Robustness of any statistics depends upon the number of assumptions it makes about the measured data. We point out the advantages of median statistics using toy numerical experiments and demonstrate its robustness, when the number of assumptions we can make about the data are limited. We then apply the median statistics technique to obtain estimates of two constants of nature, Hubble constant (H0) and Newton's gravitational constant ( G , both of which show significant differences between different measurements. For H0, we update the analyses done by Chen and Ratra (2011) and Gott et al. (2001) using 576 measurements. We find after grouping the different results according to their primary type of measurement, the median estimates are given by H0 = 72.5^{+2.5}_{-8} km/sec/Mpc with errors corresponding to 95% c.l. (2 σ) and G=6.674702^{+0.0014}_{-0.0009} × 10^{-11} Nm2kg-2 corresponding to 68% c.l. (1σ).

  19. On the Use of Statistics in Design and the Implications for Deterministic Computer Experiments

    NASA Technical Reports Server (NTRS)

    Simpson, Timothy W.; Peplinski, Jesse; Koch, Patrick N.; Allen, Janet K.

    1997-01-01

    Perhaps the most prevalent use of statistics in engineering design is through Taguchi's parameter and robust design -- using orthogonal arrays to compute signal-to-noise ratios in a process of design improvement. In our view, however, there is an equally exciting use of statistics in design that could become just as prevalent: it is the concept of metamodeling whereby statistical models are built to approximate detailed computer analysis codes. Although computers continue to get faster, analysis codes always seem to keep pace so that their computational time remains non-trivial. Through metamodeling, approximations of these codes are built that are orders of magnitude cheaper to run. These metamodels can then be linked to optimization routines for fast analysis, or they can serve as a bridge for integrating analysis codes across different domains. In this paper we first review metamodeling techniques that encompass design of experiments, response surface methodology, Taguchi methods, neural networks, inductive learning, and kriging. We discuss their existing applications in engineering design and then address the dangers of applying traditional statistical techniques to approximate deterministic computer analysis codes. We conclude with recommendations for the appropriate use of metamodeling techniques in given situations and how common pitfalls can be avoided.

  20. Multiple Phenotype Association Tests Using Summary Statistics in Genome-Wide Association Studies

    PubMed Central

    Liu, Zhonghua; Lin, Xihong

    2017-01-01

    Summary We study in this paper jointly testing the associations of a genetic variant with correlated multiple phenotypes using the summary statistics of individual phenotype analysis from Genome-Wide Association Studies (GWASs). We estimated the between-phenotype correlation matrix using the summary statistics of individual phenotype GWAS analyses, and developed genetic association tests for multiple phenotypes by accounting for between-phenotype correlation without the need to access individual-level data. Since genetic variants often affect multiple phenotypes differently across the genome and the between-phenotype correlation can be arbitrary, we proposed robust and powerful multiple phenotype testing procedures by jointly testing a common mean and a variance component in linear mixed models for summary statistics. We computed the p-values of the proposed tests analytically. This computational advantage makes our methods practically appealing in large-scale GWASs. We performed simulation studies to show that the proposed tests maintained correct type I error rates, and to compare their powers in various settings with the existing methods. We applied the proposed tests to a GWAS Global Lipids Genetics Consortium summary statistics data set and identified additional genetic variants that were missed by the original single-trait analysis. PMID:28653391

  1. Multiple phenotype association tests using summary statistics in genome-wide association studies.

    PubMed

    Liu, Zhonghua; Lin, Xihong

    2018-03-01

    We study in this article jointly testing the associations of a genetic variant with correlated multiple phenotypes using the summary statistics of individual phenotype analysis from Genome-Wide Association Studies (GWASs). We estimated the between-phenotype correlation matrix using the summary statistics of individual phenotype GWAS analyses, and developed genetic association tests for multiple phenotypes by accounting for between-phenotype correlation without the need to access individual-level data. Since genetic variants often affect multiple phenotypes differently across the genome and the between-phenotype correlation can be arbitrary, we proposed robust and powerful multiple phenotype testing procedures by jointly testing a common mean and a variance component in linear mixed models for summary statistics. We computed the p-values of the proposed tests analytically. This computational advantage makes our methods practically appealing in large-scale GWASs. We performed simulation studies to show that the proposed tests maintained correct type I error rates, and to compare their powers in various settings with the existing methods. We applied the proposed tests to a GWAS Global Lipids Genetics Consortium summary statistics data set and identified additional genetic variants that were missed by the original single-trait analysis. © 2017, The International Biometric Society.

  2. Baseline models of trace elements in major aquifers of the United States

    USGS Publications Warehouse

    Lee, L.; Helsel, D.

    2005-01-01

    Trace-element concentrations in baseline samples from a survey of aquifers used as potable-water supplies in the United States are summarized using methods appropriate for data with multiple detection limits. The resulting statistical distribution models are used to develop summary statistics and estimate probabilities of exceeding water-quality standards. The models are based on data from the major aquifer studies of the USGS National Water Quality Assessment (NAWQA) Program. These data were produced with a nationally-consistent sampling and analytical framework specifically designed to determine the quality of the most important potable groundwater resources during the years 1991-2001. The analytical data for all elements surveyed contain values that were below several detection limits. Such datasets are referred to as multiply-censored data. To address this issue, a robust semi-parametric statistical method called regression on order statistics (ROS) is employed. Utilizing the 90th-95th percentile as an arbitrary range for the upper limits of expected baseline concentrations, the models show that baseline concentrations of dissolved Ba and Zn are below 500 ??g/L. For the same percentile range, dissolved As, Cu and Mo concentrations are below 10 ??g/L, and dissolved Ag, Be, Cd, Co, Cr, Ni, Pb, Sb and Se are below 1-5 ??g/L. These models are also used to determine the probabilities that potable ground waters exceed drinking water standards. For dissolved Ba, Cr, Cu, Pb, Ni, Mo and Se, the likelihood of exceeding the US Environmental Protection Agency standards at the well-head is less than 1-1.5%. A notable exception is As, which has approximately a 7% chance of exceeding the maximum contaminant level (10 ??g/L) at the well head.

  3. Bayesian depth estimation from monocular natural images.

    PubMed

    Su, Che-Chun; Cormack, Lawrence K; Bovik, Alan C

    2017-05-01

    Estimating an accurate and naturalistic dense depth map from a single monocular photographic image is a difficult problem. Nevertheless, human observers have little difficulty understanding the depth structure implied by photographs. Two-dimensional (2D) images of the real-world environment contain significant statistical information regarding the three-dimensional (3D) structure of the world that the vision system likely exploits to compute perceived depth, monocularly as well as binocularly. Toward understanding how this might be accomplished, we propose a Bayesian model of monocular depth computation that recovers detailed 3D scene structures by extracting reliable, robust, depth-sensitive statistical features from single natural images. These features are derived using well-accepted univariate natural scene statistics (NSS) models and recent bivariate/correlation NSS models that describe the relationships between 2D photographic images and their associated depth maps. This is accomplished by building a dictionary of canonical local depth patterns from which NSS features are extracted as prior information. The dictionary is used to create a multivariate Gaussian mixture (MGM) likelihood model that associates local image features with depth patterns. A simple Bayesian predictor is then used to form spatial depth estimates. The depth results produced by the model, despite its simplicity, correlate well with ground-truth depths measured by a current-generation terrestrial light detection and ranging (LIDAR) scanner. Such a strong form of statistical depth information could be used by the visual system when creating overall estimated depth maps incorporating stereopsis, accommodation, and other conditions. Indeed, even in isolation, the Bayesian predictor delivers depth estimates that are competitive with state-of-the-art "computer vision" methods that utilize highly engineered image features and sophisticated machine learning algorithms.

  4. September Arctic Sea Ice minimum prediction - a new skillful statistical approach

    NASA Astrophysics Data System (ADS)

    Ionita-Scholz, Monica; Grosfeld, Klaus; Scholz, Patrick; Treffeisen, Renate; Lohmann, Gerrit

    2017-04-01

    Sea ice in both Polar Regions is an important indicator for the expression of global climate change and its polar amplification. Consequently, a broad interest exists on sea ice, its coverage, variability and long term change. Knowledge on sea ice requires high quality data on ice extent, thickness and its dynamics. However, its predictability is complex and it depends on various climate and oceanic parameters and conditions. In order to provide insights into the potential development of a monthly/seasonal signal of sea ice evolution, we developed a robust statistical model based on ocean heat content, sea surface temperature and different atmospheric variables to calculate an estimate of the September Sea ice extent (SSIE) on monthly time scale. Although previous statistical attempts at monthly/seasonal forecasts of SSIE show a relatively reduced skill, we show here that more than 92% (r = 0.96) of the September sea ice extent can be predicted at the end of May by using previous months' climate and oceanic conditions. The skill of the model increases with a decrease in the time lag used for the forecast. At the end of August, our predictions are even able to explain 99% of the SSIE. Our statistical model captures both the general trend as well as the interannual variability of the SSIE. Moreover, it is able to properly forecast the years with extreme high/low SSIE (e.g. 1996/ 2007, 2012, 2013). Besides its forecast skill for SSIE, the model could provide a valuable tool for identifying relevant regions and climate parameters that are important for the sea ice development in the Arctic and for detecting sensitive and critical regions in global coupled climate models with focus on sea ice formation.

  5. Replicating annual North Atlantic hurricane activity 1878-2012 from environmental variables

    NASA Astrophysics Data System (ADS)

    Saunders, Mark A.; Klotzbach, Philip J.; Lea, Adam S. R.

    2017-06-01

    Statistical models can replicate annual North Atlantic hurricane activity from large-scale environmental field data for August and September, the months of peak hurricane activity. We assess how well the six environmental fields used most often in contemporary statistical modeling of seasonal hurricane activity replicate North Atlantic hurricane numbers and Accumulated Cyclone Energy (ACE) over the 135 year period from 1878 to 2012. We find that these fields replicate historical hurricane activity surprisingly well, showing that contemporary statistical models and their seasonal physical links have long-term robustness. We find that August-September zonal trade wind speed over the Caribbean Sea and the tropical North Atlantic is the environmental field which individually replicates long-term hurricane activity the best and that trade wind speed combined with the difference in sea surface temperature between the tropical Atlantic and the tropical mean is the best multi-predictor model. Comparing the performance of the best single-predictor and best multi-predictor models shows that they exhibit little difference in hindcast skill for predicting long-term ACE but that the best multipredictor model offers improved skill for predicting long-term hurricane numbers. We examine whether replicated real-time prediction skill 1983-2012 increases as the model training period lengthens and find evidence that this happens slowly. We identify a dropout in hurricane replication centered on the 1940s and show that this is likely due to a decrease in data quality which affects all data sets but Atlantic sea surface temperatures in particular. Finally, we offer insights on the implications of our findings for seasonal hurricane prediction.

  6. AN INVERSE MODELING APPROACH FOR STRESS ESTIMATION IN MITRAL VALVE ANTERIOR LEAFLET VALVULOPLASTY FOR IN-VIVO VALVULAR BIOMATERIAL ASSESSMENT

    PubMed Central

    Lee, Chung-Hao; Amini, Rouzbeh; Gorman, Robert C.; Gorman, Joseph H.; Sacks, Michael S.

    2013-01-01

    Estimation of regional tissue stresses in the functioning heart valve remains an important goal in our understanding of normal valve function and in developing novel engineered tissue strategies for valvular repair and replacement. Methods to accurately estimate regional tissue stresses are thus needed for this purpose, and in particular to develop accurate, statistically informed means to validate computational models of valve function. Moreover, there exists no currently accepted method to evaluate engineered heart valve tissues and replacement heart valve biomaterials undergoing valvular stresses in blood contact. While we have utilized mitral valve anterior leaflet valvuloplasty as an experimental approach to address this limitation, robust computational techniques to estimate implant stresses are required. In the present study, we developed a novel numerical analysis approach for estimation of the in-vivo stresses of the central region of the mitral valve anterior leaflet (MVAL) delimited by a sonocrystal transducer array. The in-vivo material properties of the MVAL were simulated using an inverse FE modeling approach based on three pseudo-hyperelastic constitutive models: the neo-Hookean, exponential-type isotropic, and full collagen-fiber mapped transversely isotropic models. A series of numerical replications with varying structural configurations were developed by incorporating measured statistical variations in MVAL local preferred fiber directions and fiber splay. These model replications were then used to investigate how known variations in the valve tissue microstructure influence the estimated ROI stresses and its variation at each time point during a cardiac cycle. Simulations were also able to include estimates of the variation in tissue stresses for an individual specimen dataset over the cardiac cycle. Of the three material models, the transversely anisotropic model produced the most accurate results, with ROI averaged stresses at the fully-loaded state of 432.6±46.5 kPa and 241.4±40.5 kPa in the radial and circumferential directions, respectively. We conclude that the present approach can provide robust instantaneous mean and variation estimates of tissue stresses of the central regions of the MVAL. PMID:24275434

  7. Directional variance adjustment: bias reduction in covariance matrices based on factor analysis with an application to portfolio optimization.

    PubMed

    Bartz, Daniel; Hatrick, Kerr; Hesse, Christian W; Müller, Klaus-Robert; Lemm, Steven

    2013-01-01

    Robust and reliable covariance estimates play a decisive role in financial and many other applications. An important class of estimators is based on factor models. Here, we show by extensive Monte Carlo simulations that covariance matrices derived from the statistical Factor Analysis model exhibit a systematic error, which is similar to the well-known systematic error of the spectrum of the sample covariance matrix. Moreover, we introduce the Directional Variance Adjustment (DVA) algorithm, which diminishes the systematic error. In a thorough empirical study for the US, European, and Hong Kong stock market we show that our proposed method leads to improved portfolio allocation.

  8. Structure of turbulent non-premixed flames modeled with two-step chemistry

    NASA Technical Reports Server (NTRS)

    Chen, J. H.; Mahalingam, S.; Puri, I. K.; Vervisch, L.

    1992-01-01

    Direct numerical simulations of turbulent diffusion flames modeled with finite-rate, two-step chemistry, A + B yields I, A + I yields P, were carried out. A detailed analysis of the turbulent flame structure reveals the complex nature of the penetration of various reactive species across two reaction zones in mixture fraction space. Due to this two zone structure, these flames were found to be robust, resisting extinction over the parameter ranges investigated. As in single-step computations, mixture fraction dissipation rate and the mixture fraction were found to be statistically correlated. Simulations involving unequal molecular diffusivities suggest that the small scale mixing process and, hence, the turbulent flame structure is sensitive to the Schmidt number.

  9. A simple white noise analysis of neuronal light responses.

    PubMed

    Chichilnisky, E J

    2001-05-01

    A white noise technique is presented for estimating the response properties of spiking visual system neurons. The technique is simple, robust, efficient and well suited to simultaneous recordings from multiple neurons. It provides a complete and easily interpretable model of light responses even for neurons that display a common form of response nonlinearity that precludes classical linear systems analysis. A theoretical justification of the technique is presented that relies only on elementary linear algebra and statistics. Implementation is described with examples. The technique and the underlying model of neural responses are validated using recordings from retinal ganglion cells, and in principle are applicable to other neurons. Advantages and disadvantages of the technique relative to classical approaches are discussed.

  10. Three-dimensional analysis of scoliosis surgery using stereophotogrammetry

    NASA Astrophysics Data System (ADS)

    Jang, Stanley B.; Booth, Kellogg S.; Reilly, Chris W.; Sawatzky, Bonita J.; Tredwell, Stephen J.

    1994-04-01

    A new stereophotogrammetric analysis and 3D visualization allow accurate assessment of the scoliotic spine during instrumentation. Stereophoto pairs taken at each stage of the operation and robust statistical techniques are used to compute 3D transformations of the vertebrae between stages. These determine rotation, translation, goodness of fit, and overall spinal contour. A polygonal model of the spine using commercial 3D modeling package is used to produce an animation sequence of the transformation. The visualization have provided some important observation. Correction of the scoliosis is achieved largely through vertebral translation and coronal plane rotation, contrary to claims that large axial rotations are required. The animations provide valuable qualitative information for surgeons assessing the results of scoliotic correction.

  11. Directional Variance Adjustment: Bias Reduction in Covariance Matrices Based on Factor Analysis with an Application to Portfolio Optimization

    PubMed Central

    Bartz, Daniel; Hatrick, Kerr; Hesse, Christian W.; Müller, Klaus-Robert; Lemm, Steven

    2013-01-01

    Robust and reliable covariance estimates play a decisive role in financial and many other applications. An important class of estimators is based on factor models. Here, we show by extensive Monte Carlo simulations that covariance matrices derived from the statistical Factor Analysis model exhibit a systematic error, which is similar to the well-known systematic error of the spectrum of the sample covariance matrix. Moreover, we introduce the Directional Variance Adjustment (DVA) algorithm, which diminishes the systematic error. In a thorough empirical study for the US, European, and Hong Kong stock market we show that our proposed method leads to improved portfolio allocation. PMID:23844016

  12. Hierarchical Commensurate and Power Prior Models for Adaptive Incorporation of Historical Information in Clinical Trials

    PubMed Central

    Hobbs, Brian P.; Carlin, Bradley P.; Mandrekar, Sumithra J.; Sargent, Daniel J.

    2011-01-01

    Summary Bayesian clinical trial designs offer the possibility of a substantially reduced sample size, increased statistical power, and reductions in cost and ethical hazard. However when prior and current information conflict, Bayesian methods can lead to higher than expected Type I error, as well as the possibility of a costlier and lengthier trial. This motivates an investigation of the feasibility of hierarchical Bayesian methods for incorporating historical data that are adaptively robust to prior information that reveals itself to be inconsistent with the accumulating experimental data. In this paper, we present several models that allow for the commensurability of the information in the historical and current data to determine how much historical information is used. A primary tool is elaborating the traditional power prior approach based upon a measure of commensurability for Gaussian data. We compare the frequentist performance of several methods using simulations, and close with an example of a colon cancer trial that illustrates a linear models extension of our adaptive borrowing approach. Our proposed methods produce more precise estimates of the model parameters, in particular conferring statistical significance to the observed reduction in tumor size for the experimental regimen as compared to the control regimen. PMID:21361892

  13. Efficient exploration of pan-cancer networks by generalized covariance selection and interactive web content

    PubMed Central

    Kling, Teresia; Johansson, Patrik; Sanchez, José; Marinescu, Voichita D.; Jörnsten, Rebecka; Nelander, Sven

    2015-01-01

    Statistical network modeling techniques are increasingly important tools to analyze cancer genomics data. However, current tools and resources are not designed to work across multiple diagnoses and technical platforms, thus limiting their applicability to comprehensive pan-cancer datasets such as The Cancer Genome Atlas (TCGA). To address this, we describe a new data driven modeling method, based on generalized Sparse Inverse Covariance Selection (SICS). The method integrates genetic, epigenetic and transcriptional data from multiple cancers, to define links that are present in multiple cancers, a subset of cancers, or a single cancer. It is shown to be statistically robust and effective at detecting direct pathway links in data from TCGA. To facilitate interpretation of the results, we introduce a publicly accessible tool (cancerlandscapes.org), in which the derived networks are explored as interactive web content, linked to several pathway and pharmacological databases. To evaluate the performance of the method, we constructed a model for eight TCGA cancers, using data from 3900 patients. The model rediscovered known mechanisms and contained interesting predictions. Possible applications include prediction of regulatory relationships, comparison of network modules across multiple forms of cancer and identification of drug targets. PMID:25953855

  14. Optimization strategies based on sequential quadratic programming applied for a fermentation process for butanol production.

    PubMed

    Pinto Mariano, Adriano; Bastos Borba Costa, Caliane; de Franceschi de Angelis, Dejanira; Maugeri Filho, Francisco; Pires Atala, Daniel Ibraim; Wolf Maciel, Maria Regina; Maciel Filho, Rubens

    2009-11-01

    In this work, the mathematical optimization of a continuous flash fermentation process for the production of biobutanol was studied. The process consists of three interconnected units, as follows: fermentor, cell-retention system (tangential microfiltration), and vacuum flash vessel (responsible for the continuous recovery of butanol from the broth). The objective of the optimization was to maximize butanol productivity for a desired substrate conversion. Two strategies were compared for the optimization of the process. In one of them, the process was represented by a deterministic model with kinetic parameters determined experimentally and, in the other, by a statistical model obtained using the factorial design technique combined with simulation. For both strategies, the problem was written as a nonlinear programming problem and was solved with the sequential quadratic programming technique. The results showed that despite the very similar solutions obtained with both strategies, the problems found with the strategy using the deterministic model, such as lack of convergence and high computational time, make the use of the optimization strategy with the statistical model, which showed to be robust and fast, more suitable for the flash fermentation process, being recommended for real-time applications coupling optimization and control.

  15. A brief introduction to computer-intensive methods, with a view towards applications in spatial statistics and stereology.

    PubMed

    Mattfeldt, Torsten

    2011-04-01

    Computer-intensive methods may be defined as data analytical procedures involving a huge number of highly repetitive computations. We mention resampling methods with replacement (bootstrap methods), resampling methods without replacement (randomization tests) and simulation methods. The resampling methods are based on simple and robust principles and are largely free from distributional assumptions. Bootstrap methods may be used to compute confidence intervals for a scalar model parameter and for summary statistics from replicated planar point patterns, and for significance tests. For some simple models of planar point processes, point patterns can be simulated by elementary Monte Carlo methods. The simulation of models with more complex interaction properties usually requires more advanced computing methods. In this context, we mention simulation of Gibbs processes with Markov chain Monte Carlo methods using the Metropolis-Hastings algorithm. An alternative to simulations on the basis of a parametric model consists of stochastic reconstruction methods. The basic ideas behind the methods are briefly reviewed and illustrated by simple worked examples in order to encourage novices in the field to use computer-intensive methods. © 2010 The Authors Journal of Microscopy © 2010 Royal Microscopical Society.

  16. Computational Motion Phantoms and Statistical Models of Respiratory Motion

    NASA Astrophysics Data System (ADS)

    Ehrhardt, Jan; Klinder, Tobias; Lorenz, Cristian

    Breathing motion is not a robust and 100 % reproducible process, and inter- and intra-fractional motion variations form an important problem in radiotherapy of the thorax and upper abdomen. A widespread consensus nowadays exists that it would be useful to use prior knowledge about respiratory organ motion and its variability to improve radiotherapy planning and treatment delivery. This chapter discusses two different approaches to model the variability of respiratory motion. In the first part, we review computational motion phantoms, i.e. computerized anatomical and physiological models. Computational phantoms are excellent tools to simulate and investigate the effects of organ motion in radiation therapy and to gain insight into methods for motion management. The second part of this chapter discusses statistical modeling techniques to describe the breathing motion and its variability in a population of 4D images. Population-based models can be generated from repeatedly acquired 4D images of the same patient (intra-patient models) and from 4D images of different patients (inter-patient models). The generation of those models is explained and possible applications of those models for motion prediction in radiotherapy are exemplified. Computational models of respiratory motion and motion variability have numerous applications in radiation therapy, e.g. to understand motion effects in simulation studies, to develop and evaluate treatment strategies or to introduce prior knowledge into the patient-specific treatment planning.

  17. Robust approaches to quantification of margin and uncertainty for sparse data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hund, Lauren; Schroeder, Benjamin B.; Rumsey, Kelin

    Characterizing the tails of probability distributions plays a key role in quantification of margins and uncertainties (QMU), where the goal is characterization of low probability, high consequence events based on continuous measures of performance. When data are collected using physical experimentation, probability distributions are typically fit using statistical methods based on the collected data, and these parametric distributional assumptions are often used to extrapolate about the extreme tail behavior of the underlying probability distribution. In this project, we character- ize the risk associated with such tail extrapolation. Specifically, we conducted a scaling study to demonstrate the large magnitude of themore » risk; then, we developed new methods for communicat- ing risk associated with tail extrapolation from unvalidated statistical models; lastly, we proposed a Bayesian data-integration framework to mitigate tail extrapolation risk through integrating ad- ditional information. We conclude that decision-making using QMU is a complex process that cannot be achieved using statistical analyses alone.« less

  18. Statistical detection of systematic election irregularities

    PubMed Central

    Klimek, Peter; Yegorov, Yuri; Hanel, Rudolf; Thurner, Stefan

    2012-01-01

    Democratic societies are built around the principle of free and fair elections, and that each citizen’s vote should count equally. National elections can be regarded as large-scale social experiments, where people are grouped into usually large numbers of electoral districts and vote according to their preferences. The large number of samples implies statistical consequences for the polling results, which can be used to identify election irregularities. Using a suitable data representation, we find that vote distributions of elections with alleged fraud show a kurtosis substantially exceeding the kurtosis of normal elections, depending on the level of data aggregation. As an example, we show that reported irregularities in recent Russian elections are, indeed, well-explained by systematic ballot stuffing. We develop a parametric model quantifying the extent to which fraudulent mechanisms are present. We formulate a parametric test detecting these statistical properties in election results. Remarkably, this technique produces robust outcomes with respect to the resolution of the data and therefore, allows for cross-country comparisons. PMID:23010929

  19. Statistical detection of systematic election irregularities.

    PubMed

    Klimek, Peter; Yegorov, Yuri; Hanel, Rudolf; Thurner, Stefan

    2012-10-09

    Democratic societies are built around the principle of free and fair elections, and that each citizen's vote should count equally. National elections can be regarded as large-scale social experiments, where people are grouped into usually large numbers of electoral districts and vote according to their preferences. The large number of samples implies statistical consequences for the polling results, which can be used to identify election irregularities. Using a suitable data representation, we find that vote distributions of elections with alleged fraud show a kurtosis substantially exceeding the kurtosis of normal elections, depending on the level of data aggregation. As an example, we show that reported irregularities in recent Russian elections are, indeed, well-explained by systematic ballot stuffing. We develop a parametric model quantifying the extent to which fraudulent mechanisms are present. We formulate a parametric test detecting these statistical properties in election results. Remarkably, this technique produces robust outcomes with respect to the resolution of the data and therefore, allows for cross-country comparisons.

  20. Computational Software for Fitting Seismic Data to Epidemic-Type Aftershock Sequence Models

    NASA Astrophysics Data System (ADS)

    Chu, A.

    2014-12-01

    Modern earthquake catalogs are often analyzed using spatial-temporal point process models such as the epidemic-type aftershock sequence (ETAS) models of Ogata (1998). My work introduces software to implement two of ETAS models described in Ogata (1998). To find the Maximum-Likelihood Estimates (MLEs), my software provides estimates of the homogeneous background rate parameter and the temporal and spatial parameters that govern triggering effects by applying the Expectation-Maximization (EM) algorithm introduced in Veen and Schoenberg (2008). Despite other computer programs exist for similar data modeling purpose, using EM-algorithm has the benefits of stability and robustness (Veen and Schoenberg, 2008). Spatial shapes that are very long and narrow cause difficulties in optimization convergence and problems with flat or multi-modal log-likelihood functions encounter similar issues. My program uses a robust method to preset a parameter to overcome the non-convergence computational issue. In addition to model fitting, the software is equipped with useful tools for examining modeling fitting results, for example, visualization of estimated conditional intensity, and estimation of expected number of triggered aftershocks. A simulation generator is also given with flexible spatial shapes that may be defined by the user. This open-source software has a very simple user interface. The user may execute it on a local computer, and the program also has potential to be hosted online. Java language is used for the software's core computing part and an optional interface to the statistical package R is provided.

  1. The anesthetic action of some polyhalogenated ethers-Monte Carlo method based QSAR study.

    PubMed

    Golubović, Mlađan; Lazarević, Milan; Zlatanović, Dragan; Krtinić, Dane; Stoičkov, Viktor; Mladenović, Bojan; Milić, Dragan J; Sokolović, Dušan; Veselinović, Aleksandar M

    2018-04-13

    Up to this date, there has been an ongoing debate about the mode of action of general anesthetics, which have postulated many biological sites as targets for their action. However, postoperative nausea and vomiting are common problems in which inhalational agents may have a role in their development. When a mode of action is unknown, QSAR modelling is essential in drug development. To investigate the aspects of their anesthetic, QSAR models based on the Monte Carlo method were developed for a set of polyhalogenated ethers. Until now, their anesthetic action has not been completely defined, although some hypotheses have been suggested. Therefore, a QSAR model should be developed on molecular fragments that contribute to anesthetic action. QSAR models were built on the basis of optimal molecular descriptors based on the SMILES notation and local graph invariants, whereas the Monte Carlo optimization method with three random splits into the training and test set was applied for model development. Different methods, including novel Index of ideality correlation, were applied for the determination of the robustness of the model and its predictive potential. The Monte Carlo optimization process was capable of being an efficient in silico tool for building up a robust model of good statistical quality. Molecular fragments which have both positive and negative influence on anesthetic action were determined. The presented study can be useful in the search for novel anesthetics. Copyright © 2018 Elsevier Ltd. All rights reserved.

  2. Adaptive and robust statistical methods for processing near-field scanning microwave microscopy images.

    PubMed

    Coakley, K J; Imtiaz, A; Wallis, T M; Weber, J C; Berweger, S; Kabos, P

    2015-03-01

    Near-field scanning microwave microscopy offers great potential to facilitate characterization, development and modeling of materials. By acquiring microwave images at multiple frequencies and amplitudes (along with the other modalities) one can study material and device physics at different lateral and depth scales. Images are typically noisy and contaminated by artifacts that can vary from scan line to scan line and planar-like trends due to sample tilt errors. Here, we level images based on an estimate of a smooth 2-d trend determined with a robust implementation of a local regression method. In this robust approach, features and outliers which are not due to the trend are automatically downweighted. We denoise images with the Adaptive Weights Smoothing method. This method smooths out additive noise while preserving edge-like features in images. We demonstrate the feasibility of our methods on topography images and microwave |S11| images. For one challenging test case, we demonstrate that our method outperforms alternative methods from the scanning probe microscopy data analysis software package Gwyddion. Our methods should be useful for massive image data sets where manual selection of landmarks or image subsets by a user is impractical. Published by Elsevier B.V.

  3. Recent advances in parametric neuroreceptor mapping with dynamic PET: basic concepts and graphical analyses.

    PubMed

    Seo, Seongho; Kim, Su Jin; Lee, Dong Soo; Lee, Jae Sung

    2014-10-01

    Tracer kinetic modeling in dynamic positron emission tomography (PET) has been widely used to investigate the characteristic distribution patterns or dysfunctions of neuroreceptors in brain diseases. Its practical goal has progressed from regional data quantification to parametric mapping that produces images of kinetic-model parameters by fully exploiting the spatiotemporal information in dynamic PET data. Graphical analysis (GA) is a major parametric mapping technique that is independent on any compartmental model configuration, robust to noise, and computationally efficient. In this paper, we provide an overview of recent advances in the parametric mapping of neuroreceptor binding based on GA methods. The associated basic concepts in tracer kinetic modeling are presented, including commonly-used compartment models and major parameters of interest. Technical details of GA approaches for reversible and irreversible radioligands are described, considering both plasma input and reference tissue input models. Their statistical properties are discussed in view of parametric imaging.

  4. Analytical modeling for heat transfer in sheared flows of nanofluids.

    PubMed

    Ferrari, Claudio; Kaoui, Badr; L'vov, Victor S; Procaccia, Itamar; Rudenko, Oleksii; ten Thije Boonkkamp, J H M; Toschi, Federico

    2012-07-01

    We developed a model for the enhancement of the heat flux by spherical and elongated nanoparticles in sheared laminar flows of nanofluids. Besides the heat flux carried by the nanoparticles, the model accounts for the contribution of their rotation to the heat flux inside and outside the particles. The rotation of the nanoparticles has a twofold effect: it induces a fluid advection around the particle and it strongly influences the statistical distribution of particle orientations. These dynamical effects, which were not included in existing thermal models, are responsible for changing the thermal properties of flowing fluids as compared to quiescent fluids. The proposed model is strongly supported by extensive numerical simulations, demonstrating a potential increase of the heat flux far beyond the Maxwell-Garnett limit for the spherical nanoparticles. The road ahead, which should lead toward robust predictive models of heat flux enhancement, is discussed.

  5. Uncertainty quantification of fast sodium current steady-state inactivation for multi-scale models of cardiac electrophysiology.

    PubMed

    Pathmanathan, Pras; Shotwell, Matthew S; Gavaghan, David J; Cordeiro, Jonathan M; Gray, Richard A

    2015-01-01

    Perhaps the most mature area of multi-scale systems biology is the modelling of the heart. Current models are grounded in over fifty years of research in the development of biophysically detailed models of the electrophysiology (EP) of cardiac cells, but one aspect which is inadequately addressed is the incorporation of uncertainty and physiological variability. Uncertainty quantification (UQ) is the identification and characterisation of the uncertainty in model parameters derived from experimental data, and the computation of the resultant uncertainty in model outputs. It is a necessary tool for establishing the credibility of computational models, and will likely be expected of EP models for future safety-critical clinical applications. The focus of this paper is formal UQ of one major sub-component of cardiac EP models, the steady-state inactivation of the fast sodium current, INa. To better capture average behaviour and quantify variability across cells, we have applied for the first time an 'individual-based' statistical methodology to assess voltage clamp data. Advantages of this approach over a more traditional 'population-averaged' approach are highlighted. The method was used to characterise variability amongst cells isolated from canine epi and endocardium, and this variability was then 'propagated forward' through a canine model to determine the resultant uncertainty in model predictions at different scales, such as of upstroke velocity and spiral wave dynamics. Statistically significant differences between epi and endocardial cells (greater half-inactivation and less steep slope of steady state inactivation curve for endo) was observed, and the forward propagation revealed a lack of robustness of the model to underlying variability, but also surprising robustness to variability at the tissue scale. Overall, the methodology can be used to: (i) better analyse voltage clamp data; (ii) characterise underlying population variability; (iii) investigate consequences of variability; and (iv) improve the ability to validate a model. To our knowledge this article is the first to quantify population variability in membrane dynamics in this manner, and the first to perform formal UQ for a component of a cardiac model. The approach is likely to find much wider applicability across systems biology as current application domains reach greater levels of maturity. Published by Elsevier Ltd.

  6. Systematic review of prediction models for delirium in the older adult inpatient.

    PubMed

    Lindroth, Heidi; Bratzke, Lisa; Purvis, Suzanne; Brown, Roger; Coburn, Mark; Mrkobrada, Marko; Chan, Matthew T V; Davis, Daniel H J; Pandharipande, Pratik; Carlsson, Cynthia M; Sanders, Robert D

    2018-04-28

    To identify existing prognostic delirium prediction models and evaluate their validity and statistical methodology in the older adult (≥60 years) acute hospital population. Systematic review. PubMed, CINAHL, PsychINFO, SocINFO, Cochrane, Web of Science and Embase were searched from 1 January 1990 to 31 December 2016. The Preferred Reporting Items for Systematic Reviews and Meta-Analyses and CHARMS Statement guided protocol development. age >60 years, inpatient, developed/validated a prognostic delirium prediction model. alcohol-related delirium, sample size ≤50. The primary performance measures were calibration and discrimination statistics. Two authors independently conducted search and extracted data. The synthesis of data was done by the first author. Disagreement was resolved by the mentoring author. The initial search resulted in 7,502 studies. Following full-text review of 192 studies, 33 were excluded based on age criteria (<60 years) and 27 met the defined criteria. Twenty-three delirium prediction models were identified, 14 were externally validated and 3 were internally validated. The following populations were represented: 11 medical, 3 medical/surgical and 13 surgical. The assessment of delirium was often non-systematic, resulting in varied incidence. Fourteen models were externally validated with an area under the receiver operating curve range from 0.52 to 0.94. Limitations in design, data collection methods and model metric reporting statistics were identified. Delirium prediction models for older adults show variable and typically inadequate predictive capabilities. Our review highlights the need for development of robust models to predict delirium in older inpatients. We provide recommendations for the development of such models. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  7. Robust Magnetotelluric Impedance Estimation

    NASA Astrophysics Data System (ADS)

    Sutarno, D.

    2010-12-01

    Robust magnetotelluric (MT) response function estimators are now in standard use by the induction community. Properly devised and applied, these have ability to reduce the influence of unusual data (outliers). The estimators always yield impedance estimates which are better than the conventional least square (LS) estimation because the `real' MT data almost never satisfy the statistical assumptions of Gaussian distribution and stationary upon which normal spectral analysis is based. This paper discuses the development and application of robust estimation procedures which can be classified as M-estimators to MT data. Starting with the description of the estimators, special attention is addressed to the recent development of a bounded-influence robust estimation, including utilization of the Hilbert Transform (HT) operation on causal MT impedance functions. The resulting robust performances are illustrated using synthetic as well as real MT data.

  8. A comparison of hydrologic models for ecological flows and water availability

    USGS Publications Warehouse

    Caldwell, Peter V; Kennen, Jonathan G.; Sun, Ge; Kiang, Julie E.; Butcher, John B; Eddy, Michelle C; Hay, Lauren E.; LaFontaine, Jacob H.; Hain, Ernie F.; Nelson, Stacy C; McNulty, Steve G

    2015-01-01

    Robust hydrologic models are needed to help manage water resources for healthy aquatic ecosystems and reliable water supplies for people, but there is a lack of comprehensive model comparison studies that quantify differences in streamflow predictions among model applications developed to answer management questions. We assessed differences in daily streamflow predictions by four fine-scale models and two regional-scale monthly time step models by comparing model fit statistics and bias in ecologically relevant flow statistics (ERFSs) at five sites in the Southeastern USA. Models were calibrated to different extents, including uncalibrated (level A), calibrated to a downstream site (level B), calibrated specifically for the site (level C) and calibrated for the site with adjusted precipitation and temperature inputs (level D). All models generally captured the magnitude and variability of observed streamflows at the five study sites, and increasing level of model calibration generally improved performance. All models had at least 1 of 14 ERFSs falling outside a +/−30% range of hydrologic uncertainty at every site, and ERFSs related to low flows were frequently over-predicted. Our results do not indicate that any specific hydrologic model is superior to the others evaluated at all sites and for all measures of model performance. Instead, we provide evidence that (1) model performance is as likely to be related to calibration strategy as it is to model structure and (2) simple, regional-scale models have comparable performance to the more complex, fine-scale models at a monthly time step.

  9. Dark Energy Survey Year 1 Results: Multi-Probe Methodology and Simulated Likelihood Analyses

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Krause, E.; et al.

    We present the methodology for and detail the implementation of the Dark Energy Survey (DES) 3x2pt DES Year 1 (Y1) analysis, which combines configuration-space two-point statistics from three different cosmological probes: cosmic shear, galaxy-galaxy lensing, and galaxy clustering, using data from the first year of DES observations. We have developed two independent modeling pipelines and describe the code validation process. We derive expressions for analytical real-space multi-probe covariances, and describe their validation with numerical simulations. We stress-test the inference pipelines in simulated likelihood analyses that vary 6-7 cosmology parameters plus 20 nuisance parameters and precisely resemble the analysis to be presented in the DES 3x2pt analysis paper, using a variety of simulated input data vectors with varying assumptions. We find that any disagreement between pipelines leads to changes in assigned likelihoodmore » $$\\Delta \\chi^2 \\le 0.045$$ with respect to the statistical error of the DES Y1 data vector. We also find that angular binning and survey mask do not impact our analytic covariance at a significant level. We determine lower bounds on scales used for analysis of galaxy clustering (8 Mpc$$~h^{-1}$$) and galaxy-galaxy lensing (12 Mpc$$~h^{-1}$$) such that the impact of modeling uncertainties in the non-linear regime is well below statistical errors, and show that our analysis choices are robust against a variety of systematics. These tests demonstrate that we have a robust analysis pipeline that yields unbiased cosmological parameter inferences for the flagship 3x2pt DES Y1 analysis. We emphasize that the level of independent code development and subsequent code comparison as demonstrated in this paper is necessary to produce credible constraints from increasingly complex multi-probe analyses of current data.« less

  10. Fully automatic segmentation of the femur from 3D-CT images using primitive shape recognition and statistical shape models.

    PubMed

    Ben Younes, Lassad; Nakajima, Yoshikazu; Saito, Toki

    2014-03-01

    Femur segmentation is well established and widely used in computer-assisted orthopedic surgery. However, most of the robust segmentation methods such as statistical shape models (SSM) require human intervention to provide an initial position for the SSM. In this paper, we propose to overcome this problem and provide a fully automatic femur segmentation method for CT images based on primitive shape recognition and SSM. Femur segmentation in CT scans was performed using primitive shape recognition based on a robust algorithm such as the Hough transform and RANdom SAmple Consensus. The proposed method is divided into 3 steps: (1) detection of the femoral head as sphere and the femoral shaft as cylinder in the SSM and the CT images, (2) rigid registration between primitives of SSM and CT image to initialize the SSM into the CT image, and (3) fitting of the SSM to the CT image edge using an affine transformation followed by a nonlinear fitting. The automated method provided good results even with a high number of outliers. The difference of segmentation error between the proposed automatic initialization method and a manual initialization method is less than 1 mm. The proposed method detects primitive shape position to initialize the SSM into the target image. Based on primitive shapes, this method overcomes the problem of inter-patient variability. Moreover, the results demonstrate that our method of primitive shape recognition can be used for 3D SSM initialization to achieve fully automatic segmentation of the femur.

  11. Identifying positive deviants in healthcare quality and safety: a mixed methods study.

    PubMed

    O'Hara, Jane K; Grasic, Katja; Gutacker, Nils; Street, Andrew; Foy, Robbie; Thompson, Carl; Wright, John; Lawton, Rebecca

    2018-01-01

    Objective Solutions to quality and safety problems exist within healthcare organisations, but to maximise the learning from these positive deviants, we first need to identify them. This study explores using routinely collected, publicly available data in England to identify positively deviant services in one region of the country. Design A mixed methods study undertaken July 2014 to February 2015, employing expert discussion, consensus and statistical modelling to identify indicators of quality and safety, establish a set of criteria to inform decisions about which indicators were robust and useful measures, and whether these could be used to identify positive deviants. Setting Yorkshire and Humber, England. Participants None - analysis based on routinely collected, administrative English hospital data. Main outcome measures We identified 49 indicators of quality and safety from acute care settings across eight data sources. Twenty-six indicators did not allow comparison of quality at the sub-hospital level. Of the 23 remaining indicators, 12 met all criteria and were possible candidates for identifying positive deviants. Results Four indicators (readmission and patient reported outcomes for hip and knee surgery) offered indicators of the same service. These were selected by an expert group as the basis for statistical modelling, which supported identification of one service in Yorkshire and Humber showing a 50% positive deviation from the national average. Conclusion Relatively few indicators of quality and safety relate to a service level, making meaningful comparisons and local improvement based on the measures difficult. It was possible, however, to identify a set of indicators that provided robust measurement of the quality and safety of services providing hip and knee surgery.

  12. Improving Ecological Response Monitoring of Environmental Flows

    NASA Astrophysics Data System (ADS)

    King, Alison J.; Gawne, Ben; Beesley, Leah; Koehn, John D.; Nielsen, Daryl L.; Price, Amina

    2015-05-01

    Environmental flows are now an important restoration technique in flow-degraded rivers, and with the increasing public scrutiny of their effectiveness and value, the importance of undertaking scientifically robust monitoring is now even more critical. Many existing environmental flow monitoring programs have poorly defined objectives, nonjustified indicator choices, weak experimental designs, poor statistical strength, and often focus on outcomes from a single event. These negative attributes make them difficult to learn from. We provide practical recommendations that aim to improve the performance, scientific robustness, and defensibility of environmental flow monitoring programs. We draw on the literature and knowledge gained from working with stakeholders and managers to design, implement, and monitor a range of environmental flow types. We recommend that (1) environmental flow monitoring programs should be implemented within an adaptive management framework; (2) objectives of environmental flow programs should be well defined, attainable, and based on an agreed conceptual understanding of the system; (3) program and intervention targets should be attainable, measurable, and inform program objectives; (4) intervention monitoring programs should improve our understanding of flow-ecological responses and related conceptual models; (5) indicator selection should be based on conceptual models, objectives, and prioritization approaches; (6) appropriate monitoring designs and statistical tools should be used to measure and determine ecological response; (7) responses should be measured within timeframes that are relevant to the indicator(s); (8) watering events should be treated as replicates of a larger experiment; (9) environmental flow outcomes should be reported using a standard suite of metadata. Incorporating these attributes into future monitoring programs should ensure their outcomes are transferable and measured with high scientific credibility.

  13. Practical robustness measures in multivariable control system analysis. Ph.D. Thesis

    NASA Technical Reports Server (NTRS)

    Lehtomaki, N. A.

    1981-01-01

    The robustness of the stability of multivariable linear time invariant feedback control systems with respect to model uncertainty is considered using frequency domain criteria. Available robustness tests are unified under a common framework based on the nature and structure of model errors. These results are derived using a multivariable version of Nyquist's stability theorem in which the minimum singular value of the return difference transfer matrix is shown to be the multivariable generalization of the distance to the critical point on a single input, single output Nyquist diagram. Using the return difference transfer matrix, a very general robustness theorem is presented from which all of the robustness tests dealing with specific model errors may be derived. The robustness tests that explicitly utilized model error structure are able to guarantee feedback system stability in the face of model errors of larger magnitude than those robustness tests that do not. The robustness of linear quadratic Gaussian control systems are analyzed.

  14. A pre-operative planning for endoprosthetic human tracheal implantation: a decision support system based on robust design of experiments.

    PubMed

    Trabelsi, O; Villalobos, J L López; Ginel, A; Cortes, E Barrot; Doblaré, M

    2014-05-01

    Swallowing depends on physiological variables that have a decisive influence on the swallowing capacity and on the tracheal stress distribution. Prosthetic implantation modifies these values and the overall performance of the trachea. The objective of this work was to develop a decision support system based on experimental, numerical and statistical approaches, with clinical verification, to help the thoracic surgeon in deciding the position and appropriate dimensions of a Dumon prosthesis for a specific patient in an optimal time and with sufficient robustness. A code for mesh adaptation to any tracheal geometry was implemented and used to develop a robust experimental design, based on the Taguchi's method and the analysis of variance. This design was able to establish the main swallowing influencing factors. The equations to fit the stress and the vertical displacement distributions were obtained. The resulting fitted values were compared to those calculated directly by the finite element method (FEM). Finally, a checking and clinical validation of the statistical study were made, by studying two cases of real patients. The vertical displacements and principal stress distribution obtained for the specific tracheal model were in agreement with those calculated by FE simulations with a maximum absolute error of 1.2 mm and 0.17 MPa, respectively. It was concluded that the resulting decision support tool provides a fast, accurate and simple tool for the thoracic surgeon to predict the stress state of the trachea and the reduction in the ability to swallow after implantation. Thus, it will help them in taking decisions during pre-operative planning of tracheal interventions.

  15. The Graphical Display of Simulation Results, with Applications to the Comparison of Robust IRT Estimators of Ability.

    ERIC Educational Resources Information Center

    Thissen, David; Wainer, Howard

    Simulation studies of the performance of (potentially) robust statistical estimation produce large quantities of numbers in the form of performance indices of the various estimators under various conditions. This report presents a multivariate graphical display used to aid in the digestion of the plentiful results in a current study of Item…

  16. Robust Mokken Scale Analysis by Means of the Forward Search Algorithm for Outlier Detection

    ERIC Educational Resources Information Center

    Zijlstra, Wobbe P.; van der Ark, L. Andries; Sijtsma, Klaas

    2011-01-01

    Exploratory Mokken scale analysis (MSA) is a popular method for identifying scales from larger sets of items. As with any statistical method, in MSA the presence of outliers in the data may result in biased results and wrong conclusions. The forward search algorithm is a robust diagnostic method for outlier detection, which we adapt here to…

  17. Aggregation models on hypergraphs

    NASA Astrophysics Data System (ADS)

    Alberici, Diego; Contucci, Pierluigi; Mingione, Emanuele; Molari, Marco

    2017-01-01

    Following a newly introduced approach by Rasetti and Merelli we investigate the possibility to extract topological information about the space where interacting systems are modelled. From the statistical datum of their observable quantities, like the correlation functions, we show how to reconstruct the activities of their constitutive parts which embed the topological information. The procedure is implemented on a class of polymer models on hypergraphs with hard-core interactions. We show that the model fulfils a set of iterative relations for the partition function that generalise those introduced by Heilmann and Lieb for the monomer-dimer case. After translating those relations into structural identities for the correlation functions we use them to test the precision and the robustness of the inverse problem. Finally the possible presence of a further interaction of peer-to-peer type is considered and a criterion to discover it is identified.

  18. Robust Detection of Examinees with Aberrant Answer Changes

    ERIC Educational Resources Information Center

    Belov, Dmitry I.

    2015-01-01

    The statistical analysis of answer changes (ACs) has uncovered multiple testing irregularities on large-scale assessments and is now routinely performed at testing organizations. However, AC data has an uncertainty caused by technological or human factors. Therefore, existing statistics (e.g., number of wrong-to-right ACs) used to detect examinees…

  19. Limited data tomographic image reconstruction via dual formulation of total variation minimization

    NASA Astrophysics Data System (ADS)

    Jang, Kwang Eun; Sung, Younghun; Lee, Kangeui; Lee, Jongha; Cho, Seungryong

    2011-03-01

    The X-ray mammography is the primary imaging modality for breast cancer screening. For the dense breast, however, the mammogram is usually difficult to read due to tissue overlap problem caused by the superposition of normal tissues. The digital breast tomosynthesis (DBT) that measures several low dose projections over a limited angle range may be an alternative modality for breast imaging, since it allows the visualization of the cross-sectional information of breast. The DBT, however, may suffer from the aliasing artifact and the severe noise corruption. To overcome these problems, a total variation (TV) regularized statistical reconstruction algorithm is presented. Inspired by the dual formulation of TV minimization in denoising and deblurring problems, we derived a gradient-type algorithm based on statistical model of X-ray tomography. The objective function is comprised of a data fidelity term derived from the statistical model and a TV regularization term. The gradient of the objective function can be easily calculated using simple operations in terms of auxiliary variables. After a descending step, the data fidelity term is renewed in each iteration. Since the proposed algorithm can be implemented without sophisticated operations such as matrix inverse, it provides an efficient way to include the TV regularization in the statistical reconstruction method, which results in a fast and robust estimation for low dose projections over the limited angle range. Initial tests with an experimental DBT system confirmed our finding.

  20. Enhanced detection and visualization of anomalies in spectral imagery

    NASA Astrophysics Data System (ADS)

    Basener, William F.; Messinger, David W.

    2009-05-01

    Anomaly detection algorithms applied to hyperspectral imagery are able to reliably identify man-made objects from a natural environment based on statistical/geometric likelyhood. The process is more robust than target identification, which requires precise prior knowledge of the object of interest, but has an inherently higher false alarm rate. Standard anomaly detection algorithms measure deviation of pixel spectra from a parametric model (either statistical or linear mixing) estimating the image background. The topological anomaly detector (TAD) creates a fully non-parametric, graph theory-based, topological model of the image background and measures deviation from this background using codensity. In this paper we present a large-scale comparative test of TAD against 80+ targets in four full HYDICE images using the entire canonical target set for generation of ROC curves. TAD will be compared against several statistics-based detectors including local RX and subspace RX. Even a perfect anomaly detection algorithm would have a high practical false alarm rate in most scenes simply because the user/analyst is not interested in every anomalous object. To assist the analyst in identifying and sorting objects of interest, we investigate coloring of the anomalies with principle components projections using statistics computed from the anomalies. This gives a very useful colorization of anomalies in which objects of similar material tend to have the same color, enabling an analyst to quickly sort and identify anomalies of highest interest.

Top