Sample records for robust regression analysis

  1. The comparison of robust partial least squares regression with robust principal component regression on a real

    NASA Astrophysics Data System (ADS)

    Polat, Esra; Gunay, Suleyman

    2013-10-01

    One of the problems encountered in Multiple Linear Regression (MLR) is multicollinearity, which causes the overestimation of the regression parameters and increase of the variance of these parameters. Hence, in case of multicollinearity presents, biased estimation procedures such as classical Principal Component Regression (CPCR) and Partial Least Squares Regression (PLSR) are then performed. SIMPLS algorithm is the leading PLSR algorithm because of its speed, efficiency and results are easier to interpret. However, both of the CPCR and SIMPLS yield very unreliable results when the data set contains outlying observations. Therefore, Hubert and Vanden Branden (2003) have been presented a robust PCR (RPCR) method and a robust PLSR (RPLSR) method called RSIMPLS. In RPCR, firstly, a robust Principal Component Analysis (PCA) method for high-dimensional data on the independent variables is applied, then, the dependent variables are regressed on the scores using a robust regression method. RSIMPLS has been constructed from a robust covariance matrix for high-dimensional data and robust linear regression. The purpose of this study is to show the usage of RPCR and RSIMPLS methods on an econometric data set, hence, making a comparison of two methods on an inflation model of Turkey. The considered methods have been compared in terms of predictive ability and goodness of fit by using a robust Root Mean Squared Error of Cross-validation (R-RMSECV), a robust R2 value and Robust Component Selection (RCS) statistic.

  2. Robust Methods for Moderation Analysis with a Two-Level Regression Model.

    PubMed

    Yang, Miao; Yuan, Ke-Hai

    2016-01-01

    Moderation analysis has many applications in social sciences. Most widely used estimation methods for moderation analysis assume that errors are normally distributed and homoscedastic. When these assumptions are not met, the results from a classical moderation analysis can be misleading. For more reliable moderation analysis, this article proposes two robust methods with a two-level regression model when the predictors do not contain measurement error. One method is based on maximum likelihood with Student's t distribution and the other is based on M-estimators with Huber-type weights. An algorithm for obtaining the robust estimators is developed. Consistent estimates of standard errors of the robust estimators are provided. The robust approaches are compared against normal-distribution-based maximum likelihood (NML) with respect to power and accuracy of parameter estimates through a simulation study. Results show that the robust approaches outperform NML under various distributional conditions. Application of the robust methods is illustrated through a real data example. An R program is developed and documented to facilitate the application of the robust methods.

  3. Multilayer Perceptron for Robust Nonlinear Interval Regression Analysis Using Genetic Algorithms

    PubMed Central

    2014-01-01

    On the basis of fuzzy regression, computational models in intelligence such as neural networks have the capability to be applied to nonlinear interval regression analysis for dealing with uncertain and imprecise data. When training data are not contaminated by outliers, computational models perform well by including almost all given training data in the data interval. Nevertheless, since training data are often corrupted by outliers, robust learning algorithms employed to resist outliers for interval regression analysis have been an interesting area of research. Several approaches involving computational intelligence are effective for resisting outliers, but the required parameters for these approaches are related to whether the collected data contain outliers or not. Since it seems difficult to prespecify the degree of contamination beforehand, this paper uses multilayer perceptron to construct the robust nonlinear interval regression model using the genetic algorithm. Outliers beyond or beneath the data interval will impose slight effect on the determination of data interval. Simulation results demonstrate that the proposed method performs well for contaminated datasets. PMID:25110755

  4. Multilayer perceptron for robust nonlinear interval regression analysis using genetic algorithms.

    PubMed

    Hu, Yi-Chung

    2014-01-01

    On the basis of fuzzy regression, computational models in intelligence such as neural networks have the capability to be applied to nonlinear interval regression analysis for dealing with uncertain and imprecise data. When training data are not contaminated by outliers, computational models perform well by including almost all given training data in the data interval. Nevertheless, since training data are often corrupted by outliers, robust learning algorithms employed to resist outliers for interval regression analysis have been an interesting area of research. Several approaches involving computational intelligence are effective for resisting outliers, but the required parameters for these approaches are related to whether the collected data contain outliers or not. Since it seems difficult to prespecify the degree of contamination beforehand, this paper uses multilayer perceptron to construct the robust nonlinear interval regression model using the genetic algorithm. Outliers beyond or beneath the data interval will impose slight effect on the determination of data interval. Simulation results demonstrate that the proposed method performs well for contaminated datasets.

  5. Robust Mediation Analysis Based on Median Regression

    PubMed Central

    Yuan, Ying; MacKinnon, David P.

    2014-01-01

    Mediation analysis has many applications in psychology and the social sciences. The most prevalent methods typically assume that the error distribution is normal and homoscedastic. However, this assumption may rarely be met in practice, which can affect the validity of the mediation analysis. To address this problem, we propose robust mediation analysis based on median regression. Our approach is robust to various departures from the assumption of homoscedasticity and normality, including heavy-tailed, skewed, contaminated, and heteroscedastic distributions. Simulation studies show that under these circumstances, the proposed method is more efficient and powerful than standard mediation analysis. We further extend the proposed robust method to multilevel mediation analysis, and demonstrate through simulation studies that the new approach outperforms the standard multilevel mediation analysis. We illustrate the proposed method using data from a program designed to increase reemployment and enhance mental health of job seekers. PMID:24079925

  6. Robust analysis of trends in noisy tokamak confinement data using geodesic least squares regression

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Verdoolaege, G., E-mail: geert.verdoolaege@ugent.be; Laboratory for Plasma Physics, Royal Military Academy, B-1000 Brussels; Shabbir, A.

    Regression analysis is a very common activity in fusion science for unveiling trends and parametric dependencies, but it can be a difficult matter. We have recently developed the method of geodesic least squares (GLS) regression that is able to handle errors in all variables, is robust against data outliers and uncertainty in the regression model, and can be used with arbitrary distribution models and regression functions. We here report on first results of application of GLS to estimation of the multi-machine scaling law for the energy confinement time in tokamaks, demonstrating improved consistency of the GLS results compared to standardmore » least squares.« less

  7. Least median of squares and iteratively re-weighted least squares as robust linear regression methods for fluorimetric determination of α-lipoic acid in capsules in ideal and non-ideal cases of linearity.

    PubMed

    Korany, Mohamed A; Gazy, Azza A; Khamis, Essam F; Ragab, Marwa A A; Kamal, Miranda F

    2018-06-01

    This study outlines two robust regression approaches, namely least median of squares (LMS) and iteratively re-weighted least squares (IRLS) to investigate their application in instrument analysis of nutraceuticals (that is, fluorescence quenching of merbromin reagent upon lipoic acid addition). These robust regression methods were used to calculate calibration data from the fluorescence quenching reaction (∆F and F-ratio) under ideal or non-ideal linearity conditions. For each condition, data were treated using three regression fittings: Ordinary Least Squares (OLS), LMS and IRLS. Assessment of linearity, limits of detection (LOD) and quantitation (LOQ), accuracy and precision were carefully studied for each condition. LMS and IRLS regression line fittings showed significant improvement in correlation coefficients and all regression parameters for both methods and both conditions. In the ideal linearity condition, the intercept and slope changed insignificantly, but a dramatic change was observed for the non-ideal condition and linearity intercept. Under both linearity conditions, LOD and LOQ values after the robust regression line fitting of data were lower than those obtained before data treatment. The results obtained after statistical treatment indicated that the linearity ranges for drug determination could be expanded to lower limits of quantitation by enhancing the regression equation parameters after data treatment. Analysis results for lipoic acid in capsules, using both fluorimetric methods, treated by parametric OLS and after treatment by robust LMS and IRLS were compared for both linearity conditions. Copyright © 2018 John Wiley & Sons, Ltd.

  8. Using Robust Variance Estimation to Combine Multiple Regression Estimates with Meta-Analysis

    ERIC Educational Resources Information Center

    Williams, Ryan

    2013-01-01

    The purpose of this study was to explore the use of robust variance estimation for combining commonly specified multiple regression models and for combining sample-dependent focal slope estimates from diversely specified models. The proposed estimator obviates traditionally required information about the covariance structure of the dependent…

  9. A robust ridge regression approach in the presence of both multicollinearity and outliers in the data

    NASA Astrophysics Data System (ADS)

    Shariff, Nurul Sima Mohamad; Ferdaos, Nur Aqilah

    2017-08-01

    Multicollinearity often leads to inconsistent and unreliable parameter estimates in regression analysis. This situation will be more severe in the presence of outliers it will cause fatter tails in the error distributions than the normal distributions. The well-known procedure that is robust to multicollinearity problem is the ridge regression method. This method however is expected to be affected by the presence of outliers due to some assumptions imposed in the modeling procedure. Thus, the robust version of existing ridge method with some modification in the inverse matrix and the estimated response value is introduced. The performance of the proposed method is discussed and comparisons are made with several existing estimators namely, Ordinary Least Squares (OLS), ridge regression and robust ridge regression based on GM-estimates. The finding of this study is able to produce reliable parameter estimates in the presence of both multicollinearity and outliers in the data.

  10. A note on variance estimation in random effects meta-regression.

    PubMed

    Sidik, Kurex; Jonkman, Jeffrey N

    2005-01-01

    For random effects meta-regression inference, variance estimation for the parameter estimates is discussed. Because estimated weights are used for meta-regression analysis in practice, the assumed or estimated covariance matrix used in meta-regression is not strictly correct, due to possible errors in estimating the weights. Therefore, this note investigates the use of a robust variance estimation approach for obtaining variances of the parameter estimates in random effects meta-regression inference. This method treats the assumed covariance matrix of the effect measure variables as a working covariance matrix. Using an example of meta-analysis data from clinical trials of a vaccine, the robust variance estimation approach is illustrated in comparison with two other methods of variance estimation. A simulation study is presented, comparing the three methods of variance estimation in terms of bias and coverage probability. We find that, despite the seeming suitability of the robust estimator for random effects meta-regression, the improved variance estimator of Knapp and Hartung (2003) yields the best performance among the three estimators, and thus may provide the best protection against errors in the estimated weights.

  11. Robust mislabel logistic regression without modeling mislabel probabilities.

    PubMed

    Hung, Hung; Jou, Zhi-Yu; Huang, Su-Yun

    2018-03-01

    Logistic regression is among the most widely used statistical methods for linear discriminant analysis. In many applications, we only observe possibly mislabeled responses. Fitting a conventional logistic regression can then lead to biased estimation. One common resolution is to fit a mislabel logistic regression model, which takes into consideration of mislabeled responses. Another common method is to adopt a robust M-estimation by down-weighting suspected instances. In this work, we propose a new robust mislabel logistic regression based on γ-divergence. Our proposal possesses two advantageous features: (1) It does not need to model the mislabel probabilities. (2) The minimum γ-divergence estimation leads to a weighted estimating equation without the need to include any bias correction term, that is, it is automatically bias-corrected. These features make the proposed γ-logistic regression more robust in model fitting and more intuitive for model interpretation through a simple weighting scheme. Our method is also easy to implement, and two types of algorithms are included. Simulation studies and the Pima data application are presented to demonstrate the performance of γ-logistic regression. © 2017, The International Biometric Society.

  12. New robust statistical procedures for the polytomous logistic regression models.

    PubMed

    Castilla, Elena; Ghosh, Abhik; Martin, Nirian; Pardo, Leandro

    2018-05-17

    This article derives a new family of estimators, namely the minimum density power divergence estimators, as a robust generalization of the maximum likelihood estimator for the polytomous logistic regression model. Based on these estimators, a family of Wald-type test statistics for linear hypotheses is introduced. Robustness properties of both the proposed estimators and the test statistics are theoretically studied through the classical influence function analysis. Appropriate real life examples are presented to justify the requirement of suitable robust statistical procedures in place of the likelihood based inference for the polytomous logistic regression model. The validity of the theoretical results established in the article are further confirmed empirically through suitable simulation studies. Finally, an approach for the data-driven selection of the robustness tuning parameter is proposed with empirical justifications. © 2018, The International Biometric Society.

  13. Gradient descent for robust kernel-based regression

    NASA Astrophysics Data System (ADS)

    Guo, Zheng-Chu; Hu, Ting; Shi, Lei

    2018-06-01

    In this paper, we study the gradient descent algorithm generated by a robust loss function over a reproducing kernel Hilbert space (RKHS). The loss function is defined by a windowing function G and a scale parameter σ, which can include a wide range of commonly used robust losses for regression. There is still a gap between theoretical analysis and optimization process of empirical risk minimization based on loss: the estimator needs to be global optimal in the theoretical analysis while the optimization method can not ensure the global optimality of its solutions. In this paper, we aim to fill this gap by developing a novel theoretical analysis on the performance of estimators generated by the gradient descent algorithm. We demonstrate that with an appropriately chosen scale parameter σ, the gradient update with early stopping rules can approximate the regression function. Our elegant error analysis can lead to convergence in the standard L 2 norm and the strong RKHS norm, both of which are optimal in the mini-max sense. We show that the scale parameter σ plays an important role in providing robustness as well as fast convergence. The numerical experiments implemented on synthetic examples and real data set also support our theoretical results.

  14. Using Robust Standard Errors to Combine Multiple Regression Estimates with Meta-Analysis

    ERIC Educational Resources Information Center

    Williams, Ryan T.

    2012-01-01

    Combining multiple regression estimates with meta-analysis has continued to be a difficult task. A variety of methods have been proposed and used to combine multiple regression slope estimates with meta-analysis, however, most of these methods have serious methodological and practical limitations. The purpose of this study was to explore the use…

  15. Robust regression for large-scale neuroimaging studies.

    PubMed

    Fritsch, Virgile; Da Mota, Benoit; Loth, Eva; Varoquaux, Gaël; Banaschewski, Tobias; Barker, Gareth J; Bokde, Arun L W; Brühl, Rüdiger; Butzek, Brigitte; Conrod, Patricia; Flor, Herta; Garavan, Hugh; Lemaitre, Hervé; Mann, Karl; Nees, Frauke; Paus, Tomas; Schad, Daniel J; Schümann, Gunter; Frouin, Vincent; Poline, Jean-Baptiste; Thirion, Bertrand

    2015-05-01

    Multi-subject datasets used in neuroimaging group studies have a complex structure, as they exhibit non-stationary statistical properties across regions and display various artifacts. While studies with small sample sizes can rarely be shown to deviate from standard hypotheses (such as the normality of the residuals) due to the poor sensitivity of normality tests with low degrees of freedom, large-scale studies (e.g. >100 subjects) exhibit more obvious deviations from these hypotheses and call for more refined models for statistical inference. Here, we demonstrate the benefits of robust regression as a tool for analyzing large neuroimaging cohorts. First, we use an analytic test based on robust parameter estimates; based on simulations, this procedure is shown to provide an accurate statistical control without resorting to permutations. Second, we show that robust regression yields more detections than standard algorithms using as an example an imaging genetics study with 392 subjects. Third, we show that robust regression can avoid false positives in a large-scale analysis of brain-behavior relationships with over 1500 subjects. Finally we embed robust regression in the Randomized Parcellation Based Inference (RPBI) method and demonstrate that this combination further improves the sensitivity of tests carried out across the whole brain. Altogether, our results show that robust procedures provide important advantages in large-scale neuroimaging group studies. Copyright © 2015 Elsevier Inc. All rights reserved.

  16. Robustness of meta-analyses in finding gene × environment interactions

    PubMed Central

    Shi, Gang; Nehorai, Arye

    2017-01-01

    Meta-analyses that synthesize statistical evidence across studies have become important analytical tools for genetic studies. Inspired by the success of genome-wide association studies of the genetic main effect, researchers are searching for gene × environment interactions. Confounders are routinely included in the genome-wide gene × environment interaction analysis as covariates; however, this does not control for any confounding effects on the results if covariate × environment interactions are present. We carried out simulation studies to evaluate the robustness to the covariate × environment confounder for meta-regression and joint meta-analysis, which are two commonly used meta-analysis methods for testing the gene × environment interaction or the genetic main effect and interaction jointly. Here we show that meta-regression is robust to the covariate × environment confounder while joint meta-analysis is subject to the confounding effect with inflated type I error rates. Given vast sample sizes employed in genome-wide gene × environment interaction studies, non-significant covariate × environment interactions at the study level could substantially elevate the type I error rate at the consortium level. When covariate × environment confounders are present, type I errors can be controlled in joint meta-analysis by including the covariate × environment terms in the analysis at the study level. Alternatively, meta-regression can be applied, which is robust to potential covariate × environment confounders. PMID:28362796

  17. Robust neural network with applications to credit portfolio data analysis.

    PubMed

    Feng, Yijia; Li, Runze; Sudjianto, Agus; Zhang, Yiyun

    2010-01-01

    In this article, we study nonparametric conditional quantile estimation via neural network structure. We proposed an estimation method that combines quantile regression and neural network (robust neural network, RNN). It provides good smoothing performance in the presence of outliers and can be used to construct prediction bands. A Majorization-Minimization (MM) algorithm was developed for optimization. Monte Carlo simulation study is conducted to assess the performance of RNN. Comparison with other nonparametric regression methods (e.g., local linear regression and regression splines) in real data application demonstrate the advantage of the newly proposed procedure.

  18. Robust Variable Selection with Exponential Squared Loss.

    PubMed

    Wang, Xueqin; Jiang, Yunlu; Huang, Mian; Zhang, Heping

    2013-04-01

    Robust variable selection procedures through penalized regression have been gaining increased attention in the literature. They can be used to perform variable selection and are expected to yield robust estimates. However, to the best of our knowledge, the robustness of those penalized regression procedures has not been well characterized. In this paper, we propose a class of penalized robust regression estimators based on exponential squared loss. The motivation for this new procedure is that it enables us to characterize its robustness that has not been done for the existing procedures, while its performance is near optimal and superior to some recently developed methods. Specifically, under defined regularity conditions, our estimators are [Formula: see text] and possess the oracle property. Importantly, we show that our estimators can achieve the highest asymptotic breakdown point of 1/2 and that their influence functions are bounded with respect to the outliers in either the response or the covariate domain. We performed simulation studies to compare our proposed method with some recent methods, using the oracle method as the benchmark. We consider common sources of influential points. Our simulation studies reveal that our proposed method performs similarly to the oracle method in terms of the model error and the positive selection rate even in the presence of influential points. In contrast, other existing procedures have a much lower non-causal selection rate. Furthermore, we re-analyze the Boston Housing Price Dataset and the Plasma Beta-Carotene Level Dataset that are commonly used examples for regression diagnostics of influential points. Our analysis unravels the discrepancies of using our robust method versus the other penalized regression method, underscoring the importance of developing and applying robust penalized regression methods.

  19. Robust Variable Selection with Exponential Squared Loss

    PubMed Central

    Wang, Xueqin; Jiang, Yunlu; Huang, Mian; Zhang, Heping

    2013-01-01

    Robust variable selection procedures through penalized regression have been gaining increased attention in the literature. They can be used to perform variable selection and are expected to yield robust estimates. However, to the best of our knowledge, the robustness of those penalized regression procedures has not been well characterized. In this paper, we propose a class of penalized robust regression estimators based on exponential squared loss. The motivation for this new procedure is that it enables us to characterize its robustness that has not been done for the existing procedures, while its performance is near optimal and superior to some recently developed methods. Specifically, under defined regularity conditions, our estimators are n-consistent and possess the oracle property. Importantly, we show that our estimators can achieve the highest asymptotic breakdown point of 1/2 and that their influence functions are bounded with respect to the outliers in either the response or the covariate domain. We performed simulation studies to compare our proposed method with some recent methods, using the oracle method as the benchmark. We consider common sources of influential points. Our simulation studies reveal that our proposed method performs similarly to the oracle method in terms of the model error and the positive selection rate even in the presence of influential points. In contrast, other existing procedures have a much lower non-causal selection rate. Furthermore, we re-analyze the Boston Housing Price Dataset and the Plasma Beta-Carotene Level Dataset that are commonly used examples for regression diagnostics of influential points. Our analysis unravels the discrepancies of using our robust method versus the other penalized regression method, underscoring the importance of developing and applying robust penalized regression methods. PMID:23913996

  20. Improving power and robustness for detecting genetic association with extreme-value sampling design.

    PubMed

    Chen, Hua Yun; Li, Mingyao

    2011-12-01

    Extreme-value sampling design that samples subjects with extremely large or small quantitative trait values is commonly used in genetic association studies. Samples in such designs are often treated as "cases" and "controls" and analyzed using logistic regression. Such a case-control analysis ignores the potential dose-response relationship between the quantitative trait and the underlying trait locus and thus may lead to loss of power in detecting genetic association. An alternative approach to analyzing such data is to model the dose-response relationship by a linear regression model. However, parameter estimation from this model can be biased, which may lead to inflated type I errors. We propose a robust and efficient approach that takes into consideration of both the biased sampling design and the potential dose-response relationship. Extensive simulations demonstrate that the proposed method is more powerful than the traditional logistic regression analysis and is more robust than the linear regression analysis. We applied our method to the analysis of a candidate gene association study on high-density lipoprotein cholesterol (HDL-C) which includes study subjects with extremely high or low HDL-C levels. Using our method, we identified several SNPs showing a stronger evidence of association with HDL-C than the traditional case-control logistic regression analysis. Our results suggest that it is important to appropriately model the quantitative traits and to adjust for the biased sampling when dose-response relationship exists in extreme-value sampling designs. © 2011 Wiley Periodicals, Inc.

  1. Linear regression based on Minimum Covariance Determinant (MCD) and TELBS methods on the productivity of phytoplankton

    NASA Astrophysics Data System (ADS)

    Gusriani, N.; Firdaniza

    2018-03-01

    The existence of outliers on multiple linear regression analysis causes the Gaussian assumption to be unfulfilled. If the Least Square method is forcedly used on these data, it will produce a model that cannot represent most data. For that, we need a robust regression method against outliers. This paper will compare the Minimum Covariance Determinant (MCD) method and the TELBS method on secondary data on the productivity of phytoplankton, which contains outliers. Based on the robust determinant coefficient value, MCD method produces a better model compared to TELBS method.

  2. Logistic regression applied to natural hazards: rare event logistic regression with replications

    NASA Astrophysics Data System (ADS)

    Guns, M.; Vanacker, V.

    2012-06-01

    Statistical analysis of natural hazards needs particular attention, as most of these phenomena are rare events. This study shows that the ordinary rare event logistic regression, as it is now commonly used in geomorphologic studies, does not always lead to a robust detection of controlling factors, as the results can be strongly sample-dependent. In this paper, we introduce some concepts of Monte Carlo simulations in rare event logistic regression. This technique, so-called rare event logistic regression with replications, combines the strength of probabilistic and statistical methods, and allows overcoming some of the limitations of previous developments through robust variable selection. This technique was here developed for the analyses of landslide controlling factors, but the concept is widely applicable for statistical analyses of natural hazards.

  3. Passing the Test: Ecological Regression Analysis in the Los Angeles County Case and Beyond.

    ERIC Educational Resources Information Center

    Lichtman, Allan J.

    1991-01-01

    Statistical analysis of racially polarized voting prepared for the Garza v County of Los Angeles (California) (1990) voting rights case is reviewed to demonstrate that ecological regression is a flexible, robust technique that illuminates the reality of ethnic voting, and superior to the neighborhood model supported by the defendants. (SLD)

  4. Robust variance estimation with dependent effect sizes: practical considerations including a software tutorial in Stata and spss.

    PubMed

    Tanner-Smith, Emily E; Tipton, Elizabeth

    2014-03-01

    Methodologists have recently proposed robust variance estimation as one way to handle dependent effect sizes in meta-analysis. Software macros for robust variance estimation in meta-analysis are currently available for Stata (StataCorp LP, College Station, TX, USA) and spss (IBM, Armonk, NY, USA), yet there is little guidance for authors regarding the practical application and implementation of those macros. This paper provides a brief tutorial on the implementation of the Stata and spss macros and discusses practical issues meta-analysts should consider when estimating meta-regression models with robust variance estimates. Two example databases are used in the tutorial to illustrate the use of meta-analysis with robust variance estimates. Copyright © 2013 John Wiley & Sons, Ltd.

  5. A subagging regression method for estimating the qualitative and quantitative state of groundwater

    NASA Astrophysics Data System (ADS)

    Jeong, Jina; Park, Eungyu; Han, Weon Shik; Kim, Kue-Young

    2017-08-01

    A subsample aggregating (subagging) regression (SBR) method for the analysis of groundwater data pertaining to trend-estimation-associated uncertainty is proposed. The SBR method is validated against synthetic data competitively with other conventional robust and non-robust methods. From the results, it is verified that the estimation accuracies of the SBR method are consistent and superior to those of other methods, and the uncertainties are reasonably estimated; the others have no uncertainty analysis option. To validate further, actual groundwater data are employed and analyzed comparatively with Gaussian process regression (GPR). For all cases, the trend and the associated uncertainties are reasonably estimated by both SBR and GPR regardless of Gaussian or non-Gaussian skewed data. However, it is expected that GPR has a limitation in applications to severely corrupted data by outliers owing to its non-robustness. From the implementations, it is determined that the SBR method has the potential to be further developed as an effective tool of anomaly detection or outlier identification in groundwater state data such as the groundwater level and contaminant concentration.

  6. A robust background regression based score estimation algorithm for hyperspectral anomaly detection

    NASA Astrophysics Data System (ADS)

    Zhao, Rui; Du, Bo; Zhang, Liangpei; Zhang, Lefei

    2016-12-01

    Anomaly detection has become a hot topic in the hyperspectral image analysis and processing fields in recent years. The most important issue for hyperspectral anomaly detection is the background estimation and suppression. Unreasonable or non-robust background estimation usually leads to unsatisfactory anomaly detection results. Furthermore, the inherent nonlinearity of hyperspectral images may cover up the intrinsic data structure in the anomaly detection. In order to implement robust background estimation, as well as to explore the intrinsic data structure of the hyperspectral image, we propose a robust background regression based score estimation algorithm (RBRSE) for hyperspectral anomaly detection. The Robust Background Regression (RBR) is actually a label assignment procedure which segments the hyperspectral data into a robust background dataset and a potential anomaly dataset with an intersection boundary. In the RBR, a kernel expansion technique, which explores the nonlinear structure of the hyperspectral data in a reproducing kernel Hilbert space, is utilized to formulate the data as a density feature representation. A minimum squared loss relationship is constructed between the data density feature and the corresponding assigned labels of the hyperspectral data, to formulate the foundation of the regression. Furthermore, a manifold regularization term which explores the manifold smoothness of the hyperspectral data, and a maximization term of the robust background average density, which suppresses the bias caused by the potential anomalies, are jointly appended in the RBR procedure. After this, a paired-dataset based k-nn score estimation method is undertaken on the robust background and potential anomaly datasets, to implement the detection output. The experimental results show that RBRSE achieves superior ROC curves, AUC values, and background-anomaly separation than some of the other state-of-the-art anomaly detection methods, and is easy to implement in practice.

  7. Glomerular structural-functional relationship models of diabetic nephropathy are robust in type 1 diabetic patients.

    PubMed

    Mauer, Michael; Caramori, Maria Luiza; Fioretto, Paola; Najafian, Behzad

    2015-06-01

    Studies of structural-functional relationships have improved understanding of the natural history of diabetic nephropathy (DN). However, in order to consider structural end points for clinical trials, the robustness of the resultant models needs to be verified. This study examined whether structural-functional relationship models derived from a large cohort of type 1 diabetic (T1D) patients with a wide range of renal function are robust. The predictability of models derived from multiple regression analysis and piecewise linear regression analysis was also compared. T1D patients (n = 161) with research renal biopsies were divided into two equal groups matched for albumin excretion rate (AER). Models to explain AER and glomerular filtration rate (GFR) by classical DN lesions in one group (T1D-model, or T1D-M) were applied to the other group (T1D-test, or T1D-T) and regression analyses were performed. T1D-M-derived models explained 70 and 63% of AER variance and 32 and 21% of GFR variance in T1D-M and T1D-T, respectively, supporting the substantial robustness of the models. Piecewise linear regression analyses substantially improved predictability of the models with 83% of AER variance and 66% of GFR variance explained by classical DN glomerular lesions alone. These studies demonstrate that DN structural-functional relationship models are robust, and if appropriate models are used, glomerular lesions alone explain a major proportion of AER and GFR variance in T1D patients. © The Author 2014. Published by Oxford University Press on behalf of ERA-EDTA. All rights reserved.

  8. An application of robust ridge regression model in the presence of outliers to real data problem

    NASA Astrophysics Data System (ADS)

    Shariff, N. S. Md.; Ferdaos, N. A.

    2017-09-01

    Multicollinearity and outliers are often leads to inconsistent and unreliable parameter estimates in regression analysis. The well-known procedure that is robust to multicollinearity problem is the ridge regression method. This method however is believed are affected by the presence of outlier. The combination of GM-estimation and ridge parameter that is robust towards both problems is on interest in this study. As such, both techniques are employed to investigate the relationship between stock market price and macroeconomic variables in Malaysia due to curiosity of involving the multicollinearity and outlier problem in the data set. There are four macroeconomic factors selected for this study which are Consumer Price Index (CPI), Gross Domestic Product (GDP), Base Lending Rate (BLR) and Money Supply (M1). The results demonstrate that the proposed procedure is able to produce reliable results towards the presence of multicollinearity and outliers in the real data.

  9. A diagnostic analysis of the VVP single-doppler retrieval technique

    NASA Technical Reports Server (NTRS)

    Boccippio, Dennis J.

    1995-01-01

    A diagnostic analysis of the VVP (volume velocity processing) retrieval method is presented, with emphasis on understanding the technique as a linear, multivariate regression. Similarities and differences to the velocity-azimuth display and extended velocity-azimuth display retrieval techniques are discussed, using this framework. Conventional regression diagnostics are then employed to quantitatively determine situations in which the VVP technique is likely to fail. An algorithm for preparation and analysis of a robust VVP retrieval is developed and applied to synthetic and actual datasets with high temporal and spatial resolution. A fundamental (but quantifiable) limitation to some forms of VVP analysis is inadequate sampling dispersion in the n space of the multivariate regression, manifest as a collinearity between the basis functions of some fitted parameters. Such collinearity may be present either in the definition of these basis functions or in their realization in a given sampling configuration. This nonorthogonality may cause numerical instability, variance inflation (decrease in robustness), and increased sensitivity to bias from neglected wind components. It is shown that these effects prevent the application of VVP to small azimuthal sectors of data. The behavior of the VVP regression is further diagnosed over a wide range of sampling constraints, and reasonable sector limits are established.

  10. Graphical Evaluation of the Ridge-Type Robust Regression Estimators in Mixture Experiments

    PubMed Central

    Erkoc, Ali; Emiroglu, Esra

    2014-01-01

    In mixture experiments, estimation of the parameters is generally based on ordinary least squares (OLS). However, in the presence of multicollinearity and outliers, OLS can result in very poor estimates. In this case, effects due to the combined outlier-multicollinearity problem can be reduced to certain extent by using alternative approaches. One of these approaches is to use biased-robust regression techniques for the estimation of parameters. In this paper, we evaluate various ridge-type robust estimators in the cases where there are multicollinearity and outliers during the analysis of mixture experiments. Also, for selection of biasing parameter, we use fraction of design space plots for evaluating the effect of the ridge-type robust estimators with respect to the scaled mean squared error of prediction. The suggested graphical approach is illustrated on Hald cement data set. PMID:25202738

  11. Graphical evaluation of the ridge-type robust regression estimators in mixture experiments.

    PubMed

    Erkoc, Ali; Emiroglu, Esra; Akay, Kadri Ulas

    2014-01-01

    In mixture experiments, estimation of the parameters is generally based on ordinary least squares (OLS). However, in the presence of multicollinearity and outliers, OLS can result in very poor estimates. In this case, effects due to the combined outlier-multicollinearity problem can be reduced to certain extent by using alternative approaches. One of these approaches is to use biased-robust regression techniques for the estimation of parameters. In this paper, we evaluate various ridge-type robust estimators in the cases where there are multicollinearity and outliers during the analysis of mixture experiments. Also, for selection of biasing parameter, we use fraction of design space plots for evaluating the effect of the ridge-type robust estimators with respect to the scaled mean squared error of prediction. The suggested graphical approach is illustrated on Hald cement data set.

  12. Robust logistic regression to narrow down the winner's curse for rare and recessive susceptibility variants.

    PubMed

    Kesselmeier, Miriam; Lorenzo Bermejo, Justo

    2017-11-01

    Logistic regression is the most common technique used for genetic case-control association studies. A disadvantage of standard maximum likelihood estimators of the genotype relative risk (GRR) is their strong dependence on outlier subjects, for example, patients diagnosed at unusually young age. Robust methods are available to constrain outlier influence, but they are scarcely used in genetic studies. This article provides a non-intimidating introduction to robust logistic regression, and investigates its benefits and limitations in genetic association studies. We applied the bounded Huber and extended the R package 'robustbase' with the re-descending Hampel functions to down-weight outlier influence. Computer simulations were carried out to assess the type I error rate, mean squared error (MSE) and statistical power according to major characteristics of the genetic study and investigated markers. Simulations were complemented with the analysis of real data. Both standard and robust estimation controlled type I error rates. Standard logistic regression showed the highest power but standard GRR estimates also showed the largest bias and MSE, in particular for associated rare and recessive variants. For illustration, a recessive variant with a true GRR=6.32 and a minor allele frequency=0.05 investigated in a 1000 case/1000 control study by standard logistic regression resulted in power=0.60 and MSE=16.5. The corresponding figures for Huber-based estimation were power=0.51 and MSE=0.53. Overall, Hampel- and Huber-based GRR estimates did not differ much. Robust logistic regression may represent a valuable alternative to standard maximum likelihood estimation when the focus lies on risk prediction rather than identification of susceptibility variants. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  13. A subagging regression method for estimating the qualitative and quantitative state of groundwater

    NASA Astrophysics Data System (ADS)

    Jeong, J.; Park, E.; Choi, J.; Han, W. S.; Yun, S. T.

    2016-12-01

    A subagging regression (SBR) method for the analysis of groundwater data pertaining to the estimation of trend and the associated uncertainty is proposed. The SBR method is validated against synthetic data competitively with other conventional robust and non-robust methods. From the results, it is verified that the estimation accuracies of the SBR method are consistent and superior to those of the other methods and the uncertainties are reasonably estimated where the others have no uncertainty analysis option. To validate further, real quantitative and qualitative data are employed and analyzed comparatively with Gaussian process regression (GPR). For all cases, the trend and the associated uncertainties are reasonably estimated by SBR, whereas the GPR has limitations in representing the variability of non-Gaussian skewed data. From the implementations, it is determined that the SBR method has potential to be further developed as an effective tool of anomaly detection or outlier identification in groundwater state data.

  14. Development of Optimal Stressor Scenarios for New Operational Energy Systems

    DTIC Science & Technology

    2017-12-01

    Analyzing the previous model using a design of experiments (DOE) and regression analysis provides critical information about the associated operational...from experimentation. The resulting system requirements can be used to revisit the design requirements and develop a more robust system. This process...stressor scenarios for acceptance testing. Analyzing the previous model using a design of experiments (DOE) and regression analysis provides critical

  15. [Regression on order statistics and its application in estimating nondetects for food exposure assessment].

    PubMed

    Yu, Xiaojin; Liu, Pei; Min, Jie; Chen, Qiguang

    2009-01-01

    To explore the application of regression on order statistics (ROS) in estimating nondetects for food exposure assessment. Regression on order statistics was adopted in analysis of cadmium residual data set from global food contaminant monitoring, the mean residual was estimated basing SAS programming and compared with the results from substitution methods. The results show that ROS method performs better obviously than substitution methods for being robust and convenient for posterior analysis. Regression on order statistics is worth to adopt,but more efforts should be make for details of application of this method.

  16. An Analytical Investigation of the Robustness and Power of ANCOVA with the Presence of Heterogeneous Regression Slopes.

    ERIC Educational Resources Information Center

    Hollingsworth, Holly H.

    This study shows that the test statistic for Analysis of Covariance (ANCOVA) has a noncentral F-districution with noncentrality parameter equal to zero if and only if the regression planes are homogeneous and/or the vector of overall covariate means is the null vector. The effect of heterogeneous regression slope parameters is to either increase…

  17. Robust inference under the beta regression model with application to health care studies.

    PubMed

    Ghosh, Abhik

    2017-01-01

    Data on rates, percentages, or proportions arise frequently in many different applied disciplines like medical biology, health care, psychology, and several others. In this paper, we develop a robust inference procedure for the beta regression model, which is used to describe such response variables taking values in (0, 1) through some related explanatory variables. In relation to the beta regression model, the issue of robustness has been largely ignored in the literature so far. The existing maximum likelihood-based inference has serious lack of robustness against outliers in data and generate drastically different (erroneous) inference in the presence of data contamination. Here, we develop the robust minimum density power divergence estimator and a class of robust Wald-type tests for the beta regression model along with several applications. We derive their asymptotic properties and describe their robustness theoretically through the influence function analyses. Finite sample performances of the proposed estimators and tests are examined through suitable simulation studies and real data applications in the context of health care and psychology. Although we primarily focus on the beta regression models with a fixed dispersion parameter, some indications are also provided for extension to the variable dispersion beta regression models with an application.

  18. The comparison between several robust ridge regression estimators in the presence of multicollinearity and multiple outliers

    NASA Astrophysics Data System (ADS)

    Zahari, Siti Meriam; Ramli, Norazan Mohamed; Moktar, Balkiah; Zainol, Mohammad Said

    2014-09-01

    In the presence of multicollinearity and multiple outliers, statistical inference of linear regression model using ordinary least squares (OLS) estimators would be severely affected and produces misleading results. To overcome this, many approaches have been investigated. These include robust methods which were reported to be less sensitive to the presence of outliers. In addition, ridge regression technique was employed to tackle multicollinearity problem. In order to mitigate both problems, a combination of ridge regression and robust methods was discussed in this study. The superiority of this approach was examined when simultaneous presence of multicollinearity and multiple outliers occurred in multiple linear regression. This study aimed to look at the performance of several well-known robust estimators; M, MM, RIDGE and robust ridge regression estimators, namely Weighted Ridge M-estimator (WRM), Weighted Ridge MM (WRMM), Ridge MM (RMM), in such a situation. Results of the study showed that in the presence of simultaneous multicollinearity and multiple outliers (in both x and y-direction), the RMM and RIDGE are more or less similar in terms of superiority over the other estimators, regardless of the number of observation, level of collinearity and percentage of outliers used. However, when outliers occurred in only single direction (y-direction), the WRMM estimator is the most superior among the robust ridge regression estimators, by producing the least variance. In conclusion, the robust ridge regression is the best alternative as compared to robust and conventional least squares estimators when dealing with simultaneous presence of multicollinearity and outliers.

  19. Detection of outliers in the response and explanatory variables of the simple circular regression model

    NASA Astrophysics Data System (ADS)

    Mahmood, Ehab A.; Rana, Sohel; Hussin, Abdul Ghapor; Midi, Habshah

    2016-06-01

    The circular regression model may contain one or more data points which appear to be peculiar or inconsistent with the main part of the model. This may be occur due to recording errors, sudden short events, sampling under abnormal conditions etc. The existence of these data points "outliers" in the data set cause lot of problems in the research results and the conclusions. Therefore, we should identify them before applying statistical analysis. In this article, we aim to propose a statistic to identify outliers in the both of the response and explanatory variables of the simple circular regression model. Our proposed statistic is robust circular distance RCDxy and it is justified by the three robust measurements such as proportion of detection outliers, masking and swamping rates.

  20. Goodness-of-fit tests and model diagnostics for negative binomial regression of RNA sequencing data.

    PubMed

    Mi, Gu; Di, Yanming; Schafer, Daniel W

    2015-01-01

    This work is about assessing model adequacy for negative binomial (NB) regression, particularly (1) assessing the adequacy of the NB assumption, and (2) assessing the appropriateness of models for NB dispersion parameters. Tools for the first are appropriate for NB regression generally; those for the second are primarily intended for RNA sequencing (RNA-Seq) data analysis. The typically small number of biological samples and large number of genes in RNA-Seq analysis motivate us to address the trade-offs between robustness and statistical power using NB regression models. One widely-used power-saving strategy, for example, is to assume some commonalities of NB dispersion parameters across genes via simple models relating them to mean expression rates, and many such models have been proposed. As RNA-Seq analysis is becoming ever more popular, it is appropriate to make more thorough investigations into power and robustness of the resulting methods, and into practical tools for model assessment. In this article, we propose simulation-based statistical tests and diagnostic graphics to address model adequacy. We provide simulated and real data examples to illustrate that our proposed methods are effective for detecting the misspecification of the NB mean-variance relationship as well as judging the adequacy of fit of several NB dispersion models.

  1. A Robust Shape Reconstruction Method for Facial Feature Point Detection.

    PubMed

    Tan, Shuqiu; Chen, Dongyi; Guo, Chenggang; Huang, Zhiqi

    2017-01-01

    Facial feature point detection has been receiving great research advances in recent years. Numerous methods have been developed and applied in practical face analysis systems. However, it is still a quite challenging task because of the large variability in expression and gestures and the existence of occlusions in real-world photo shoot. In this paper, we present a robust sparse reconstruction method for the face alignment problems. Instead of a direct regression between the feature space and the shape space, the concept of shape increment reconstruction is introduced. Moreover, a set of coupled overcomplete dictionaries termed the shape increment dictionary and the local appearance dictionary are learned in a regressive manner to select robust features and fit shape increments. Additionally, to make the learned model more generalized, we select the best matched parameter set through extensive validation tests. Experimental results on three public datasets demonstrate that the proposed method achieves a better robustness over the state-of-the-art methods.

  2. Least Principal Components Analysis (LPCA): An Alternative to Regression Analysis.

    ERIC Educational Resources Information Center

    Olson, Jeffery E.

    Often, all of the variables in a model are latent, random, or subject to measurement error, or there is not an obvious dependent variable. When any of these conditions exist, an appropriate method for estimating the linear relationships among the variables is Least Principal Components Analysis. Least Principal Components are robust, consistent,…

  3. Robust Head-Pose Estimation Based on Partially-Latent Mixture of Linear Regressions.

    PubMed

    Drouard, Vincent; Horaud, Radu; Deleforge, Antoine; Ba, Sileye; Evangelidis, Georgios

    2017-03-01

    Head-pose estimation has many applications, such as social event analysis, human-robot and human-computer interaction, driving assistance, and so forth. Head-pose estimation is challenging, because it must cope with changing illumination conditions, variabilities in face orientation and in appearance, partial occlusions of facial landmarks, as well as bounding-box-to-face alignment errors. We propose to use a mixture of linear regressions with partially-latent output. This regression method learns to map high-dimensional feature vectors (extracted from bounding boxes of faces) onto the joint space of head-pose angles and bounding-box shifts, such that they are robustly predicted in the presence of unobservable phenomena. We describe in detail the mapping method that combines the merits of unsupervised manifold learning techniques and of mixtures of regressions. We validate our method with three publicly available data sets and we thoroughly benchmark four variants of the proposed algorithm with several state-of-the-art head-pose estimation methods.

  4. Robust support vector regression networks for function approximation with outliers.

    PubMed

    Chuang, Chen-Chia; Su, Shun-Feng; Jeng, Jin-Tsong; Hsiao, Chih-Ching

    2002-01-01

    Support vector regression (SVR) employs the support vector machine (SVM) to tackle problems of function approximation and regression estimation. SVR has been shown to have good robust properties against noise. When the parameters used in SVR are improperly selected, overfitting phenomena may still occur. However, the selection of various parameters is not straightforward. Besides, in SVR, outliers may also possibly be taken as support vectors. Such an inclusion of outliers in support vectors may lead to seriously overfitting phenomena. In this paper, a novel regression approach, termed as the robust support vector regression (RSVR) network, is proposed to enhance the robust capability of SVR. In the approach, traditional robust learning approaches are employed to improve the learning performance for any selected parameters. From the simulation results, our RSVR can always improve the performance of the learned systems for all cases. Besides, it can be found that even the training lasted for a long period, the testing errors would not go up. In other words, the overfitting phenomenon is indeed suppressed.

  5. Quantile regression in the presence of monotone missingness with sensitivity analysis

    PubMed Central

    Liu, Minzhao; Daniels, Michael J.; Perri, Michael G.

    2016-01-01

    In this paper, we develop methods for longitudinal quantile regression when there is monotone missingness. In particular, we propose pattern mixture models with a constraint that provides a straightforward interpretation of the marginal quantile regression parameters. Our approach allows sensitivity analysis which is an essential component in inference for incomplete data. To facilitate computation of the likelihood, we propose a novel way to obtain analytic forms for the required integrals. We conduct simulations to examine the robustness of our approach to modeling assumptions and compare its performance to competing approaches. The model is applied to data from a recent clinical trial on weight management. PMID:26041008

  6. London Measure of Unplanned Pregnancy: guidance for its use as an outcome measure

    PubMed Central

    Hall, Jennifer A; Barrett, Geraldine; Copas, Andrew; Stephenson, Judith

    2017-01-01

    Background The London Measure of Unplanned Pregnancy (LMUP) is a psychometrically validated measure of the degree of intention of a current or recent pregnancy. The LMUP is increasingly being used worldwide, and can be used to evaluate family planning or preconception care programs. However, beyond recommending the use of the full LMUP scale, there is no published guidance on how to use the LMUP as an outcome measure. Ordinal logistic regression has been recommended informally, but studies published to date have all used binary logistic regression and dichotomized the scale at different cut points. There is thus a need for evidence-based guidance to provide a standardized methodology for multivariate analysis and to enable comparison of results. This paper makes recommendations for the regression method for analysis of the LMUP as an outcome measure. Materials and methods Data collected from 4,244 pregnant women in Malawi were used to compare five regression methods: linear, logistic with two cut points, and ordinal logistic with either the full or grouped LMUP score. The recommendations were then tested on the original UK LMUP data. Results There were small but no important differences in the findings across the regression models. Logistic regression resulted in the largest loss of information, and assumptions were violated for the linear and ordinal logistic regression. Consequently, robust standard errors were used for linear regression and a partial proportional odds ordinal logistic regression model attempted. The latter could only be fitted for grouped LMUP score. Conclusion We recommend the linear regression model with robust standard errors to make full use of the LMUP score when analyzed as an outcome measure. Ordinal logistic regression could be considered, but a partial proportional odds model with grouped LMUP score may be required. Logistic regression is the least-favored option, due to the loss of information. For logistic regression, the cut point for un/planned pregnancy should be between nine and ten. These recommendations will standardize the analysis of LMUP data and enhance comparability of results across studies. PMID:28435343

  7. A simple linear regression method for quantitative trait loci linkage analysis with censored observations.

    PubMed

    Anderson, Carl A; McRae, Allan F; Visscher, Peter M

    2006-07-01

    Standard quantitative trait loci (QTL) mapping techniques commonly assume that the trait is both fully observed and normally distributed. When considering survival or age-at-onset traits these assumptions are often incorrect. Methods have been developed to map QTL for survival traits; however, they are both computationally intensive and not available in standard genome analysis software packages. We propose a grouped linear regression method for the analysis of continuous survival data. Using simulation we compare this method to both the Cox and Weibull proportional hazards models and a standard linear regression method that ignores censoring. The grouped linear regression method is of equivalent power to both the Cox and Weibull proportional hazards methods and is significantly better than the standard linear regression method when censored observations are present. The method is also robust to the proportion of censored individuals and the underlying distribution of the trait. On the basis of linear regression methodology, the grouped linear regression model is computationally simple and fast and can be implemented readily in freely available statistical software.

  8. Detecting outliers when fitting data with nonlinear regression – a new method based on robust nonlinear regression and the false discovery rate

    PubMed Central

    Motulsky, Harvey J; Brown, Ronald E

    2006-01-01

    Background Nonlinear regression, like linear regression, assumes that the scatter of data around the ideal curve follows a Gaussian or normal distribution. This assumption leads to the familiar goal of regression: to minimize the sum of the squares of the vertical or Y-value distances between the points and the curve. Outliers can dominate the sum-of-the-squares calculation, and lead to misleading results. However, we know of no practical method for routinely identifying outliers when fitting curves with nonlinear regression. Results We describe a new method for identifying outliers when fitting data with nonlinear regression. We first fit the data using a robust form of nonlinear regression, based on the assumption that scatter follows a Lorentzian distribution. We devised a new adaptive method that gradually becomes more robust as the method proceeds. To define outliers, we adapted the false discovery rate approach to handling multiple comparisons. We then remove the outliers, and analyze the data using ordinary least-squares regression. Because the method combines robust regression and outlier removal, we call it the ROUT method. When analyzing simulated data, where all scatter is Gaussian, our method detects (falsely) one or more outlier in only about 1–3% of experiments. When analyzing data contaminated with one or several outliers, the ROUT method performs well at outlier identification, with an average False Discovery Rate less than 1%. Conclusion Our method, which combines a new method of robust nonlinear regression with a new method of outlier identification, identifies outliers from nonlinear curve fits with reasonable power and few false positives. PMID:16526949

  9. Application of least median of squared orthogonal distance (LMD) and LMD-based reweighted least squares (RLS) methods on the stock-recruitment relationship

    NASA Astrophysics Data System (ADS)

    Wang, Yan-Jun; Liu, Qun

    1999-03-01

    Analysis of stock-recruitment (SR) data is most often done by fitting various SR relationship curves to the data. Fish population dynamics data often have stochastic variations and measurement errors, which usually result in a biased regression analysis. This paper presents a robust regression method, least median of squared orthogonal distance (LMD), which is insensitive to abnormal values in the dependent and independent variables in a regression analysis. Outliers that have significantly different variance from the rest of the data can be identified in a residual analysis. Then, the least squares (LS) method is applied to the SR data with defined outliers being down weighted. The application of LMD and LMD-based Reweighted Least Squares (RLS) method to simulated and real fisheries SR data is explored.

  10. Robust geographically weighted regression of modeling the Air Polluter Standard Index (APSI)

    NASA Astrophysics Data System (ADS)

    Warsito, Budi; Yasin, Hasbi; Ispriyanti, Dwi; Hoyyi, Abdul

    2018-05-01

    The Geographically Weighted Regression (GWR) model has been widely applied to many practical fields for exploring spatial heterogenity of a regression model. However, this method is inherently not robust to outliers. Outliers commonly exist in data sets and may lead to a distorted estimate of the underlying regression model. One of solution to handle the outliers in the regression model is to use the robust models. So this model was called Robust Geographically Weighted Regression (RGWR). This research aims to aid the government in the policy making process related to air pollution mitigation by developing a standard index model for air polluter (Air Polluter Standard Index - APSI) based on the RGWR approach. In this research, we also consider seven variables that are directly related to the air pollution level, which are the traffic velocity, the population density, the business center aspect, the air humidity, the wind velocity, the air temperature, and the area size of the urban forest. The best model is determined by the smallest AIC value. There are significance differences between Regression and RGWR in this case, but Basic GWR using the Gaussian kernel is the best model to modeling APSI because it has smallest AIC.

  11. Robust, Adaptive Functional Regression in Functional Mixed Model Framework.

    PubMed

    Zhu, Hongxiao; Brown, Philip J; Morris, Jeffrey S

    2011-09-01

    Functional data are increasingly encountered in scientific studies, and their high dimensionality and complexity lead to many analytical challenges. Various methods for functional data analysis have been developed, including functional response regression methods that involve regression of a functional response on univariate/multivariate predictors with nonparametrically represented functional coefficients. In existing methods, however, the functional regression can be sensitive to outlying curves and outlying regions of curves, so is not robust. In this paper, we introduce a new Bayesian method, robust functional mixed models (R-FMM), for performing robust functional regression within the general functional mixed model framework, which includes multiple continuous or categorical predictors and random effect functions accommodating potential between-function correlation induced by the experimental design. The underlying model involves a hierarchical scale mixture model for the fixed effects, random effect and residual error functions. These modeling assumptions across curves result in robust nonparametric estimators of the fixed and random effect functions which down-weight outlying curves and regions of curves, and produce statistics that can be used to flag global and local outliers. These assumptions also lead to distributions across wavelet coefficients that have outstanding sparsity and adaptive shrinkage properties, with great flexibility for the data to determine the sparsity and the heaviness of the tails. Together with the down-weighting of outliers, these within-curve properties lead to fixed and random effect function estimates that appear in our simulations to be remarkably adaptive in their ability to remove spurious features yet retain true features of the functions. We have developed general code to implement this fully Bayesian method that is automatic, requiring the user to only provide the functional data and design matrices. It is efficient enough to handle large data sets, and yields posterior samples of all model parameters that can be used to perform desired Bayesian estimation and inference. Although we present details for a specific implementation of the R-FMM using specific distributional choices in the hierarchical model, 1D functions, and wavelet transforms, the method can be applied more generally using other heavy-tailed distributions, higher dimensional functions (e.g. images), and using other invertible transformations as alternatives to wavelets.

  12. Robust, Adaptive Functional Regression in Functional Mixed Model Framework

    PubMed Central

    Zhu, Hongxiao; Brown, Philip J.; Morris, Jeffrey S.

    2012-01-01

    Functional data are increasingly encountered in scientific studies, and their high dimensionality and complexity lead to many analytical challenges. Various methods for functional data analysis have been developed, including functional response regression methods that involve regression of a functional response on univariate/multivariate predictors with nonparametrically represented functional coefficients. In existing methods, however, the functional regression can be sensitive to outlying curves and outlying regions of curves, so is not robust. In this paper, we introduce a new Bayesian method, robust functional mixed models (R-FMM), for performing robust functional regression within the general functional mixed model framework, which includes multiple continuous or categorical predictors and random effect functions accommodating potential between-function correlation induced by the experimental design. The underlying model involves a hierarchical scale mixture model for the fixed effects, random effect and residual error functions. These modeling assumptions across curves result in robust nonparametric estimators of the fixed and random effect functions which down-weight outlying curves and regions of curves, and produce statistics that can be used to flag global and local outliers. These assumptions also lead to distributions across wavelet coefficients that have outstanding sparsity and adaptive shrinkage properties, with great flexibility for the data to determine the sparsity and the heaviness of the tails. Together with the down-weighting of outliers, these within-curve properties lead to fixed and random effect function estimates that appear in our simulations to be remarkably adaptive in their ability to remove spurious features yet retain true features of the functions. We have developed general code to implement this fully Bayesian method that is automatic, requiring the user to only provide the functional data and design matrices. It is efficient enough to handle large data sets, and yields posterior samples of all model parameters that can be used to perform desired Bayesian estimation and inference. Although we present details for a specific implementation of the R-FMM using specific distributional choices in the hierarchical model, 1D functions, and wavelet transforms, the method can be applied more generally using other heavy-tailed distributions, higher dimensional functions (e.g. images), and using other invertible transformations as alternatives to wavelets. PMID:22308015

  13. Enhancement of partial robust M-regression (PRM) performance using Bisquare weight function

    NASA Astrophysics Data System (ADS)

    Mohamad, Mazni; Ramli, Norazan Mohamed; Ghani@Mamat, Nor Azura Md; Ahmad, Sanizah

    2014-09-01

    Partial Least Squares (PLS) regression is a popular regression technique for handling multicollinearity in low and high dimensional data which fits a linear relationship between sets of explanatory and response variables. Several robust PLS methods are proposed to accommodate the classical PLS algorithms which are easily affected with the presence of outliers. The recent one was called partial robust M-regression (PRM). Unfortunately, the use of monotonous weighting function in the PRM algorithm fails to assign appropriate and proper weights to large outliers according to their severity. Thus, in this paper, a modified partial robust M-regression is introduced to enhance the performance of the original PRM. A re-descending weight function, known as Bisquare weight function is recommended to replace the fair function in the PRM. A simulation study is done to assess the performance of the modified PRM and its efficiency is also tested in both contaminated and uncontaminated simulated data under various percentages of outliers, sample sizes and number of predictors.

  14. Workers' compensation costs among construction workers: a robust regression analysis.

    PubMed

    Friedman, Lee S; Forst, Linda S

    2009-11-01

    Workers' compensation data are an important source for evaluating costs associated with construction injuries. We describe the characteristics of injured construction workers filing claims in Illinois between 2000 and 2005 and the factors associated with compensation costs using a robust regression model. In the final multivariable model, the cumulative percent temporary and permanent disability-measures of severity of injury-explained 38.7% of the variance of cost. Attorney costs explained only 0.3% of the variance of the dependent variable. The model used in this study clearly indicated that percent disability was the most important determinant of cost, although the method and uniformity of percent impairment allocation could be better elucidated. There is a need to integrate analytical methods that are suitable for skewed data when analyzing claim costs.

  15. Rank-preserving regression: a more robust rank regression model against outliers.

    PubMed

    Chen, Tian; Kowalski, Jeanne; Chen, Rui; Wu, Pan; Zhang, Hui; Feng, Changyong; Tu, Xin M

    2016-08-30

    Mean-based semi-parametric regression models such as the popular generalized estimating equations are widely used to improve robustness of inference over parametric models. Unfortunately, such models are quite sensitive to outlying observations. The Wilcoxon-score-based rank regression (RR) provides more robust estimates over generalized estimating equations against outliers. However, the RR and its extensions do not sufficiently address missing data arising in longitudinal studies. In this paper, we propose a new approach to address outliers under a different framework based on the functional response models. This functional-response-model-based alternative not only addresses limitations of the RR and its extensions for longitudinal data, but, with its rank-preserving property, even provides more robust estimates than these alternatives. The proposed approach is illustrated with both real and simulated data. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  16. Assessing the Liquidity of Firms: Robust Neural Network Regression as an Alternative to the Current Ratio

    NASA Astrophysics Data System (ADS)

    de Andrés, Javier; Landajo, Manuel; Lorca, Pedro; Labra, Jose; Ordóñez, Patricia

    Artificial neural networks have proven to be useful tools for solving financial analysis problems such as financial distress prediction and audit risk assessment. In this paper we focus on the performance of robust (least absolute deviation-based) neural networks on measuring liquidity of firms. The problem of learning the bivariate relationship between the components (namely, current liabilities and current assets) of the so-called current ratio is analyzed, and the predictive performance of several modelling paradigms (namely, linear and log-linear regressions, classical ratios and neural networks) is compared. An empirical analysis is conducted on a representative data base from the Spanish economy. Results indicate that classical ratio models are largely inadequate as a realistic description of the studied relationship, especially when used for predictive purposes. In a number of cases, especially when the analyzed firms are microenterprises, the linear specification is improved by considering the flexible non-linear structures provided by neural networks.

  17. Estimating the Standard Error of Robust Regression Estimates.

    DTIC Science & Technology

    1987-03-01

    error is 0(n4/5). In another Monte Carlo study, McKean and Schrader (1984) found that the tests resulting from studentizing ; by _3d/1/2 with d =0(n4 /5...44 4 -:~~-~*v: -. *;~ ~ ~*t .~ # ~ 44 % * ~ .%j % % % * . ., ~ -%. -14- Sheather, S. J. and McKean, J. W. (1987). A comparison of testing and...Wiley, New York. Welsch, R. E. (1980). Regression Sensitivity Analysis and Bounded- Influence Estimation, in Evaluation of Econometric Models eds. J

  18. Quantitative Structure Retention Relationships of Polychlorinated Dibenzodioxins and Dibenzofurans

    DTIC Science & Technology

    1991-08-01

    be a projection onto the X-Y plane. The algorithm for this calculation can be found in Stouch and Jurs (22), but was further refined by Rohrbaugh and...throughspace distances. WPSA2 (c) Weighted positive charged surface area. MOMH2 (c) Second major moment of inertia with hydrogens attached. CSTR 3 (d) Sum...of the models. The robust regression analysis method calculates a regression model using a least median squares algorithm which is not as susceptible

  19. Efficient Robust Regression via Two-Stage Generalized Empirical Likelihood

    PubMed Central

    Bondell, Howard D.; Stefanski, Leonard A.

    2013-01-01

    Large- and finite-sample efficiency and resistance to outliers are the key goals of robust statistics. Although often not simultaneously attainable, we develop and study a linear regression estimator that comes close. Efficiency obtains from the estimator’s close connection to generalized empirical likelihood, and its favorable robustness properties are obtained by constraining the associated sum of (weighted) squared residuals. We prove maximum attainable finite-sample replacement breakdown point, and full asymptotic efficiency for normal errors. Simulation evidence shows that compared to existing robust regression estimators, the new estimator has relatively high efficiency for small sample sizes, and comparable outlier resistance. The estimator is further illustrated and compared to existing methods via application to a real data set with purported outliers. PMID:23976805

  20. Relationships between Adolescent Sexual Outcomes and Exposure to Sex in Media: Robustness to Propensity-Based Analysis

    ERIC Educational Resources Information Center

    Collins, Rebecca L.; Martino, Steven C.; Elliott, Marc N.; Miu, Angela

    2011-01-01

    Adolescent sexual health is a substantial problem in the United States, and two recent studies have linked adolescent sexual behavior and/or outcomes to youths' exposure to sex in the media. Both studies had longitudinal survey designs and used covariate-adjusted regression analysis. Steinberg and Monahan (2011) reanalyzed data from one of these…

  1. Reader reaction to "a robust method for estimating optimal treatment regimes" by Zhang et al. (2012).

    PubMed

    Taylor, Jeremy M G; Cheng, Wenting; Foster, Jared C

    2015-03-01

    A recent article (Zhang et al., 2012, Biometrics 168, 1010-1018) compares regression based and inverse probability based methods of estimating an optimal treatment regime and shows for a small number of covariates that inverse probability weighted methods are more robust to model misspecification than regression methods. We demonstrate that using models that fit the data better reduces the concern about non-robustness for the regression methods. We extend the simulation study of Zhang et al. (2012, Biometrics 168, 1010-1018), also considering the situation of a larger number of covariates, and show that incorporating random forests into both regression and inverse probability weighted based methods improves their properties. © 2014, The International Biometric Society.

  2. Multi-Target Regression via Robust Low-Rank Learning.

    PubMed

    Zhen, Xiantong; Yu, Mengyang; He, Xiaofei; Li, Shuo

    2018-02-01

    Multi-target regression has recently regained great popularity due to its capability of simultaneously learning multiple relevant regression tasks and its wide applications in data mining, computer vision and medical image analysis, while great challenges arise from jointly handling inter-target correlations and input-output relationships. In this paper, we propose Multi-layer Multi-target Regression (MMR) which enables simultaneously modeling intrinsic inter-target correlations and nonlinear input-output relationships in a general framework via robust low-rank learning. Specifically, the MMR can explicitly encode inter-target correlations in a structure matrix by matrix elastic nets (MEN); the MMR can work in conjunction with the kernel trick to effectively disentangle highly complex nonlinear input-output relationships; the MMR can be efficiently solved by a new alternating optimization algorithm with guaranteed convergence. The MMR leverages the strength of kernel methods for nonlinear feature learning and the structural advantage of multi-layer learning architectures for inter-target correlation modeling. More importantly, it offers a new multi-layer learning paradigm for multi-target regression which is endowed with high generality, flexibility and expressive ability. Extensive experimental evaluation on 18 diverse real-world datasets demonstrates that our MMR can achieve consistently high performance and outperforms representative state-of-the-art algorithms, which shows its great effectiveness and generality for multivariate prediction.

  3. Predicting the rate of change in timber value for forest stands infested with gypsy moth

    Treesearch

    David A. Gansner; Owen W. Herrick

    1982-01-01

    Presents a method for estimating the potential impact of gypsy moth attacks on forest-stand value. Robust regression analysis is used to develop an equation for predicting the rate of change in timber value from easy-to-measure key characteristics of stand condition.

  4. Practical aspects of estimating energy components in rodents

    PubMed Central

    van Klinken, Jan B.; van den Berg, Sjoerd A. A.; van Dijk, Ko Willems

    2013-01-01

    Recently there has been an increasing interest in exploiting computational and statistical techniques for the purpose of component analysis of indirect calorimetry data. Using these methods it becomes possible to dissect daily energy expenditure into its components and to assess the dynamic response of the resting metabolic rate (RMR) to nutritional and pharmacological manipulations. To perform robust component analysis, however, is not straightforward and typically requires the tuning of parameters and the preprocessing of data. Moreover the degree of accuracy that can be attained by these methods depends on the configuration of the system, which must be properly taken into account when setting up experimental studies. Here, we review the methods of Kalman filtering, linear, and penalized spline regression, and minimal energy expenditure estimation in the context of component analysis and discuss their results on high resolution datasets from mice and rats. In addition, we investigate the effect of the sample time, the accuracy of the activity sensor, and the washout time of the chamber on the estimation accuracy. We found that on the high resolution data there was a strong correlation between the results of Kalman filtering and penalized spline (P-spline) regression, except for the activity respiratory quotient (RQ). For low resolution data the basal metabolic rate (BMR) and resting RQ could still be estimated accurately with P-spline regression, having a strong correlation with the high resolution estimate (R2 > 0.997; sample time of 9 min). In contrast, the thermic effect of food (TEF) and activity related energy expenditure (AEE) were more sensitive to a reduction in the sample rate (R2 > 0.97). In conclusion, for component analysis on data generated by single channel systems with continuous data acquisition both Kalman filtering and P-spline regression can be used, while for low resolution data from multichannel systems P-spline regression gives more robust results. PMID:23641217

  5. Neural network uncertainty assessment using Bayesian statistics: a remote sensing application

    NASA Technical Reports Server (NTRS)

    Aires, F.; Prigent, C.; Rossow, W. B.

    2004-01-01

    Neural network (NN) techniques have proved successful for many regression problems, in particular for remote sensing; however, uncertainty estimates are rarely provided. In this article, a Bayesian technique to evaluate uncertainties of the NN parameters (i.e., synaptic weights) is first presented. In contrast to more traditional approaches based on point estimation of the NN weights, we assess uncertainties on such estimates to monitor the robustness of the NN model. These theoretical developments are illustrated by applying them to the problem of retrieving surface skin temperature, microwave surface emissivities, and integrated water vapor content from a combined analysis of satellite microwave and infrared observations over land. The weight uncertainty estimates are then used to compute analytically the uncertainties in the network outputs (i.e., error bars and correlation structure of these errors). Such quantities are very important for evaluating any application of an NN model. The uncertainties on the NN Jacobians are then considered in the third part of this article. Used for regression fitting, NN models can be used effectively to represent highly nonlinear, multivariate functions. In this situation, most emphasis is put on estimating the output errors, but almost no attention has been given to errors associated with the internal structure of the regression model. The complex structure of dependency inside the NN is the essence of the model, and assessing its quality, coherency, and physical character makes all the difference between a blackbox model with small output errors and a reliable, robust, and physically coherent model. Such dependency structures are described to the first order by the NN Jacobians: they indicate the sensitivity of one output with respect to the inputs of the model for given input data. We use a Monte Carlo integration procedure to estimate the robustness of the NN Jacobians. A regularization strategy based on principal component analysis is proposed to suppress the multicollinearities in order to make these Jacobians robust and physically meaningful.

  6. Notes on power of normality tests of error terms in regression models

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Střelec, Luboš

    2015-03-10

    Normality is one of the basic assumptions in applying statistical procedures. For example in linear regression most of the inferential procedures are based on the assumption of normality, i.e. the disturbance vector is assumed to be normally distributed. Failure to assess non-normality of the error terms may lead to incorrect results of usual statistical inference techniques such as t-test or F-test. Thus, error terms should be normally distributed in order to allow us to make exact inferences. As a consequence, normally distributed stochastic errors are necessary in order to make a not misleading inferences which explains a necessity and importancemore » of robust tests of normality. Therefore, the aim of this contribution is to discuss normality testing of error terms in regression models. In this contribution, we introduce the general RT class of robust tests for normality, and present and discuss the trade-off between power and robustness of selected classical and robust normality tests of error terms in regression models.« less

  7. Project risk management in the construction of high-rise buildings

    NASA Astrophysics Data System (ADS)

    Titarenko, Boris; Hasnaoui, Amir; Titarenko, Roman; Buzuk, Liliya

    2018-03-01

    This paper shows the project risk management methods, which allow to better identify risks in the construction of high-rise buildings and to manage them throughout the life cycle of the project. One of the project risk management processes is a quantitative analysis of risks. The quantitative analysis usually includes the assessment of the potential impact of project risks and their probabilities. This paper shows the most popular methods of risk probability assessment and tries to indicate the advantages of the robust approach over the traditional methods. Within the framework of the project risk management model a robust approach of P. Huber is applied and expanded for the tasks of regression analysis of project data. The suggested algorithms used to assess the parameters in statistical models allow to obtain reliable estimates. A review of the theoretical problems of the development of robust models built on the methodology of the minimax estimates was done and the algorithm for the situation of asymmetric "contamination" was developed.

  8. Efficient robust doubly adaptive regularized regression with applications.

    PubMed

    Karunamuni, Rohana J; Kong, Linglong; Tu, Wei

    2018-01-01

    We consider the problem of estimation and variable selection for general linear regression models. Regularized regression procedures have been widely used for variable selection, but most existing methods perform poorly in the presence of outliers. We construct a new penalized procedure that simultaneously attains full efficiency and maximum robustness. Furthermore, the proposed procedure satisfies the oracle properties. The new procedure is designed to achieve sparse and robust solutions by imposing adaptive weights on both the decision loss and the penalty function. The proposed method of estimation and variable selection attains full efficiency when the model is correct and, at the same time, achieves maximum robustness when outliers are present. We examine the robustness properties using the finite-sample breakdown point and an influence function. We show that the proposed estimator attains the maximum breakdown point. Furthermore, there is no loss in efficiency when there are no outliers or the error distribution is normal. For practical implementation of the proposed method, we present a computational algorithm. We examine the finite-sample and robustness properties using Monte Carlo studies. Two datasets are also analyzed.

  9. A Comprehensive review of group level model performance in the presence of heteroscedasticity: Can a single model control Type I errors in the presence of outliers?

    PubMed Central

    Mumford, Jeanette A.

    2017-01-01

    Even after thorough preprocessing and a careful time series analysis of functional magnetic resonance imaging (fMRI) data, artifact and other issues can lead to violations of the assumption that the variance is constant across subjects in the group level model. This is especially concerning when modeling a continuous covariate at the group level, as the slope is easily biased by outliers. Various models have been proposed to deal with outliers including models that use the first level variance or that use the group level residual magnitude to differentially weight subjects. The most typically used robust regression, implementing a robust estimator of the regression slope, has been previously studied in the context of fMRI studies and was found to perform well in some scenarios, but a loss of Type I error control can occur for some outlier settings. A second type of robust regression using a heteroscedastic autocorrelation consistent (HAC) estimator, which produces robust slope and variance estimates has been shown to perform well, with better Type I error control, but with large sample sizes (500–1000 subjects). The Type I error control with smaller sample sizes has not been studied in this model and has not been compared to other modeling approaches that handle outliers such as FSL’s Flame 1 and FSL’s outlier de-weighting. Focusing on group level inference with a continuous covariate over a range of sample sizes and degree of heteroscedasticity, which can be driven either by the within- or between-subject variability, both styles of robust regression are compared to ordinary least squares (OLS), FSL’s Flame 1, Flame 1 with outlier de-weighting algorithm and Kendall’s Tau. Additionally, subject omission using the Cook’s Distance measure with OLS and nonparametric inference with the OLS statistic are studied. Pros and cons of these models as well as general strategies for detecting outliers in data and taking precaution to avoid inflated Type I error rates are discussed. PMID:28030782

  10. New machine-learning algorithms for prediction of Parkinson's disease

    NASA Astrophysics Data System (ADS)

    Mandal, Indrajit; Sairam, N.

    2014-03-01

    This article presents an enhanced prediction accuracy of diagnosis of Parkinson's disease (PD) to prevent the delay and misdiagnosis of patients using the proposed robust inference system. New machine-learning methods are proposed and performance comparisons are based on specificity, sensitivity, accuracy and other measurable parameters. The robust methods of treating Parkinson's disease (PD) includes sparse multinomial logistic regression, rotation forest ensemble with support vector machines and principal components analysis, artificial neural networks, boosting methods. A new ensemble method comprising of the Bayesian network optimised by Tabu search algorithm as classifier and Haar wavelets as projection filter is used for relevant feature selection and ranking. The highest accuracy obtained by linear logistic regression and sparse multinomial logistic regression is 100% and sensitivity, specificity of 0.983 and 0.996, respectively. All the experiments are conducted over 95% and 99% confidence levels and establish the results with corrected t-tests. This work shows a high degree of advancement in software reliability and quality of the computer-aided diagnosis system and experimentally shows best results with supportive statistical inference.

  11. Multivariate Analysis of Seismic Field Data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Alam, M. Kathleen

    1999-06-01

    This report includes the details of the model building procedure and prediction of seismic field data. Principal Components Regression, a multivariate analysis technique, was used to model seismic data collected as two pieces of equipment were cycled on and off. Models built that included only the two pieces of equipment of interest had trouble predicting data containing signals not included in the model. Evidence for poor predictions came from the prediction curves as well as spectral F-ratio plots. Once the extraneous signals were included in the model, predictions improved dramatically. While Principal Components Regression performed well for the present datamore » sets, the present data analysis suggests further work will be needed to develop more robust modeling methods as the data become more complex.« less

  12. Modeling vertebrate diversity in Oregon using satellite imagery

    NASA Astrophysics Data System (ADS)

    Cablk, Mary Elizabeth

    Vertebrate diversity was modeled for the state of Oregon using a parametric approach to regression tree analysis. This exploratory data analysis effectively modeled the non-linear relationships between vertebrate richness and phenology, terrain, and climate. Phenology was derived from time-series NOAA-AVHRR satellite imagery for the year 1992 using two methods: principal component analysis and derivation of EROS data center greenness metrics. These two measures of spatial and temporal vegetation condition incorporated the critical temporal element in this analysis. The first three principal components were shown to contain spatial and temporal information about the landscape and discriminated phenologically distinct regions in Oregon. Principal components 2 and 3, 6 greenness metrics, elevation, slope, aspect, annual precipitation, and annual seasonal temperature difference were investigated as correlates to amphibians, birds, all vertebrates, reptiles, and mammals. Variation explained for each regression tree by taxa were: amphibians (91%), birds (67%), all vertebrates (66%), reptiles (57%), and mammals (55%). Spatial statistics were used to quantify the pattern of each taxa and assess validity of resulting predictions from regression tree models. Regression tree analysis was relatively robust against spatial autocorrelation in the response data and graphical results indicated models were well fit to the data.

  13. Cox regression analysis with missing covariates via nonparametric multiple imputation.

    PubMed

    Hsu, Chiu-Hsieh; Yu, Mandi

    2018-01-01

    We consider the situation of estimating Cox regression in which some covariates are subject to missing, and there exists additional information (including observed event time, censoring indicator and fully observed covariates) which may be predictive of the missing covariates. We propose to use two working regression models: one for predicting the missing covariates and the other for predicting the missing probabilities. For each missing covariate observation, these two working models are used to define a nearest neighbor imputing set. This set is then used to non-parametrically impute covariate values for the missing observation. Upon the completion of imputation, Cox regression is performed on the multiply imputed datasets to estimate the regression coefficients. In a simulation study, we compare the nonparametric multiple imputation approach with the augmented inverse probability weighted (AIPW) method, which directly incorporates the two working models into estimation of Cox regression, and the predictive mean matching imputation (PMM) method. We show that all approaches can reduce bias due to non-ignorable missing mechanism. The proposed nonparametric imputation method is robust to mis-specification of either one of the two working models and robust to mis-specification of the link function of the two working models. In contrast, the PMM method is sensitive to misspecification of the covariates included in imputation. The AIPW method is sensitive to the selection probability. We apply the approaches to a breast cancer dataset from Surveillance, Epidemiology and End Results (SEER) Program.

  14. Robust Regression for Slope Estimation in Curriculum-Based Measurement Progress Monitoring

    ERIC Educational Resources Information Center

    Mercer, Sterett H.; Lyons, Alina F.; Johnston, Lauren E.; Millhoff, Courtney L.

    2015-01-01

    Although ordinary least-squares (OLS) regression has been identified as a preferred method to calculate rates of improvement for individual students during curriculum-based measurement (CBM) progress monitoring, OLS slope estimates are sensitive to the presence of extreme values. Robust estimators have been developed that are less biased by…

  15. Kendall-Theil Robust Line (KTRLine--version 1.0)-A Visual Basic Program for Calculating and Graphing Robust Nonparametric Estimates of Linear-Regression Coefficients Between Two Continuous Variables

    USGS Publications Warehouse

    Granato, Gregory E.

    2006-01-01

    The Kendall-Theil Robust Line software (KTRLine-version 1.0) is a Visual Basic program that may be used with the Microsoft Windows operating system to calculate parameters for robust, nonparametric estimates of linear-regression coefficients between two continuous variables. The KTRLine software was developed by the U.S. Geological Survey, in cooperation with the Federal Highway Administration, for use in stochastic data modeling with local, regional, and national hydrologic data sets to develop planning-level estimates of potential effects of highway runoff on the quality of receiving waters. The Kendall-Theil robust line was selected because this robust nonparametric method is resistant to the effects of outliers and nonnormality in residuals that commonly characterize hydrologic data sets. The slope of the line is calculated as the median of all possible pairwise slopes between points. The intercept is calculated so that the line will run through the median of input data. A single-line model or a multisegment model may be specified. The program was developed to provide regression equations with an error component for stochastic data generation because nonparametric multisegment regression tools are not available with the software that is commonly used to develop regression models. The Kendall-Theil robust line is a median line and, therefore, may underestimate total mass, volume, or loads unless the error component or a bias correction factor is incorporated into the estimate. Regression statistics such as the median error, the median absolute deviation, the prediction error sum of squares, the root mean square error, the confidence interval for the slope, and the bias correction factor for median estimates are calculated by use of nonparametric methods. These statistics, however, may be used to formulate estimates of mass, volume, or total loads. The program is used to read a two- or three-column tab-delimited input file with variable names in the first row and data in subsequent rows. The user may choose the columns that contain the independent (X) and dependent (Y) variable. A third column, if present, may contain metadata such as the sample-collection location and date. The program screens the input files and plots the data. The KTRLine software is a graphical tool that facilitates development of regression models by use of graphs of the regression line with data, the regression residuals (with X or Y), and percentile plots of the cumulative frequency of the X variable, Y variable, and the regression residuals. The user may individually transform the independent and dependent variables to reduce heteroscedasticity and to linearize data. The program plots the data and the regression line. The program also prints model specifications and regression statistics to the screen. The user may save and print the regression results. The program can accept data sets that contain up to about 15,000 XY data points, but because the program must sort the array of all pairwise slopes, the program may be perceptibly slow with data sets that contain more than about 1,000 points.

  16. Interrupted time series regression for the evaluation of public health interventions: a tutorial.

    PubMed

    Bernal, James Lopez; Cummins, Steven; Gasparrini, Antonio

    2017-02-01

    Interrupted time series (ITS) analysis is a valuable study design for evaluating the effectiveness of population-level health interventions that have been implemented at a clearly defined point in time. It is increasingly being used to evaluate the effectiveness of interventions ranging from clinical therapy to national public health legislation. Whereas the design shares many properties of regression-based approaches in other epidemiological studies, there are a range of unique features of time series data that require additional methodological considerations. In this tutorial we use a worked example to demonstrate a robust approach to ITS analysis using segmented regression. We begin by describing the design and considering when ITS is an appropriate design choice. We then discuss the essential, yet often omitted, step of proposing the impact model a priori. Subsequently, we demonstrate the approach to statistical analysis including the main segmented regression model. Finally we describe the main methodological issues associated with ITS analysis: over-dispersion of time series data, autocorrelation, adjusting for seasonal trends and controlling for time-varying confounders, and we also outline some of the more complex design adaptations that can be used to strengthen the basic ITS design.

  17. Interrupted time series regression for the evaluation of public health interventions: a tutorial

    PubMed Central

    Bernal, James Lopez; Cummins, Steven; Gasparrini, Antonio

    2017-01-01

    Abstract Interrupted time series (ITS) analysis is a valuable study design for evaluating the effectiveness of population-level health interventions that have been implemented at a clearly defined point in time. It is increasingly being used to evaluate the effectiveness of interventions ranging from clinical therapy to national public health legislation. Whereas the design shares many properties of regression-based approaches in other epidemiological studies, there are a range of unique features of time series data that require additional methodological considerations. In this tutorial we use a worked example to demonstrate a robust approach to ITS analysis using segmented regression. We begin by describing the design and considering when ITS is an appropriate design choice. We then discuss the essential, yet often omitted, step of proposing the impact model a priori. Subsequently, we demonstrate the approach to statistical analysis including the main segmented regression model. Finally we describe the main methodological issues associated with ITS analysis: over-dispersion of time series data, autocorrelation, adjusting for seasonal trends and controlling for time-varying confounders, and we also outline some of the more complex design adaptations that can be used to strengthen the basic ITS design. PMID:27283160

  18. Low-Level Stratus Prediction Using Binary Statistical Regression: A Progress Report Using Moffett Field Data.

    DTIC Science & Technology

    1983-12-01

    analysis; such work is not reported here. It seems pos- sible that a robust principle component analysis may he informa- tive (see Gnanadesikan (1977...Statistics in Atmospheric Sciences, American Meteorological Soc., Boston, Mass. (1979) pp. 46-48. a Gnanadesikan , R., Methods for Statistical Data...North Carolina Chapel Hill, NC 20742 Dr. R. Gnanadesikan Bell Telephone Lab Murray Hill, NJ 07733 -%.. *5%a: *1 *15 I ,, - . . , ,, ... . . . . . . NO

  19. An Alternative Flight Software Trigger Paradigm: Applying Multivariate Logistic Regression to Sense Trigger Conditions Using Inaccurate or Scarce Information

    NASA Technical Reports Server (NTRS)

    Smith, Kelly M.; Gay, Robert S.; Stachowiak, Susan J.

    2013-01-01

    In late 2014, NASA will fly the Orion capsule on a Delta IV-Heavy rocket for the Exploration Flight Test-1 (EFT-1) mission. For EFT-1, the Orion capsule will be flying with a new GPS receiver and new navigation software. Given the experimental nature of the flight, the flight software must be robust to the loss of GPS measurements. Once the high-speed entry is complete, the drogue parachutes must be deployed within the proper conditions to stabilize the vehicle prior to deploying the main parachutes. When GPS is available in nominal operations, the vehicle will deploy the drogue parachutes based on an altitude trigger. However, when GPS is unavailable, the navigated altitude errors become excessively large, driving the need for a backup barometric altimeter to improve altitude knowledge. In order to increase overall robustness, the vehicle also has an alternate method of triggering the parachute deployment sequence based on planet-relative velocity if both the GPS and the barometric altimeter fail. However, this backup trigger results in large altitude errors relative to the targeted altitude. Motivated by this challenge, this paper demonstrates how logistic regression may be employed to semi-automatically generate robust triggers based on statistical analysis. Logistic regression is used as a ground processor pre-flight to develop a statistical classifier. The classifier would then be implemented in flight software and executed in real-time. This technique offers improved performance even in the face of highly inaccurate measurements. Although the logistic regression-based trigger approach will not be implemented within EFT-1 flight software, the methodology can be carried forward for future missions and vehicles.

  20. An Alternative Flight Software Paradigm: Applying Multivariate Logistic Regression to Sense Trigger Conditions using Inaccurate or Scarce Information

    NASA Technical Reports Server (NTRS)

    Smith, Kelly; Gay, Robert; Stachowiak, Susan

    2013-01-01

    In late 2014, NASA will fly the Orion capsule on a Delta IV-Heavy rocket for the Exploration Flight Test-1 (EFT-1) mission. For EFT-1, the Orion capsule will be flying with a new GPS receiver and new navigation software. Given the experimental nature of the flight, the flight software must be robust to the loss of GPS measurements. Once the high-speed entry is complete, the drogue parachutes must be deployed within the proper conditions to stabilize the vehicle prior to deploying the main parachutes. When GPS is available in nominal operations, the vehicle will deploy the drogue parachutes based on an altitude trigger. However, when GPS is unavailable, the navigated altitude errors become excessively large, driving the need for a backup barometric altimeter to improve altitude knowledge. In order to increase overall robustness, the vehicle also has an alternate method of triggering the parachute deployment sequence based on planet-relative velocity if both the GPS and the barometric altimeter fail. However, this backup trigger results in large altitude errors relative to the targeted altitude. Motivated by this challenge, this paper demonstrates how logistic regression may be employed to semi-automatically generate robust triggers based on statistical analysis. Logistic regression is used as a ground processor pre-flight to develop a statistical classifier. The classifier would then be implemented in flight software and executed in real-time. This technique offers improved performance even in the face of highly inaccurate measurements. Although the logistic regression-based trigger approach will not be implemented within EFT-1 flight software, the methodology can be carried forward for future missions and vehicles

  1. An Alternative Flight Software Trigger Paradigm: Applying Multivariate Logistic Regression to Sense Trigger Conditions using Inaccurate or Scarce Information

    NASA Technical Reports Server (NTRS)

    Smith, Kelly M.; Gay, Robert S.; Stachowiak, Susan J.

    2013-01-01

    In late 2014, NASA will fly the Orion capsule on a Delta IV-Heavy rocket for the Exploration Flight Test-1 (EFT-1) mission. For EFT-1, the Orion capsule will be flying with a new GPS receiver and new navigation software. Given the experimental nature of the flight, the flight software must be robust to the loss of GPS measurements. Once the high-speed entry is complete, the drogue parachutes must be deployed within the proper conditions to stabilize the vehicle prior to deploying the main parachutes. When GPS is available in nominal operations, the vehicle will deploy the drogue parachutes based on an altitude trigger. However, when GPS is unavailable, the navigated altitude errors become excessively large, driving the need for a backup barometric altimeter. In order to increase overall robustness, the vehicle also has an alternate method of triggering the drogue parachute deployment based on planet-relative velocity if both the GPS and the barometric altimeter fail. However, this velocity-based trigger results in large altitude errors relative to the targeted altitude. Motivated by this challenge, this paper demonstrates how logistic regression may be employed to automatically generate robust triggers based on statistical analysis. Logistic regression is used as a ground processor pre-flight to develop a classifier. The classifier would then be implemented in flight software and executed in real-time. This technique offers excellent performance even in the face of highly inaccurate measurements. Although the logistic regression-based trigger approach will not be implemented within EFT-1 flight software, the methodology can be carried forward for future missions and vehicles.

  2. Robust functional regression model for marginal mean and subject-specific inferences.

    PubMed

    Cao, Chunzheng; Shi, Jian Qing; Lee, Youngjo

    2017-01-01

    We introduce flexible robust functional regression models, using various heavy-tailed processes, including a Student t-process. We propose efficient algorithms in estimating parameters for the marginal mean inferences and in predicting conditional means as well as interpolation and extrapolation for the subject-specific inferences. We develop bootstrap prediction intervals (PIs) for conditional mean curves. Numerical studies show that the proposed model provides a robust approach against data contamination or distribution misspecification, and the proposed PIs maintain the nominal confidence levels. A real data application is presented as an illustrative example.

  3. Ensemble habitat mapping of invasive plant species

    USGS Publications Warehouse

    Stohlgren, T.J.; Ma, P.; Kumar, S.; Rocca, M.; Morisette, J.T.; Jarnevich, C.S.; Benson, N.

    2010-01-01

    Ensemble species distribution models combine the strengths of several species environmental matching models, while minimizing the weakness of any one model. Ensemble models may be particularly useful in risk analysis of recently arrived, harmful invasive species because species may not yet have spread to all suitable habitats, leaving species-environment relationships difficult to determine. We tested five individual models (logistic regression, boosted regression trees, random forest, multivariate adaptive regression splines (MARS), and maximum entropy model or Maxent) and ensemble modeling for selected nonnative plant species in Yellowstone and Grand Teton National Parks, Wyoming; Sequoia and Kings Canyon National Parks, California, and areas of interior Alaska. The models are based on field data provided by the park staffs, combined with topographic, climatic, and vegetation predictors derived from satellite data. For the four invasive plant species tested, ensemble models were the only models that ranked in the top three models for both field validation and test data. Ensemble models may be more robust than individual species-environment matching models for risk analysis. ?? 2010 Society for Risk Analysis.

  4. Forecasting urban water demand: A meta-regression analysis.

    PubMed

    Sebri, Maamar

    2016-12-01

    Water managers and planners require accurate water demand forecasts over the short-, medium- and long-term for many purposes. These range from assessing water supply needs over spatial and temporal patterns to optimizing future investments and planning future allocations across competing sectors. This study surveys the empirical literature on the urban water demand forecasting using the meta-analytical approach. Specifically, using more than 600 estimates, a meta-regression analysis is conducted to identify explanations of cross-studies variation in accuracy of urban water demand forecasting. Our study finds that accuracy depends significantly on study characteristics, including demand periodicity, modeling method, forecasting horizon, model specification and sample size. The meta-regression results remain robust to different estimators employed as well as to a series of sensitivity checks performed. The importance of these findings lies in the conclusions and implications drawn out for regulators and policymakers and for academics alike. Copyright © 2016. Published by Elsevier Ltd.

  5. A comparison of methods for the analysis of binomial clustered outcomes in behavioral research.

    PubMed

    Ferrari, Alberto; Comelli, Mario

    2016-12-01

    In behavioral research, data consisting of a per-subject proportion of "successes" and "failures" over a finite number of trials often arise. This clustered binary data are usually non-normally distributed, which can distort inference if the usual general linear model is applied and sample size is small. A number of more advanced methods is available, but they are often technically challenging and a comparative assessment of their performances in behavioral setups has not been performed. We studied the performances of some methods applicable to the analysis of proportions; namely linear regression, Poisson regression, beta-binomial regression and Generalized Linear Mixed Models (GLMMs). We report on a simulation study evaluating power and Type I error rate of these models in hypothetical scenarios met by behavioral researchers; plus, we describe results from the application of these methods on data from real experiments. Our results show that, while GLMMs are powerful instruments for the analysis of clustered binary outcomes, beta-binomial regression can outperform them in a range of scenarios. Linear regression gave results consistent with the nominal level of significance, but was overall less powerful. Poisson regression, instead, mostly led to anticonservative inference. GLMMs and beta-binomial regression are generally more powerful than linear regression; yet linear regression is robust to model misspecification in some conditions, whereas Poisson regression suffers heavily from violations of the assumptions when used to model proportion data. We conclude providing directions to behavioral scientists dealing with clustered binary data and small sample sizes. Copyright © 2016 Elsevier B.V. All rights reserved.

  6. Robust nonlinear system identification: Bayesian mixture of experts using the t-distribution

    NASA Astrophysics Data System (ADS)

    Baldacchino, Tara; Worden, Keith; Rowson, Jennifer

    2017-02-01

    A novel variational Bayesian mixture of experts model for robust regression of bifurcating and piece-wise continuous processes is introduced. The mixture of experts model is a powerful model which probabilistically splits the input space allowing different models to operate in the separate regions. However, current methods have no fail-safe against outliers. In this paper, a robust mixture of experts model is proposed which consists of Student-t mixture models at the gates and Student-t distributed experts, trained via Bayesian inference. The Student-t distribution has heavier tails than the Gaussian distribution, and so it is more robust to outliers, noise and non-normality in the data. Using both simulated data and real data obtained from the Z24 bridge this robust mixture of experts performs better than its Gaussian counterpart when outliers are present. In particular, it provides robustness to outliers in two forms: unbiased parameter regression models, and robustness to overfitting/complex models.

  7. Determinants of the lethality of climate-related disasters in the Caribbean Community (CARICOM): a cross-country analysis

    PubMed Central

    Andrewin, Aisha N.; Rodriguez-Llanes, Jose M.; Guha-Sapir, Debarati

    2015-01-01

    Floods and storms are climate-related hazards posing high mortality risk to Caribbean Community (CARICOM) nations. However risk factors for their lethality remain untested. We conducted an ecological study investigating risk factors for flood and storm lethality in CARICOM nations for the period 1980–2012. Lethality - deaths versus no deaths per disaster event- was the outcome. We examined biophysical and social vulnerability proxies and a decadal effect as predictors. We developed our regression model via multivariate analysis using a generalized logistic regression model with quasi-binomial distribution; removal of multi-collinear variables and backward elimination. Robustness was checked through subset analysis. We found significant positive associations between lethality, percentage of total land dedicated to agriculture (odds ratio [OR] 1.032; 95% CI: 1.013–1.053) and percentage urban population (OR 1.029, 95% CI 1.003–1.057). Deaths were more likely in the 2000–2012 period versus 1980–1989 (OR 3.708, 95% CI 1.615–8.737). Robustness checks revealed similar coefficients and directions of association. Population health in CARICOM nations is being increasingly impacted by climate-related disasters connected to increasing urbanization and land use patterns. Our findings support the evidence base for setting sustainable development goals (SDG). PMID:26153115

  8. Model Robust Calibration: Method and Application to Electronically-Scanned Pressure Transducers

    NASA Technical Reports Server (NTRS)

    Walker, Eric L.; Starnes, B. Alden; Birch, Jeffery B.; Mays, James E.

    2010-01-01

    This article presents the application of a recently developed statistical regression method to the controlled instrument calibration problem. The statistical method of Model Robust Regression (MRR), developed by Mays, Birch, and Starnes, is shown to improve instrument calibration by reducing the reliance of the calibration on a predetermined parametric (e.g. polynomial, exponential, logarithmic) model. This is accomplished by allowing fits from the predetermined parametric model to be augmented by a certain portion of a fit to the residuals from the initial regression using a nonparametric (locally parametric) regression technique. The method is demonstrated for the absolute scale calibration of silicon-based pressure transducers.

  9. Robust Face Recognition via Multi-Scale Patch-Based Matrix Regression.

    PubMed

    Gao, Guangwei; Yang, Jian; Jing, Xiaoyuan; Huang, Pu; Hua, Juliang; Yue, Dong

    2016-01-01

    In many real-world applications such as smart card solutions, law enforcement, surveillance and access control, the limited training sample size is the most fundamental problem. By making use of the low-rank structural information of the reconstructed error image, the so-called nuclear norm-based matrix regression has been demonstrated to be effective for robust face recognition with continuous occlusions. However, the recognition performance of nuclear norm-based matrix regression degrades greatly in the face of the small sample size problem. An alternative solution to tackle this problem is performing matrix regression on each patch and then integrating the outputs from all patches. However, it is difficult to set an optimal patch size across different databases. To fully utilize the complementary information from different patch scales for the final decision, we propose a multi-scale patch-based matrix regression scheme based on which the ensemble of multi-scale outputs can be achieved optimally. Extensive experiments on benchmark face databases validate the effectiveness and robustness of our method, which outperforms several state-of-the-art patch-based face recognition algorithms.

  10. Combining synthetic controls and interrupted time series analysis to improve causal inference in program evaluation.

    PubMed

    Linden, Ariel

    2018-04-01

    Interrupted time series analysis (ITSA) is an evaluation methodology in which a single treatment unit's outcome is studied over time and the intervention is expected to "interrupt" the level and/or trend of the outcome. The internal validity is strengthened considerably when the treated unit is contrasted with a comparable control group. In this paper, we introduce a robust evaluation framework that combines the synthetic controls method (SYNTH) to generate a comparable control group and ITSA regression to assess covariate balance and estimate treatment effects. We evaluate the effect of California's Proposition 99 for reducing cigarette sales, by comparing California to other states not exposed to smoking reduction initiatives. SYNTH is used to reweight nontreated units to make them comparable to the treated unit. These weights are then used in ITSA regression models to assess covariate balance and estimate treatment effects. Covariate balance was achieved for all but one covariate. While California experienced a significant decrease in the annual trend of cigarette sales after Proposition 99, there was no statistically significant treatment effect when compared to synthetic controls. The advantage of using this framework over regression alone is that it ensures that a comparable control group is generated. Additionally, it offers a common set of statistical measures familiar to investigators, the capability for assessing covariate balance, and enhancement of the evaluation with a comprehensive set of postestimation measures. Therefore, this robust framework should be considered as a primary approach for evaluating treatment effects in multiple group time series analysis. © 2018 John Wiley & Sons, Ltd.

  11. Functional Data Analysis Applied to Modeling of Severe Acute Mucositis and Dysphagia Resulting From Head and Neck Radiation Therapy

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dean, Jamie A., E-mail: jamie.dean@icr.ac.uk; Wong, Kee H.; Gay, Hiram

    Purpose: Current normal tissue complication probability modeling using logistic regression suffers from bias and high uncertainty in the presence of highly correlated radiation therapy (RT) dose data. This hinders robust estimates of dose-response associations and, hence, optimal normal tissue–sparing strategies from being elucidated. Using functional data analysis (FDA) to reduce the dimensionality of the dose data could overcome this limitation. Methods and Materials: FDA was applied to modeling of severe acute mucositis and dysphagia resulting from head and neck RT. Functional partial least squares regression (FPLS) and functional principal component analysis were used for dimensionality reduction of the dose-volume histogrammore » data. The reduced dose data were input into functional logistic regression models (functional partial least squares–logistic regression [FPLS-LR] and functional principal component–logistic regression [FPC-LR]) along with clinical data. This approach was compared with penalized logistic regression (PLR) in terms of predictive performance and the significance of treatment covariate–response associations, assessed using bootstrapping. Results: The area under the receiver operating characteristic curve for the PLR, FPC-LR, and FPLS-LR models was 0.65, 0.69, and 0.67, respectively, for mucositis (internal validation) and 0.81, 0.83, and 0.83, respectively, for dysphagia (external validation). The calibration slopes/intercepts for the PLR, FPC-LR, and FPLS-LR models were 1.6/−0.67, 0.45/0.47, and 0.40/0.49, respectively, for mucositis (internal validation) and 2.5/−0.96, 0.79/−0.04, and 0.79/0.00, respectively, for dysphagia (external validation). The bootstrapped odds ratios indicated significant associations between RT dose and severe toxicity in the mucositis and dysphagia FDA models. Cisplatin was significantly associated with severe dysphagia in the FDA models. None of the covariates was significantly associated with severe toxicity in the PLR models. Dose levels greater than approximately 1.0 Gy/fraction were most strongly associated with severe acute mucositis and dysphagia in the FDA models. Conclusions: FPLS and functional principal component analysis marginally improved predictive performance compared with PLR and provided robust dose-response associations. FDA is recommended for use in normal tissue complication probability modeling.« less

  12. Functional Data Analysis Applied to Modeling of Severe Acute Mucositis and Dysphagia Resulting From Head and Neck Radiation Therapy.

    PubMed

    Dean, Jamie A; Wong, Kee H; Gay, Hiram; Welsh, Liam C; Jones, Ann-Britt; Schick, Ulrike; Oh, Jung Hun; Apte, Aditya; Newbold, Kate L; Bhide, Shreerang A; Harrington, Kevin J; Deasy, Joseph O; Nutting, Christopher M; Gulliford, Sarah L

    2016-11-15

    Current normal tissue complication probability modeling using logistic regression suffers from bias and high uncertainty in the presence of highly correlated radiation therapy (RT) dose data. This hinders robust estimates of dose-response associations and, hence, optimal normal tissue-sparing strategies from being elucidated. Using functional data analysis (FDA) to reduce the dimensionality of the dose data could overcome this limitation. FDA was applied to modeling of severe acute mucositis and dysphagia resulting from head and neck RT. Functional partial least squares regression (FPLS) and functional principal component analysis were used for dimensionality reduction of the dose-volume histogram data. The reduced dose data were input into functional logistic regression models (functional partial least squares-logistic regression [FPLS-LR] and functional principal component-logistic regression [FPC-LR]) along with clinical data. This approach was compared with penalized logistic regression (PLR) in terms of predictive performance and the significance of treatment covariate-response associations, assessed using bootstrapping. The area under the receiver operating characteristic curve for the PLR, FPC-LR, and FPLS-LR models was 0.65, 0.69, and 0.67, respectively, for mucositis (internal validation) and 0.81, 0.83, and 0.83, respectively, for dysphagia (external validation). The calibration slopes/intercepts for the PLR, FPC-LR, and FPLS-LR models were 1.6/-0.67, 0.45/0.47, and 0.40/0.49, respectively, for mucositis (internal validation) and 2.5/-0.96, 0.79/-0.04, and 0.79/0.00, respectively, for dysphagia (external validation). The bootstrapped odds ratios indicated significant associations between RT dose and severe toxicity in the mucositis and dysphagia FDA models. Cisplatin was significantly associated with severe dysphagia in the FDA models. None of the covariates was significantly associated with severe toxicity in the PLR models. Dose levels greater than approximately 1.0 Gy/fraction were most strongly associated with severe acute mucositis and dysphagia in the FDA models. FPLS and functional principal component analysis marginally improved predictive performance compared with PLR and provided robust dose-response associations. FDA is recommended for use in normal tissue complication probability modeling. Copyright © 2016 The Author(s). Published by Elsevier Inc. All rights reserved.

  13. Rapid quantification of casein in skim milk using Fourier transform infrared spectroscopy, enzymatic perturbation, and multiway partial least squares regression: Monitoring chymosin at work.

    PubMed

    Baum, A; Hansen, P W; Nørgaard, L; Sørensen, John; Mikkelsen, J D

    2016-08-01

    In this study, we introduce enzymatic perturbation combined with Fourier transform infrared (FTIR) spectroscopy as a concept for quantifying casein in subcritical heated skim milk using chemometric multiway analysis. Chymosin is a protease that cleaves specifically caseins. As a result of hydrolysis, all casein proteins clot to form a creamy precipitate, and whey proteins remain in the supernatant. We monitored the cheese-clotting reaction in real time using FTIR and analyzed the resulting evolution profiles to establish calibration models using parallel factor analysis and multiway partial least squares regression. Because we observed casein-specific kinetic changes, the retrieved models were independent of the chemical background matrix and were therefore robust against possible covariance effects. We tested the robustness of the models by spiking the milk solutions with whey, calcium, and cream. This method can be used at different stages in the dairy production chain to ensure the quality of the delivered milk. In particular, the cheese-making industry can benefit from such methods to optimize production control. Copyright © 2016 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.

  14. Robust Eye Center Localization through Face Alignment and Invariant Isocentric Patterns

    PubMed Central

    Teng, Dongdong; Chen, Dihu; Tan, Hongzhou

    2015-01-01

    The localization of eye centers is a very useful cue for numerous applications like face recognition, facial expression recognition, and the early screening of neurological pathologies. Several methods relying on available light for accurate eye-center localization have been exploited. However, despite the considerable improvements that eye-center localization systems have undergone in recent years, only few of these developments deal with the challenges posed by the profile (non-frontal face). In this paper, we first use the explicit shape regression method to obtain the rough location of the eye centers. Because this method extracts global information from the human face, it is robust against any changes in the eye region. We exploit this robustness and utilize it as a constraint. To locate the eye centers accurately, we employ isophote curvature features, the accuracy of which has been demonstrated in a previous study. By applying these features, we obtain a series of eye-center locations which are candidates for the actual position of the eye-center. Among these locations, the estimated locations which minimize the reconstruction error between the two methods mentioned above are taken as the closest approximation for the eye centers locations. Therefore, we combine explicit shape regression and isophote curvature feature analysis to achieve robustness and accuracy, respectively. In practical experiments, we use BioID and FERET datasets to test our approach to obtaining an accurate eye-center location while retaining robustness against changes in scale and pose. In addition, we apply our method to non-frontal faces to test its robustness and accuracy, which are essential in gaze estimation but have seldom been mentioned in previous works. Through extensive experimentation, we show that the proposed method can achieve a significant improvement in accuracy and robustness over state-of-the-art techniques, with our method ranking second in terms of accuracy. According to our implementation on a PC with a Xeon 2.5Ghz CPU, the frame rate of the eye tracking process can achieve 38 Hz. PMID:26426929

  15. Robust biological parametric mapping: an improved technique for multimodal brain image analysis

    NASA Astrophysics Data System (ADS)

    Yang, Xue; Beason-Held, Lori; Resnick, Susan M.; Landman, Bennett A.

    2011-03-01

    Mapping the quantitative relationship between structure and function in the human brain is an important and challenging problem. Numerous volumetric, surface, region of interest and voxelwise image processing techniques have been developed to statistically assess potential correlations between imaging and non-imaging metrics. Recently, biological parametric mapping has extended the widely popular statistical parametric approach to enable application of the general linear model to multiple image modalities (both for regressors and regressands) along with scalar valued observations. This approach offers great promise for direct, voxelwise assessment of structural and functional relationships with multiple imaging modalities. However, as presented, the biological parametric mapping approach is not robust to outliers and may lead to invalid inferences (e.g., artifactual low p-values) due to slight mis-registration or variation in anatomy between subjects. To enable widespread application of this approach, we introduce robust regression and robust inference in the neuroimaging context of application of the general linear model. Through simulation and empirical studies, we demonstrate that our robust approach reduces sensitivity to outliers without substantial degradation in power. The robust approach and associated software package provides a reliable way to quantitatively assess voxelwise correlations between structural and functional neuroimaging modalities.

  16. A Bayesian goodness of fit test and semiparametric generalization of logistic regression with measurement data.

    PubMed

    Schörgendorfer, Angela; Branscum, Adam J; Hanson, Timothy E

    2013-06-01

    Logistic regression is a popular tool for risk analysis in medical and population health science. With continuous response data, it is common to create a dichotomous outcome for logistic regression analysis by specifying a threshold for positivity. Fitting a linear regression to the nondichotomized response variable assuming a logistic sampling model for the data has been empirically shown to yield more efficient estimates of odds ratios than ordinary logistic regression of the dichotomized endpoint. We illustrate that risk inference is not robust to departures from the parametric logistic distribution. Moreover, the model assumption of proportional odds is generally not satisfied when the condition of a logistic distribution for the data is violated, leading to biased inference from a parametric logistic analysis. We develop novel Bayesian semiparametric methodology for testing goodness of fit of parametric logistic regression with continuous measurement data. The testing procedures hold for any cutoff threshold and our approach simultaneously provides the ability to perform semiparametric risk estimation. Bayes factors are calculated using the Savage-Dickey ratio for testing the null hypothesis of logistic regression versus a semiparametric generalization. We propose a fully Bayesian and a computationally efficient empirical Bayesian approach to testing, and we present methods for semiparametric estimation of risks, relative risks, and odds ratios when parametric logistic regression fails. Theoretical results establish the consistency of the empirical Bayes test. Results from simulated data show that the proposed approach provides accurate inference irrespective of whether parametric assumptions hold or not. Evaluation of risk factors for obesity shows that different inferences are derived from an analysis of a real data set when deviations from a logistic distribution are permissible in a flexible semiparametric framework. © 2013, The International Biometric Society.

  17. Modeling Longitudinal Data Containing Non-Normal Within Subject Errors

    NASA Technical Reports Server (NTRS)

    Feiveson, Alan; Glenn, Nancy L.

    2013-01-01

    The mission of the National Aeronautics and Space Administration’s (NASA) human research program is to advance safe human spaceflight. This involves conducting experiments, collecting data, and analyzing data. The data are longitudinal and result from a relatively few number of subjects; typically 10 – 20. A longitudinal study refers to an investigation where participant outcomes and possibly treatments are collected at multiple follow-up times. Standard statistical designs such as mean regression with random effects and mixed–effects regression are inadequate for such data because the population is typically not approximately normally distributed. Hence, more advanced data analysis methods are necessary. This research focuses on four such methods for longitudinal data analysis: the recently proposed linear quantile mixed models (lqmm) by Geraci and Bottai (2013), quantile regression, multilevel mixed–effects linear regression, and robust regression. This research also provides computational algorithms for longitudinal data that scientists can directly use for human spaceflight and other longitudinal data applications, then presents statistical evidence that verifies which method is best for specific situations. This advances the study of longitudinal data in a broad range of applications including applications in the sciences, technology, engineering and mathematics fields.

  18. Revisiting the southern pine growth decline: Where are we 10 years later?

    Treesearch

    Gary L. Gadbury; Michael S. Williams; Hans T. Schreuder

    2004-01-01

    This paper evaluates changes in growth of pine stands in the state of Georgia, U.S.A., using USDA Forest Service Forest Inventory and Analysis (FIA) data. In particular, data representing an additional 10-year growth cy-cle has been added to previously published results from two earlier growth cycles. A robust regression procedure is combined with a bootstrap technique...

  19. Improving near-infrared prediction model robustness with support vector machine regression: a pharmaceutical tablet assay example.

    PubMed

    Igne, Benoît; Drennen, James K; Anderson, Carl A

    2014-01-01

    Changes in raw materials and process wear and tear can have significant effects on the prediction error of near-infrared calibration models. When the variability that is present during routine manufacturing is not included in the calibration, test, and validation sets, the long-term performance and robustness of the model will be limited. Nonlinearity is a major source of interference. In near-infrared spectroscopy, nonlinearity can arise from light path-length differences that can come from differences in particle size or density. The usefulness of support vector machine (SVM) regression to handle nonlinearity and improve the robustness of calibration models in scenarios where the calibration set did not include all the variability present in test was evaluated. Compared to partial least squares (PLS) regression, SVM regression was less affected by physical (particle size) and chemical (moisture) differences. The linearity of the SVM predicted values was also improved. Nevertheless, although visualization and interpretation tools have been developed to enhance the usability of SVM-based methods, work is yet to be done to provide chemometricians in the pharmaceutical industry with a regression method that can supplement PLS-based methods.

  20. Regression estimators for generic health-related quality of life and quality-adjusted life years.

    PubMed

    Basu, Anirban; Manca, Andrea

    2012-01-01

    To develop regression models for outcomes with truncated supports, such as health-related quality of life (HRQoL) data, and account for features typical of such data such as a skewed distribution, spikes at 1 or 0, and heteroskedasticity. Regression estimators based on features of the Beta distribution. First, both a single equation and a 2-part model are presented, along with estimation algorithms based on maximum-likelihood, quasi-likelihood, and Bayesian Markov-chain Monte Carlo methods. A novel Bayesian quasi-likelihood estimator is proposed. Second, a simulation exercise is presented to assess the performance of the proposed estimators against ordinary least squares (OLS) regression for a variety of HRQoL distributions that are encountered in practice. Finally, the performance of the proposed estimators is assessed by using them to quantify the treatment effect on QALYs in the EVALUATE hysterectomy trial. Overall model fit is studied using several goodness-of-fit tests such as Pearson's correlation test, link and reset tests, and a modified Hosmer-Lemeshow test. The simulation results indicate that the proposed methods are more robust in estimating covariate effects than OLS, especially when the effects are large or the HRQoL distribution has a large spike at 1. Quasi-likelihood techniques are more robust than maximum likelihood estimators. When applied to the EVALUATE trial, all but the maximum likelihood estimators produce unbiased estimates of the treatment effect. One and 2-part Beta regression models provide flexible approaches to regress the outcomes with truncated supports, such as HRQoL, on covariates, after accounting for many idiosyncratic features of the outcomes distribution. This work will provide applied researchers with a practical set of tools to model outcomes in cost-effectiveness analysis.

  1. Systematic genomic identification of colorectal cancer genes delineating advanced from early clinical stage and metastasis

    PubMed Central

    2013-01-01

    Background Colorectal cancer is the third leading cause of cancer deaths in the United States. The initial assessment of colorectal cancer involves clinical staging that takes into account the extent of primary tumor invasion, determining the number of lymph nodes with metastatic cancer and the identification of metastatic sites in other organs. Advanced clinical stage indicates metastatic cancer, either in regional lymph nodes or in distant organs. While the genomic and genetic basis of colorectal cancer has been elucidated to some degree, less is known about the identity of specific cancer genes that are associated with advanced clinical stage and metastasis. Methods We compiled multiple genomic data types (mutations, copy number alterations, gene expression and methylation status) as well as clinical meta-data from The Cancer Genome Atlas (TCGA). We used an elastic-net regularized regression method on the combined genomic data to identify genetic aberrations and their associated cancer genes that are indicators of clinical stage. We ranked candidate genes by their regression coefficient and level of support from multiple assay modalities. Results A fit of the elastic-net regularized regression to 197 samples and integrated analysis of four genomic platforms identified the set of top gene predictors of advanced clinical stage, including: WRN, SYK, DDX5 and ADRA2C. These genetic features were identified robustly in bootstrap resampling analysis. Conclusions We conducted an analysis integrating multiple genomic features including mutations, copy number alterations, gene expression and methylation. This integrated approach in which one considers all of these genomic features performs better than any individual genomic assay. We identified multiple genes that robustly delineate advanced clinical stage, suggesting their possible role in colorectal cancer metastatic progression. PMID:24308539

  2. An event-based approach to understanding decadal fluctuations in the Atlantic meridional overturning circulation

    NASA Astrophysics Data System (ADS)

    Allison, Lesley; Hawkins, Ed; Woollings, Tim

    2015-01-01

    Many previous studies have shown that unforced climate model simulations exhibit decadal-scale fluctuations in the Atlantic meridional overturning circulation (AMOC), and that this variability can have impacts on surface climate fields. However, the robustness of these surface fingerprints across different models is less clear. Furthermore, with the potential for coupled feedbacks that may amplify or damp the response, it is not known whether the associated climate signals are linearly related to the strength of the AMOC changes, or if the fluctuation events exhibit nonlinear behaviour with respect to their strength or polarity. To explore these questions, we introduce an objective and flexible method for identifying the largest natural AMOC fluctuation events in multicentennial/multimillennial simulations of a variety of coupled climate models. The characteristics of the events are explored, including their magnitude, meridional coherence and spatial structure, as well as links with ocean heat transport and the horizontal circulation. The surface fingerprints in ocean temperature and salinity are examined, and compared with the results of linear regression analysis. It is found that the regressions generally provide a good indication of the surface changes associated with the largest AMOC events. However, there are some exceptions, including a nonlinear change in the atmospheric pressure signal, particularly at high latitudes, in HadCM3. Some asymmetries are also found between the changes associated with positive and negative AMOC events in the same model. Composite analysis suggests that there are signals that are robust across the largest AMOC events in each model, which provides reassurance that the surface changes associated with one particular event will be similar to those expected from regression analysis. However, large differences are found between the AMOC fingerprints in different models, which may hinder the prediction and attribution of such events in reality.

  3. Analysis of Binary Adherence Data in the Setting of Polypharmacy: A Comparison of Different Approaches

    PubMed Central

    Esserman, Denise A.; Moore, Charity G.; Roth, Mary T.

    2009-01-01

    Older community dwelling adults often take multiple medications for numerous chronic diseases. Non-adherence to these medications can have a large public health impact. Therefore, the measurement and modeling of medication adherence in the setting of polypharmacy is an important area of research. We apply a variety of different modeling techniques (standard linear regression; weighted linear regression; adjusted linear regression; naïve logistic regression; beta-binomial (BB) regression; generalized estimating equations (GEE)) to binary medication adherence data from a study in a North Carolina based population of older adults, where each medication an individual was taking was classified as adherent or non-adherent. In addition, through simulation we compare these different methods based on Type I error rates, bias, power, empirical 95% coverage, and goodness of fit. We find that estimation and inference using GEE is robust to a wide variety of scenarios and we recommend using this in the setting of polypharmacy when adherence is dichotomously measured for multiple medications per person. PMID:20414358

  4. Comparing lagged linear correlation, lagged regression, Granger causality, and vector autoregression for uncovering associations in EHR data.

    PubMed

    Levine, Matthew E; Albers, David J; Hripcsak, George

    2016-01-01

    Time series analysis methods have been shown to reveal clinical and biological associations in data collected in the electronic health record. We wish to develop reliable high-throughput methods for identifying adverse drug effects that are easy to implement and produce readily interpretable results. To move toward this goal, we used univariate and multivariate lagged regression models to investigate associations between twenty pairs of drug orders and laboratory measurements. Multivariate lagged regression models exhibited higher sensitivity and specificity than univariate lagged regression in the 20 examples, and incorporating autoregressive terms for labs and drugs produced more robust signals in cases of known associations among the 20 example pairings. Moreover, including inpatient admission terms in the model attenuated the signals for some cases of unlikely associations, demonstrating how multivariate lagged regression models' explicit handling of context-based variables can provide a simple way to probe for health-care processes that confound analyses of EHR data.

  5. Association of Protein Translation and Extracellular Matrix Gene Sets with Breast Cancer Metastasis: Findings Uncovered on Analysis of Multiple Publicly Available Datasets Using Individual Patient Data Approach.

    PubMed

    Chowdhury, Nilotpal; Sapru, Shantanu

    2015-01-01

    Microarray analysis has revolutionized the role of genomic prognostication in breast cancer. However, most studies are single series studies, and suffer from methodological problems. We sought to use a meta-analytic approach in combining multiple publicly available datasets, while correcting for batch effects, to reach a more robust oncogenomic analysis. The aim of the present study was to find gene sets associated with distant metastasis free survival (DMFS) in systemically untreated, node-negative breast cancer patients, from publicly available genomic microarray datasets. Four microarray series (having 742 patients) were selected after a systematic search and combined. Cox regression for each gene was done for the combined dataset (univariate, as well as multivariate - adjusted for expression of Cell cycle related genes) and for the 4 major molecular subtypes. The centre and microarray batch effects were adjusted by including them as random effects variables. The Cox regression coefficients for each analysis were then ranked and subjected to a Gene Set Enrichment Analysis (GSEA). Gene sets representing protein translation were independently negatively associated with metastasis in the Luminal A and Luminal B subtypes, but positively associated with metastasis in Basal tumors. Proteinaceous extracellular matrix (ECM) gene set expression was positively associated with metastasis, after adjustment for expression of cell cycle related genes on the combined dataset. Finally, the positive association of the proliferation-related genes with metastases was confirmed. To the best of our knowledge, the results depicting mixed prognostic significance of protein translation in breast cancer subtypes are being reported for the first time. We attribute this to our study combining multiple series and performing a more robust meta-analytic Cox regression modeling on the combined dataset, thus discovering 'hidden' associations. This methodology seems to yield new and interesting results and may be used as a tool to guide new research.

  6. Association of Protein Translation and Extracellular Matrix Gene Sets with Breast Cancer Metastasis: Findings Uncovered on Analysis of Multiple Publicly Available Datasets Using Individual Patient Data Approach

    PubMed Central

    Chowdhury, Nilotpal; Sapru, Shantanu

    2015-01-01

    Introduction Microarray analysis has revolutionized the role of genomic prognostication in breast cancer. However, most studies are single series studies, and suffer from methodological problems. We sought to use a meta-analytic approach in combining multiple publicly available datasets, while correcting for batch effects, to reach a more robust oncogenomic analysis. Aim The aim of the present study was to find gene sets associated with distant metastasis free survival (DMFS) in systemically untreated, node-negative breast cancer patients, from publicly available genomic microarray datasets. Methods Four microarray series (having 742 patients) were selected after a systematic search and combined. Cox regression for each gene was done for the combined dataset (univariate, as well as multivariate – adjusted for expression of Cell cycle related genes) and for the 4 major molecular subtypes. The centre and microarray batch effects were adjusted by including them as random effects variables. The Cox regression coefficients for each analysis were then ranked and subjected to a Gene Set Enrichment Analysis (GSEA). Results Gene sets representing protein translation were independently negatively associated with metastasis in the Luminal A and Luminal B subtypes, but positively associated with metastasis in Basal tumors. Proteinaceous extracellular matrix (ECM) gene set expression was positively associated with metastasis, after adjustment for expression of cell cycle related genes on the combined dataset. Finally, the positive association of the proliferation-related genes with metastases was confirmed. Conclusion To the best of our knowledge, the results depicting mixed prognostic significance of protein translation in breast cancer subtypes are being reported for the first time. We attribute this to our study combining multiple series and performing a more robust meta-analytic Cox regression modeling on the combined dataset, thus discovering 'hidden' associations. This methodology seems to yield new and interesting results and may be used as a tool to guide new research. PMID:26080057

  7. Source apportionment of soil heavy metals using robust absolute principal component scores-robust geographically weighted regression (RAPCS-RGWR) receptor model.

    PubMed

    Qu, Mingkai; Wang, Yan; Huang, Biao; Zhao, Yongcun

    2018-06-01

    The traditional source apportionment models, such as absolute principal component scores-multiple linear regression (APCS-MLR), are usually susceptible to outliers, which may be widely present in the regional geochemical dataset. Furthermore, the models are merely built on variable space instead of geographical space and thus cannot effectively capture the local spatial characteristics of each source contributions. To overcome the limitations, a new receptor model, robust absolute principal component scores-robust geographically weighted regression (RAPCS-RGWR), was proposed based on the traditional APCS-MLR model. Then, the new method was applied to the source apportionment of soil metal elements in a region of Wuhan City, China as a case study. Evaluations revealed that: (i) RAPCS-RGWR model had better performance than APCS-MLR model in the identification of the major sources of soil metal elements, and (ii) source contributions estimated by RAPCS-RGWR model were more close to the true soil metal concentrations than that estimated by APCS-MLR model. It is shown that the proposed RAPCS-RGWR model is a more effective source apportionment method than APCS-MLR (i.e., non-robust and global model) in dealing with the regional geochemical dataset. Copyright © 2018 Elsevier B.V. All rights reserved.

  8. TLE uncertainty estimation using robust weighted differencing

    NASA Astrophysics Data System (ADS)

    Geul, Jacco; Mooij, Erwin; Noomen, Ron

    2017-05-01

    Accurate knowledge of satellite orbit errors is essential for many types of analyses. Unfortunately, for two-line elements (TLEs) this is not available. This paper presents a weighted differencing method using robust least-squares regression for estimating many important error characteristics. The method is applied to both classic and enhanced TLEs, compared to previous implementations, and validated using Global Positioning System (GPS) solutions for the GOCE satellite in Low-Earth Orbit (LEO), prior to its re-entry. The method is found to be more accurate than previous TLE differencing efforts in estimating initial uncertainty, as well as error growth. The method also proves more reliable and requires no data filtering (such as outlier removal). Sensitivity analysis shows a strong relationship between argument of latitude and covariance (standard deviations and correlations), which the method is able to approximate. Overall, the method proves accurate, computationally fast, and robust, and is applicable to any object in the satellite catalogue (SATCAT).

  9. Multiple Imputation of a Randomly Censored Covariate Improves Logistic Regression Analysis.

    PubMed

    Atem, Folefac D; Qian, Jing; Maye, Jacqueline E; Johnson, Keith A; Betensky, Rebecca A

    2016-01-01

    Randomly censored covariates arise frequently in epidemiologic studies. The most commonly used methods, including complete case and single imputation or substitution, suffer from inefficiency and bias. They make strong parametric assumptions or they consider limit of detection censoring only. We employ multiple imputation, in conjunction with semi-parametric modeling of the censored covariate, to overcome these shortcomings and to facilitate robust estimation. We develop a multiple imputation approach for randomly censored covariates within the framework of a logistic regression model. We use the non-parametric estimate of the covariate distribution or the semiparametric Cox model estimate in the presence of additional covariates in the model. We evaluate this procedure in simulations, and compare its operating characteristics to those from the complete case analysis and a survival regression approach. We apply the procedures to an Alzheimer's study of the association between amyloid positivity and maternal age of onset of dementia. Multiple imputation achieves lower standard errors and higher power than the complete case approach under heavy and moderate censoring and is comparable under light censoring. The survival regression approach achieves the highest power among all procedures, but does not produce interpretable estimates of association. Multiple imputation offers a favorable alternative to complete case analysis and ad hoc substitution methods in the presence of randomly censored covariates within the framework of logistic regression.

  10. Can cover data be used as a surrogate for seedling counts in regeneration stocking evaluations in northern hardwood forests?

    Treesearch

    Todd E. Ristau; Susan L. Stout

    2014-01-01

    Assessment of regeneration can be time-consuming and costly. Often, foresters look for ways to minimize the cost of doing inventories. One potential method to reduce time required on a plot is use of percent cover data rather than seedling count data to determine stocking. Robust linear regression analysis was used in this report to predict seedling count data from...

  11. Modeling time-to-event (survival) data using classification tree analysis.

    PubMed

    Linden, Ariel; Yarnold, Paul R

    2017-12-01

    Time to the occurrence of an event is often studied in health research. Survival analysis differs from other designs in that follow-up times for individuals who do not experience the event by the end of the study (called censored) are accounted for in the analysis. Cox regression is the standard method for analysing censored data, but the assumptions required of these models are easily violated. In this paper, we introduce classification tree analysis (CTA) as a flexible alternative for modelling censored data. Classification tree analysis is a "decision-tree"-like classification model that provides parsimonious, transparent (ie, easy to visually display and interpret) decision rules that maximize predictive accuracy, derives exact P values via permutation tests, and evaluates model cross-generalizability. Using empirical data, we identify all statistically valid, reproducible, longitudinally consistent, and cross-generalizable CTA survival models and then compare their predictive accuracy to estimates derived via Cox regression and an unadjusted naïve model. Model performance is assessed using integrated Brier scores and a comparison between estimated survival curves. The Cox regression model best predicts average incidence of the outcome over time, whereas CTA survival models best predict either relatively high, or low, incidence of the outcome over time. Classification tree analysis survival models offer many advantages over Cox regression, such as explicit maximization of predictive accuracy, parsimony, statistical robustness, and transparency. Therefore, researchers interested in accurate prognoses and clear decision rules should consider developing models using the CTA-survival framework. © 2017 John Wiley & Sons, Ltd.

  12. Evaluation of the Bitterness of Traditional Chinese Medicines using an E-Tongue Coupled with a Robust Partial Least Squares Regression Method.

    PubMed

    Lin, Zhaozhou; Zhang, Qiao; Liu, Ruixin; Gao, Xiaojie; Zhang, Lu; Kang, Bingya; Shi, Junhan; Wu, Zidan; Gui, Xinjing; Li, Xuelin

    2016-01-25

    To accurately, safely, and efficiently evaluate the bitterness of Traditional Chinese Medicines (TCMs), a robust predictor was developed using robust partial least squares (RPLS) regression method based on data obtained from an electronic tongue (e-tongue) system. The data quality was verified by the Grubb's test. Moreover, potential outliers were detected based on both the standardized residual and score distance calculated for each sample. The performance of RPLS on the dataset before and after outlier detection was compared to other state-of-the-art methods including multivariate linear regression, least squares support vector machine, and the plain partial least squares regression. Both R² and root-mean-squares error (RMSE) of cross-validation (CV) were recorded for each model. With four latent variables, a robust RMSECV value of 0.3916 with bitterness values ranging from 0.63 to 4.78 were obtained for the RPLS model that was constructed based on the dataset including outliers. Meanwhile, the RMSECV, which was calculated using the models constructed by other methods, was larger than that of the RPLS model. After six outliers were excluded, the performance of all benchmark methods markedly improved, but the difference between the RPLS model constructed before and after outlier exclusion was negligible. In conclusion, the bitterness of TCM decoctions can be accurately evaluated with the RPLS model constructed using e-tongue data.

  13. SU-F-J-64: Comparison of Dosimetric Robustness Between Proton Therapy and IMRT Plans Following Tumor Regression for Locally Advanced Non-Small Cell Lung Cancer (NSCLC)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Teng, C; Ainsley, C; Teo, B

    Purpose: In the light of tumor regression and normal tissue changes, dose distributions can deviate undesirably from what was planned. As a consequence, replanning is sometimes necessary during treatment to ensure continued tumor coverage or to avoid overdosing organs at risk (OARs). Proton plans are generally thought to be less robust than photon plans because of the proton beam’s higher sensitivity to changes in tissue composition, suggesting also a higher likely replanning rate due to tumor regression. The purpose of this study is to compare dosimetric deviations between forward-calculated double scattering (DS) proton plans with IMRT plans upon tumor regression,more » and assesses their impact on clinical replanning decisions. Methods: Ten consecutive locally advanced NSCLC patients whose tumors shrank > 50% in volume and who received four or more CT scans during radiotherapy were analyzed. All the patients received proton radiotherapy (6660 cGy, 180 cGy/fx). Dosimetric robustness during therapy was characterized by changes in the planning objective metrics as well as by point-by-point root-mean-squared differences for the entire PTV, ITV, and OARs (heart, cord, esophagus, brachial plexus and lungs) DVHs. Results: Sixty-four pairs of DVHs were reviewed by three clinicians, who requested a replanning rate of 16.7% and 18.6% for DS and IMRT plans, respectively, with a high agreement between providers. Robustness of clinical indicators was found to depend on the beam orientation and dose level on the DVH curve. Proton dose increased most in OARs distal to the PTV along the beam path, but these changes were primarily in the mid to low dose levels. In contrast, the variation in IMRT plans occurred primarily in the high dose region. Conclusion: Robustness of clinical indicators depends where on the DVH curves comparisons are made. Similar replanning rates were observed for DS and IMRT plans upon large tumor regression.« less

  14. Automatic coronary artery segmentation based on multi-domains remapping and quantile regression in angiographies.

    PubMed

    Li, Zhixun; Zhang, Yingtao; Gong, Huiling; Li, Weimin; Tang, Xianglong

    2016-12-01

    Coronary artery disease has become the most dangerous diseases to human life. And coronary artery segmentation is the basis of computer aided diagnosis and analysis. Existing segmentation methods are difficult to handle the complex vascular texture due to the projective nature in conventional coronary angiography. Due to large amount of data and complex vascular shapes, any manual annotation has become increasingly unrealistic. A fully automatic segmentation method is necessary in clinic practice. In this work, we study a method based on reliable boundaries via multi-domains remapping and robust discrepancy correction via distance balance and quantile regression for automatic coronary artery segmentation of angiography images. The proposed method can not only segment overlapping vascular structures robustly, but also achieve good performance in low contrast regions. The effectiveness of our approach is demonstrated on a variety of coronary blood vessels compared with the existing methods. The overall segmentation performances si, fnvf, fvpf and tpvf were 95.135%, 3.733%, 6.113%, 96.268%, respectively. Copyright © 2016 Elsevier Ltd. All rights reserved.

  15. Comparison of robustness to outliers between robust poisson models and log-binomial models when estimating relative risks for common binary outcomes: a simulation study.

    PubMed

    Chen, Wansu; Shi, Jiaxiao; Qian, Lei; Azen, Stanley P

    2014-06-26

    To estimate relative risks or risk ratios for common binary outcomes, the most popular model-based methods are the robust (also known as modified) Poisson and the log-binomial regression. Of the two methods, it is believed that the log-binomial regression yields more efficient estimators because it is maximum likelihood based, while the robust Poisson model may be less affected by outliers. Evidence to support the robustness of robust Poisson models in comparison with log-binomial models is very limited. In this study a simulation was conducted to evaluate the performance of the two methods in several scenarios where outliers existed. The findings indicate that for data coming from a population where the relationship between the outcome and the covariate was in a simple form (e.g. log-linear), the two models yielded comparable biases and mean square errors. However, if the true relationship contained a higher order term, the robust Poisson models consistently outperformed the log-binomial models even when the level of contamination is low. The robust Poisson models are more robust (or less sensitive) to outliers compared to the log-binomial models when estimating relative risks or risk ratios for common binary outcomes. Users should be aware of the limitations when choosing appropriate models to estimate relative risks or risk ratios.

  16. Robust scoring functions for protein-ligand interactions with quantum chemical charge models.

    PubMed

    Wang, Jui-Chih; Lin, Jung-Hsin; Chen, Chung-Ming; Perryman, Alex L; Olson, Arthur J

    2011-10-24

    Ordinary least-squares (OLS) regression has been used widely for constructing the scoring functions for protein-ligand interactions. However, OLS is very sensitive to the existence of outliers, and models constructed using it are easily affected by the outliers or even the choice of the data set. On the other hand, determination of atomic charges is regarded as of central importance, because the electrostatic interaction is known to be a key contributing factor for biomolecular association. In the development of the AutoDock4 scoring function, only OLS was conducted, and the simple Gasteiger method was adopted. It is therefore of considerable interest to see whether more rigorous charge models could improve the statistical performance of the AutoDock4 scoring function. In this study, we have employed two well-established quantum chemical approaches, namely the restrained electrostatic potential (RESP) and the Austin-model 1-bond charge correction (AM1-BCC) methods, to obtain atomic partial charges, and we have compared how different charge models affect the performance of AutoDock4 scoring functions. In combination with robust regression analysis and outlier exclusion, our new protein-ligand free energy regression model with AM1-BCC charges for ligands and Amber99SB charges for proteins achieve lowest root-mean-squared error of 1.637 kcal/mol for the training set of 147 complexes and 2.176 kcal/mol for the external test set of 1427 complexes. The assessment for binding pose prediction with the 100 external decoy sets indicates very high success rate of 87% with the criteria of predicted root-mean-squared deviation of less than 2 Å. The success rates and statistical performance of our robust scoring functions are only weakly class-dependent (hydrophobic, hydrophilic, or mixed).

  17. Empirical Likelihood in Nonignorable Covariate-Missing Data Problems.

    PubMed

    Xie, Yanmei; Zhang, Biao

    2017-04-20

    Missing covariate data occurs often in regression analysis, which frequently arises in the health and social sciences as well as in survey sampling. We study methods for the analysis of a nonignorable covariate-missing data problem in an assumed conditional mean function when some covariates are completely observed but other covariates are missing for some subjects. We adopt the semiparametric perspective of Bartlett et al. (Improving upon the efficiency of complete case analysis when covariates are MNAR. Biostatistics 2014;15:719-30) on regression analyses with nonignorable missing covariates, in which they have introduced the use of two working models, the working probability model of missingness and the working conditional score model. In this paper, we study an empirical likelihood approach to nonignorable covariate-missing data problems with the objective of effectively utilizing the two working models in the analysis of covariate-missing data. We propose a unified approach to constructing a system of unbiased estimating equations, where there are more equations than unknown parameters of interest. One useful feature of these unbiased estimating equations is that they naturally incorporate the incomplete data into the data analysis, making it possible to seek efficient estimation of the parameter of interest even when the working regression function is not specified to be the optimal regression function. We apply the general methodology of empirical likelihood to optimally combine these unbiased estimating equations. We propose three maximum empirical likelihood estimators of the underlying regression parameters and compare their efficiencies with other existing competitors. We present a simulation study to compare the finite-sample performance of various methods with respect to bias, efficiency, and robustness to model misspecification. The proposed empirical likelihood method is also illustrated by an analysis of a data set from the US National Health and Nutrition Examination Survey (NHANES).

  18. Defining surfaces for skewed, highly variable data

    USGS Publications Warehouse

    Helsel, D.R.; Ryker, S.J.

    2002-01-01

    Skewness of environmental data is often caused by more than simply a handful of outliers in an otherwise normal distribution. Statistical procedures for such datasets must be sufficiently robust to deal with distributions that are strongly non-normal, containing both a large proportion of outliers and a skewed main body of data. In the field of water quality, skewness is commonly associated with large variation over short distances. Spatial analysis of such data generally requires either considerable effort at modeling or the use of robust procedures not strongly affected by skewness and local variability. Using a skewed dataset of 675 nitrate measurements in ground water, commonly used methods for defining a surface (least-squares regression and kriging) are compared to a more robust method (loess). Three choices are critical in defining a surface: (i) is the surface to be a central mean or median surface? (ii) is either a well-fitting transformation or a robust and scale-independent measure of center used? (iii) does local spatial autocorrelation assist in or detract from addressing objectives? Published in 2002 by John Wiley & Sons, Ltd.

  19. Multivariate Linear Regression and CART Regression Analysis of TBM Performance at Abu Hamour Phase-I Tunnel

    NASA Astrophysics Data System (ADS)

    Jakubowski, J.; Stypulkowski, J. B.; Bernardeau, F. G.

    2017-12-01

    The first phase of the Abu Hamour drainage and storm tunnel was completed in early 2017. The 9.5 km long, 3.7 m diameter tunnel was excavated with two Earth Pressure Balance (EPB) Tunnel Boring Machines from Herrenknecht. TBM operation processes were monitored and recorded by Data Acquisition and Evaluation System. The authors coupled collected TBM drive data with available information on rock mass properties, cleansed, completed with secondary variables and aggregated by weeks and shifts. Correlations and descriptive statistics charts were examined. Multivariate Linear Regression and CART regression tree models linking TBM penetration rate (PR), penetration per revolution (PPR) and field penetration index (FPI) with TBM operational and geotechnical characteristics were performed for the conditions of the weak/soft rock of Doha. Both regression methods are interpretable and the data were screened with different computational approaches allowing enriched insight. The primary goal of the analysis was to investigate empirical relations between multiple explanatory and responding variables, to search for best subsets of explanatory variables and to evaluate the strength of linear and non-linear relations. For each of the penetration indices, a predictive model coupling both regression methods was built and validated. The resultant models appeared to be stronger than constituent ones and indicated an opportunity for more accurate and robust TBM performance predictions.

  20. A New SEYHAN's Approach in Case of Heterogeneity of Regression Slopes in ANCOVA.

    PubMed

    Ankarali, Handan; Cangur, Sengul; Ankarali, Seyit

    2018-06-01

    In this study, when the assumptions of linearity and homogeneity of regression slopes of conventional ANCOVA are not met, a new approach named as SEYHAN has been suggested to use conventional ANCOVA instead of robust or nonlinear ANCOVA. The proposed SEYHAN's approach involves transformation of continuous covariate into categorical structure when the relationship between covariate and dependent variable is nonlinear and the regression slopes are not homogenous. A simulated data set was used to explain SEYHAN's approach. In this approach, we performed conventional ANCOVA in each subgroup which is constituted according to knot values and analysis of variance with two-factor model after MARS method was used for categorization of covariate. The first model is a simpler model than the second model that includes interaction term. Since the model with interaction effect has more subjects, the power of test also increases and the existing significant difference is revealed better. We can say that linearity and homogeneity of regression slopes are not problem for data analysis by conventional linear ANCOVA model by helping this approach. It can be used fast and efficiently for the presence of one or more covariates.

  1. Design, innovation, and rural creative places: Are the arts the cherry on top, or the secret sauce?

    PubMed

    Wojan, Timothy R; Nichols, Bonnie

    2018-01-01

    Creative class theory explains the positive relationship between the arts and commercial innovation as the mutual attraction of artists and other creative workers by an unobserved creative milieu. This study explores alternative theories for rural settings, by analyzing establishment-level survey data combined with data on the local arts scene. The study identifies the local contextual factors associated with a strong design orientation, and estimates the impact that a strong design orientation has on the local economy. Data on innovation and design come from a nationally representative sample of establishments in tradable industries. Latent class analysis allows identifying unobserved subpopulations comprised of establishments with different design and innovation orientations. Logistic regression allows estimating the association between an establishment's design orientation and local contextual factors. A quantile instrumental variable regression allows assessing the robustness of the logistic regression results with respect to endogeneity. An estimate of design orientation at the local level derived from the survey is used to examine variation in economic performance during the period of recovery from the Great Recession (2010-2014). Three distinct innovation (substantive, nominal, and non-innovators) and design orientations (design-integrated, "design last finish," and no systematic approach to design) are identified. Innovation- and design-intensive establishments were identified in both rural and urban areas. Rural design-integrated establishments tended to locate in counties with more highly educated workforces and containing at least one performing arts organization. A quantile instrumental variable regression confirmed that the logistic regression result is robust to endogeneity concerns. Finally, rural areas characterized by design-integrated establishments experienced faster growth in wages relative to rural areas characterized by establishments using no systematic approach to design.

  2. Support vector machine regression (SVR/LS-SVM)--an alternative to neural networks (ANN) for analytical chemistry? Comparison of nonlinear methods on near infrared (NIR) spectroscopy data.

    PubMed

    Balabin, Roman M; Lomakina, Ekaterina I

    2011-04-21

    In this study, we make a general comparison of the accuracy and robustness of five multivariate calibration models: partial least squares (PLS) regression or projection to latent structures, polynomial partial least squares (Poly-PLS) regression, artificial neural networks (ANNs), and two novel techniques based on support vector machines (SVMs) for multivariate data analysis: support vector regression (SVR) and least-squares support vector machines (LS-SVMs). The comparison is based on fourteen (14) different datasets: seven sets of gasoline data (density, benzene content, and fractional composition/boiling points), two sets of ethanol gasoline fuel data (density and ethanol content), one set of diesel fuel data (total sulfur content), three sets of petroleum (crude oil) macromolecules data (weight percentages of asphaltenes, resins, and paraffins), and one set of petroleum resins data (resins content). Vibrational (near-infrared, NIR) spectroscopic data are used to predict the properties and quality coefficients of gasoline, biofuel/biodiesel, diesel fuel, and other samples of interest. The four systems presented here range greatly in composition, properties, strength of intermolecular interactions (e.g., van der Waals forces, H-bonds), colloid structure, and phase behavior. Due to the high diversity of chemical systems studied, general conclusions about SVM regression methods can be made. We try to answer the following question: to what extent can SVM-based techniques replace ANN-based approaches in real-world (industrial/scientific) applications? The results show that both SVR and LS-SVM methods are comparable to ANNs in accuracy. Due to the much higher robustness of the former, the SVM-based approaches are recommended for practical (industrial) application. This has been shown to be especially true for complicated, highly nonlinear objects.

  3. Design, innovation, and rural creative places: Are the arts the cherry on top, or the secret sauce?

    PubMed Central

    Nichols, Bonnie

    2018-01-01

    Objective Creative class theory explains the positive relationship between the arts and commercial innovation as the mutual attraction of artists and other creative workers by an unobserved creative milieu. This study explores alternative theories for rural settings, by analyzing establishment-level survey data combined with data on the local arts scene. The study identifies the local contextual factors associated with a strong design orientation, and estimates the impact that a strong design orientation has on the local economy. Method Data on innovation and design come from a nationally representative sample of establishments in tradable industries. Latent class analysis allows identifying unobserved subpopulations comprised of establishments with different design and innovation orientations. Logistic regression allows estimating the association between an establishment’s design orientation and local contextual factors. A quantile instrumental variable regression allows assessing the robustness of the logistic regression results with respect to endogeneity. An estimate of design orientation at the local level derived from the survey is used to examine variation in economic performance during the period of recovery from the Great Recession (2010–2014). Results Three distinct innovation (substantive, nominal, and non-innovators) and design orientations (design-integrated, “design last finish,” and no systematic approach to design) are identified. Innovation- and design-intensive establishments were identified in both rural and urban areas. Rural design-integrated establishments tended to locate in counties with more highly educated workforces and containing at least one performing arts organization. A quantile instrumental variable regression confirmed that the logistic regression result is robust to endogeneity concerns. Finally, rural areas characterized by design-integrated establishments experienced faster growth in wages relative to rural areas characterized by establishments using no systematic approach to design. PMID:29489884

  4. Robust inference in the negative binomial regression model with an application to falls data.

    PubMed

    Aeberhard, William H; Cantoni, Eva; Heritier, Stephane

    2014-12-01

    A popular way to model overdispersed count data, such as the number of falls reported during intervention studies, is by means of the negative binomial (NB) distribution. Classical estimating methods are well-known to be sensitive to model misspecifications, taking the form of patients falling much more than expected in such intervention studies where the NB regression model is used. We extend in this article two approaches for building robust M-estimators of the regression parameters in the class of generalized linear models to the NB distribution. The first approach achieves robustness in the response by applying a bounded function on the Pearson residuals arising in the maximum likelihood estimating equations, while the second approach achieves robustness by bounding the unscaled deviance components. For both approaches, we explore different choices for the bounding functions. Through a unified notation, we show how close these approaches may actually be as long as the bounding functions are chosen and tuned appropriately, and provide the asymptotic distributions of the resulting estimators. Moreover, we introduce a robust weighted maximum likelihood estimator for the overdispersion parameter, specific to the NB distribution. Simulations under various settings show that redescending bounding functions yield estimates with smaller biases under contamination while keeping high efficiency at the assumed model, and this for both approaches. We present an application to a recent randomized controlled trial measuring the effectiveness of an exercise program at reducing the number of falls among people suffering from Parkinsons disease to illustrate the diagnostic use of such robust procedures and their need for reliable inference. © 2014, The International Biometric Society.

  5. Evaluation of the Bitterness of Traditional Chinese Medicines using an E-Tongue Coupled with a Robust Partial Least Squares Regression Method

    PubMed Central

    Lin, Zhaozhou; Zhang, Qiao; Liu, Ruixin; Gao, Xiaojie; Zhang, Lu; Kang, Bingya; Shi, Junhan; Wu, Zidan; Gui, Xinjing; Li, Xuelin

    2016-01-01

    To accurately, safely, and efficiently evaluate the bitterness of Traditional Chinese Medicines (TCMs), a robust predictor was developed using robust partial least squares (RPLS) regression method based on data obtained from an electronic tongue (e-tongue) system. The data quality was verified by the Grubb’s test. Moreover, potential outliers were detected based on both the standardized residual and score distance calculated for each sample. The performance of RPLS on the dataset before and after outlier detection was compared to other state-of-the-art methods including multivariate linear regression, least squares support vector machine, and the plain partial least squares regression. Both R2 and root-mean-squares error (RMSE) of cross-validation (CV) were recorded for each model. With four latent variables, a robust RMSECV value of 0.3916 with bitterness values ranging from 0.63 to 4.78 were obtained for the RPLS model that was constructed based on the dataset including outliers. Meanwhile, the RMSECV, which was calculated using the models constructed by other methods, was larger than that of the RPLS model. After six outliers were excluded, the performance of all benchmark methods markedly improved, but the difference between the RPLS model constructed before and after outlier exclusion was negligible. In conclusion, the bitterness of TCM decoctions can be accurately evaluated with the RPLS model constructed using e-tongue data. PMID:26821026

  6. MicroCT angiography detects vascular formation and regression in skin wound healing

    PubMed Central

    Urao, Norifumi; Okonkwo, Uzoagu A.; Fang, Milie M.; Zhuang, Zhen W.; Koh, Timothy J.; DiPietro, Luisa A.

    2016-01-01

    Properly regulated angiogenesis and arteriogenesis are essential for effective wound healing. Tissue injury induces robust new vessel formation and subsequent vessel maturation, which involves vessel regression and remodeling. Although formation of functional vasculature is essential for healing, alterations in vascular structure over the time course of skin wound healing are not well understood. Here, using high-resolution ex vivo X-ray micro-computed tomography (microCT), we describe the vascular network during healing of skin excisional wounds with highly detailed three-dimensional (3D) reconstructed images and associated quantitative analysis. We found that relative vessel volume, surface area and branching number are significantly decreased in wounds from day 7 to day 14 and 21. Segmentation and skeletonization analysis of selected branches from high-resolution images as small as 2.5 μm voxel size show that branching orders are decreased in the wound vessels during healing. In histological analysis, we found that the contrast agent fills mainly arterioles, but not small capillaries nor large veins. In summary, high-resolution microCT revealed dynamic alterations of vessel structures during wound healing. This technique may be useful as a key tool in the study of the formation and regression of wound vessels. PMID:27009591

  7. Guidance for the utility of linear models in meta-analysis of genetic association studies of binary phenotypes.

    PubMed

    Cook, James P; Mahajan, Anubha; Morris, Andrew P

    2017-02-01

    Linear mixed models are increasingly used for the analysis of genome-wide association studies (GWAS) of binary phenotypes because they can efficiently and robustly account for population stratification and relatedness through inclusion of random effects for a genetic relationship matrix. However, the utility of linear (mixed) models in the context of meta-analysis of GWAS of binary phenotypes has not been previously explored. In this investigation, we present simulations to compare the performance of linear and logistic regression models under alternative weighting schemes in a fixed-effects meta-analysis framework, considering designs that incorporate variable case-control imbalance, confounding factors and population stratification. Our results demonstrate that linear models can be used for meta-analysis of GWAS of binary phenotypes, without loss of power, even in the presence of extreme case-control imbalance, provided that one of the following schemes is used: (i) effective sample size weighting of Z-scores or (ii) inverse-variance weighting of allelic effect sizes after conversion onto the log-odds scale. Our conclusions thus provide essential recommendations for the development of robust protocols for meta-analysis of binary phenotypes with linear models.

  8. The relationship between air pollution, fossil fuel energy consumption, and water resources in the panel of selected Asia-Pacific countries.

    PubMed

    Rafindadi, Abdulkadir Abdulrashid; Yusof, Zarinah; Zaman, Khalid; Kyophilavong, Phouphet; Akhmat, Ghulam

    2014-10-01

    The objective of the study is to examine the relationship between air pollution, fossil fuel energy consumption, water resources, and natural resource rents in the panel of selected Asia-Pacific countries, over a period of 1975-2012. The study includes number of variables in the model for robust analysis. The results of cross-sectional analysis show that there is a significant relationship between air pollution, energy consumption, and water productivity in the individual countries of Asia-Pacific. However, the results of each country vary according to the time invariant shocks. For this purpose, the study employed the panel least square technique which includes the panel least square regression, panel fixed effect regression, and panel two-stage least square regression. In general, all the panel tests indicate that there is a significant and positive relationship between air pollution, energy consumption, and water resources in the region. The fossil fuel energy consumption has a major dominating impact on the changes in the air pollution in the region.

  9. High-level language ability in healthy individuals and its relationship with verbal working memory.

    PubMed

    Antonsson, Malin; Longoni, Francesca; Einald, Christina; Hallberg, Lina; Kurt, Gabriella; Larsson, Kajsa; Nilsson, Tina; Hartelius, Lena

    2016-01-01

    The aims of the study were to investigate healthy subjects' performance on a clinical test of high-level language (HLL) and how it is related to demographic characteristics and verbal working memory (VWM). One hundred healthy subjects (20-79 years old) were assessed with the Swedish BeSS test (Laakso, Brunnegård, Hartelius, & Ahlsén, 2000) and two digit span tasks. Relationships between the demographic variables, VWM and BeSS were investigated both with bivariate correlations and multiple regression analysis. The results present the norms for BeSS. The correlations and multiple regression analysis show that demographic variables had limited influence on test performance. Measures of VWM were moderately related to total BeSS score and weakly to moderately correlated with five of the seven subtests. To conclude, education has an influence on the test as a whole but measures of VWM stood out as the most robust predictor of HLL.

  10. Vector autoregressive models: A Gini approach

    NASA Astrophysics Data System (ADS)

    Mussard, Stéphane; Ndiaye, Oumar Hamady

    2018-02-01

    In this paper, it is proven that the usual VAR models may be performed in the Gini sense, that is, on a ℓ1 metric space. The Gini regression is robust to outliers. As a consequence, when data are contaminated by extreme values, we show that semi-parametric VAR-Gini regressions may be used to obtain robust estimators. The inference about the estimators is made with the ℓ1 norm. Also, impulse response functions and Gini decompositions for prevision errors are introduced. Finally, Granger's causality tests are properly derived based on U-statistics.

  11. Integrated Low-Rank-Based Discriminative Feature Learning for Recognition.

    PubMed

    Zhou, Pan; Lin, Zhouchen; Zhang, Chao

    2016-05-01

    Feature learning plays a central role in pattern recognition. In recent years, many representation-based feature learning methods have been proposed and have achieved great success in many applications. However, these methods perform feature learning and subsequent classification in two separate steps, which may not be optimal for recognition tasks. In this paper, we present a supervised low-rank-based approach for learning discriminative features. By integrating latent low-rank representation (LatLRR) with a ridge regression-based classifier, our approach combines feature learning with classification, so that the regulated classification error is minimized. In this way, the extracted features are more discriminative for the recognition tasks. Our approach benefits from a recent discovery on the closed-form solutions to noiseless LatLRR. When there is noise, a robust Principal Component Analysis (PCA)-based denoising step can be added as preprocessing. When the scale of a problem is large, we utilize a fast randomized algorithm to speed up the computation of robust PCA. Extensive experimental results demonstrate the effectiveness and robustness of our method.

  12. Structured Kernel Subspace Learning for Autonomous Robot Navigation.

    PubMed

    Kim, Eunwoo; Choi, Sungjoon; Oh, Songhwai

    2018-02-14

    This paper considers two important problems for autonomous robot navigation in a dynamic environment, where the goal is to predict pedestrian motion and control a robot with the prediction for safe navigation. While there are several methods for predicting the motion of a pedestrian and controlling a robot to avoid incoming pedestrians, it is still difficult to safely navigate in a dynamic environment due to challenges, such as the varying quality and complexity of training data with unwanted noises. This paper addresses these challenges simultaneously by proposing a robust kernel subspace learning algorithm based on the recent advances in nuclear-norm and l 1 -norm minimization. We model the motion of a pedestrian and the robot controller using Gaussian processes. The proposed method efficiently approximates a kernel matrix used in Gaussian process regression by learning low-rank structured matrix (with symmetric positive semi-definiteness) to find an orthogonal basis, which eliminates the effects of erroneous and inconsistent data. Based on structured kernel subspace learning, we propose a robust motion model and motion controller for safe navigation in dynamic environments. We evaluate the proposed robust kernel learning in various tasks, including regression, motion prediction, and motion control problems, and demonstrate that the proposed learning-based systems are robust against outliers and outperform existing regression and navigation methods.

  13. Determination of boiling point of petrochemicals by gas chromatography-mass spectrometry and multivariate regression analysis of structural activity relationship.

    PubMed

    Fakayode, Sayo O; Mitchell, Breanna S; Pollard, David A

    2014-08-01

    Accurate understanding of analyte boiling points (BP) is of critical importance in gas chromatographic (GC) separation and crude oil refinery operation in petrochemical industries. This study reported the first combined use of GC separation and partial-least-square (PLS1) multivariate regression analysis of petrochemical structural activity relationship (SAR) for accurate BP determination of two commercially available (D3710 and MA VHP) calibration gas mix samples. The results of the BP determination using PLS1 multivariate regression were further compared with the results of traditional simulated distillation method of BP determination. The developed PLS1 regression was able to correctly predict analytes BP in D3710 and MA VHP calibration gas mix samples, with a root-mean-square-%-relative-error (RMS%RE) of 6.4%, and 10.8% respectively. In contrast, the overall RMS%RE of 32.9% and 40.4%, respectively obtained for BP determination in D3710 and MA VHP using a traditional simulated distillation method were approximately four times larger than the corresponding RMS%RE of BP prediction using MRA, demonstrating the better predictive ability of MRA. The reported method is rapid, robust, and promising, and can be potentially used routinely for fast analysis, pattern recognition, and analyte BP determination in petrochemical industries. Copyright © 2014 Elsevier B.V. All rights reserved.

  14. Bounded-Influence Inference in Regression.

    DTIC Science & Technology

    1984-02-01

    be viewed as generalization of the classical F-test. By means of the influence function their robustness properties are investigated and optimally...robust tests that maximize the asymptotic power within each class, under the side condition of a bounded influence function , are constructed. Finally, an

  15. Doubly Robust Additive Hazards Models to Estimate Effects of a Continuous Exposure on Survival.

    PubMed

    Wang, Yan; Lee, Mihye; Liu, Pengfei; Shi, Liuhua; Yu, Zhi; Abu Awad, Yara; Zanobetti, Antonella; Schwartz, Joel D

    2017-11-01

    The effect of an exposure on survival can be biased when the regression model is misspecified. Hazard difference is easier to use in risk assessment than hazard ratio and has a clearer interpretation in the assessment of effect modifications. We proposed two doubly robust additive hazards models to estimate the causal hazard difference of a continuous exposure on survival. The first model is an inverse probability-weighted additive hazards regression. The second model is an extension of the doubly robust estimator for binary exposures by categorizing the continuous exposure. We compared these with the marginal structural model and outcome regression with correct and incorrect model specifications using simulations. We applied doubly robust additive hazard models to the estimation of hazard difference of long-term exposure to PM2.5 (particulate matter with an aerodynamic diameter less than or equal to 2.5 microns) on survival using a large cohort of 13 million older adults residing in seven states of the Southeastern United States. We showed that the proposed approaches are doubly robust. We found that each 1 μg m increase in annual PM2.5 exposure was associated with a causal hazard difference in mortality of 8.0 × 10 (95% confidence interval 7.4 × 10, 8.7 × 10), which was modified by age, medical history, socioeconomic status, and urbanicity. The overall hazard difference translates to approximately 5.5 (5.1, 6.0) thousand deaths per year in the study population. The proposed approaches improve the robustness of the additive hazards model and produce a novel additive causal estimate of PM2.5 on survival and several additive effect modifications, including social inequality.

  16. Random Bits Forest: a Strong Classifier/Regressor for Big Data

    NASA Astrophysics Data System (ADS)

    Wang, Yi; Li, Yi; Pu, Weilin; Wen, Kathryn; Shugart, Yin Yao; Xiong, Momiao; Jin, Li

    2016-07-01

    Efficiency, memory consumption, and robustness are common problems with many popular methods for data analysis. As a solution, we present Random Bits Forest (RBF), a classification and regression algorithm that integrates neural networks (for depth), boosting (for width), and random forests (for prediction accuracy). Through a gradient boosting scheme, it first generates and selects ~10,000 small, 3-layer random neural networks. These networks are then fed into a modified random forest algorithm to obtain predictions. Testing with datasets from the UCI (University of California, Irvine) Machine Learning Repository shows that RBF outperforms other popular methods in both accuracy and robustness, especially with large datasets (N > 1000). The algorithm also performed highly in testing with an independent data set, a real psoriasis genome-wide association study (GWAS).

  17. A probabilistic approach to aircraft design emphasizing stability and control uncertainties

    NASA Astrophysics Data System (ADS)

    Delaurentis, Daniel Andrew

    In order to address identified deficiencies in current approaches to aerospace systems design, a new method has been developed. This new method for design is based on the premise that design is a decision making activity, and that deterministic analysis and synthesis can lead to poor, or misguided decision making. This is due to a lack of disciplinary knowledge of sufficient fidelity about the product, to the presence of uncertainty at multiple levels of the aircraft design hierarchy, and to a failure to focus on overall affordability metrics as measures of goodness. Design solutions are desired which are robust to uncertainty and are based on the maximum knowledge possible. The new method represents advances in the two following general areas. 1. Design models and uncertainty. The research performed completes a transition from a deterministic design representation to a probabilistic one through a modeling of design uncertainty at multiple levels of the aircraft design hierarchy, including: (1) Consistent, traceable uncertainty classification and representation; (2) Concise mathematical statement of the Probabilistic Robust Design problem; (3) Variants of the Cumulative Distribution Functions (CDFs) as decision functions for Robust Design; (4) Probabilistic Sensitivities which identify the most influential sources of variability. 2. Multidisciplinary analysis and design. Imbedded in the probabilistic methodology is a new approach for multidisciplinary design analysis and optimization (MDA/O), employing disciplinary analysis approximations formed through statistical experimentation and regression. These approximation models are a function of design variables common to the system level as well as other disciplines. For aircraft, it is proposed that synthesis/sizing is the proper avenue for integrating multiple disciplines. Research hypotheses are translated into a structured method, which is subsequently tested for validity. Specifically, the implementation involves the study of the relaxed static stability technology for a supersonic commercial transport aircraft. The probabilistic robust design method is exercised resulting in a series of robust design solutions based on different interpretations of "robustness". Insightful results are obtained and the ability of the method to expose trends in the design space are noted as a key advantage.

  18. Skeletal Correlates for Body Mass Estimation in Modern and Fossil Flying Birds

    PubMed Central

    Field, Daniel J.; Lynner, Colton; Brown, Christian; Darroch, Simon A. F.

    2013-01-01

    Scaling relationships between skeletal dimensions and body mass in extant birds are often used to estimate body mass in fossil crown-group birds, as well as in stem-group avialans. However, useful statistical measurements for constraining the precision and accuracy of fossil mass estimates are rarely provided, which prevents the quantification of robust upper and lower bound body mass estimates for fossils. Here, we generate thirteen body mass correlations and associated measures of statistical robustness using a sample of 863 extant flying birds. By providing robust body mass regressions with upper- and lower-bound prediction intervals for individual skeletal elements, we address the longstanding problem of body mass estimation for highly fragmentary fossil birds. We demonstrate that the most precise proxy for estimating body mass in the overall dataset, measured both as coefficient determination of ordinary least squares regression and percent prediction error, is the maximum diameter of the coracoid’s humeral articulation facet (the glenoid). We further demonstrate that this result is consistent among the majority of investigated avian orders (10 out of 18). As a result, we suggest that, in the majority of cases, this proxy may provide the most accurate estimates of body mass for volant fossil birds. Additionally, by presenting statistical measurements of body mass prediction error for thirteen different body mass regressions, this study provides a much-needed quantitative framework for the accurate estimation of body mass and associated ecological correlates in fossil birds. The application of these regressions will enhance the precision and robustness of many mass-based inferences in future paleornithological studies. PMID:24312392

  19. A multimedia retrieval framework based on semi-supervised ranking and relevance feedback.

    PubMed

    Yang, Yi; Nie, Feiping; Xu, Dong; Luo, Jiebo; Zhuang, Yueting; Pan, Yunhe

    2012-04-01

    We present a new framework for multimedia content analysis and retrieval which consists of two independent algorithms. First, we propose a new semi-supervised algorithm called ranking with Local Regression and Global Alignment (LRGA) to learn a robust Laplacian matrix for data ranking. In LRGA, for each data point, a local linear regression model is used to predict the ranking scores of its neighboring points. A unified objective function is then proposed to globally align the local models from all the data points so that an optimal ranking score can be assigned to each data point. Second, we propose a semi-supervised long-term Relevance Feedback (RF) algorithm to refine the multimedia data representation. The proposed long-term RF algorithm utilizes both the multimedia data distribution in multimedia feature space and the history RF information provided by users. A trace ratio optimization problem is then formulated and solved by an efficient algorithm. The algorithms have been applied to several content-based multimedia retrieval applications, including cross-media retrieval, image retrieval, and 3D motion/pose data retrieval. Comprehensive experiments on four data sets have demonstrated its advantages in precision, robustness, scalability, and computational efficiency.

  20. [Heath and political regimes: presidential or parliamentary government for Colombia?].

    PubMed

    Idrovo, Alvaro J

    2007-01-01

    Changing the presidential regime for a parliamentarian one is currently be-ing discussed in Colombia. This preliminary study explores the potential effects on health of both presidential and parliamentary regimes by using world-wide data. An ecological study was undertaken using countries from which comparable information concerning life-expectancy at birth, political regime, economic development, inequality in income, social capital (as measured by general-ised trust or Corruption Perceptions Index), political rights, civil freedom and cultural diversity could be obtained. Life-expectancy at birth and macro-determinants were compared between both political regimes. The co-relationship between these macro-determinants was estimated and the relationship between political regimen and life-expectancy at birth was estimated using robust regression. Crude analysis revealed that parliamentary countries have greater life-expectancy at birth than countries having a presidential regime. Significant co-relationships between all macro-determinants were observed. No differential effects were observed between both political regimes regarding life-expectancy at birth in multiple robust regressions. There is no evidence that presidential or parliamentary regimes provide greater levels of health for the population. It is suggested that public health policies be focused on other macro-determinants having more known effects on health, such as income inequality.

  1. Testing Gene-Gene Interactions in the Case-Parents Design

    PubMed Central

    Yu, Zhaoxia

    2011-01-01

    The case-parents design has been widely used to detect genetic associations as it can prevent spurious association that could occur in population-based designs. When examining the effect of an individual genetic locus on a disease, logistic regressions developed by conditioning on parental genotypes provide complete protection from spurious association caused by population stratification. However, when testing gene-gene interactions, it is unknown whether conditional logistic regressions are still robust. Here we evaluate the robustness and efficiency of several gene-gene interaction tests that are derived from conditional logistic regressions. We found that in the presence of SNP genotype correlation due to population stratification or linkage disequilibrium, tests with incorrectly specified main-genetic-effect models can lead to inflated type I error rates. We also found that a test with fully flexible main genetic effects always maintains correct test size and its robustness can be achieved with negligible sacrifice of its power. When testing gene-gene interactions is the focus, the test allowing fully flexible main effects is recommended to be used. PMID:21778736

  2. Approximate median regression for complex survey data with skewed response.

    PubMed

    Fraser, Raphael André; Lipsitz, Stuart R; Sinha, Debajyoti; Fitzmaurice, Garrett M; Pan, Yi

    2016-12-01

    The ready availability of public-use data from various large national complex surveys has immense potential for the assessment of population characteristics using regression models. Complex surveys can be used to identify risk factors for important diseases such as cancer. Existing statistical methods based on estimating equations and/or utilizing resampling methods are often not valid with survey data due to complex survey design features. That is, stratification, multistage sampling, and weighting. In this article, we accommodate these design features in the analysis of highly skewed response variables arising from large complex surveys. Specifically, we propose a double-transform-both-sides (DTBS)'based estimating equations approach to estimate the median regression parameters of the highly skewed response; the DTBS approach applies the same Box-Cox type transformation twice to both the outcome and regression function. The usual sandwich variance estimate can be used in our approach, whereas a resampling approach would be needed for a pseudo-likelihood based on minimizing absolute deviations (MAD). Furthermore, the approach is relatively robust to the true underlying distribution, and has much smaller mean square error than a MAD approach. The method is motivated by an analysis of laboratory data on urinary iodine (UI) concentration from the National Health and Nutrition Examination Survey. © 2016, The International Biometric Society.

  3. Non-ignorable missingness in logistic regression.

    PubMed

    Wang, Joanna J J; Bartlett, Mark; Ryan, Louise

    2017-08-30

    Nonresponses and missing data are common in observational studies. Ignoring or inadequately handling missing data may lead to biased parameter estimation, incorrect standard errors and, as a consequence, incorrect statistical inference and conclusions. We present a strategy for modelling non-ignorable missingness where the probability of nonresponse depends on the outcome. Using a simple case of logistic regression, we quantify the bias in regression estimates and show the observed likelihood is non-identifiable under non-ignorable missing data mechanism. We then adopt a selection model factorisation of the joint distribution as the basis for a sensitivity analysis to study changes in estimated parameters and the robustness of study conclusions against different assumptions. A Bayesian framework for model estimation is used as it provides a flexible approach for incorporating different missing data assumptions and conducting sensitivity analysis. Using simulated data, we explore the performance of the Bayesian selection model in correcting for bias in a logistic regression. We then implement our strategy using survey data from the 45 and Up Study to investigate factors associated with worsening health from the baseline to follow-up survey. Our findings have practical implications for the use of the 45 and Up Study data to answer important research questions relating to health and quality-of-life. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  4. Approximate Median Regression for Complex Survey Data with Skewed Response

    PubMed Central

    Fraser, Raphael André; Lipsitz, Stuart R.; Sinha, Debajyoti; Fitzmaurice, Garrett M.; Pan, Yi

    2016-01-01

    Summary The ready availability of public-use data from various large national complex surveys has immense potential for the assessment of population characteristics using regression models. Complex surveys can be used to identify risk factors for important diseases such as cancer. Existing statistical methods based on estimating equations and/or utilizing resampling methods are often not valid with survey data due to complex survey design features. That is, stratification, multistage sampling and weighting. In this paper, we accommodate these design features in the analysis of highly skewed response variables arising from large complex surveys. Specifically, we propose a double-transform-both-sides (DTBS) based estimating equations approach to estimate the median regression parameters of the highly skewed response; the DTBS approach applies the same Box-Cox type transformation twice to both the outcome and regression function. The usual sandwich variance estimate can be used in our approach, whereas a resampling approach would be needed for a pseudo-likelihood based on minimizing absolute deviations (MAD). Furthermore, the approach is relatively robust to the true underlying distribution, and has much smaller mean square error than a MAD approach. The method is motivated by an analysis of laboratory data on urinary iodine (UI) concentration from the National Health and Nutrition Examination Survey. PMID:27062562

  5. Dose-Dependent Effects of Statins for Patients with Aneurysmal Subarachnoid Hemorrhage: Meta-Regression Analysis.

    PubMed

    To, Minh-Son; Prakash, Shivesh; Poonnoose, Santosh I; Bihari, Shailesh

    2018-05-01

    The study uses meta-regression analysis to quantify the dose-dependent effects of statin pharmacotherapy on vasospasm, delayed ischemic neurologic deficits (DIND), and mortality in aneurysmal subarachnoid hemorrhage. Prospective, retrospective observational studies, and randomized controlled trials (RCTs) were retrieved by a systematic database search. Summary estimates were expressed as absolute risk (AR) for a given statin dose or control (placebo). Meta-regression using inverse variance weighting and robust variance estimation was performed to assess the effect of statin dose on transformed AR in a random effects model. Dose-dependence of predicted AR with 95% confidence interval (CI) was recovered by using Miller's Freeman-Tukey inverse. The database search and study selection criteria yielded 18 studies (2594 patients) for analysis. These included 12 RCTs, 4 retrospective observational studies, and 2 prospective observational studies. Twelve studies investigated simvastatin, whereas the remaining studies investigated atorvastatin, pravastatin, or pitavastatin, with simvastatin-equivalent doses ranging from 20 to 80 mg. Meta-regression revealed dose-dependent reductions in Freeman-Tukey-transformed AR of vasospasm (slope coefficient -0.00404, 95% CI -0.00720 to -0.00087; P = 0.0321), DIND (slope coefficient -0.00316, 95% CI -0.00586 to -0.00047; P = 0.0392), and mortality (slope coefficient -0.00345, 95% CI -0.00623 to -0.00067; P = 0.0352). The present meta-regression provides weak evidence for dose-dependent reductions in vasospasm, DIND and mortality associated with acute statin use after aneurysmal subarachnoid hemorrhage. However, the analysis was limited by substantial heterogeneity among individual studies. Greater dosing strategies are a potential consideration for future RCTs. Copyright © 2018 Elsevier Inc. All rights reserved.

  6. Robust and efficient estimation with weighted composite quantile regression

    NASA Astrophysics Data System (ADS)

    Jiang, Xuejun; Li, Jingzhi; Xia, Tian; Yan, Wanfeng

    2016-09-01

    In this paper we introduce a weighted composite quantile regression (CQR) estimation approach and study its application in nonlinear models such as exponential models and ARCH-type models. The weighted CQR is augmented by using a data-driven weighting scheme. With the error distribution unspecified, the proposed estimators share robustness from quantile regression and achieve nearly the same efficiency as the oracle maximum likelihood estimator (MLE) for a variety of error distributions including the normal, mixed-normal, Student's t, Cauchy distributions, etc. We also suggest an algorithm for the fast implementation of the proposed methodology. Simulations are carried out to compare the performance of different estimators, and the proposed approach is used to analyze the daily S&P 500 Composite index, which verifies the effectiveness and efficiency of our theoretical results.

  7. Quantifying the causal effects of 20mph zones on road casualties in London via doubly robust estimation.

    PubMed

    Li, Haojie; Graham, Daniel J

    2016-08-01

    This paper estimates the causal effect of 20mph zones on road casualties in London. Potential confounders in the key relationship of interest are included within outcome regression and propensity score models, and the models are then combined to form a doubly robust estimator. A total of 234 treated zones and 2844 potential control zones are included in the data sample. The propensity score model is used to select a viable control group which has common support in the covariate distributions. We compare the doubly robust estimates with those obtained using three other methods: inverse probability weighting, regression adjustment, and propensity score matching. The results indicate that 20mph zones have had a significant causal impact on road casualty reduction in both absolute and proportional terms. Copyright © 2016 Elsevier Ltd. All rights reserved.

  8. Development of a Bayesian model to estimate health care outcomes in the severely wounded

    PubMed Central

    Stojadinovic, Alexander; Eberhardt, John; Brown, Trevor S; Hawksworth, Jason S; Gage, Frederick; Tadaki, Douglas K; Forsberg, Jonathan A; Davis, Thomas A; Potter, Benjamin K; Dunne, James R; Elster, E A

    2010-01-01

    Background: Graphical probabilistic models have the ability to provide insights as to how clinical factors are conditionally related. These models can be used to help us understand factors influencing health care outcomes and resource utilization, and to estimate morbidity and clinical outcomes in trauma patient populations. Study design: Thirty-two combat casualties with severe extremity injuries enrolled in a prospective observational study were analyzed using step-wise machine-learned Bayesian belief network (BBN) and step-wise logistic regression (LR). Models were evaluated using 10-fold cross-validation to calculate area-under-the-curve (AUC) from receiver operating characteristics (ROC) curves. Results: Our BBN showed important associations between various factors in our data set that could not be developed using standard regression methods. Cross-validated ROC curve analysis showed that our BBN model was a robust representation of our data domain and that LR models trained on these findings were also robust: hospital-acquired infection (AUC: LR, 0.81; BBN, 0.79), intensive care unit length of stay (AUC: LR, 0.97; BBN, 0.81), and wound healing (AUC: LR, 0.91; BBN, 0.72) showed strong AUC. Conclusions: A BBN model can effectively represent clinical outcomes and biomarkers in patients hospitalized after severe wounding, and is confirmed by 10-fold cross-validation and further confirmed through logistic regression modeling. The method warrants further development and independent validation in other, more diverse patient populations. PMID:21197361

  9. Application of Regression-Discontinuity Analysis in Pharmaceutical Health Services Research

    PubMed Central

    Zuckerman, Ilene H; Lee, Euni; Wutoh, Anthony K; Xue, Zhenyi; Stuart, Bruce

    2006-01-01

    Objective To demonstrate how a relatively underused design, regression-discontinuity (RD), can provide robust estimates of intervention effects when stronger designs are impossible to implement. Data Sources/Study Setting Administrative claims from a Mid-Atlantic state Medicaid program were used to evaluate the effectiveness of an educational drug utilization review intervention. Study Design Quasi-experimental design. Data Collection/Extraction Methods A drug utilization review study was conducted to evaluate a letter intervention to physicians treating Medicaid children with potentially excessive use of short-acting β2-agonist inhalers (SAB). The outcome measure is change in seasonally-adjusted SAB use 5 months pre- and postintervention. To determine if the intervention reduced monthly SAB utilization, results from an RD analysis are compared to findings from a pretest–posttest design using repeated-measure ANOVA. Principal Findings Both analyses indicated that the intervention significantly reduced SAB use among the high users. Average monthly SAB use declined by 0.9 canisters per month (p<.001) according to the repeated-measure ANOVA and by 0.2 canisters per month (p<.001) from RD analysis. Conclusions Regression-discontinuity design is a useful quasi-experimental methodology that has significant advantages in internal validity compared to other pre–post designs when assessing interventions in which subjects' assignment is based on cutoff scores for a critical variable. PMID:16584464

  10. MicroCT angiography detects vascular formation and regression in skin wound healing.

    PubMed

    Urao, Norifumi; Okonkwo, Uzoagu A; Fang, Milie M; Zhuang, Zhen W; Koh, Timothy J; DiPietro, Luisa A

    2016-07-01

    Properly regulated angiogenesis and arteriogenesis are essential for effective wound healing. Tissue injury induces robust new vessel formation and subsequent vessel maturation, which involves vessel regression and remodeling. Although formation of functional vasculature is essential for healing, alterations in vascular structure over the time course of skin wound healing are not well understood. Here, using high-resolution ex vivo X-ray micro-computed tomography (microCT), we describe the vascular network during healing of skin excisional wounds with highly detailed three-dimensional (3D) reconstructed images and associated quantitative analysis. We found that relative vessel volume, surface area and branching number are significantly decreased in wounds from day 7 to days 14 and 21. Segmentation and skeletonization analysis of selected branches from high-resolution images as small as 2.5μm voxel size show that branching orders are decreased in the wound vessels during healing. In histological analysis, we found that the contrast agent fills mainly arterioles, but not small capillaries nor large veins. In summary, high-resolution microCT revealed dynamic alterations of vessel structures during wound healing. This technique may be useful as a key tool in the study of the formation and regression of wound vessels. Copyright © 2016 Elsevier Inc. All rights reserved.

  11. Parametric Method Performance for Dynamic 3'-Deoxy-3'-18F-Fluorothymidine PET/CT in Epidermal Growth Factor Receptor-Mutated Non-Small Cell Lung Carcinoma Patients Before and During Therapy.

    PubMed

    Kramer, Gerbrand Maria; Frings, Virginie; Heijtel, Dennis; Smit, E F; Hoekstra, Otto S; Boellaard, Ronald

    2017-06-01

    The objective of this study was to validate several parametric methods for quantification of 3'-deoxy-3'- 18 F-fluorothymidine ( 18 F-FLT) PET in advanced-stage non-small cell lung carcinoma (NSCLC) patients with an activating epidermal growth factor receptor mutation who were treated with gefitinib or erlotinib. Furthermore, we evaluated the impact of noise on accuracy and precision of the parametric analyses of dynamic 18 F-FLT PET/CT to assess the robustness of these methods. Methods : Ten NSCLC patients underwent dynamic 18 F-FLT PET/CT at baseline and 7 and 28 d after the start of treatment. Parametric images were generated using plasma input Logan graphic analysis and 2 basis functions-based methods: a 2-tissue-compartment basis function model (BFM) and spectral analysis (SA). Whole-tumor-averaged parametric pharmacokinetic parameters were compared with those obtained by nonlinear regression of the tumor time-activity curve using a reversible 2-tissue-compartment model with blood volume fraction. In addition, 2 statistically equivalent datasets were generated by countwise splitting the original list-mode data, each containing 50% of the total counts. Both new datasets were reconstructed, and parametric pharmacokinetic parameters were compared between the 2 replicates and the original data. Results: After the settings of each parametric method were optimized, distribution volumes (V T ) obtained with Logan graphic analysis, BFM, and SA all correlated well with those derived using nonlinear regression at baseline and during therapy ( R 2 ≥ 0.94; intraclass correlation coefficient > 0.97). SA-based V T images were most robust to increased noise on a voxel-level (repeatability coefficient, 16% vs. >26%). Yet BFM generated the most accurate K 1 values ( R 2 = 0.94; intraclass correlation coefficient, 0.96). Parametric K 1 data showed a larger variability in general; however, no differences were found in robustness between methods (repeatability coefficient, 80%-84%). Conclusion: Both BFM and SA can generate quantitatively accurate parametric 18 F-FLT V T images in NSCLC patients before and during therapy. SA was more robust to noise, yet BFM provided more accurate parametric K 1 data. We therefore recommend BFM as the preferred parametric method for analysis of dynamic 18 F-FLT PET/CT studies; however, SA can also be used. © 2017 by the Society of Nuclear Medicine and Molecular Imaging.

  12. REGRESSION MODELS OF RESIDENTIAL EXPOSURE TO CHLORPYRIFOS AND DIAZINON

    EPA Science Inventory

    This study examines the ability of regression models to predict residential exposures to chlorpyrifos and diazinon, based on the information from the NHEXAS-AZ database. The robust method was used to generate "fill-in" values for samples that are below the detection l...

  13. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Neeway, James J.; Rieke, Peter C.; Parruzot, Benjamin P.

    In far-from-equilibrium conditions, the dissolution of borosilicate glasses used to immobilize nuclear waste is known to be a function of both temperature and pH. The aim of this paper is to study effects of these variables on three model waste glasses (SON68, ISG, AFCI). To do this, experiments were conducted at temperatures of 23, 40, 70, and 90 °C and pH(RT) values of 9, 10, 11, and 12 with the single-pass flow-through (SPFT) test method. The results from these tests were then used to parameterize a kinetic rate model based on transition state theory. Both the absolute dissolution rates andmore » the rate model parameters are compared with previous results. Discrepancies in the absolute dissolution rates as compared to those obtained using other test methods are discussed. Rate model parameters for the three glasses studied here are nearly equivalent within error and in relative agreement with previous studies. The results were analyzed with a linear multivariate regression (LMR) and a nonlinear multivariate regression performed with the use of the Glass Corrosion Modeling Tool (GCMT), which is capable of providing a robust uncertainty analysis. This robust analysis highlights the high degree of correlation of various parameters in the kinetic rate model. As more data are obtained on borosilicate glasses with varying compositions, the effect of glass composition on the rate parameter values could possibly be obtained. This would allow for the possibility of predicting the forward dissolution rate of glass based solely on composition« less

  14. Robust estimation approach for blind denoising.

    PubMed

    Rabie, Tamer

    2005-11-01

    This work develops a new robust statistical framework for blind image denoising. Robust statistics addresses the problem of estimation when the idealized assumptions about a system are occasionally violated. The contaminating noise in an image is considered as a violation of the assumption of spatial coherence of the image intensities and is treated as an outlier random variable. A denoised image is estimated by fitting a spatially coherent stationary image model to the available noisy data using a robust estimator-based regression method within an optimal-size adaptive window. The robust formulation aims at eliminating the noise outliers while preserving the edge structures in the restored image. Several examples demonstrating the effectiveness of this robust denoising technique are reported and a comparison with other standard denoising filters is presented.

  15. A robust functional-data-analysis method for data recovery in multichannel sensor systems.

    PubMed

    Sun, Jian; Liao, Haitao; Upadhyaya, Belle R

    2014-08-01

    Multichannel sensor systems are widely used in condition monitoring for effective failure prevention of critical equipment or processes. However, loss of sensor readings due to malfunctions of sensors and/or communication has long been a hurdle to reliable operations of such integrated systems. Moreover, asynchronous data sampling and/or limited data transmission are usually seen in multiple sensor channels. To reliably perform fault diagnosis and prognosis in such operating environments, a data recovery method based on functional principal component analysis (FPCA) can be utilized. However, traditional FPCA methods are not robust to outliers and their capabilities are limited in recovering signals with strongly skewed distributions (i.e., lack of symmetry). This paper provides a robust data-recovery method based on functional data analysis to enhance the reliability of multichannel sensor systems. The method not only considers the possibly skewed distribution of each channel of signal trajectories, but is also capable of recovering missing data for both individual and correlated sensor channels with asynchronous data that may be sparse as well. In particular, grand median functions, rather than classical grand mean functions, are utilized for robust smoothing of sensor signals. Furthermore, the relationship between the functional scores of two correlated signals is modeled using multivariate functional regression to enhance the overall data-recovery capability. An experimental flow-control loop that mimics the operation of coolant-flow loop in a multimodular integral pressurized water reactor is used to demonstrate the effectiveness and adaptability of the proposed data-recovery method. The computational results illustrate that the proposed method is robust to outliers and more capable than the existing FPCA-based method in terms of the accuracy in recovering strongly skewed signals. In addition, turbofan engine data are also analyzed to verify the capability of the proposed method in recovering non-skewed signals.

  16. Predicting the aquatic toxicity mode of action using logistic regression and linear discriminant analysis.

    PubMed

    Ren, Y Y; Zhou, L C; Yang, L; Liu, P Y; Zhao, B W; Liu, H X

    2016-09-01

    The paper highlights the use of the logistic regression (LR) method in the construction of acceptable statistically significant, robust and predictive models for the classification of chemicals according to their aquatic toxic modes of action. Essentials accounting for a reliable model were all considered carefully. The model predictors were selected by stepwise forward discriminant analysis (LDA) from a combined pool of experimental data and chemical structure-based descriptors calculated by the CODESSA and DRAGON software packages. Model predictive ability was validated both internally and externally. The applicability domain was checked by the leverage approach to verify prediction reliability. The obtained models are simple and easy to interpret. In general, LR performs much better than LDA and seems to be more attractive for the prediction of the more toxic compounds, i.e. compounds that exhibit excess toxicity versus non-polar narcotic compounds and more reactive compounds versus less reactive compounds. In addition, model fit and regression diagnostics was done through the influence plot which reflects the hat-values, studentized residuals, and Cook's distance statistics of each sample. Overdispersion was also checked for the LR model. The relationships between the descriptors and the aquatic toxic behaviour of compounds are also discussed.

  17. Quantification of endocrine disruptors and pesticides in water by gas chromatography-tandem mass spectrometry. Method validation using weighted linear regression schemes.

    PubMed

    Mansilha, C; Melo, A; Rebelo, H; Ferreira, I M P L V O; Pinho, O; Domingues, V; Pinho, C; Gameiro, P

    2010-10-22

    A multi-residue methodology based on a solid phase extraction followed by gas chromatography-tandem mass spectrometry was developed for trace analysis of 32 compounds in water matrices, including estrogens and several pesticides from different chemical families, some of them with endocrine disrupting properties. Matrix standard calibration solutions were prepared by adding known amounts of the analytes to a residue-free sample to compensate matrix-induced chromatographic response enhancement observed for certain pesticides. Validation was done mainly according to the International Conference on Harmonisation recommendations, as well as some European and American validation guidelines with specifications for pesticides analysis and/or GC-MS methodology. As the assumption of homoscedasticity was not met for analytical data, weighted least squares linear regression procedure was applied as a simple and effective way to counteract the greater influence of the greater concentrations on the fitted regression line, improving accuracy at the lower end of the calibration curve. The method was considered validated for 31 compounds after consistent evaluation of the key analytical parameters: specificity, linearity, limit of detection and quantification, range, precision, accuracy, extraction efficiency, stability and robustness. Copyright © 2010 Elsevier B.V. All rights reserved.

  18. Implementing informative priors for heterogeneity in meta-analysis using meta-regression and pseudo data.

    PubMed

    Rhodes, Kirsty M; Turner, Rebecca M; White, Ian R; Jackson, Dan; Spiegelhalter, David J; Higgins, Julian P T

    2016-12-20

    Many meta-analyses combine results from only a small number of studies, a situation in which the between-study variance is imprecisely estimated when standard methods are applied. Bayesian meta-analysis allows incorporation of external evidence on heterogeneity, providing the potential for more robust inference on the effect size of interest. We present a method for performing Bayesian meta-analysis using data augmentation, in which we represent an informative conjugate prior for between-study variance by pseudo data and use meta-regression for estimation. To assist in this, we derive predictive inverse-gamma distributions for the between-study variance expected in future meta-analyses. These may serve as priors for heterogeneity in new meta-analyses. In a simulation study, we compare approximate Bayesian methods using meta-regression and pseudo data against fully Bayesian approaches based on importance sampling techniques and Markov chain Monte Carlo (MCMC). We compare the frequentist properties of these Bayesian methods with those of the commonly used frequentist DerSimonian and Laird procedure. The method is implemented in standard statistical software and provides a less complex alternative to standard MCMC approaches. An importance sampling approach produces almost identical results to standard MCMC approaches, and results obtained through meta-regression and pseudo data are very similar. On average, data augmentation provides closer results to MCMC, if implemented using restricted maximum likelihood estimation rather than DerSimonian and Laird or maximum likelihood estimation. The methods are applied to real datasets, and an extension to network meta-analysis is described. The proposed method facilitates Bayesian meta-analysis in a way that is accessible to applied researchers. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.

  19. Standard and Robust Methods in Regression Imputation

    ERIC Educational Resources Information Center

    Moraveji, Behjat; Jafarian, Koorosh

    2014-01-01

    The aim of this paper is to provide an introduction of new imputation algorithms for estimating missing values from official statistics in larger data sets of data pre-processing, or outliers. The goal is to propose a new algorithm called IRMI (iterative robust model-based imputation). This algorithm is able to deal with all challenges like…

  20. Revisiting the relationship between managed care and hospital consolidation.

    PubMed

    Town, Robert J; Wholey, Douglas; Feldman, Roger; Burns, Lawton R

    2007-02-01

    This paper analyzes whether the rise in managed care during the 1990s caused the increase in hospital concentration. We assemble data from the American Hospital Association, InterStudy and government censuses from 1990 to 2000. We employ linear regression analyses on long differenced data to estimate the impact of managed care penetration on hospital consolidation. Instrumental variable analogs of these regressions are also analyzed to control for potential endogeneity. All data are from secondary sources merged at the level of the Health Care Services Area. In 1990, the mean population-weighted hospital Herfindahl-Hirschman index (HHI) in a Health Services Area was .19. By 2000, the HHI had risen to .26. Most of this increase in hospital concentration is due to hospital consolidation. Over the same time frame HMO penetration increased three fold. However, our regression analysis strongly implies that the rise of managed care did not cause the hospital consolidation wave. This finding is robust to a number of different specifications.

  1. Revisiting the Relationship between Managed Care and Hospital Consolidation

    PubMed Central

    Town, Robert J; Wholey, Douglas; Feldman, Roger; Burns, Lawton R

    2007-01-01

    Objective This paper analyzes whether the rise in managed care during the 1990s caused the increase in hospital concentration. Data Sources We assemble data from the American Hospital Association, InterStudy and government censuses from 1990 to 2000. Study Design We employ linear regression analyses on long differenced data to estimate the impact of managed care penetration on hospital consolidation. Instrumental variable analogs of these regressions are also analyzed to control for potential endogeneity. Data Collection All data are from secondary sources merged at the level of the Health Care Services Area. Principle Findings In 1990, the mean population-weighted hospital Herfindahl–Hirschman index (HHI) in a Health Services Area was .19. By 2000, the HHI had risen to .26. Most of this increase in hospital concentration is due to hospital consolidation. Over the same time frame HMO penetration increased three fold. However, our regression analysis strongly implies that the rise of managed care did not cause the hospital consolidation wave. This finding is robust to a number of different specifications. PMID:17355590

  2. Cascade Optimization Strategy with Neural Network and Regression Approximations Demonstrated on a Preliminary Aircraft Engine Design

    NASA Technical Reports Server (NTRS)

    Hopkins, Dale A.; Patnaik, Surya N.

    2000-01-01

    A preliminary aircraft engine design methodology is being developed that utilizes a cascade optimization strategy together with neural network and regression approximation methods. The cascade strategy employs different optimization algorithms in a specified sequence. The neural network and regression methods are used to approximate solutions obtained from the NASA Engine Performance Program (NEPP), which implements engine thermodynamic cycle and performance analysis models. The new methodology is proving to be more robust and computationally efficient than the conventional optimization approach of using a single optimization algorithm with direct reanalysis. The methodology has been demonstrated on a preliminary design problem for a novel subsonic turbofan engine concept that incorporates a wave rotor as a cycle-topping device. Computations of maximum thrust were obtained for a specific design point in the engine mission profile. The results (depicted in the figure) show a significant improvement in the maximum thrust obtained using the new methodology in comparison to benchmark solutions obtained using NEPP in a manual design mode.

  3. [Quantitative structure-gas chromatographic retention relationship of polycyclic aromatic sulfur heterocycles using molecular electronegativity-distance vector].

    PubMed

    Li, Zhenghua; Cheng, Fansheng; Xia, Zhining

    2011-01-01

    The chemical structures of 114 polycyclic aromatic sulfur heterocycles (PASHs) have been studied by molecular electronegativity-distance vector (MEDV). The linear relationships between gas chromatographic retention index and the MEDV have been established by a multiple linear regression (MLR) model. The results of variable selection by stepwise multiple regression (SMR) and the powerful predictive abilities of the optimization model appraised by leave-one-out cross-validation showed that the optimization model with the correlation coefficient (R) of 0.994 7 and the cross-validated correlation coefficient (Rcv) of 0.994 0 possessed the best statistical quality. Furthermore, when the 114 PASHs compounds were divided into calibration and test sets in the ratio of 2:1, the statistical analysis showed our models possesses almost equal statistical quality, the very similar regression coefficients and the good robustness. The quantitative structure-retention relationship (QSRR) model established may provide a convenient and powerful method for predicting the gas chromatographic retention of PASHs.

  4. Genotype-phenotype association study via new multi-task learning model

    PubMed Central

    Huo, Zhouyuan; Shen, Dinggang

    2018-01-01

    Research on the associations between genetic variations and imaging phenotypes is developing with the advance in high-throughput genotype and brain image techniques. Regression analysis of single nucleotide polymorphisms (SNPs) and imaging measures as quantitative traits (QTs) has been proposed to identify the quantitative trait loci (QTL) via multi-task learning models. Recent studies consider the interlinked structures within SNPs and imaging QTs through group lasso, e.g. ℓ2,1-norm, leading to better predictive results and insights of SNPs. However, group sparsity is not enough for representing the correlation between multiple tasks and ℓ2,1-norm regularization is not robust either. In this paper, we propose a new multi-task learning model to analyze the associations between SNPs and QTs. We suppose that low-rank structure is also beneficial to uncover the correlation between genetic variations and imaging phenotypes. Finally, we conduct regression analysis of SNPs and QTs. Experimental results show that our model is more accurate in prediction than compared methods and presents new insights of SNPs. PMID:29218896

  5. A canonical correlation neural network for multicollinearity and functional data.

    PubMed

    Gou, Zhenkun; Fyfe, Colin

    2004-03-01

    We review a recent neural implementation of Canonical Correlation Analysis and show, using ideas suggested by Ridge Regression, how to make the algorithm robust. The network is shown to operate on data sets which exhibit multicollinearity. We develop a second model which not only performs as well on multicollinear data but also on general data sets. This model allows us to vary a single parameter so that the network is capable of performing Partial Least Squares regression (at one extreme) to Canonical Correlation Analysis (at the other)and every intermediate operation between the two. On multicollinear data, the parameter setting is shown to be important but on more general data no particular parameter setting is required. Finally, we develop a second penalty term which acts on such data as a smoother in that the resulting weight vectors are much smoother and more interpretable than the weights without the robustification term. We illustrate our algorithms on both artificial and real data.

  6. Comparing machine learning and logistic regression methods for predicting hypertension using a combination of gene expression and next-generation sequencing data.

    PubMed

    Held, Elizabeth; Cape, Joshua; Tintle, Nathan

    2016-01-01

    Machine learning methods continue to show promise in the analysis of data from genetic association studies because of the high number of variables relative to the number of observations. However, few best practices exist for the application of these methods. We extend a recently proposed supervised machine learning approach for predicting disease risk by genotypes to be able to incorporate gene expression data and rare variants. We then apply 2 different versions of the approach (radial and linear support vector machines) to simulated data from Genetic Analysis Workshop 19 and compare performance to logistic regression. Method performance was not radically different across the 3 methods, although the linear support vector machine tended to show small gains in predictive ability relative to a radial support vector machine and logistic regression. Importantly, as the number of genes in the models was increased, even when those genes contained causal rare variants, model predictive ability showed a statistically significant decrease in performance for both the radial support vector machine and logistic regression. The linear support vector machine showed more robust performance to the inclusion of additional genes. Further work is needed to evaluate machine learning approaches on larger samples and to evaluate the relative improvement in model prediction from the incorporation of gene expression data.

  7. Robust regression on noisy data for fusion scaling laws

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Verdoolaege, Geert, E-mail: geert.verdoolaege@ugent.be; Laboratoire de Physique des Plasmas de l'ERM - Laboratorium voor Plasmafysica van de KMS

    2014-11-15

    We introduce the method of geodesic least squares (GLS) regression for estimating fusion scaling laws. Based on straightforward principles, the method is easily implemented, yet it clearly outperforms established regression techniques, particularly in cases of significant uncertainty on both the response and predictor variables. We apply GLS for estimating the scaling of the L-H power threshold, resulting in estimates for ITER that are somewhat higher than predicted earlier.

  8. Adjusting for overdispersion in piecewise exponential regression models to estimate excess mortality rate in population-based research.

    PubMed

    Luque-Fernandez, Miguel Angel; Belot, Aurélien; Quaresma, Manuela; Maringe, Camille; Coleman, Michel P; Rachet, Bernard

    2016-10-01

    In population-based cancer research, piecewise exponential regression models are used to derive adjusted estimates of excess mortality due to cancer using the Poisson generalized linear modelling framework. However, the assumption that the conditional mean and variance of the rate parameter given the set of covariates x i are equal is strong and may fail to account for overdispersion given the variability of the rate parameter (the variance exceeds the mean). Using an empirical example, we aimed to describe simple methods to test and correct for overdispersion. We used a regression-based score test for overdispersion under the relative survival framework and proposed different approaches to correct for overdispersion including a quasi-likelihood, robust standard errors estimation, negative binomial regression and flexible piecewise modelling. All piecewise exponential regression models showed the presence of significant inherent overdispersion (p-value <0.001). However, the flexible piecewise exponential model showed the smallest overdispersion parameter (3.2 versus 21.3) for non-flexible piecewise exponential models. We showed that there were no major differences between methods. However, using a flexible piecewise regression modelling, with either a quasi-likelihood or robust standard errors, was the best approach as it deals with both, overdispersion due to model misspecification and true or inherent overdispersion.

  9. Adjustment of geochemical background by robust multivariate statistics

    USGS Publications Warehouse

    Zhou, D.

    1985-01-01

    Conventional analyses of exploration geochemical data assume that the background is a constant or slowly changing value, equivalent to a plane or a smoothly curved surface. However, it is better to regard the geochemical background as a rugged surface, varying with changes in geology and environment. This rugged surface can be estimated from observed geological, geochemical and environmental properties by using multivariate statistics. A method of background adjustment was developed and applied to groundwater and stream sediment reconnaissance data collected from the Hot Springs Quadrangle, South Dakota, as part of the National Uranium Resource Evaluation (NURE) program. Source-rock lithology appears to be a dominant factor controlling the chemical composition of groundwater or stream sediments. The most efficacious adjustment procedure is to regress uranium concentration on selected geochemical and environmental variables for each lithologic unit, and then to delineate anomalies by a common threshold set as a multiple of the standard deviation of the combined residuals. Robust versions of regression and RQ-mode principal components analysis techniques were used rather than ordinary techniques to guard against distortion caused by outliers Anomalies delineated by this background adjustment procedure correspond with uranium prospects much better than do anomalies delineated by conventional procedures. The procedure should be applicable to geochemical exploration at different scales for other metals. ?? 1985.

  10. Robust Bayesian linear regression with application to an analysis of the CODATA values for the Planck constant

    NASA Astrophysics Data System (ADS)

    Wübbeler, Gerd; Bodnar, Olha; Elster, Clemens

    2018-02-01

    Weighted least-squares estimation is commonly applied in metrology to fit models to measurements that are accompanied with quoted uncertainties. The weights are chosen in dependence on the quoted uncertainties. However, when data and model are inconsistent in view of the quoted uncertainties, this procedure does not yield adequate results. When it can be assumed that all uncertainties ought to be rescaled by a common factor, weighted least-squares estimation may still be used, provided that a simple correction of the uncertainty obtained for the estimated model is applied. We show that these uncertainties and credible intervals are robust, as they do not rely on the assumption of a Gaussian distribution of the data. Hence, common software for weighted least-squares estimation may still safely be employed in such a case, followed by a simple modification of the uncertainties obtained by that software. We also provide means of checking the assumptions of such an approach. The Bayesian regression procedure is applied to analyze the CODATA values for the Planck constant published over the past decades in terms of three different models: a constant model, a straight line model and a spline model. Our results indicate that the CODATA values may not have yet stabilized.

  11. An empirical comparison of methods for analyzing correlated data from a discrete choice survey to elicit patient preference for colorectal cancer screening

    PubMed Central

    2012-01-01

    Background A discrete choice experiment (DCE) is a preference survey which asks participants to make a choice among product portfolios comparing the key product characteristics by performing several choice tasks. Analyzing DCE data needs to account for within-participant correlation because choices from the same participant are likely to be similar. In this study, we empirically compared some commonly-used statistical methods for analyzing DCE data while accounting for within-participant correlation based on a survey of patient preference for colorectal cancer (CRC) screening tests conducted in Hamilton, Ontario, Canada in 2002. Methods A two-stage DCE design was used to investigate the impact of six attributes on participants' preferences for CRC screening test and willingness to undertake the test. We compared six models for clustered binary outcomes (logistic and probit regressions using cluster-robust standard error (SE), random-effects and generalized estimating equation approaches) and three models for clustered nominal outcomes (multinomial logistic and probit regressions with cluster-robust SE and random-effects multinomial logistic model). We also fitted a bivariate probit model with cluster-robust SE treating the choices from two stages as two correlated binary outcomes. The rank of relative importance between attributes and the estimates of β coefficient within attributes were used to assess the model robustness. Results In total 468 participants with each completing 10 choices were analyzed. Similar results were reported for the rank of relative importance and β coefficients across models for stage-one data on evaluating participants' preferences for the test. The six attributes ranked from high to low as follows: cost, specificity, process, sensitivity, preparation and pain. However, the results differed across models for stage-two data on evaluating participants' willingness to undertake the tests. Little within-patient correlation (ICC ≈ 0) was found in stage-one data, but substantial within-patient correlation existed (ICC = 0.659) in stage-two data. Conclusions When small clustering effect presented in DCE data, results remained robust across statistical models. However, results varied when larger clustering effect presented. Therefore, it is important to assess the robustness of the estimates via sensitivity analysis using different models for analyzing clustered data from DCE studies. PMID:22348526

  12. Bayesian nonparametric regression with varying residual density

    PubMed Central

    Pati, Debdeep; Dunson, David B.

    2013-01-01

    We consider the problem of robust Bayesian inference on the mean regression function allowing the residual density to change flexibly with predictors. The proposed class of models is based on a Gaussian process prior for the mean regression function and mixtures of Gaussians for the collection of residual densities indexed by predictors. Initially considering the homoscedastic case, we propose priors for the residual density based on probit stick-breaking (PSB) scale mixtures and symmetrized PSB (sPSB) location-scale mixtures. Both priors restrict the residual density to be symmetric about zero, with the sPSB prior more flexible in allowing multimodal densities. We provide sufficient conditions to ensure strong posterior consistency in estimating the regression function under the sPSB prior, generalizing existing theory focused on parametric residual distributions. The PSB and sPSB priors are generalized to allow residual densities to change nonparametrically with predictors through incorporating Gaussian processes in the stick-breaking components. This leads to a robust Bayesian regression procedure that automatically down-weights outliers and influential observations in a locally-adaptive manner. Posterior computation relies on an efficient data augmentation exact block Gibbs sampler. The methods are illustrated using simulated and real data applications. PMID:24465053

  13. Robust ridge regression estimators for nonlinear models with applications to high throughput screening assay data.

    PubMed

    Lim, Changwon

    2015-03-30

    Nonlinear regression is often used to evaluate the toxicity of a chemical or a drug by fitting data from a dose-response study. Toxicologists and pharmacologists may draw a conclusion about whether a chemical is toxic by testing the significance of the estimated parameters. However, sometimes the null hypothesis cannot be rejected even though the fit is quite good. One possible reason for such cases is that the estimated standard errors of the parameter estimates are extremely large. In this paper, we propose robust ridge regression estimation procedures for nonlinear models to solve this problem. The asymptotic properties of the proposed estimators are investigated; in particular, their mean squared errors are derived. The performances of the proposed estimators are compared with several standard estimators using simulation studies. The proposed methodology is also illustrated using high throughput screening assay data obtained from the National Toxicology Program. Copyright © 2014 John Wiley & Sons, Ltd.

  14. Quantitative Structure-Activity Relationship of Insecticidal Activity of Benzyl Ether Diamidine Derivatives

    NASA Astrophysics Data System (ADS)

    Zhai, Mengting; Chen, Yan; Li, Jing; Zhou, Jun

    2017-12-01

    The molecular electrongativity distance vector (MEDV-13) was used to describe the molecular structure of benzyl ether diamidine derivatives in this paper, Based on MEDV-13, The three-parameter (M 3, M 15, M 47) QSAR model of insecticidal activity (pIC 50) for 60 benzyl ether diamidine derivatives was constructed by leaps-and-bounds regression (LBR) . The traditional correlation coefficient (R) and the cross-validation correlation coefficient (R CV ) were 0.975 and 0.971, respectively. The robustness of the regression model was validated by Jackknife method, the correlation coefficient R were between 0.971 and 0.983. Meanwhile, the independent variables in the model were tested to be no autocorrelation. The regression results indicate that the model has good robust and predictive capabilities. The research would provide theoretical guidance for the development of new generation of anti African trypanosomiasis drugs with efficiency and low toxicity.

  15. A comprehensive evaluation of various sensitivity analysis methods: A case study with a hydrological model

    DOE PAGES

    Gan, Yanjun; Duan, Qingyun; Gong, Wei; ...

    2014-01-01

    Sensitivity analysis (SA) is a commonly used approach for identifying important parameters that dominate model behaviors. We use a newly developed software package, a Problem Solving environment for Uncertainty Analysis and Design Exploration (PSUADE), to evaluate the effectiveness and efficiency of ten widely used SA methods, including seven qualitative and three quantitative ones. All SA methods are tested using a variety of sampling techniques to screen out the most sensitive (i.e., important) parameters from the insensitive ones. The Sacramento Soil Moisture Accounting (SAC-SMA) model, which has thirteen tunable parameters, is used for illustration. The South Branch Potomac River basin nearmore » Springfield, West Virginia in the U.S. is chosen as the study area. The key findings from this study are: (1) For qualitative SA methods, Correlation Analysis (CA), Regression Analysis (RA), and Gaussian Process (GP) screening methods are shown to be not effective in this example. Morris One-At-a-Time (MOAT) screening is the most efficient, needing only 280 samples to identify the most important parameters, but it is the least robust method. Multivariate Adaptive Regression Splines (MARS), Delta Test (DT) and Sum-Of-Trees (SOT) screening methods need about 400–600 samples for the same purpose. Monte Carlo (MC), Orthogonal Array (OA) and Orthogonal Array based Latin Hypercube (OALH) are appropriate sampling techniques for them; (2) For quantitative SA methods, at least 2777 samples are needed for Fourier Amplitude Sensitivity Test (FAST) to identity parameter main effect. McKay method needs about 360 samples to evaluate the main effect, more than 1000 samples to assess the two-way interaction effect. OALH and LPτ (LPTAU) sampling techniques are more appropriate for McKay method. For the Sobol' method, the minimum samples needed are 1050 to compute the first-order and total sensitivity indices correctly. These comparisons show that qualitative SA methods are more efficient but less accurate and robust than quantitative ones.« less

  16. Mapping Quantitative Traits in Unselected Families: Algorithms and Examples

    PubMed Central

    Dupuis, Josée; Shi, Jianxin; Manning, Alisa K.; Benjamin, Emelia J.; Meigs, James B.; Cupples, L. Adrienne; Siegmund, David

    2009-01-01

    Linkage analysis has been widely used to identify from family data genetic variants influencing quantitative traits. Common approaches have both strengths and limitations. Likelihood ratio tests typically computed in variance component analysis can accommodate large families but are highly sensitive to departure from normality assumptions. Regression-based approaches are more robust but their use has primarily been restricted to nuclear families. In this paper, we develop methods for mapping quantitative traits in moderately large pedigrees. Our methods are based on the score statistic which in contrast to the likelihood ratio statistic, can use nonparametric estimators of variability to achieve robustness of the false positive rate against departures from the hypothesized phenotypic model. Because the score statistic is easier to calculate than the likelihood ratio statistic, our basic mapping methods utilize relatively simple computer code that performs statistical analysis on output from any program that computes estimates of identity-by-descent. This simplicity also permits development and evaluation of methods to deal with multivariate and ordinal phenotypes, and with gene-gene and gene-environment interaction. We demonstrate our methods on simulated data and on fasting insulin, a quantitative trait measured in the Framingham Heart Study. PMID:19278016

  17. Variation in reaction norms: Statistical considerations and biological interpretation.

    PubMed

    Morrissey, Michael B; Liefting, Maartje

    2016-09-01

    Analysis of reaction norms, the functions by which the phenotype produced by a given genotype depends on the environment, is critical to studying many aspects of phenotypic evolution. Different techniques are available for quantifying different aspects of reaction norm variation. We examine what biological inferences can be drawn from some of the more readily applicable analyses for studying reaction norms. We adopt a strongly biologically motivated view, but draw on statistical theory to highlight strengths and drawbacks of different techniques. In particular, consideration of some formal statistical theory leads to revision of some recently, and forcefully, advocated opinions on reaction norm analysis. We clarify what simple analysis of the slope between mean phenotype in two environments can tell us about reaction norms, explore the conditions under which polynomial regression can provide robust inferences about reaction norm shape, and explore how different existing approaches may be used to draw inferences about variation in reaction norm shape. We show how mixed model-based approaches can provide more robust inferences than more commonly used multistep statistical approaches, and derive new metrics of the relative importance of variation in reaction norm intercepts, slopes, and curvatures. © 2016 The Author(s). Evolution © 2016 The Society for the Study of Evolution.

  18. Rapid and simultaneous analysis of five alkaloids in four parts of Coptidis Rhizoma by near-infrared spectroscopy

    NASA Astrophysics Data System (ADS)

    Jintao, Xue; Yufei, Liu; Liming, Ye; Chunyan, Li; Quanwei, Yang; Weiying, Wang; Yun, Jing; Minxiang, Zhang; Peng, Li

    2018-01-01

    Near-Infrared Spectroscopy (NIRS) was first used to develop a method for rapid and simultaneous determination of 5 active alkaloids (berberine, coptisine, palmatine, epiberberine and jatrorrhizine) in 4 parts (rhizome, fibrous root, stem and leaf) of Coptidis Rhizoma. A total of 100 samples from 4 main places of origin were collected and studied. With HPLC analysis values as calibration reference, the quantitative analysis of 5 marker components was performed by two different modeling methods, partial least-squares (PLS) regression as linear regression and artificial neural networks (ANN) as non-linear regression. The results indicated that the 2 types of models established were robust, accurate and repeatable for five active alkaloids, and the ANN models was more suitable for the determination of berberine, coptisine and palmatine while the PLS model was more suitable for the analysis of epiberberine and jatrorrhizine. The performance of the optimal models was achieved as follows: the correlation coefficient (R) for berberine, coptisine, palmatine, epiberberine and jatrorrhizine was 0.9958, 0.9956, 0.9959, 0.9963 and 0.9923, respectively; the root mean square error of validation (RMSEP) was 0.5093, 0.0578, 0.0443, 0.0563 and 0.0090, respectively. Furthermore, for the comprehensive exploitation and utilization of plant resource of Coptidis Rhizoma, the established NIR models were used to analysis the content of 5 active alkaloids in 4 parts of Coptidis Rhizoma and 4 main origin of places. This work demonstrated that NIRS may be a promising method as routine screening for off-line fast analysis or on-line quality assessment of traditional Chinese medicine (TCM).

  19. The Effect of Education on Old Age Cognitive Abilities: Evidence from a Regression Discontinuity Design*

    PubMed Central

    Banks, James; Mazzonna, Fabrizio

    2011-01-01

    In this paper we exploit the 1947 change to the minimum school-leaving age in England from 14 to 15, to evaluate the causal effect of a year of education on cognitive abilities at older ages. We use a regression discontinuity design analysis and find a large and significant effect of the reform on males’ memory and executive functioning at older ages, using simple cognitive tests from the English Longitudinal Survey on Ageing (ELSA) as our outcome measures. This result is particularly remarkable since the reform had a powerful and immediate effect on about half the population of 14-year-olds. We investigate and discuss the potential channels by which this reform may have had its effects, as well as carrying out a full set of sensitivity analyses and robustness checks. PMID:22611283

  20. Use of partial least squares regression for the multivariate calibration of hazardous air pollutants in open-path FT-IR spectrometry

    NASA Astrophysics Data System (ADS)

    Hart, Brian K.; Griffiths, Peter R.

    1998-06-01

    Partial least squares (PLS) regression has been evaluated as a robust calibration technique for over 100 hazardous air pollutants (HAPs) measured by open path Fourier transform infrared (OP/FT-IR) spectrometry. PLS has the advantage over the current recommended calibration method of classical least squares (CLS), in that it can look at the whole useable spectrum (700-1300 cm-1, 2000-2150 cm-1, and 2400-3000 cm-1), and detect several analytes simultaneously. Up to one hundred HAPs synthetically added to OP/FT-IR backgrounds have been simultaneously calibrated and detected using PLS. PLS also has the advantage in requiring less preprocessing of spectra than that which is required in CLS calibration schemes, allowing PLS to provide user independent real-time analysis of OP/FT-IR spectra.

  1. A consensus least squares support vector regression (LS-SVR) for analysis of near-infrared spectra of plant samples.

    PubMed

    Li, Yankun; Shao, Xueguang; Cai, Wensheng

    2007-04-15

    Consensus modeling of combining the results of multiple independent models to produce a single prediction avoids the instability of single model. Based on the principle of consensus modeling, a consensus least squares support vector regression (LS-SVR) method for calibrating the near-infrared (NIR) spectra was proposed. In the proposed approach, NIR spectra of plant samples were firstly preprocessed using discrete wavelet transform (DWT) for filtering the spectral background and noise, then, consensus LS-SVR technique was used for building the calibration model. With an optimization of the parameters involved in the modeling, a satisfied model was achieved for predicting the content of reducing sugar in plant samples. The predicted results show that consensus LS-SVR model is more robust and reliable than the conventional partial least squares (PLS) and LS-SVR methods.

  2. In search of a corrected prescription drug elasticity estimate: a meta-regression approach.

    PubMed

    Gemmill, Marin C; Costa-Font, Joan; McGuire, Alistair

    2007-06-01

    An understanding of the relationship between cost sharing and drug consumption depends on consistent and unbiased price elasticity estimates. However, there is wide heterogeneity among studies, which constrains the applicability of elasticity estimates for empirical purposes and policy simulation. This paper attempts to provide a corrected measure of the drug price elasticity by employing meta-regression analysis (MRA). The results indicate that the elasticity estimates are significantly different from zero, and the corrected elasticity is -0.209 when the results are made robust to heteroskedasticity and clustering of observations. Elasticity values are higher when the study was published in an economic journal, when the study employed a greater number of observations, and when the study used aggregate data. Elasticity estimates are lower when the institutional setting was a tax-based health insurance system.

  3. Development of a Highly Automated and Multiplexed Targeted Proteome Pipeline and Assay for 112 Rat Brain Synaptic Proteins

    PubMed Central

    Colangelo, Christopher M.; Ivosev, Gordana; Chung, Lisa; Abbott, Thomas; Shifman, Mark; Sakaue, Fumika; Cox, David; Kitchen, Rob R.; Burton, Lyle; Tate, Stephen A; Gulcicek, Erol; Bonner, Ron; Rinehart, Jesse; Nairn, Angus C.; Williams, Kenneth R.

    2015-01-01

    We present a comprehensive workflow for large scale (>1000 transitions/run) label-free LC-MRM proteome assays. Innovations include automated MRM transition selection, intelligent retention time scheduling (xMRM) that improves Signal/Noise by >2-fold, and automatic peak modeling. Improvements to data analysis include a novel Q/C metric, Normalized Group Area Ratio (NGAR), MLR normalization, weighted regression analysis, and data dissemination through the Yale Protein Expression Database. As a proof of principle we developed a robust 90 minute LC-MRM assay for Mouse/Rat Post-Synaptic Density (PSD) fractions which resulted in the routine quantification of 337 peptides from 112 proteins based on 15 observations per protein. Parallel analyses with stable isotope dilution peptide standards (SIS), demonstrate very high correlation in retention time (1.0) and protein fold change (0.94) between the label-free and SIS analyses. Overall, our first method achieved a technical CV of 11.4% with >97.5% of the 1697 transitions being quantified without user intervention, resulting in a highly efficient, robust, and single injection LC-MRM assay. PMID:25476245

  4. Does the effect of gender modify the relationship between deprivation and mortality?

    PubMed

    Salcedo, Natalia; Saez, Marc; Bragulat, Basili; Saurina, Carme

    2012-07-30

    In this study we propose improvements to the method of elaborating deprivation indexes. First, in the selection of the variables, we incorporated a wider range of both objective and subjective measures. Second, in the statistical methodology, we used a distance indicator instead of the standard aggregating method principal component analysis. Third, we propose another methodological improvement, which consists in the use of a more robust statistical method to assess the relationship between deprivation and health responses in ecological regressions. We conducted an ecological small-area analysis based on the residents of the Metropolitan region of Barcelona in the period 1994-2007. Standardized mortality rates, stratified by sex, were studied for four mortality causes: tumor of the bronquial, lung and trachea, diabetes mellitus type II, breast cancer, and prostate cancer. Socioeconomic conditions were summarized using a deprivation index. Sixteen socio-demographic variables available in the Spanish Census of Population and Housing were included. The deprivation index was constructed by aggregating the above-mentioned variables using the distance indicator, DP2. For the estimation of the ecological regression we used hierarchical Bayesian models with some improvements. At greater deprivation, there is an increased risk of dying from diabetes for both sexes and of dying from lung cancer for men. On the other hand, at greater deprivation, there is a decreased risk of dying from breast cancer and lung cancer for women. We did not find a clear relationship in the case of prostate cancer (presenting an increased risk but only in the second quintile of deprivation). We believe our results were obtained using a more robust methodology. First off, we have built a better index that allows us to directly collect the variability of contextual variables without having to use arbitrary weights. Secondly, we have solved two major problems that are present in spatial ecological regressions, i.e. those that use spatial data and, consequently, perform a spatial adjustment in order to obtain consistent estimators.

  5. Is the maturity of hospitals' quality improvement systems associated with measures of quality and patient safety?

    PubMed Central

    2011-01-01

    Background Previous research addressed the development of a classification scheme for quality improvement systems in European hospitals. In this study we explore associations between the 'maturity' of the hospitals' quality improvement system and clinical outcomes. Methods The maturity classification scheme was developed based on survey results from 389 hospitals in eight European countries. We matched the hospitals from the Spanish sample (113 hospitals) with those hospitals participating in a nation-wide, voluntary hospital performance initiative. We then compared sample distributions and explored associations between the 'maturity' of the hospitals' quality improvement system and a range of composite outcomes measures, such as adjusted hospital-wide mortality, -readmission, -complication and -length of stay indices. Statistical analysis includes bivariate correlations for parametrically and non-parametrically distributed data, multiple robust regression models and bootstrapping techniques to obtain confidence-intervals for the correlation and regression estimates. Results Overall, 43 hospitals were included. Compared to the original sample of 113, this sample was characterized by a higher representation of university hospitals. Maturity of the quality improvement system was similar, although the matched sample showed less variability. Analysis of associations between the quality improvement system and hospital-wide outcomes suggests significant correlations for the indicator adjusted hospital complications, borderline significance for adjusted hospital readmissions and non-significance for the adjusted hospital mortality and length of stay indicators. These results are confirmed by the bootstrap estimates of the robust regression model after adjusting for hospital characteristics. Conclusions We assessed associations between hospitals' quality improvement systems and clinical outcomes. From this data it seems that having a more developed quality improvement system is associated with lower rates of adjusted hospital complications. A number of methodological and logistic hurdles remain to link hospital quality improvement systems to outcomes. Further research should aim at identifying the latent dimensions of quality improvement systems that predict quality and safety outcomes. Such research would add pertinent knowledge regarding the implementation of organizational strategies related with quality of care outcomes. PMID:22185479

  6. Optimisation in the Design of Environmental Sensor Networks with Robustness Consideration

    PubMed Central

    Budi, Setia; de Souza, Paulo; Timms, Greg; Malhotra, Vishv; Turner, Paul

    2015-01-01

    This work proposes the design of Environmental Sensor Networks (ESN) through balancing robustness and redundancy. An Evolutionary Algorithm (EA) is employed to find the optimal placement of sensor nodes in the Region of Interest (RoI). Data quality issues are introduced to simulate their impact on the performance of the ESN. Spatial Regression Test (SRT) is also utilised to promote robustness in data quality of the designed ESN. The proposed method provides high network representativeness (fit for purpose) with minimum sensor redundancy (cost), and ensures robustness by enabling the network to continue to achieve its objectives when some sensors fail. PMID:26633392

  7. Fully Bayesian inference for structural MRI: application to segmentation and statistical analysis of T2-hypointensities.

    PubMed

    Schmidt, Paul; Schmid, Volker J; Gaser, Christian; Buck, Dorothea; Bührlen, Susanne; Förschler, Annette; Mühlau, Mark

    2013-01-01

    Aiming at iron-related T2-hypointensity, which is related to normal aging and neurodegenerative processes, we here present two practicable approaches, based on Bayesian inference, for preprocessing and statistical analysis of a complex set of structural MRI data. In particular, Markov Chain Monte Carlo methods were used to simulate posterior distributions. First, we rendered a segmentation algorithm that uses outlier detection based on model checking techniques within a Bayesian mixture model. Second, we rendered an analytical tool comprising a Bayesian regression model with smoothness priors (in the form of Gaussian Markov random fields) mitigating the necessity to smooth data prior to statistical analysis. For validation, we used simulated data and MRI data of 27 healthy controls (age: [Formula: see text]; range, [Formula: see text]). We first observed robust segmentation of both simulated T2-hypointensities and gray-matter regions known to be T2-hypointense. Second, simulated data and images of segmented T2-hypointensity were analyzed. We found not only robust identification of simulated effects but also a biologically plausible age-related increase of T2-hypointensity primarily within the dentate nucleus but also within the globus pallidus, substantia nigra, and red nucleus. Our results indicate that fully Bayesian inference can successfully be applied for preprocessing and statistical analysis of structural MRI data.

  8. Steep Delay Discounting and Addictive Behavior: A Meta-Analysis of Continuous Associations

    PubMed Central

    Amlung, Michael; Vedelago, Lana; Acker, John; Balodis, Iris; MacKillop, James

    2016-01-01

    Aims To synthesize continuous associations between delayed reward discounting (DRD) and both addiction severity and quantity-frequency (QF); to examine moderators of these relationships; and to investigate publication bias. Methods Meta-analysis of published studies examining continuous associations between DRD and addictive behaviors. Published, peer-reviewed studies on addictive behaviors (alcohol, tobacco, cannabis, stimulants, opiates, and gambling) were identified via PubMed, MEDLINE, and PsycInfo. Studies were restricted to DRD measures of monetary gains. Random effects meta-analysis was conducted using Pearson’s r as the effect size. Publication bias was evaluated using fail-safe N, Begg-Mazumdar and Egger’s tests, meta-regression of publication year and effect size, and imputation of missing studies. Results The primary meta-analysis revealed a small magnitude effect size that was highly significant (r = 0.14, p < 10−14). Significantly larger effect sizes were observed for studies examining severity compared with QF (p = 0.01), but not between the type of addictive behavior (p = 0.30) or DRD assessment (p = 0.90). Indices of publication bias suggested a modest impact of unpublished findings. Conclusions Delayed reward discounting is robustly associated with continuous measures of addiction severity and quantity-frequency. This relation is generally robust across type of addictive behavior and delayed reward discounting assessment modality. PMID:27450931

  9. How Robust Is Linear Regression with Dummy Variables?

    ERIC Educational Resources Information Center

    Blankmeyer, Eric

    2006-01-01

    Researchers in education and the social sciences make extensive use of linear regression models in which the dependent variable is continuous-valued while the explanatory variables are a combination of continuous-valued regressors and dummy variables. The dummies partition the sample into groups, some of which may contain only a few observations.…

  10. A General Bayesian Network Approach to Analyzing Online Game Item Values and Its Influence on Consumer Satisfaction and Purchase Intention

    NASA Astrophysics Data System (ADS)

    Lee, Kun Chang; Park, Bong-Won

    Many online game users purchase game items with which to play free-to-play games. Because of a lack of research into which there is no specified framework for categorizing the values of game items, this study proposes four types of online game item values based on an analysis of literature regarding online game characteristics. It then proposes to investigate how online game users perceive satisfaction and purchase intention from the proposed four types of online game item values. Though regression analysis has been used frequently to answer this kind of research question, we propose a new approach, a General Bayesian Network (GBN), which can be performed in an understandable way without sacrificing predictive accuracy. Conventional techniques, such as regression analysis, do not provide significant explanation for this kind of problem because they are fixed to a linear structure and are limited in explaining why customers are likely to purchase game items and if they are satisfied with their purchases. In contrast, the proposed GBN provides a flexible underlying structure based on questionnaire survey data and offers robust decision support on this kind of research question by identifying its causal relationships. To illustrate the validity of GBN in solving the research question in this study, 327 valid questionnaires were analyzed using GBN with what-if and goal-seeking approaches. The experimental results were promising and meaningful in comparison with regression analysis results.

  11. Label-noise resistant logistic regression for functional data classification with an application to Alzheimer's disease study.

    PubMed

    Lee, Seokho; Shin, Hyejin; Lee, Sang Han

    2016-12-01

    Alzheimer's disease (AD) is usually diagnosed by clinicians through cognitive and functional performance test with a potential risk of misdiagnosis. Since the progression of AD is known to cause structural changes in the corpus callosum (CC), the CC thickness can be used as a functional covariate in AD classification problem for a diagnosis. However, misclassified class labels negatively impact the classification performance. Motivated by AD-CC association studies, we propose a logistic regression for functional data classification that is robust to misdiagnosis or label noise. Specifically, our logistic regression model is constructed by adopting individual intercepts to functional logistic regression model. This approach enables to indicate which observations are possibly mislabeled and also lead to a robust and efficient classifier. An effective algorithm using MM algorithm provides simple closed-form update formulas. We test our method using synthetic datasets to demonstrate its superiority over an existing method, and apply it to differentiating patients with AD from healthy normals based on CC from MRI. © 2016, The International Biometric Society.

  12. Evaluation of long-term survival: use of diagnostics and robust estimators with Cox's proportional hazards model.

    PubMed

    Valsecchi, M G; Silvestri, D; Sasieni, P

    1996-12-30

    We consider methodological problems in evaluating long-term survival in clinical trials. In particular we examine the use of several methods that extend the basic Cox regression analysis. In the presence of a long term observation, the proportional hazard (PH) assumption may easily be violated and a few long term survivors may have a large effect on parameter estimates. We consider both model selection and robust estimation in a data set of 474 ovarian cancer patients enrolled in a clinical trial and followed for between 7 and 12 years after randomization. Two diagnostic plots for assessing goodness-of-fit are introduced. One shows the variation in time of parameter estimates and is an alternative to PH checking based on time-dependent covariates. The other takes advantage of the martingale residual process in time to represent the lack of fit with a metric of the type 'observed minus expected' number of events. Robust estimation is carried out by maximizing a weighted partial likelihood which downweights the contribution to estimation of influential observations. This type of complementary analysis of long-term results of clinical studies is useful in assessing the soundness of the conclusions on treatment effect. In the example analysed here, the difference in survival between treatments was mostly confined to those individuals who survived at least two years beyond randomization.

  13. The contextual effects of social capital on health: a cross-national instrumental variable analysis.

    PubMed

    Kim, Daniel; Baum, Christopher F; Ganz, Michael L; Subramanian, S V; Kawachi, Ichiro

    2011-12-01

    Past research on the associations between area-level/contextual social capital and health has produced conflicting evidence. However, interpreting this rapidly growing literature is difficult because estimates using conventional regression are prone to major sources of bias including residual confounding and reverse causation. Instrumental variable (IV) analysis can reduce such bias. Using data on up to 167,344 adults in 64 nations in the European and World Values Surveys and applying IV and ordinary least squares (OLS) regression, we estimated the contextual effects of country-level social trust on individual self-rated health. We further explored whether these associations varied by gender and individual levels of trust. Using OLS regression, we found higher average country-level trust to be associated with better self-rated health in both women and men. Instrumental variable analysis yielded qualitatively similar results, although the estimates were more than double in size in both sexes when country population density and corruption were used as instruments. The estimated health effects of raising the percentage of a country's population that trusts others by 10 percentage points were at least as large as the estimated health effects of an individual developing trust in others. These findings were robust to alternative model specifications and instruments. Conventional regression and to a lesser extent IV analysis suggested that these associations are more salient in women and in women reporting social trust. In a large cross-national study, our findings, including those using instrumental variables, support the presence of beneficial effects of higher country-level trust on self-rated health. Previous findings for contextual social capital using traditional regression may have underestimated the true associations. Given the close linkages between self-rated health and all-cause mortality, the public health gains from raising social capital within and across countries may be large. Copyright © 2011 Elsevier Ltd. All rights reserved.

  14. The contextual effects of social capital on health: a cross-national instrumental variable analysis

    PubMed Central

    Kim, Daniel; Baum, Christopher F; Ganz, Michael; Subramanian, S V; Kawachi, Ichiro

    2011-01-01

    Past observational studies of the associations of area-level/contextual social capital with health have revealed conflicting findings. However, interpreting this rapidly growing literature is difficult because estimates using conventional regression are prone to major sources of bias including residual confounding and reverse causation. Instrumental variable (IV) analysis can reduce such bias. Using data on up to 167 344 adults in 64 nations in the European and World Values Surveys and applying IV and ordinary least squares (OLS) regression, we estimated the contextual effects of country-level social trust on individual self-rated health. We further explored whether these associations varied by gender and individual levels of trust. Using OLS regression, we found higher average country-level trust to be associated with better self-rated health in both women and men. Instrumental variable analysis yielded qualitatively similar results, although the estimates were more than double in size in women and men using country population density and corruption as instruments. The estimated health effects of raising the percentage of a country's population that trusts others by 10 percentage points were at least as large as the estimated health effects of an individual developing trust in others. These findings were robust to alternative model specifications and instruments. Conventional regression and to a lesser extent IV analysis suggested that these associations are more salient in women and in women reporting social trust. In a large cross-national study, our findings, including those using instrumental variables, support the presence of beneficial effects of higher country-level trust on self-rated health. Past findings for contextual social capital using traditional regression may have underestimated the true associations. Given the close linkages between self-rated health and all-cause mortality, the public health gains from raising social capital within countries may be large. PMID:22078106

  15. The Influential Effect of Blending, Bump, Changing Period, and Eclipsing Cepheids on the Leavitt Law

    NASA Astrophysics Data System (ADS)

    García-Varela, A.; Muñoz, J. R.; Sabogal, B. E.; Vargas Domínguez, S.; Martínez, J.

    2016-06-01

    The investigation of the nonlinearity of the Leavitt law (LL) is a topic that began more than seven decades ago, when some of the studies in this field found that the LL has a break at about 10 days. The goal of this work is to investigate a possible statistical cause of this nonlinearity. By applying linear regressions to OGLE-II and OGLE-IV data, we find that to obtain the LL by using linear regression, robust techniques to deal with influential points and/or outliers are needed instead of the ordinary least-squares regression traditionally used. In particular, by using M- and MM-regressions we establish firmly and without doubt the linearity of the LL in the Large Magellanic Cloud, without rejecting or excluding Cepheid data from the analysis. This implies that light curves of Cepheids suggesting blending, bumps, eclipses, or period changes do not affect the LL for this galaxy. For the Small Magellanic Cloud, when including Cepheids of this kind, it is not possible to find an adequate model, probably because of the geometry of the galaxy. In that case, a possible influence of these stars could exist.

  16. Robust Joint Graph Sparse Coding for Unsupervised Spectral Feature Selection.

    PubMed

    Zhu, Xiaofeng; Li, Xuelong; Zhang, Shichao; Ju, Chunhua; Wu, Xindong

    2017-06-01

    In this paper, we propose a new unsupervised spectral feature selection model by embedding a graph regularizer into the framework of joint sparse regression for preserving the local structures of data. To do this, we first extract the bases of training data by previous dictionary learning methods and, then, map original data into the basis space to generate their new representations, by proposing a novel joint graph sparse coding (JGSC) model. In JGSC, we first formulate its objective function by simultaneously taking subspace learning and joint sparse regression into account, then, design a new optimization solution to solve the resulting objective function, and further prove the convergence of the proposed solution. Furthermore, we extend JGSC to a robust JGSC (RJGSC) via replacing the least square loss function with a robust loss function, for achieving the same goals and also avoiding the impact of outliers. Finally, experimental results on real data sets showed that both JGSC and RJGSC outperformed the state-of-the-art algorithms in terms of k -nearest neighbor classification performance.

  17. Quantifying the statistical importance of utilizing regression over classic energy intensity calculations for tracking efficiency improvements in industry

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nimbalkar, Sachin U.; Wenning, Thomas J.; Guo, Wei

    In the United States, manufacturing facilities account for about 32% of total domestic energy consumption in 2014. Robust energy tracking methodologies are critical to understanding energy performance in manufacturing facilities. Due to its simplicity and intuitiveness, the classic energy intensity method (i.e. the ratio of total energy use over total production) is the most widely adopted. However, the classic energy intensity method does not take into account the variation of other relevant parameters (i.e. product type, feed stock type, weather, etc.). Furthermore, the energy intensity method assumes that the facilities’ base energy consumption (energy use at zero production) is zero,more » which rarely holds true. Therefore, it is commonly recommended to utilize regression models rather than the energy intensity approach for tracking improvements at the facility level. Unfortunately, many energy managers have difficulties understanding why regression models are statistically better than utilizing the classic energy intensity method. While anecdotes and qualitative information may convince some, many have major reservations about the accuracy of regression models and whether it is worth the time and effort to gather data and build quality regression models. This paper will explain why regression models are theoretically and quantitatively more accurate for tracking energy performance improvements. Based on the analysis of data from 114 manufacturing plants over 12 years, this paper will present quantitative results on the importance of utilizing regression models over the energy intensity methodology. This paper will also document scenarios where regression models do not have significant relevance over the energy intensity method.« less

  18. Adaptive and robust statistical methods for processing near-field scanning microwave microscopy images.

    PubMed

    Coakley, K J; Imtiaz, A; Wallis, T M; Weber, J C; Berweger, S; Kabos, P

    2015-03-01

    Near-field scanning microwave microscopy offers great potential to facilitate characterization, development and modeling of materials. By acquiring microwave images at multiple frequencies and amplitudes (along with the other modalities) one can study material and device physics at different lateral and depth scales. Images are typically noisy and contaminated by artifacts that can vary from scan line to scan line and planar-like trends due to sample tilt errors. Here, we level images based on an estimate of a smooth 2-d trend determined with a robust implementation of a local regression method. In this robust approach, features and outliers which are not due to the trend are automatically downweighted. We denoise images with the Adaptive Weights Smoothing method. This method smooths out additive noise while preserving edge-like features in images. We demonstrate the feasibility of our methods on topography images and microwave |S11| images. For one challenging test case, we demonstrate that our method outperforms alternative methods from the scanning probe microscopy data analysis software package Gwyddion. Our methods should be useful for massive image data sets where manual selection of landmarks or image subsets by a user is impractical. Published by Elsevier B.V.

  19. Characterizing individual differences in functional connectivity using dual-regression and seed-based approaches.

    PubMed

    Smith, David V; Utevsky, Amanda V; Bland, Amy R; Clement, Nathan; Clithero, John A; Harsch, Anne E W; McKell Carter, R; Huettel, Scott A

    2014-07-15

    A central challenge for neuroscience lies in relating inter-individual variability to the functional properties of specific brain regions. Yet, considerable variability exists in the connectivity patterns between different brain areas, potentially producing reliable group differences. Using sex differences as a motivating example, we examined two separate resting-state datasets comprising a total of 188 human participants. Both datasets were decomposed into resting-state networks (RSNs) using a probabilistic spatial independent component analysis (ICA). We estimated voxel-wise functional connectivity with these networks using a dual-regression analysis, which characterizes the participant-level spatiotemporal dynamics of each network while controlling for (via multiple regression) the influence of other networks and sources of variability. We found that males and females exhibit distinct patterns of connectivity with multiple RSNs, including both visual and auditory networks and the right frontal-parietal network. These results replicated across both datasets and were not explained by differences in head motion, data quality, brain volume, cortisol levels, or testosterone levels. Importantly, we also demonstrate that dual-regression functional connectivity is better at detecting inter-individual variability than traditional seed-based functional connectivity approaches. Our findings characterize robust-yet frequently ignored-neural differences between males and females, pointing to the necessity of controlling for sex in neuroscience studies of individual differences. Moreover, our results highlight the importance of employing network-based models to study variability in functional connectivity. Copyright © 2014 Elsevier Inc. All rights reserved.

  20. Quality of semen: a 6-year single experience study on 5680 patients.

    PubMed

    Cozzolino, Mauro; Coccia, Maria E; Picone, Rita

    2018-02-08

    The aim of our study was to evaluate the quality of semen of a large sample from general healthy population living in Italy, in order to identify possible variables that could influence several parameters of spermiogram. We conducted a cross-sectional study from February 2010 to March 2015, collecting semen samples from the general population. Semen analysis was performed according to the WHO guidelines. The collected data were inserted in a database and processed using the software Stata 12. The Mann - Whitney test was used to assess the relationship of dichotomus variables with the parameters of the spermiogram; Kruskal-Wallis test for variables with more than two categories. We used also Robust regression and Spearman correlation to analyze the relationship between age and the parameters. We collected 5680 samples of semen. The mean age of our patients was 41.4 years old. Mann-Whitney test showed that the citizenship (codified as "Italian/Foreign") influences some parameters: pH, vitality, number of spermatozoa, sperm concentration, with worse results for the Italian group. Kruskal-Wallis test showed that the single nationality influences pH, volume, Sperm motility A-B-C-D, vitality, morphology, number of spermatozoa, sperm concentration. Robust regression showed a relationship between age and several parameters: volume (p=0.04, R squared= 0.0007 β: - 0.06); sperm motility A (p<0.01; R squared 0.0051 β: 0.02); sperm motility B (p<0.01; R squared 0.02 β: -0.35); sperm motility C (p<0.01; R squared 0.01 β: 0.12); sperm motility D (p<0.01; R squared 0.006 β: 0.2); vitality (p<0.01; R squared 0.01 β: -0.32); sperm concentration (p=0.01; R squared 0.001 β: 0.19). Our patients had spermiogram's results quite better than the standard guidelines. Our study showed that the country of origin could be a factor influencing several parameters of the spermiogram in healthy population and through Robust regression confirmed a strict correlation between age and these parameters.

  1. Measurement Consistency from Magnetic Resonance Images

    PubMed Central

    Chung, Dongjun; Chung, Moo K.; Durtschi, Reid B.; Lindell, R. Gentry; Vorperian, Houri K.

    2010-01-01

    Rationale and Objectives In quantifying medical images, length-based measurements are still obtained manually. Due to possible human error, a measurement protocol is required to guarantee the consistency of measurements. In this paper, we review various statistical techniques that can be used in determining measurement consistency. The focus is on detecting a possible measurement bias and determining the robustness of the procedures to outliers. Materials and Methods We review correlation analysis, linear regression, Bland-Altman method, paired t-test, and analysis of variance (ANOVA). These techniques were applied to measurements, obtained by two raters, of head and neck structures from magnetic resonance images (MRI). Results The correlation analysis and the linear regression were shown to be insufficient for detecting measurement inconsistency. They are also very sensitive to outliers. The widely used Bland-Altman method is a visualization technique so it lacks the numerical quantification. The paired t-test tends to be sensitive to small measurement bias. On the other hand, ANOVA performs well even under small measurement bias. Conclusion In almost all cases, using only one method is insufficient and it is recommended to use several methods simultaneously. In general, ANOVA performs the best. PMID:18790405

  2. Linear and nonlinear models for predicting fish bioconcentration factors for pesticides.

    PubMed

    Yuan, Jintao; Xie, Chun; Zhang, Ting; Sun, Jinfang; Yuan, Xuejie; Yu, Shuling; Zhang, Yingbiao; Cao, Yunyuan; Yu, Xingchen; Yang, Xuan; Yao, Wu

    2016-08-01

    This work is devoted to the applications of the multiple linear regression (MLR), multilayer perceptron neural network (MLP NN) and projection pursuit regression (PPR) to quantitative structure-property relationship analysis of bioconcentration factors (BCFs) of pesticides tested on Bluegill (Lepomis macrochirus). Molecular descriptors of a total of 107 pesticides were calculated with the DRAGON Software and selected by inverse enhanced replacement method. Based on the selected DRAGON descriptors, a linear model was built by MLR, nonlinear models were developed using MLP NN and PPR. The robustness of the obtained models was assessed by cross-validation and external validation using test set. Outliers were also examined and deleted to improve predictive power. Comparative results revealed that PPR achieved the most accurate predictions. This study offers useful models and information for BCF prediction, risk assessment, and pesticide formulation. Copyright © 2016 Elsevier Ltd. All rights reserved.

  3. Predictors of effects of lifestyle intervention on diabetes mellitus type 2 patients.

    PubMed

    Jacobsen, Ramune; Vadstrup, Eva; Røder, Michael; Frølich, Anne

    2012-01-01

    The main aim of the study was to identify predictors of the effects of lifestyle intervention on diabetes mellitus type 2 patients by means of multivariate analysis. Data from a previously published randomised clinical trial, which compared the effects of a rehabilitation programme including standardised education and physical training sessions in the municipality's health care centre with the same duration of individual counseling in the diabetes outpatient clinic, were used. Data from 143 diabetes patients were analysed. The merged lifestyle intervention resulted in statistically significant improvements in patients' systolic blood pressure, waist circumference, exercise capacity, glycaemic control, and some aspects of general health-related quality of life. The linear multivariate regression models explained 45% to 80% of the variance in these improvements. The baseline outcomes in accordance to the logic of the regression to the mean phenomenon were the only statistically significant and robust predictors in all regression models. These results are important from a clinical point of view as they highlight the more urgent need for and better outcomes following lifestyle intervention for those patients who have worse general and disease-specific health.

  4. Publication bias in obesity treatment trials?

    PubMed

    Allison, D B; Faith, M S; Gorman, B S

    1996-10-01

    The present investigation examined the extent of publication bias (namely the tendency to publish significant findings and file away non-significant findings) within the obesity treatment literature. Quantitative literature synthesis of four published meta-analyses from the obesity treatment literature. Interventions in these studies included pharmacological, educational, child, and couples treatments. To assess publication bias, several regression procedures (for example weighted least-squares, random-effects multi-level modeling, and robust regression methods) were used to regress effect sizes onto their standard errors, or proxies thereof, within each of the four meta-analysis. A significant positive beta weight in these analyses signified publication bias. There was evidence for publication bias within two of the four published meta-analyses, such that reviews of published studies were likely to overestimate clinical efficacy. The lack of evidence for publication bias within the two other meta-analyses might have been due to insufficient statistical power rather than the absence of selection bias. As in other disciplines, publication bias appears to exist in the obesity treatment literature. Suggestions are offered for managing publication bias once identified or reducing its likelihood in the first place.

  5. Psychosocial variables and time to injury onset: a hurdle regression analysis model.

    PubMed

    Sibold, Jeremy; Zizzi, Samuel

    2012-01-01

    Psychological variables have been shown to be related to athletic injury and time missed from participation in sport. We are unaware of any empirical examination of the influence of psychological variables on time to onset of injury. To examine the influence of orthopaedic and psychosocial variables on time to injury in college athletes. One hundred seventy-seven (men 5 116, women 5 61; age 5 19.45 6 1.39 years) National Collegiate Athletic Association Division II athletes. Hurdle regression analysis (HRA) was used to determine the influence of predictor variables on days to first injury. Worry (z = 2.98, P = .003), concentration disruption (z = -3.95, P < .001), and negative life-event stress (z = 5.02, P < .001) were robust predictors of days to injury. Orthopaedic risk score was not a predictor (z = 1.28, P = .20). These findings support previous research on the stress-injury relationship, and our group is the first to use HRA in athletic injury data. These data support the addition of psychological screening as part of preseason health examinations for collegiate athletes.

  6. Catchments as non-linear filters: evaluating data-driven approaches for spatio-temporal predictions in ungauged basins

    NASA Astrophysics Data System (ADS)

    Bellugi, D. G.; Tennant, C.; Larsen, L.

    2016-12-01

    Catchment and climate heterogeneity complicate prediction of runoff across time and space, and resulting parameter uncertainty can lead to large accumulated errors in hydrologic models, particularly in ungauged basins. Recently, data-driven modeling approaches have been shown to avoid the accumulated uncertainty associated with many physically-based models, providing an appealing alternative for hydrologic prediction. However, the effectiveness of different methods in hydrologically and geomorphically distinct catchments, and the robustness of these methods to changing climate and changing hydrologic processes remain to be tested. Here, we evaluate the use of machine learning techniques to predict daily runoff across time and space using only essential climatic forcing (e.g. precipitation, temperature, and potential evapotranspiration) time series as model input. Model training and testing was done using a high quality dataset of daily runoff and climate forcing data for 25+ years for 600+ minimally-disturbed catchments (drainage area range 5-25,000 km2, median size 336 km2) that cover a wide range of climatic and physical characteristics. Preliminary results using Support Vector Regression (SVR) suggest that in some catchments this nonlinear-based regression technique can accurately predict daily runoff, while the same approach fails in other catchments, indicating that the representation of climate inputs and/or catchment filter characteristics in the model structure need further refinement to increase performance. We bolster this analysis by using Sparse Identification of Nonlinear Dynamics (a sparse symbolic regression technique) to uncover the governing equations that describe runoff processes in catchments where SVR performed well and for ones where it performed poorly, thereby enabling inference about governing processes. This provides a robust means of examining how catchment complexity influences runoff prediction skill, and represents a contribution towards the integration of data-driven inference and physically-based models.

  7. Determinants of efficiency in reducing child mortality in developing countries. The role of inequality and government effectiveness.

    PubMed

    Ortega, Bienvenido; Sanjuán, Jesús; Casquero, Antonio

    2017-12-01

    The main aim of this article was to analyze the relationship of income inequality and government effectiveness with differences in efficiency in the use of health inputs to improve the under-five survival rate (U5SR) in developing countries. Robust Data Envelopment Analysis (DEA) and regression analysis were conducted using data for 47 developing countries for the periods 2000-2004, 2005-2009, and 2010-2012. The estimations show that countries with a more equal income distribution and better government effectiveness (i.e. a more competent bureaucracy and good quality public service delivery) may need fewer health inputs to achieve a specific level of the U5SR than other countries with higher inequality and worse government effectiveness.

  8. Assessment of parametric uncertainty for groundwater reactive transport modeling,

    USGS Publications Warehouse

    Shi, Xiaoqing; Ye, Ming; Curtis, Gary P.; Miller, Geoffery L.; Meyer, Philip D.; Kohler, Matthias; Yabusaki, Steve; Wu, Jichun

    2014-01-01

    The validity of using Gaussian assumptions for model residuals in uncertainty quantification of a groundwater reactive transport model was evaluated in this study. Least squares regression methods explicitly assume Gaussian residuals, and the assumption leads to Gaussian likelihood functions, model parameters, and model predictions. While the Bayesian methods do not explicitly require the Gaussian assumption, Gaussian residuals are widely used. This paper shows that the residuals of the reactive transport model are non-Gaussian, heteroscedastic, and correlated in time; characterizing them requires using a generalized likelihood function such as the formal generalized likelihood function developed by Schoups and Vrugt (2010). For the surface complexation model considered in this study for simulating uranium reactive transport in groundwater, parametric uncertainty is quantified using the least squares regression methods and Bayesian methods with both Gaussian and formal generalized likelihood functions. While the least squares methods and Bayesian methods with Gaussian likelihood function produce similar Gaussian parameter distributions, the parameter distributions of Bayesian uncertainty quantification using the formal generalized likelihood function are non-Gaussian. In addition, predictive performance of formal generalized likelihood function is superior to that of least squares regression and Bayesian methods with Gaussian likelihood function. The Bayesian uncertainty quantification is conducted using the differential evolution adaptive metropolis (DREAM(zs)) algorithm; as a Markov chain Monte Carlo (MCMC) method, it is a robust tool for quantifying uncertainty in groundwater reactive transport models. For the surface complexation model, the regression-based local sensitivity analysis and Morris- and DREAM(ZS)-based global sensitivity analysis yield almost identical ranking of parameter importance. The uncertainty analysis may help select appropriate likelihood functions, improve model calibration, and reduce predictive uncertainty in other groundwater reactive transport and environmental modeling.

  9. A Weighted Least Squares Approach To Robustify Least Squares Estimates.

    ERIC Educational Resources Information Center

    Lin, Chowhong; Davenport, Ernest C., Jr.

    This study developed a robust linear regression technique based on the idea of weighted least squares. In this technique, a subsample of the full data of interest is drawn, based on a measure of distance, and an initial set of regression coefficients is calculated. The rest of the data points are then taken into the subsample, one after another,…

  10. The capitalized value of rainwater tanks in the property market of Perth, Australia

    NASA Astrophysics Data System (ADS)

    Zhang, Fan; Polyakov, Maksym; Fogarty, James; Pannell, David J.

    2015-03-01

    In response to frequent water shortages, governments in Australia have encouraged home owners to install rainwater tanks, often by provision of partial funding for their installation. A simple investment analysis suggests that the net private benefits of rainwater tanks are negative, potentially providing justification for funding support for tank installation if it results in sufficiently large public benefits. However, using a hedonic price analysis we estimate that there is a premium of up to AU18,000 built into the sale prices of houses with tanks installed. The premium is likely to be greater than the costs of installation, even allowing for the cost of time that home owners must devote to research, purchase and installation. The premium is likely to reflect non-financial as well as financial benefits from installation. The robustness of our estimated premium is investigated using both bounded regression analysis and simulation methods and the result is found to be highly robust. The policy implication is that governments should not rely on payments to encourage installation of rainwater tanks, but instead should use information provision as their main mechanism for promoting uptake. Several explanations for the observation that many home owners are apparently leaving benefits on the table are canvased, but no fully satisfactory explanation is identified.

  11. Solar cycle in current reanalyses: (non)linear attribution study

    NASA Astrophysics Data System (ADS)

    Kuchar, A.; Sacha, P.; Miksovsky, J.; Pisoft, P.

    2014-12-01

    This study focusses on the variability of temperature, ozone and circulation characteristics in the stratosphere and lower mesosphere with regard to the influence of the 11 year solar cycle. It is based on attribution analysis using multiple nonlinear techniques (Support Vector Regression, Neural Networks) besides the traditional linear approach. The analysis was applied to several current reanalysis datasets for the 1979-2013 period, including MERRA, ERA-Interim and JRA-55, with the aim to compare how this type of data resolves especially the double-peaked solar response in temperature and ozone variables and the consequent changes induced by these anomalies. Equatorial temperature signals in the lower and upper stratosphere were found to be sufficiently robust and in qualitative agreement with previous observational studies. The analysis also pointed to the solar signal in the ozone datasets (i.e. MERRA and ERA-Interim) not being consistent with the observed double-peaked ozone anomaly extracted from satellite measurements. Consequently the results obtained by linear regression were confirmed by the nonlinear approach through all datasets, suggesting that linear regression is a relevant tool to sufficiently resolve the solar signal in the middle atmosphere. Furthermore, the seasonal dependence of the solar response was also discussed, mainly as a source of dynamical causalities in the wave propagation characteristics in the zonal wind and the induced meridional circulation in the winter hemispheres. The hypothetical mechanism of a weaker Brewer Dobson circulation was reviewed together with discussion of polar vortex stability.

  12. Robust design optimization method for centrifugal impellers under surface roughness uncertainties due to blade fouling

    NASA Astrophysics Data System (ADS)

    Ju, Yaping; Zhang, Chuhua

    2016-03-01

    Blade fouling has been proved to be a great threat to compressor performance in operating stage. The current researches on fouling-induced performance degradations of centrifugal compressors are based mainly on simplified roughness models without taking into account the realistic factors such as spatial non-uniformity and randomness of the fouling-induced surface roughness. Moreover, little attention has been paid to the robust design optimization of centrifugal compressor impellers with considerations of blade fouling. In this paper, a multi-objective robust design optimization method is developed for centrifugal impellers under surface roughness uncertainties due to blade fouling. A three-dimensional surface roughness map is proposed to describe the nonuniformity and randomness of realistic fouling accumulations on blades. To lower computational cost in robust design optimization, the support vector regression (SVR) metamodel is combined with the Monte Carlo simulation (MCS) method to conduct the uncertainty analysis of fouled impeller performance. The analyzed results show that the critical fouled region associated with impeller performance degradations lies at the leading edge of blade tip. The SVR metamodel has been proved to be an efficient and accurate means in the detection of impeller performance variations caused by roughness uncertainties. After design optimization, the robust optimal design is found to be more efficient and less sensitive to fouling uncertainties while maintaining good impeller performance in the clean condition. This research proposes a systematic design optimization method for centrifugal compressors with considerations of blade fouling, providing a practical guidance to the design of advanced centrifugal compressors.

  13. Incremental online learning in high dimensions.

    PubMed

    Vijayakumar, Sethu; D'Souza, Aaron; Schaal, Stefan

    2005-12-01

    Locally weighted projection regression (LWPR) is a new algorithm for incremental nonlinear function approximation in high-dimensional spaces with redundant and irrelevant input dimensions. At its core, it employs nonparametric regression with locally linear models. In order to stay computationally efficient and numerically robust, each local model performs the regression analysis with a small number of univariate regressions in selected directions in input space in the spirit of partial least squares regression. We discuss when and how local learning techniques can successfully work in high-dimensional spaces and review the various techniques for local dimensionality reduction before finally deriving the LWPR algorithm. The properties of LWPR are that it (1) learns rapidly with second-order learning methods based on incremental training, (2) uses statistically sound stochastic leave-one-out cross validation for learning without the need to memorize training data, (3) adjusts its weighting kernels based on only local information in order to minimize the danger of negative interference of incremental learning, (4) has a computational complexity that is linear in the number of inputs, and (5) can deal with a large number of-possibly redundant-inputs, as shown in various empirical evaluations with up to 90 dimensional data sets. For a probabilistic interpretation, predictive variance and confidence intervals are derived. To our knowledge, LWPR is the first truly incremental spatially localized learning method that can successfully and efficiently operate in very high-dimensional spaces.

  14. The regression discontinuity design showed to be a valid alternative to a randomized controlled trial for estimating treatment effects.

    PubMed

    Maas, Iris L; Nolte, Sandra; Walter, Otto B; Berger, Thomas; Hautzinger, Martin; Hohagen, Fritz; Lutz, Wolfgang; Meyer, Björn; Schröder, Johanna; Späth, Christina; Klein, Jan Philipp; Moritz, Steffen; Rose, Matthias

    2017-02-01

    To compare treatment effect estimates obtained from a regression discontinuity (RD) design with results from an actual randomized controlled trial (RCT). Data from an RCT (EVIDENT), which studied the effect of an Internet intervention on depressive symptoms measured with the Patient Health Questionnaire (PHQ-9), were used to perform an RD analysis, in which treatment allocation was determined by a cutoff value at baseline (PHQ-9 = 10). A linear regression model was fitted to the data, selecting participants above the cutoff who had received the intervention (n = 317) and control participants below the cutoff (n = 187). Outcome was PHQ-9 sum score 12 weeks after baseline. Robustness of the effect estimate was studied; the estimate was compared with the RCT treatment effect. The final regression model showed a regression coefficient of -2.29 [95% confidence interval (CI): -3.72 to -.85] compared with a treatment effect found in the RCT of -1.57 (95% CI: -2.07 to -1.07). Although the estimates obtained from two designs are not equal, their confidence intervals overlap, suggesting that an RD design can be a valid alternative for RCTs. This finding is particularly important for situations where an RCT may not be feasible or ethical as is often the case in clinical research settings. Copyright © 2016 Elsevier Inc. All rights reserved.

  15. Steep delay discounting and addictive behavior: a meta-analysis of continuous associations.

    PubMed

    Amlung, Michael; Vedelago, Lana; Acker, John; Balodis, Iris; MacKillop, James

    2017-01-01

    To synthesize continuous associations between delayed reward discounting (DRD) and both addiction severity and quantity-frequency (QF); to examine moderators of these relationships; and to investigate publication bias. Meta-analysis of published studies examining continuous associations between DRD and addictive behaviors. Published, peer-reviewed studies on addictive behaviors (alcohol, tobacco, cannabis, stimulants, opiates and gambling) were identified via PubMed, MEDLINE and PsycInfo. Studies were restricted to DRD measures of monetary gains. Random-effects meta-analysis was conducted using Pearson's r as the effect size. Publication bias was evaluated using fail-safe N, Begg-Mazumdar and Egger's tests, meta-regression of publication year and effect size and imputation of missing studies. The primary meta-analysis revealed a small magnitude effect size that was highly significant (r = 0.14, P < 10 -14 ). Significantly larger effect sizes were observed for studies examining severity compared with QF (P = 0.01), but not between the type of addictive behavior (P = 0.30) or DRD assessment (P = 0.90). Indices of publication bias suggested a modest impact of unpublished findings. Delayed reward discounting is associated robustly with continuous measures of addiction severity and quantity-frequency. This relation is generally robust across type of addictive behavior and delayed reward discounting assessment modality. © 2016 Society for the Study of Addiction.

  16. Development of a chromatographic method with multi-criteria decision making design for simultaneous determination of nifedipine and atenolol in content uniformity testing.

    PubMed

    Ahmed, Sameh; Alqurshi, Abdulmalik; Mohamed, Abdel-Maaboud Ismail

    2018-07-01

    A new robust and reliable high-performance liquid chromatography (HPLC) method with multi-criteria decision making (MCDM) approach was developed to allow simultaneous quantification of atenolol (ATN) and nifedipine (NFD) in content uniformity testing. Felodipine (FLD) was used as an internal standard (I.S.) in this study. A novel marriage between a new interactive response optimizer and a HPLC method was suggested for multiple response optimizations of target responses. An interactive response optimizer was used as a decision and prediction tool for the optimal settings of target responses, according to specified criteria, based on Derringer's desirability. Four independent variables were considered in this study: Acetonitrile%, buffer pH and concentration along with column temperature. Eight responses were optimized: retention times of ATN, NFD, and FLD, resolutions between ATN/NFD and NFD/FLD, and plate numbers for ATN, NFD, and FLD. Multiple regression analysis was applied in order to scan the influences of the most significant variables for the regression models. The experimental design was set to give minimum retention times, maximum resolution and plate numbers. The interactive response optimizer allowed prediction of optimum conditions according to these criteria with a good composite desirability value of 0.98156. The developed method was validated according to the International Conference on Harmonization (ICH) guidelines with the aid of the experimental design. The developed MCDM-HPLC method showed superior robustness and resolution in short analysis time allowing successful simultaneous content uniformity testing of ATN and NFD in marketed capsules. The current work presents an interactive response optimizer as an efficient platform to optimize, predict responses, and validate HPLC methodology with tolerable design space for assay in quality control laboratories. Copyright © 2018 Elsevier B.V. All rights reserved.

  17. Outcomes of an intervention to improve hospital antibiotic prescribing: interrupted time series with segmented regression analysis.

    PubMed

    Ansari, Faranak; Gray, Kirsteen; Nathwani, Dilip; Phillips, Gabby; Ogston, Simon; Ramsay, Craig; Davey, Peter

    2003-11-01

    To evaluate an intervention to reduce inappropriate use of key antibiotics with interrupted time series analysis. The intervention is a policy for appropriate use of Alert Antibiotics (carbapenems, glycopeptides, amphotericin, ciprofloxacin, linezolid, piperacillin-tazobactam and third-generation cephalosporins) implemented through concurrent, patient-specific feedback by clinical pharmacists. Statistical significance and effect size were calculated by segmented regression analysis of interrupted time series of drug use and cost for 2 years before and after the intervention started. Use of Alert Antibiotics increased before the intervention started but decreased steadily for 2 years thereafter. The changes in slope of the time series were 0.27 defined daily doses/100 bed-days per month (95% CI 0.19-0.34) and pound 1908 per month (95% CI pound 1238- pound 2578). The cost of development, dissemination and implementation of the intervention ( pound 20133) was well below the most conservative estimate of the reduction in cost ( pound 133296), which is the lower 95% CI of effect size assuming that cost would not have continued to increase without the intervention. However, if use had continued to increase, the difference between predicted and actual cost of Alert Antibiotics was pound 572448 (95% CI pound 435696- pound 709176) over the 24 months after the intervention started. Segmented regression analysis of pharmacy stock data is a simple, practical and robust method for measuring the impact of interventions to change prescribing. The Alert Antibiotic Monitoring intervention was associated with significant decreases in total use and cost in the 2 years after the programme was implemented. In our hospital, the value of the data far exceeded the cost of processing and analysis.

  18. Comparative Efficacy of Tongxinluo Capsule and Beta-Blockers in Treating Angina Pectoris: Meta-Analysis of Randomized Controlled Trials.

    PubMed

    Jia, Yongliang; Leung, Siu-wai

    2015-11-01

    There have been no systematic reviews, let alone meta-analyses, of randomized controlled trials (RCTs) comparing tongxinluo capsule (TXL) and beta-blockers in treating angina pectoris. This study aimed to evaluate the efficacy of TXL and beta-blockers in treating angina pectoris by a meta-analysis of eligible RCTs. The RCTs comparing TXL with beta-blockers (including metoprolol) in treating angina pectoris were searched and retrieved from databases including PubMed, Chinese National Knowledge Infrastructure, and WanFang Data. Eligible RCTs were selected according to prespecified criteria. Meta-analysis was performed on the odds ratios (OR) of symptomatic and electrocardiographic (ECG) improvements after treatment. Subgroup analysis, sensitivity analysis, meta-regression, and publication biases analysis were conducted to evaluate the robustness of the results. Seventy-three RCTs published between 2000 and 2014 with 7424 participants were eligible. Overall ORs comparing TXL with beta-blockers were 3.40 (95% confidence interval [CI], 2.97-3.89; p<0.0001) for symptomatic improvement and 2.63 (95% CI, 2.29-3.02; p<0.0001) for ECG improvement. Subgroup analysis and sensitivity analysis found no statistically significant dependence of overall ORs on specific study characteristics except efficacy criteria. Meta-regression found no significant except sample sizes for data on symptomatic improvement. Publication biases were statistically significant. TXL seems to be more effective than beta-blockers in treating angina pectoris, on the basis of the eligible RCTs. Further RCTs are warranted to reduce publication bias and verify efficacy.

  19. Cerebral autoregulation in the preterm newborn using near-infrared spectroscopy: a comparison of time-domain and frequency-domain analyses

    NASA Astrophysics Data System (ADS)

    Eriksen, Vibeke R.; Hahn, Gitte H.; Greisen, Gorm

    2015-03-01

    The aim was to compare two conventional methods used to describe cerebral autoregulation (CA): frequency-domain analysis and time-domain analysis. We measured cerebral oxygenation (as a surrogate for cerebral blood flow) and mean arterial blood pressure (MAP) in 60 preterm infants. In the frequency domain, outcome variables were coherence and gain, whereas the cerebral oximetry index (COx) and the regression coefficient were the outcome variables in the time domain. Correlation between coherence and COx was poor. The disagreement between the two methods was due to the MAP and cerebral oxygenation signals being in counterphase in three cases. High gain and high coherence may arise spuriously when cerebral oxygenation decreases as MAP increases; hence, time-domain analysis appears to be a more robust-and simpler-method to describe CA.

  20. Argon-oxygen atmospheric pressure plasma treatment on carbon fiber reinforced polymer for improved bonding

    NASA Astrophysics Data System (ADS)

    Chartosias, Marios

    Acceptance of Carbon Fiber Reinforced Polymer (CFRP) structures requires a robust surface preparation method with improved process controls capable of ensuring high bond quality. Surface preparation in a production clean room environment prior to applying adhesive for bonding would minimize risk of contamination and reduce cost. Plasma treatment is a robust surface preparation process capable of being applied in a production clean room environment with process parameters that are easily controlled and documented. Repeatable and consistent processing is enabled through the development of a process parameter window utilizing techniques such as Design of Experiments (DOE) tailored to specific adhesive and substrate bonding applications. Insight from respective plasma treatment Original Equipment Manufacturers (OEMs) and screening tests determined critical process factors from non-factors and set the associated factor levels prior to execution of the DOE. Results from mode I Double Cantilever Beam (DCB) testing per ASTM D 5528 [1] standard and DOE statistical analysis software are used to produce a regression model and determine appropriate optimum settings for each factor.

  1. New flux based dose-response relationships for ozone for European forest tree species.

    PubMed

    Büker, P; Feng, Z; Uddling, J; Briolat, A; Alonso, R; Braun, S; Elvira, S; Gerosa, G; Karlsson, P E; Le Thiec, D; Marzuoli, R; Mills, G; Oksanen, E; Wieser, G; Wilkinson, M; Emberson, L D

    2015-11-01

    To derive O3 dose-response relationships (DRR) for five European forest trees species and broadleaf deciduous and needleleaf tree plant functional types (PFTs), phytotoxic O3 doses (PODy) were related to biomass reductions. PODy was calculated using a stomatal flux model with a range of cut-off thresholds (y) indicative of varying detoxification capacities. Linear regression analysis showed that DRR for PFT and individual tree species differed in their robustness. A simplified parameterisation of the flux model was tested and showed that for most non-Mediterranean tree species, this simplified model led to similarly robust DRR as compared to a species- and climate region-specific parameterisation. Experimentally induced soil water stress was not found to substantially reduce PODy, mainly due to the short duration of soil water stress periods. This study validates the stomatal O3 flux concept and represents a step forward in predicting O3 damage to forests in a spatially and temporally varying climate. Crown Copyright © 2015. Published by Elsevier Ltd. All rights reserved.

  2. A robust nonparametric framework for reconstruction of stochastic differential equation models

    NASA Astrophysics Data System (ADS)

    Rajabzadeh, Yalda; Rezaie, Amir Hossein; Amindavar, Hamidreza

    2016-05-01

    In this paper, we employ a nonparametric framework to robustly estimate the functional forms of drift and diffusion terms from discrete stationary time series. The proposed method significantly improves the accuracy of the parameter estimation. In this framework, drift and diffusion coefficients are modeled through orthogonal Legendre polynomials. We employ the least squares regression approach along with the Euler-Maruyama approximation method to learn coefficients of stochastic model. Next, a numerical discrete construction of mean squared prediction error (MSPE) is established to calculate the order of Legendre polynomials in drift and diffusion terms. We show numerically that the new method is robust against the variation in sample size and sampling rate. The performance of our method in comparison with the kernel-based regression (KBR) method is demonstrated through simulation and real data. In case of real dataset, we test our method for discriminating healthy electroencephalogram (EEG) signals from epilepsy ones. We also demonstrate the efficiency of the method through prediction in the financial data. In both simulation and real data, our algorithm outperforms the KBR method.

  3. Image interpolation via regularized local linear regression.

    PubMed

    Liu, Xianming; Zhao, Debin; Xiong, Ruiqin; Ma, Siwei; Gao, Wen; Sun, Huifang

    2011-12-01

    The linear regression model is a very attractive tool to design effective image interpolation schemes. Some regression-based image interpolation algorithms have been proposed in the literature, in which the objective functions are optimized by ordinary least squares (OLS). However, it is shown that interpolation with OLS may have some undesirable properties from a robustness point of view: even small amounts of outliers can dramatically affect the estimates. To address these issues, in this paper we propose a novel image interpolation algorithm based on regularized local linear regression (RLLR). Starting with the linear regression model where we replace the OLS error norm with the moving least squares (MLS) error norm leads to a robust estimator of local image structure. To keep the solution stable and avoid overfitting, we incorporate the l(2)-norm as the estimator complexity penalty. Moreover, motivated by recent progress on manifold-based semi-supervised learning, we explicitly consider the intrinsic manifold structure by making use of both measured and unmeasured data points. Specifically, our framework incorporates the geometric structure of the marginal probability distribution induced by unmeasured samples as an additional local smoothness preserving constraint. The optimal model parameters can be obtained with a closed-form solution by solving a convex optimization problem. Experimental results on benchmark test images demonstrate that the proposed method achieves very competitive performance with the state-of-the-art interpolation algorithms, especially in image edge structure preservation. © 2011 IEEE

  4. Small-Sample Adjustments for Tests of Moderators and Model Fit in Robust Variance Estimation in Meta-Regression

    ERIC Educational Resources Information Center

    Tipton, Elizabeth; Pustejovsky, James E.

    2015-01-01

    Randomized experiments are commonly used to evaluate the effectiveness of educational interventions. The goal of the present investigation is to develop small-sample corrections for multiple contrast hypothesis tests (i.e., F-tests) such as the omnibus test of meta-regression fit or a test for equality of three or more levels of a categorical…

  5. Comparison of different functional EIT approaches to quantify tidal ventilation distribution.

    PubMed

    Zhao, Zhanqi; Yun, Po-Jen; Kuo, Yen-Liang; Fu, Feng; Dai, Meng; Frerichs, Inez; Möller, Knut

    2018-01-30

    The aim of the study was to examine the pros and cons of different types of functional EIT (fEIT) to quantify tidal ventilation distribution in a clinical setting. fEIT images were calculated with (1) standard deviation of pixel time curve, (2) regression coefficients of global and local impedance time curves, or (3) mean tidal variations. To characterize temporal heterogeneity of tidal ventilation distribution, another fEIT image of pixel inspiration times is also proposed. fEIT-regression is very robust to signals with different phase information. When the respiratory signal should be distinguished from the heart-beat related signal, or during high-frequency oscillatory ventilation, fEIT-regression is superior to other types. fEIT-tidal variation is the most stable image type regarding the baseline shift. We recommend using this type of fEIT image for preliminary evaluation of the acquired EIT data. However, all these fEITs would be misleading in their assessment of ventilation distribution in the presence of temporal heterogeneity. The analysis software provided by the currently available commercial EIT equipment only offers either fEIT of standard deviation or tidal variation. Considering the pros and cons of each fEIT type, we recommend embedding more types into the analysis software to allow the physicians dealing with more complex clinical applications with on-line EIT measurements.

  6. Use of Thematic Mapper for water quality assessment

    NASA Technical Reports Server (NTRS)

    Horn, E. M.; Morrissey, L. A.

    1984-01-01

    The evaluation of simulated TM data obtained on an ER-2 aircraft at twenty-five predesignated sample sites for mapping water quality factors such as conductivity, pH, suspended solids, turbidity, temperature, and depth, is discussed. Using a multiple regression for the seven TM bands, an equation is developed for the suspended solids. TM bands 1, 2, 3, 4, and 6 are used with logarithm conductivity in a multiple regression. The assessment of regression equations for a high coefficient of determination (R-squared) and statistical significance is considered. Confidence intervals about the mean regression point are calculated in order to assess the robustness of the regressions used for mapping conductivity, turbidity, and suspended solids, and by regressing random subsamples of sites and comparing the resultant range of R-squared, cross validation is conducted.

  7. Development and optimization of SPE-HPLC-UV/ELSD for simultaneous determination of nine bioactive components in Shenqi Fuzheng Injection based on Quality by Design principles.

    PubMed

    Wang, Lu; Qu, Haibin

    2016-03-01

    A method combining solid phase extraction, high performance liquid chromatography, and ultraviolet/evaporative light scattering detection (SPE-HPLC-UV/ELSD) was developed according to Quality by Design (QbD) principles and used to assay nine bioactive compounds within a botanical drug, Shenqi Fuzheng Injection. Risk assessment and a Plackett-Burman design were utilized to evaluate the impact of 11 factors on the resolutions and signal-to-noise of chromatographic peaks. Multiple regression and Pareto ranking analysis indicated that the sorbent mass, sample volume, flow rate, column temperature, evaporator temperature, and gas flow rate were statistically significant (p < 0.05) in this procedure. Furthermore, a Box-Behnken design combined with response surface analysis was employed to study the relationships between the quality of SPE-HPLC-UV/ELSD analysis and four significant factors, i.e., flow rate, column temperature, evaporator temperature, and gas flow rate. An analytical design space of SPE-HPLC-UV/ELSD was then constructed by calculated Monte Carlo probability. In the presented approach, the operating parameters of sample preparation, chromatographic separation, and compound detection were investigated simultaneously. Eight terms of method validation, i.e., system-suitability tests, method robustness/ruggedness, sensitivity, precision, repeatability, linearity, accuracy, and stability, were accomplished at a selected working point. These results revealed that the QbD principles were suitable in the development of analytical procedures for samples in complex matrices. Meanwhile, the analytical quality and method robustness were validated by the analytical design space. The presented strategy provides a tutorial on the development of a robust QbD-compliant quantitative method for samples in complex matrices.

  8. Robust estimation for partially linear models with large-dimensional covariates

    PubMed Central

    Zhu, LiPing; Li, RunZe; Cui, HengJian

    2014-01-01

    We are concerned with robust estimation procedures to estimate the parameters in partially linear models with large-dimensional covariates. To enhance the interpretability, we suggest implementing a noncon-cave regularization method in the robust estimation procedure to select important covariates from the linear component. We establish the consistency for both the linear and the nonlinear components when the covariate dimension diverges at the rate of o(n), where n is the sample size. We show that the robust estimate of linear component performs asymptotically as well as its oracle counterpart which assumes the baseline function and the unimportant covariates were known a priori. With a consistent estimator of the linear component, we estimate the nonparametric component by a robust local linear regression. It is proved that the robust estimate of nonlinear component performs asymptotically as well as if the linear component were known in advance. Comprehensive simulation studies are carried out and an application is presented to examine the finite-sample performance of the proposed procedures. PMID:24955087

  9. Robust estimation for partially linear models with large-dimensional covariates.

    PubMed

    Zhu, LiPing; Li, RunZe; Cui, HengJian

    2013-10-01

    We are concerned with robust estimation procedures to estimate the parameters in partially linear models with large-dimensional covariates. To enhance the interpretability, we suggest implementing a noncon-cave regularization method in the robust estimation procedure to select important covariates from the linear component. We establish the consistency for both the linear and the nonlinear components when the covariate dimension diverges at the rate of [Formula: see text], where n is the sample size. We show that the robust estimate of linear component performs asymptotically as well as its oracle counterpart which assumes the baseline function and the unimportant covariates were known a priori. With a consistent estimator of the linear component, we estimate the nonparametric component by a robust local linear regression. It is proved that the robust estimate of nonlinear component performs asymptotically as well as if the linear component were known in advance. Comprehensive simulation studies are carried out and an application is presented to examine the finite-sample performance of the proposed procedures.

  10. PREDICTION OF MALIGNANT BREAST LESIONS FROM MRI FEATURES: A COMPARISON OF ARTIFICIAL NEURAL NETWORK AND LOGISTIC REGRESSION TECHNIQUES

    PubMed Central

    McLaren, Christine E.; Chen, Wen-Pin; Nie, Ke; Su, Min-Ying

    2009-01-01

    Rationale and Objectives Dynamic contrast enhanced MRI (DCE-MRI) is a clinical imaging modality for detection and diagnosis of breast lesions. Analytical methods were compared for diagnostic feature selection and performance of lesion classification to differentiate between malignant and benign lesions in patients. Materials and Methods The study included 43 malignant and 28 benign histologically-proven lesions. Eight morphological parameters, ten gray level co-occurrence matrices (GLCM) texture features, and fourteen Laws’ texture features were obtained using automated lesion segmentation and quantitative feature extraction. Artificial neural network (ANN) and logistic regression analysis were compared for selection of the best predictors of malignant lesions among the normalized features. Results Using ANN, the final four selected features were compactness, energy, homogeneity, and Law_LS, with area under the receiver operating characteristic curve (AUC) = 0.82, and accuracy = 0.76. The diagnostic performance of these 4-features computed on the basis of logistic regression yielded AUC = 0.80 (95% CI, 0.688 to 0.905), similar to that of ANN. The analysis also shows that the odds of a malignant lesion decreased by 48% (95% CI, 25% to 92%) for every increase of 1 SD in the Law_LS feature, adjusted for differences in compactness, energy, and homogeneity. Using logistic regression with z-score transformation, a model comprised of compactness, NRL entropy, and gray level sum average was selected, and it had the highest overall accuracy of 0.75 among all models, with AUC = 0.77 (95% CI, 0.660 to 0.880). When logistic modeling of transformations using the Box-Cox method was performed, the most parsimonious model with predictors, compactness and Law_LS, had an AUC of 0.79 (95% CI, 0.672 to 0.898). Conclusion The diagnostic performance of models selected by ANN and logistic regression was similar. The analytic methods were found to be roughly equivalent in terms of predictive ability when a small number of variables were chosen. The robust ANN methodology utilizes a sophisticated non-linear model, while logistic regression analysis provides insightful information to enhance interpretation of the model features. PMID:19409817

  11. A secure distributed logistic regression protocol for the detection of rare adverse drug events

    PubMed Central

    El Emam, Khaled; Samet, Saeed; Arbuckle, Luk; Tamblyn, Robyn; Earle, Craig; Kantarcioglu, Murat

    2013-01-01

    Background There is limited capacity to assess the comparative risks of medications after they enter the market. For rare adverse events, the pooling of data from multiple sources is necessary to have the power and sufficient population heterogeneity to detect differences in safety and effectiveness in genetic, ethnic and clinically defined subpopulations. However, combining datasets from different data custodians or jurisdictions to perform an analysis on the pooled data creates significant privacy concerns that would need to be addressed. Existing protocols for addressing these concerns can result in reduced analysis accuracy and can allow sensitive information to leak. Objective To develop a secure distributed multi-party computation protocol for logistic regression that provides strong privacy guarantees. Methods We developed a secure distributed logistic regression protocol using a single analysis center with multiple sites providing data. A theoretical security analysis demonstrates that the protocol is robust to plausible collusion attacks and does not allow the parties to gain new information from the data that are exchanged among them. The computational performance and accuracy of the protocol were evaluated on simulated datasets. Results The computational performance scales linearly as the dataset sizes increase. The addition of sites results in an exponential growth in computation time. However, for up to five sites, the time is still short and would not affect practical applications. The model parameters are the same as the results on pooled raw data analyzed in SAS, demonstrating high model accuracy. Conclusion The proposed protocol and prototype system would allow the development of logistic regression models in a secure manner without requiring the sharing of personal health information. This can alleviate one of the key barriers to the establishment of large-scale post-marketing surveillance programs. We extended the secure protocol to account for correlations among patients within sites through generalized estimating equations, and to accommodate other link functions by extending it to generalized linear models. PMID:22871397

  12. A secure distributed logistic regression protocol for the detection of rare adverse drug events.

    PubMed

    El Emam, Khaled; Samet, Saeed; Arbuckle, Luk; Tamblyn, Robyn; Earle, Craig; Kantarcioglu, Murat

    2013-05-01

    There is limited capacity to assess the comparative risks of medications after they enter the market. For rare adverse events, the pooling of data from multiple sources is necessary to have the power and sufficient population heterogeneity to detect differences in safety and effectiveness in genetic, ethnic and clinically defined subpopulations. However, combining datasets from different data custodians or jurisdictions to perform an analysis on the pooled data creates significant privacy concerns that would need to be addressed. Existing protocols for addressing these concerns can result in reduced analysis accuracy and can allow sensitive information to leak. To develop a secure distributed multi-party computation protocol for logistic regression that provides strong privacy guarantees. We developed a secure distributed logistic regression protocol using a single analysis center with multiple sites providing data. A theoretical security analysis demonstrates that the protocol is robust to plausible collusion attacks and does not allow the parties to gain new information from the data that are exchanged among them. The computational performance and accuracy of the protocol were evaluated on simulated datasets. The computational performance scales linearly as the dataset sizes increase. The addition of sites results in an exponential growth in computation time. However, for up to five sites, the time is still short and would not affect practical applications. The model parameters are the same as the results on pooled raw data analyzed in SAS, demonstrating high model accuracy. The proposed protocol and prototype system would allow the development of logistic regression models in a secure manner without requiring the sharing of personal health information. This can alleviate one of the key barriers to the establishment of large-scale post-marketing surveillance programs. We extended the secure protocol to account for correlations among patients within sites through generalized estimating equations, and to accommodate other link functions by extending it to generalized linear models.

  13. Inequality and adolescent violence: an exploration of community, family, and individual factors.

    PubMed Central

    Bruce, Marino A.

    2004-01-01

    PURPOSE: The study seeks to examine whether the relationships among community, family, individual factors, and violent behavior are parallel across race- and gender-specific segments of the adolescent population. METHODS: Data from the National Longitudinal Study of Adolescent Health are analyzed to highlight the complex relationships between inequality, community, family, individual behavior, and violence. RESULTS: The results from robust regression analysis provide evidence that social environmental factors can influence adolescent violence in race- and gender-specific ways. CONCLUSIONS: Findings from this study establish the plausibility of multidimensional models that specify a complex relationship between inequality and adolescent violence. PMID:15101669

  14. Working covariance model selection for generalized estimating equations.

    PubMed

    Carey, Vincent J; Wang, You-Gan

    2011-11-20

    We investigate methods for data-based selection of working covariance models in the analysis of correlated data with generalized estimating equations. We study two selection criteria: Gaussian pseudolikelihood and a geodesic distance based on discrepancy between model-sensitive and model-robust regression parameter covariance estimators. The Gaussian pseudolikelihood is found in simulation to be reasonably sensitive for several response distributions and noncanonical mean-variance relations for longitudinal data. Application is also made to a clinical dataset. Assessment of adequacy of both correlation and variance models for longitudinal data should be routine in applications, and we describe open-source software supporting this practice. Copyright © 2011 John Wiley & Sons, Ltd.

  15. Extreme Sparse Multinomial Logistic Regression: A Fast and Robust Framework for Hyperspectral Image Classification

    NASA Astrophysics Data System (ADS)

    Cao, Faxian; Yang, Zhijing; Ren, Jinchang; Ling, Wing-Kuen; Zhao, Huimin; Marshall, Stephen

    2017-12-01

    Although the sparse multinomial logistic regression (SMLR) has provided a useful tool for sparse classification, it suffers from inefficacy in dealing with high dimensional features and manually set initial regressor values. This has significantly constrained its applications for hyperspectral image (HSI) classification. In order to tackle these two drawbacks, an extreme sparse multinomial logistic regression (ESMLR) is proposed for effective classification of HSI. First, the HSI dataset is projected to a new feature space with randomly generated weight and bias. Second, an optimization model is established by the Lagrange multiplier method and the dual principle to automatically determine a good initial regressor for SMLR via minimizing the training error and the regressor value. Furthermore, the extended multi-attribute profiles (EMAPs) are utilized for extracting both the spectral and spatial features. A combinational linear multiple features learning (MFL) method is proposed to further enhance the features extracted by ESMLR and EMAPs. Finally, the logistic regression via the variable splitting and the augmented Lagrangian (LORSAL) is adopted in the proposed framework for reducing the computational time. Experiments are conducted on two well-known HSI datasets, namely the Indian Pines dataset and the Pavia University dataset, which have shown the fast and robust performance of the proposed ESMLR framework.

  16. Robust discovery of genetic associations incorporating gene-environment interaction and independence.

    PubMed

    Tchetgen Tchetgen, Eric

    2011-03-01

    This article considers the detection and evaluation of genetic effects incorporating gene-environment interaction and independence. Whereas ordinary logistic regression cannot exploit the assumption of gene-environment independence, the proposed approach makes explicit use of the independence assumption to improve estimation efficiency. This method, which uses both cases and controls, fits a constrained retrospective regression in which the genetic variant plays the role of the response variable, and the disease indicator and the environmental exposure are the independent variables. The regression model constrains the association of the environmental exposure with the genetic variant among the controls to be null, thus explicitly encoding the gene-environment independence assumption, which yields substantial gain in accuracy in the evaluation of genetic effects. The proposed retrospective regression approach has several advantages. It is easy to implement with standard software, and it readily accounts for multiple environmental exposures of a polytomous or of a continuous nature, while easily incorporating extraneous covariates. Unlike the profile likelihood approach of Chatterjee and Carroll (Biometrika. 2005;92:399-418), the proposed method does not require a model for the association of a polytomous or continuous exposure with the disease outcome, and, therefore, it is agnostic to the functional form of such a model and completely robust to its possible misspecification.

  17. Assessing Principal Component Regression Prediction of Neurochemicals Detected with Fast-Scan Cyclic Voltammetry

    PubMed Central

    2011-01-01

    Principal component regression is a multivariate data analysis approach routinely used to predict neurochemical concentrations from in vivo fast-scan cyclic voltammetry measurements. This mathematical procedure can rapidly be employed with present day computer programming languages. Here, we evaluate several methods that can be used to evaluate and improve multivariate concentration determination. The cyclic voltammetric representation of the calculated regression vector is shown to be a valuable tool in determining whether the calculated multivariate model is chemically appropriate. The use of Cook’s distance successfully identified outliers contained within in vivo fast-scan cyclic voltammetry training sets. This work also presents the first direct interpretation of a residual color plot and demonstrated the effect of peak shifts on predicted dopamine concentrations. Finally, separate analyses of smaller increments of a single continuous measurement could not be concatenated without substantial error in the predicted neurochemical concentrations due to electrode drift. Taken together, these tools allow for the construction of more robust multivariate calibration models and provide the first approach to assess the predictive ability of a procedure that is inherently impossible to validate because of the lack of in vivo standards. PMID:21966586

  18. Comparison of Total Solar Irradiance with NASA/NSO Spectromagnetograph Data in Solar Cycles 22 and 23

    NASA Technical Reports Server (NTRS)

    Jones, Harrison P.; Branston, Detrick D.; Jones, Patricia B.; Popescu, Miruna D.

    2002-01-01

    An earlier study compared NASA/NSO Spectromagnetograph (SPM) data with spacecraft measurements of total solar irradiance (TSI) variations over a 1.5 year period in the declining phase of solar cycle 22. This paper extends the analysis to an eight-year period which also spans the rising and early maximum phases of cycle 23. The conclusions of the earlier work appear to be robust: three factors (sunspots, strong unipolar regions, and strong mixed polarity regions) describe most of the variation in the SPM record, but only the first two are associated with TSI. Additionally, the residuals of a linear multiple regression of TSI against SPM observations over the entire eight-year period show an unexplained, increasing, linear time variation with a rate of about 0.05 W m(exp -2) per year. Separate regressions for the periods before and after 1996 January 01 show no unexplained trends but differ substantially in regression parameters. This behavior may reflect a solar source of TSI variations beyond sunspots and faculae but more plausibly results from uncompensated non-solar effects in one or both of the TSI and SPM data sets.

  19. Assessing principal component regression prediction of neurochemicals detected with fast-scan cyclic voltammetry.

    PubMed

    Keithley, Richard B; Wightman, R Mark

    2011-06-07

    Principal component regression is a multivariate data analysis approach routinely used to predict neurochemical concentrations from in vivo fast-scan cyclic voltammetry measurements. This mathematical procedure can rapidly be employed with present day computer programming languages. Here, we evaluate several methods that can be used to evaluate and improve multivariate concentration determination. The cyclic voltammetric representation of the calculated regression vector is shown to be a valuable tool in determining whether the calculated multivariate model is chemically appropriate. The use of Cook's distance successfully identified outliers contained within in vivo fast-scan cyclic voltammetry training sets. This work also presents the first direct interpretation of a residual color plot and demonstrated the effect of peak shifts on predicted dopamine concentrations. Finally, separate analyses of smaller increments of a single continuous measurement could not be concatenated without substantial error in the predicted neurochemical concentrations due to electrode drift. Taken together, these tools allow for the construction of more robust multivariate calibration models and provide the first approach to assess the predictive ability of a procedure that is inherently impossible to validate because of the lack of in vivo standards.

  20. Data-driven discovery of partial differential equations.

    PubMed

    Rudy, Samuel H; Brunton, Steven L; Proctor, Joshua L; Kutz, J Nathan

    2017-04-01

    We propose a sparse regression method capable of discovering the governing partial differential equation(s) of a given system by time series measurements in the spatial domain. The regression framework relies on sparsity-promoting techniques to select the nonlinear and partial derivative terms of the governing equations that most accurately represent the data, bypassing a combinatorially large search through all possible candidate models. The method balances model complexity and regression accuracy by selecting a parsimonious model via Pareto analysis. Time series measurements can be made in an Eulerian framework, where the sensors are fixed spatially, or in a Lagrangian framework, where the sensors move with the dynamics. The method is computationally efficient, robust, and demonstrated to work on a variety of canonical problems spanning a number of scientific domains including Navier-Stokes, the quantum harmonic oscillator, and the diffusion equation. Moreover, the method is capable of disambiguating between potentially nonunique dynamical terms by using multiple time series taken with different initial data. Thus, for a traveling wave, the method can distinguish between a linear wave equation and the Korteweg-de Vries equation, for instance. The method provides a promising new technique for discovering governing equations and physical laws in parameterized spatiotemporal systems, where first-principles derivations are intractable.

  1. Effect of soy isoflavone supplementation on plasma lipoprotein(a) concentrations: A meta-analysis.

    PubMed

    Simental-Mendía, Luis E; Gotto, Antonio M; Atkin, Stephen L; Banach, Maciej; Pirro, Matteo; Sahebkar, Amirhossein

    Soy supplementation has been shown to reduce total and low-density lipoprotein cholesterol, while increasing high-density lipoprotein cholesterol. However, contradictory effects of soy isoflavone supplementation on lipoprotein(a) [Lp(a)] have been reported suggesting the need for a meta-analysis to be undertaken. The aim of the study was to investigate the impact of supplementation with soy isoflavones on plasma Lp(a) levels through a systematic review and meta-analysis of eligible randomized placebo-controlled trials. The search included PubMed-Medline, Scopus, ISI Web of Knowledge, and Google Scholar databases (by March 26, 2017), and quality of studies was evaluated according to Cochrane criteria. Quantitative data synthesis was performed using a random-effects model, with standardized mean difference and 95% confidence interval as summary statistics. Meta-regression and leave-one-out sensitivity analysis were performed to assess the modifiers of treatment response. Ten eligible studies comprising 11 treatment arms with 973 subjects were selected for the meta-analysis. Meta-analysis did not suggest any significant alteration of plasma Lp(a) levels after supplementation with soy isoflavones (standardized mean difference: 0.08, 95% confidence interval: -0.05, 0.20, P = .228). The effect size was robust in the leave-one-out sensitivity analysis. In meta-regression analysis, neither dose nor duration of supplementation with soy isoflavones was significantly associated with the effect size. This meta-analysis of the 10 available randomized placebo-controlled trials revealed no significant effect of soy isoflavones treatment on plasma Lp(a) concentrations. Copyright © 2017 National Lipid Association. Published by Elsevier Inc. All rights reserved.

  2. 4D-LQTA-QSAR and docking study on potent Gram-negative specific LpxC inhibitors: a comparison to CoMFA modeling.

    PubMed

    Ghasemi, Jahan B; Safavi-Sohi, Reihaneh; Barbosa, Euzébio G

    2012-02-01

    A quasi 4D-QSAR has been carried out on a series of potent Gram-negative LpxC inhibitors. This approach makes use of the molecular dynamics (MD) trajectories and topology information retrieved from the GROMACS package. This new methodology is based on the generation of a conformational ensemble profile, CEP, for each compound instead of only one conformation, followed by the calculation intermolecular interaction energies at each grid point considering probes and all aligned conformations resulting from MD simulations. These interaction energies are independent variables employed in a QSAR analysis. The comparison of the proposed methodology to comparative molecular field analysis (CoMFA) formalism was performed. This methodology explores jointly the main features of CoMFA and 4D-QSAR models. Step-wise multiple linear regression was used for the selection of the most informative variables. After variable selection, multiple linear regression (MLR) and partial least squares (PLS) methods used for building the regression models. Leave-N-out cross-validation (LNO), and Y-randomization were performed in order to confirm the robustness of the model in addition to analysis of the independent test set. Best models provided the following statistics: [Formula in text] (PLS) and [Formula in text] (MLR). Docking study was applied to investigate the major interactions in protein-ligand complex with CDOCKER algorithm. Visualization of the descriptors of the best model helps us to interpret the model from the chemical point of view, supporting the applicability of this new approach in rational drug design.

  3. Copula Regression Analysis of Simultaneously Recorded Frontal Eye Field and Inferotemporal Spiking Activity during Object-Based Working Memory

    PubMed Central

    Hu, Meng; Clark, Kelsey L.; Gong, Xiajing; Noudoost, Behrad; Li, Mingyao; Moore, Tirin

    2015-01-01

    Inferotemporal (IT) neurons are known to exhibit persistent, stimulus-selective activity during the delay period of object-based working memory tasks. Frontal eye field (FEF) neurons show robust, spatially selective delay period activity during memory-guided saccade tasks. We present a copula regression paradigm to examine neural interaction of these two types of signals between areas IT and FEF of the monkey during a working memory task. This paradigm is based on copula models that can account for both marginal distribution over spiking activity of individual neurons within each area and joint distribution over ensemble activity of neurons between areas. Considering the popular GLMs as marginal models, we developed a general and flexible likelihood framework that uses the copula to integrate separate GLMs into a joint regression analysis. Such joint analysis essentially leads to a multivariate analog of the marginal GLM theory and hence efficient model estimation. In addition, we show that Granger causality between spike trains can be readily assessed via the likelihood ratio statistic. The performance of this method is validated by extensive simulations, and compared favorably to the widely used GLMs. When applied to spiking activity of simultaneously recorded FEF and IT neurons during working memory task, we observed significant Granger causality influence from FEF to IT, but not in the opposite direction, suggesting the role of the FEF in the selection and retention of visual information during working memory. The copula model has the potential to provide unique neurophysiological insights about network properties of the brain. PMID:26063909

  4. Geodesic least squares regression for scaling studies in magnetic confinement fusion

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Verdoolaege, Geert

    In regression analyses for deriving scaling laws that occur in various scientific disciplines, usually standard regression methods have been applied, of which ordinary least squares (OLS) is the most popular. However, concerns have been raised with respect to several assumptions underlying OLS in its application to scaling laws. We here discuss a new regression method that is robust in the presence of significant uncertainty on both the data and the regression model. The method, which we call geodesic least squares regression (GLS), is based on minimization of the Rao geodesic distance on a probabilistic manifold. We demonstrate the superiority ofmore » the method using synthetic data and we present an application to the scaling law for the power threshold for the transition to the high confinement regime in magnetic confinement fusion devices.« less

  5. Comparison of naïve Bayes and logistic regression for computer-aided diagnosis of breast masses using ultrasound imaging

    NASA Astrophysics Data System (ADS)

    Cary, Theodore W.; Cwanger, Alyssa; Venkatesh, Santosh S.; Conant, Emily F.; Sehgal, Chandra M.

    2012-03-01

    This study compares the performance of two proven but very different machine learners, Naïve Bayes and logistic regression, for differentiating malignant and benign breast masses using ultrasound imaging. Ultrasound images of 266 masses were analyzed quantitatively for shape, echogenicity, margin characteristics, and texture features. These features along with patient age, race, and mammographic BI-RADS category were used to train Naïve Bayes and logistic regression classifiers to diagnose lesions as malignant or benign. ROC analysis was performed using all of the features and using only a subset that maximized information gain. Performance was determined by the area under the ROC curve, Az, obtained from leave-one-out cross validation. Naïve Bayes showed significant variation (Az 0.733 +/- 0.035 to 0.840 +/- 0.029, P < 0.002) with the choice of features, but the performance of logistic regression was relatively unchanged under feature selection (Az 0.839 +/- 0.029 to 0.859 +/- 0.028, P = 0.605). Out of 34 features, a subset of 6 gave the highest information gain: brightness difference, margin sharpness, depth-to-width, mammographic BI-RADs, age, and race. The probabilities of malignancy determined by Naïve Bayes and logistic regression after feature selection showed significant correlation (R2= 0.87, P < 0.0001). The diagnostic performance of Naïve Bayes and logistic regression can be comparable, but logistic regression is more robust. Since probability of malignancy cannot be measured directly, high correlation between the probabilities derived from two basic but dissimilar models increases confidence in the predictive power of machine learning models for characterizing solid breast masses on ultrasound.

  6. Quotation accuracy in medical journal articles-a systematic review and meta-analysis.

    PubMed

    Jergas, Hannah; Baethge, Christopher

    2015-01-01

    Background. Quotations and references are an indispensable element of scientific communication. They should support what authors claim or provide important background information for readers. Studies indicate, however, that quotations not serving their purpose-quotation errors-may be prevalent. Methods. We carried out a systematic review, meta-analysis and meta-regression of quotation errors, taking account of differences between studies in error ascertainment. Results. Out of 559 studies screened we included 28 in the main analysis, and estimated major, minor and total quotation error rates of 11,9%, 95% CI [8.4, 16.6] 11.5% [8.3, 15.7], and 25.4% [19.5, 32.4]. While heterogeneity was substantial, even the lowest estimate of total quotation errors was considerable (6.7%). Indirect references accounted for less than one sixth of all quotation problems. The findings remained robust in a number of sensitivity and subgroup analyses (including risk of bias analysis) and in meta-regression. There was no indication of publication bias. Conclusions. Readers of medical journal articles should be aware of the fact that quotation errors are common. Measures against quotation errors include spot checks by editors and reviewers, correct placement of citations in the text, and declarations by authors that they have checked cited material. Future research should elucidate if and to what degree quotation errors are detrimental to scientific progress.

  7. Quotation accuracy in medical journal articles—a systematic review and meta-analysis

    PubMed Central

    Jergas, Hannah

    2015-01-01

    Background. Quotations and references are an indispensable element of scientific communication. They should support what authors claim or provide important background information for readers. Studies indicate, however, that quotations not serving their purpose—quotation errors—may be prevalent. Methods. We carried out a systematic review, meta-analysis and meta-regression of quotation errors, taking account of differences between studies in error ascertainment. Results. Out of 559 studies screened we included 28 in the main analysis, and estimated major, minor and total quotation error rates of 11,9%, 95% CI [8.4, 16.6] 11.5% [8.3, 15.7], and 25.4% [19.5, 32.4]. While heterogeneity was substantial, even the lowest estimate of total quotation errors was considerable (6.7%). Indirect references accounted for less than one sixth of all quotation problems. The findings remained robust in a number of sensitivity and subgroup analyses (including risk of bias analysis) and in meta-regression. There was no indication of publication bias. Conclusions. Readers of medical journal articles should be aware of the fact that quotation errors are common. Measures against quotation errors include spot checks by editors and reviewers, correct placement of citations in the text, and declarations by authors that they have checked cited material. Future research should elucidate if and to what degree quotation errors are detrimental to scientific progress. PMID:26528420

  8. Fourier Transform Infrared Spectroscopy and Multivariate Analysis for Online Monitoring of Dibutyl Phosphate Degradation Product in Tributyl Phosphate/n-Dodecane/Nitric Acid Solvent

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tatiana G. Levitskaia; James M. Peterson; Emily L. Campbell

    2013-12-01

    In liquid–liquid extraction separation processes, accumulation of organic solvent degradation products is detrimental to the process robustness, and frequent solvent analysis is warranted. Our research explores the feasibility of online monitoring of the organic solvents relevant to used nuclear fuel reprocessing. This paper describes the first phase of developing a system for monitoring the tributyl phosphate (TBP)/n-dodecane solvent commonly used to separate used nuclear fuel. In this investigation, the effect of extraction of nitric acid from aqueous solutions of variable concentrations on the quantification of TBP and its major degradation product dibutylphosphoric acid (HDBP) was assessed. Fourier transform infrared (FTIR)more » spectroscopy was used to discriminate between HDBP and TBP in the nitric acid-containing TBP/n-dodecane solvent. Multivariate analysis of the spectral data facilitated the development of regression models for HDBP and TBP quantification in real time, enabling online implementation of the monitoring system. The predictive regression models were validated using TBP/n-dodecane solvent samples subjected to high-dose external ?-irradiation. The predictive models were translated to flow conditions using a hollow fiber FTIR probe installed in a centrifugal contactor extraction apparatus, demonstrating the applicability of the FTIR technique coupled with multivariate analysis for the online monitoring of the organic solvent degradation products.« less

  9. Fourier Transform Infrared Spectroscopy and Multivariate Analysis for Online Monitoring of Dibutyl Phosphate Degradation Product in Tributyl Phosphate /n-Dodecane/Nitric Acid Solvent

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Levitskaia, Tatiana G.; Peterson, James M.; Campbell, Emily L.

    2013-11-05

    In liquid-liquid extraction separation processes, accumulation of organic solvent degradation products is detrimental to the process robustness and frequent solvent analysis is warranted. Our research explores feasibility of online monitoring of the organic solvents relevant to used nuclear fuel reprocessing. This paper describes the first phase of developing a system for monitoring the tributyl phosphate (TBP)/n-dodecane solvent commonly used to separate used nuclear fuel. In this investigation, the effect of extraction of nitric acid from aqueous solutions of variable concentrations on the quantification of TBP and its major degradation product dibutyl phosphoric acid (HDBP) was assessed. Fourier Transform Infrared Spectroscopymore » (FTIR) spectroscopy was used to discriminate between HDBP and TBP in the nitric acid-containing TBP/n-dodecane solvent. Multivariate analysis of the spectral data facilitated the development of regression models for HDBP and TBP quantification in real time, enabling online implementation of the monitoring system. The predictive regression models were validated using TBP/n-dodecane solvent samples subjected to the high dose external gamma irradiation. The predictive models were translated to flow conditions using a hollow fiber FTIR probe installed in a centrifugal contactor extraction apparatus demonstrating the applicability of the FTIR technique coupled with multivariate analysis for the online monitoring of the organic solvent degradation products.« less

  10. A structured analysis of uncertainty surrounding modeled impacts of groundwater-extraction rules

    NASA Astrophysics Data System (ADS)

    Guillaume, Joseph H. A.; Qureshi, M. Ejaz; Jakeman, Anthony J.

    2012-08-01

    Integrating economic and groundwater models for groundwater-management can help improve understanding of trade-offs involved between conflicting socioeconomic and biophysical objectives. However, there is significant uncertainty in most strategic decision-making situations, including in the models constructed to represent them. If not addressed, this uncertainty may be used to challenge the legitimacy of the models and decisions made using them. In this context, a preliminary uncertainty analysis was conducted of a dynamic coupled economic-groundwater model aimed at assessing groundwater extraction rules. The analysis demonstrates how a variety of uncertainties in such a model can be addressed. A number of methods are used including propagation of scenarios and bounds on parameters, multiple models, block bootstrap time-series sampling and robust linear regression for model calibration. These methods are described within the context of a theoretical uncertainty management framework, using a set of fundamental uncertainty management tasks and an uncertainty typology.

  11. imDEV: a graphical user interface to R multivariate analysis tools in Microsoft Excel.

    PubMed

    Grapov, Dmitry; Newman, John W

    2012-09-01

    Interactive modules for Data Exploration and Visualization (imDEV) is a Microsoft Excel spreadsheet embedded application providing an integrated environment for the analysis of omics data through a user-friendly interface. Individual modules enables interactive and dynamic analyses of large data by interfacing R's multivariate statistics and highly customizable visualizations with the spreadsheet environment, aiding robust inferences and generating information-rich data visualizations. This tool provides access to multiple comparisons with false discovery correction, hierarchical clustering, principal and independent component analyses, partial least squares regression and discriminant analysis, through an intuitive interface for creating high-quality two- and a three-dimensional visualizations including scatter plot matrices, distribution plots, dendrograms, heat maps, biplots, trellis biplots and correlation networks. Freely available for download at http://sourceforge.net/projects/imdev/. Implemented in R and VBA and supported by Microsoft Excel (2003, 2007 and 2010).

  12. Investigating the Important Correlates of Maternal Education and Childhood Malaria Infections

    PubMed Central

    Njau, Joseph D.; Stephenson, Rob; Menon, Manoj P.; Kachur, S. Patrick; McFarland, Deborah A.

    2014-01-01

    The relationship between maternal education and child health has intrigued researchers for decades. This study explored the interaction between maternal education and childhood malaria infection. Cross-sectional survey data from three African countries were used. Descriptive analysis and multivariate logistic regression models were completed in line with identified correlates. Marginal effects and Oaxaca decomposition analysis on maternal education and childhood malaria infection were also estimated. Children with mothers whose education level was beyond primary school were 4.7% less likely to be malaria-positive (P < 0.001). The Oaxaca decomposition analysis exhibited an 8% gap in childhood malaria infection for educated and uneducated mothers. Over 60% of the gap was explained by differences in household wealth (26%), household place of domicile (21%), malaria transmission intensities (14%), and media exposure (12%). All other correlates accounted for only 27%. The full adjusted model showed a robust and significant relationship between maternal education and childhood malaria infection. PMID:25002302

  13. ℓ(p)-Norm multikernel learning approach for stock market price forecasting.

    PubMed

    Shao, Xigao; Wu, Kun; Liao, Bifeng

    2012-01-01

    Linear multiple kernel learning model has been used for predicting financial time series. However, ℓ(1)-norm multiple support vector regression is rarely observed to outperform trivial baselines in practical applications. To allow for robust kernel mixtures that generalize well, we adopt ℓ(p)-norm multiple kernel support vector regression (1 ≤ p < ∞) as a stock price prediction model. The optimization problem is decomposed into smaller subproblems, and the interleaved optimization strategy is employed to solve the regression model. The model is evaluated on forecasting the daily stock closing prices of Shanghai Stock Index in China. Experimental results show that our proposed model performs better than ℓ(1)-norm multiple support vector regression model.

  14. Unemployment and prostate cancer mortality in the OECD, 1990–2009

    PubMed Central

    Maruthappu, Mahiben; Watkins, Johnathan; Taylor, Abigail; Williams, Callum; Ali, Raghib; Zeltner, Thomas; Atun, Rifat

    2015-01-01

    The global economic downturn has been associated with increased unemployment in many countries. Insights into the impact of unemployment on specific health conditions remain limited. We determined the association between unemployment and prostate cancer mortality in members of the Organisation for Economic Co-operation and Development (OECD). We used multivariate regression analysis to assess the association between changes in unemployment and prostate cancer mortality in OECD member states between 1990 and 2009. Country-specific differences in healthcare infrastructure, population structure, and population size were controlled for and lag analyses conducted. Several robustness checks were also performed. Time trend analyses were used to predict the number of excess deaths from prostate cancer following the 2008 global recession. Between 1990 and 2009, a 1% rise in unemployment was associated with an increase in prostate cancer mortality. Lag analysis showed a continued increase in mortality years after unemployment rises. The association between unemployment and prostate cancer mortality remained significant in robustness checks with 46 controls. Eight of the 21 OECD countries for which a time trend analysis was conducted, exhibited an estimated excess of prostate cancer deaths in at least one of 2008, 2009, or 2010, based on 2000–2007 trends. Rises in unemployment are associated with significant increases in prostate cancer mortality. Initiatives that bolster employment may help to minimise prostate cancer mortality during times of economic hardship. PMID:26045715

  15. Direct determination of glucose, lactate and triglycerides in blood serum by a tunable quantum cascade laser-based mid-IR sensor

    NASA Astrophysics Data System (ADS)

    Brandstetter, M.; Volgger, L.; Genner, A.; Jungbauer, C.; Lendl, B.

    2013-02-01

    This work reports on a compact sensor for fast and reagent-free point-of-care determination of glucose, lactate and triglycerides in blood serum based on a tunable (1030-1230 cm-1) external-cavity quantum cascade laser (EC-QCL). For simple and robust operation a single beam set-up was designed and only thermoelectric cooling was used for the employed laser and detector. Full computer control of analysis including liquid handling and data analysis facilitated routine measurements. A high optical pathlength (>100 μm) is a prerequisite for robust measurements in clinical practice. Hence, the optimum optical pathlength for transmission measurements in aqueous solution was considered in theory and experiment. The experimentally determined maximum signal-to-noise ratio (SNR) was around 140 μm for the QCL blood sensor and around 50 μm for a standard FT-IR spectrometer employing a liquid nitrogen cooled mercury cadmium telluride (MCT) detector. A single absorption spectrum was used to calculate the analyte concentrations simultaneously by using a partial-least-squares (PLS) regression analysis. Glucose was determined in blood serum with a prediction error (RMSEP) of 6.9 mg/dl and triglycerides with an error of cross-validation (RMSECV) of 17.5 mg/dl in a set of 42 different patients. In spiked serum samples the lactate concentration could be determined with an RMSECV of 8.9 mg/dl.

  16. Unemployment and prostate cancer mortality in the OECD, 1990-2009.

    PubMed

    Maruthappu, Mahiben; Watkins, Johnathan; Taylor, Abigail; Williams, Callum; Ali, Raghib; Zeltner, Thomas; Atun, Rifat

    2015-01-01

    The global economic downturn has been associated with increased unemployment in many countries. Insights into the impact of unemployment on specific health conditions remain limited. We determined the association between unemployment and prostate cancer mortality in members of the Organisation for Economic Co-operation and Development (OECD). We used multivariate regression analysis to assess the association between changes in unemployment and prostate cancer mortality in OECD member states between 1990 and 2009. Country-specific differences in healthcare infrastructure, population structure, and population size were controlled for and lag analyses conducted. Several robustness checks were also performed. Time trend analyses were used to predict the number of excess deaths from prostate cancer following the 2008 global recession. Between 1990 and 2009, a 1% rise in unemployment was associated with an increase in prostate cancer mortality. Lag analysis showed a continued increase in mortality years after unemployment rises. The association between unemployment and prostate cancer mortality remained significant in robustness checks with 46 controls. Eight of the 21 OECD countries for which a time trend analysis was conducted, exhibited an estimated excess of prostate cancer deaths in at least one of 2008, 2009, or 2010, based on 2000-2007 trends. Rises in unemployment are associated with significant increases in prostate cancer mortality. Initiatives that bolster employment may help to minimise prostate cancer mortality during times of economic hardship.

  17. Accurate palm vein recognition based on wavelet scattering and spectral regression kernel discriminant analysis

    NASA Astrophysics Data System (ADS)

    Elnasir, Selma; Shamsuddin, Siti Mariyam; Farokhi, Sajad

    2015-01-01

    Palm vein recognition (PVR) is a promising new biometric that has been applied successfully as a method of access control by many organizations, which has even further potential in the field of forensics. The palm vein pattern has highly discriminative features that are difficult to forge because of its subcutaneous position in the palm. Despite considerable progress and a few practical issues, providing accurate palm vein readings has remained an unsolved issue in biometrics. We propose a robust and more accurate PVR method based on the combination of wavelet scattering (WS) with spectral regression kernel discriminant analysis (SRKDA). As the dimension of WS generated features is quite large, SRKDA is required to reduce the extracted features to enhance the discrimination. The results based on two public databases-PolyU Hyper Spectral Palmprint public database and PolyU Multi Spectral Palmprint-show the high performance of the proposed scheme in comparison with state-of-the-art methods. The proposed approach scored a 99.44% identification rate and a 99.90% verification rate [equal error rate (EER)=0.1%] for the hyperspectral database and a 99.97% identification rate and a 99.98% verification rate (EER=0.019%) for the multispectral database.

  18. Using permutation tests to enhance causal inference in interrupted time series analysis.

    PubMed

    Linden, Ariel

    2018-06-01

    Interrupted time series analysis (ITSA) is an evaluation methodology in which a single treatment unit's outcome is studied serially over time and the intervention is expected to "interrupt" the level and/or trend of that outcome. The internal validity is strengthened considerably when the treated unit is contrasted with a comparable control group. In this paper, we introduce a robustness check based on permutation tests to further improve causal inference. We evaluate the effect of California's Proposition 99 for reducing cigarette sales by iteratively casting each nontreated state into the role of "treated," creating a comparable control group using the ITSAMATCH package in Stata, and then evaluating treatment effects using ITSA regression. If statistically significant "treatment effects" are estimated for pseudotreated states, then any significant changes in the outcome of the actual treatment unit (California) cannot be attributed to the intervention. We perform these analyses setting the cutpoint significance level to P > .40 for identifying balanced matches (the highest threshold possible for which controls could still be found for California) and use the difference in differences of trends as the treatment effect estimator. Only California attained a statistically significant treatment effect, strengthening confidence in the conclusion that Proposition 99 reduced cigarette sales. The proposed permutation testing framework provides an additional robustness check to either support or refute a treatment effect identified in for the true treated unit in ITSA. Given its value and ease of implementation, this framework should be considered as a standard robustness test in all multiple group interrupted time series analyses. © 2018 John Wiley & Sons, Ltd.

  19. Restricted spatial regression in practice: Geostatistical models, confounding, and robustness under model misspecification

    USGS Publications Warehouse

    Hanks, Ephraim M.; Schliep, Erin M.; Hooten, Mevin B.; Hoeting, Jennifer A.

    2015-01-01

    In spatial generalized linear mixed models (SGLMMs), covariates that are spatially smooth are often collinear with spatially smooth random effects. This phenomenon is known as spatial confounding and has been studied primarily in the case where the spatial support of the process being studied is discrete (e.g., areal spatial data). In this case, the most common approach suggested is restricted spatial regression (RSR) in which the spatial random effects are constrained to be orthogonal to the fixed effects. We consider spatial confounding and RSR in the geostatistical (continuous spatial support) setting. We show that RSR provides computational benefits relative to the confounded SGLMM, but that Bayesian credible intervals under RSR can be inappropriately narrow under model misspecification. We propose a posterior predictive approach to alleviating this potential problem and discuss the appropriateness of RSR in a variety of situations. We illustrate RSR and SGLMM approaches through simulation studies and an analysis of malaria frequencies in The Gambia, Africa.

  20. Robust Regression through Robust Covariances.

    DTIC Science & Technology

    1985-01-01

    we apply (2.3). But first let us examine the influence function (see Hampel (1974)). In order to simplify the formulas we will first consider the case...remember that the influence function is an asymptotic 0tooL" and that therefore the population Values of our estimators appear in the formula. V(GR) is...the parameter a , V) based on the data Z1 , ... DZ. via tp =~t 0. Now we can apply the standard formulas to get influence function (see Huber (1981

  1. Robust kernel representation with statistical local features for face recognition.

    PubMed

    Yang, Meng; Zhang, Lei; Shiu, Simon Chi-Keung; Zhang, David

    2013-06-01

    Factors such as misalignment, pose variation, and occlusion make robust face recognition a difficult problem. It is known that statistical features such as local binary pattern are effective for local feature extraction, whereas the recently proposed sparse or collaborative representation-based classification has shown interesting results in robust face recognition. In this paper, we propose a novel robust kernel representation model with statistical local features (SLF) for robust face recognition. Initially, multipartition max pooling is used to enhance the invariance of SLF to image registration error. Then, a kernel-based representation model is proposed to fully exploit the discrimination information embedded in the SLF, and robust regression is adopted to effectively handle the occlusion in face images. Extensive experiments are conducted on benchmark face databases, including extended Yale B, AR (A. Martinez and R. Benavente), multiple pose, illumination, and expression (multi-PIE), facial recognition technology (FERET), face recognition grand challenge (FRGC), and labeled faces in the wild (LFW), which have different variations of lighting, expression, pose, and occlusions, demonstrating the promising performance of the proposed method.

  2. Statistical variation in progressive scrambling

    NASA Astrophysics Data System (ADS)

    Clark, Robert D.; Fox, Peter C.

    2004-07-01

    The two methods most often used to evaluate the robustness and predictivity of partial least squares (PLS) models are cross-validation and response randomization. Both methods may be overly optimistic for data sets that contain redundant observations, however. The kinds of perturbation analysis widely used for evaluating model stability in the context of ordinary least squares regression are only applicable when the descriptors are independent of each other and errors are independent and normally distributed; neither assumption holds for QSAR in general and for PLS in particular. Progressive scrambling is a novel, non-parametric approach to perturbing models in the response space in a way that does not disturb the underlying covariance structure of the data. Here, we introduce adjustments for two of the characteristic values produced by a progressive scrambling analysis - the deprecated predictivity (Q_s^{ast^2}) and standard error of prediction (SDEP s * ) - that correct for the effect of introduced perturbation. We also explore the statistical behavior of the adjusted values (Q_0^{ast^2} and SDEP 0 * ) and the sensitivity to perturbation (d q 2/d r yy ' 2). It is shown that the three statistics are all robust for stable PLS models, in terms of the stochastic component of their determination and of their variation due to sampling effects involved in training set selection.

  3. Development and Validation of RP-HPLC Method for the Estimation of Ivabradine Hydrochloride in Tablets

    PubMed Central

    Seerapu, Sunitha; Srinivasan, B. P.

    2010-01-01

    A simple, sensitive, precise and robust reverse–phase high-performance liquid chromatographic method for analysis of ivabradine hydrochloride in pharmaceutical formulations was developed and validated as per ICH guidelines. The separation was performed on SS Wakosil C18AR, 250×4.6 mm, 5 μm column with methanol:25 mM phosphate buffer (60:40 v/v), adjusted to pH 6.5 with orthophosphoric acid, added drop wise, as mobile phase. A well defined chromatographic peak of Ivabradine hydrochloride was exhibited with a retention time of 6.55±0.05 min and tailing factor of 1.14 at the flow rate of 0.8 ml/min and at ambient temperature, when monitored at 285 nm. The linear regression analysis data for calibration plots showed good linear relationship with R=0.9998 in the concentration range of 30-210 μg/ml. The method was validated for precision, recovery and robustness. Intra and Inter-day precision (% relative standard deviation) were always less than 2%. The method showed the mean % recovery of 99.00 and 98.55 % for Ivabrad and Inapure tablets, respectively. The proposed method has been successfully applied to the commercial tablets without any interference of excipients. PMID:21695008

  4. Bayesian Regression with Network Prior: Optimal Bayesian Filtering Perspective

    PubMed Central

    Qian, Xiaoning; Dougherty, Edward R.

    2017-01-01

    The recently introduced intrinsically Bayesian robust filter (IBRF) provides fully optimal filtering relative to a prior distribution over an uncertainty class ofjoint random process models, whereas formerly the theory was limited to model-constrained Bayesian robust filters, for which optimization was limited to the filters that are optimal for models in the uncertainty class. This paper extends the IBRF theory to the situation where there are both a prior on the uncertainty class and sample data. The result is optimal Bayesian filtering (OBF), where optimality is relative to the posterior distribution derived from the prior and the data. The IBRF theories for effective characteristics and canonical expansions extend to the OBF setting. A salient focus of the present work is to demonstrate the advantages of Bayesian regression within the OBF setting over the classical Bayesian approach in the context otlinear Gaussian models. PMID:28824268

  5. Can the provision of a home help service for the elderly population reduce the incidence of fall-related injuries? A quasi-experimental study of the community-level effects on hospital admissions in Swedish municipalities.

    PubMed

    Bonander, Carl; Gustavsson, Johanna; Nilson, Finn

    2016-12-01

    Fall-related injuries are a global public health problem, especially in elderly populations. The effect of an intervention aimed at reducing the risk of falls in the homes of community-dwelling elderly persons was evaluated. The intervention mainly involves the performance of complicated tasks and hazards assessment by a trained assessor, and has been adopted gradually over the last decade by 191 of 290 Swedish municipalities. A quasi-experimental design was used where intention-to-treat effect estimates were derived using panel regression analysis and a regression discontinuity (RD) design. The outcome measure was the incidence of fall-related hospitalisations in the treatment population, the age of which varied by municipality (≥65 years, ≥67 years, ≥70 years or ≥75 years). We found no statistically significant reductions in injury incidence in the panel regression (IRR 1.01 (95% CI 0.98 to 1.05)) or RD (IRR 1.00 (95% CI 0.97 to 1.03)) analyses. The results are robust to several different model specifications, including segmented panel regression analysis with linear trend change and community fixed effects parameters. It is unclear whether the absence of an effect is due to a low efficacy of the services provided, or a result of low adherence. Additional studies of the effects on other quality-of-life measures are recommended before conclusions are drawn regarding the cost-effectiveness of the provision of home help service programmes. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/.

  6. A general procedure to generate models for urban environmental-noise pollution using feature selection and machine learning methods.

    PubMed

    Torija, Antonio J; Ruiz, Diego P

    2015-02-01

    The prediction of environmental noise in urban environments requires the solution of a complex and non-linear problem, since there are complex relationships among the multitude of variables involved in the characterization and modelling of environmental noise and environmental-noise magnitudes. Moreover, the inclusion of the great spatial heterogeneity characteristic of urban environments seems to be essential in order to achieve an accurate environmental-noise prediction in cities. This problem is addressed in this paper, where a procedure based on feature-selection techniques and machine-learning regression methods is proposed and applied to this environmental problem. Three machine-learning regression methods, which are considered very robust in solving non-linear problems, are used to estimate the energy-equivalent sound-pressure level descriptor (LAeq). These three methods are: (i) multilayer perceptron (MLP), (ii) sequential minimal optimisation (SMO), and (iii) Gaussian processes for regression (GPR). In addition, because of the high number of input variables involved in environmental-noise modelling and estimation in urban environments, which make LAeq prediction models quite complex and costly in terms of time and resources for application to real situations, three different techniques are used to approach feature selection or data reduction. The feature-selection techniques used are: (i) correlation-based feature-subset selection (CFS), (ii) wrapper for feature-subset selection (WFS), and the data reduction technique is principal-component analysis (PCA). The subsequent analysis leads to a proposal of different schemes, depending on the needs regarding data collection and accuracy. The use of WFS as the feature-selection technique with the implementation of SMO or GPR as regression algorithm provides the best LAeq estimation (R(2)=0.94 and mean absolute error (MAE)=1.14-1.16 dB(A)). Copyright © 2014 Elsevier B.V. All rights reserved.

  7. Quantification and regionalization of groundwater recharge in South-Central Kansas: Integrating field characterization, statistical analysis, and GIS

    USGS Publications Warehouse

    Sophocleous, M.

    2000-01-01

    A practical methodology for recharge characterization was developed based on several years of field-oriented research at 10 sites in the Great Bend Prairie of south-central Kansas. This methodology combines the soil-water budget on a storm-by-storm year-round basis with the resulting watertable rises. The estimated 1985-1992 average annual recharge was less than 50mm/year with a range from 15 mm/year (during the 1998 drought) to 178 mm/year (during the 1993 flood year). Most of this recharge occurs during the spring months. To regionalize these site-specific estimates, an additional methodology based on multiple (forward) regression analysis combined with classification and GIS overlay analyses was developed and implemented. The multiple regression analysis showed that the most influential variables were, in order of decreasing importance, total annual precipitation, average maximum springtime soil-profile water storage, average shallowest springtime depth to watertable, and average springtime precipitation rate. Therefore, four GIS (ARC/INFO) data "layers" or coverages were constructed for the study region based on these four variables, and each such coverage was classified into the same number of data classes to avoid biasing the results. The normalized regression coefficients were employed to weigh the class rankings of each recharge-affecting variable. This approach resulted in recharge zonations that agreed well with the site recharge estimates. During the "Great Flood of 1993," when rainfall totals exceeded normal levels by -200% in the northern portion of the study region, the developed regionalization methodology was tested against such extreme conditions, and proved to be both practical, based on readily available or easily measurable data, and robust. It was concluded that the combination of multiple regression and GIS overlay analyses is a powerful and practical approach to regionalizing small samples of recharge estimates.

  8. The Use of Alternative Regression Methods in Social Sciences and the Comparison of Least Squares and M Estimation Methods in Terms of the Determination of Coefficient

    ERIC Educational Resources Information Center

    Coskuntuncel, Orkun

    2013-01-01

    The purpose of this study is two-fold; the first aim being to show the effect of outliers on the widely used least squares regression estimator in social sciences. The second aim is to compare the classical method of least squares with the robust M-estimator using the "determination of coefficient" (R[superscript 2]). For this purpose,…

  9. An iteratively reweighted least-squares approach to adaptive robust adjustment of parameters in linear regression models with autoregressive and t-distributed deviations

    NASA Astrophysics Data System (ADS)

    Kargoll, Boris; Omidalizarandi, Mohammad; Loth, Ina; Paffenholz, Jens-André; Alkhatib, Hamza

    2018-03-01

    In this paper, we investigate a linear regression time series model of possibly outlier-afflicted observations and autocorrelated random deviations. This colored noise is represented by a covariance-stationary autoregressive (AR) process, in which the independent error components follow a scaled (Student's) t-distribution. This error model allows for the stochastic modeling of multiple outliers and for an adaptive robust maximum likelihood (ML) estimation of the unknown regression and AR coefficients, the scale parameter, and the degree of freedom of the t-distribution. This approach is meant to be an extension of known estimators, which tend to focus only on the regression model, or on the AR error model, or on normally distributed errors. For the purpose of ML estimation, we derive an expectation conditional maximization either algorithm, which leads to an easy-to-implement version of iteratively reweighted least squares. The estimation performance of the algorithm is evaluated via Monte Carlo simulations for a Fourier as well as a spline model in connection with AR colored noise models of different orders and with three different sampling distributions generating the white noise components. We apply the algorithm to a vibration dataset recorded by a high-accuracy, single-axis accelerometer, focusing on the evaluation of the estimated AR colored noise model.

  10. An alternative empirical likelihood method in missing response problems and causal inference.

    PubMed

    Ren, Kaili; Drummond, Christopher A; Brewster, Pamela S; Haller, Steven T; Tian, Jiang; Cooper, Christopher J; Zhang, Biao

    2016-11-30

    Missing responses are common problems in medical, social, and economic studies. When responses are missing at random, a complete case data analysis may result in biases. A popular debias method is inverse probability weighting proposed by Horvitz and Thompson. To improve efficiency, Robins et al. proposed an augmented inverse probability weighting method. The augmented inverse probability weighting estimator has a double-robustness property and achieves the semiparametric efficiency lower bound when the regression model and propensity score model are both correctly specified. In this paper, we introduce an empirical likelihood-based estimator as an alternative to Qin and Zhang (2007). Our proposed estimator is also doubly robust and locally efficient. Simulation results show that the proposed estimator has better performance when the propensity score is correctly modeled. Moreover, the proposed method can be applied in the estimation of average treatment effect in observational causal inferences. Finally, we apply our method to an observational study of smoking, using data from the Cardiovascular Outcomes in Renal Atherosclerotic Lesions clinical trial. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  11. Calibration sets selection strategy for the construction of robust PLS models for prediction of biodiesel/diesel blends physico-chemical properties using NIR spectroscopy

    NASA Astrophysics Data System (ADS)

    Palou, Anna; Miró, Aira; Blanco, Marcelo; Larraz, Rafael; Gómez, José Francisco; Martínez, Teresa; González, Josep Maria; Alcalà, Manel

    2017-06-01

    Even when the feasibility of using near infrared (NIR) spectroscopy combined with partial least squares (PLS) regression for prediction of physico-chemical properties of biodiesel/diesel blends has been widely demonstrated, inclusion in the calibration sets of the whole variability of diesel samples from diverse production origins still remains as an important challenge when constructing the models. This work presents a useful strategy for the systematic selection of calibration sets of samples of biodiesel/diesel blends from diverse origins, based on a binary code, principal components analysis (PCA) and the Kennard-Stones algorithm. Results show that using this methodology the models can keep their robustness over time. PLS calculations have been done using a specialized chemometric software as well as the software of the NIR instrument installed in plant, and both produced RMSEP under reproducibility values of the reference methods. The models have been proved for on-line simultaneous determination of seven properties: density, cetane index, fatty acid methyl esters (FAME) content, cloud point, boiling point at 95% of recovery, flash point and sulphur.

  12. Statistical methods for change-point detection in surface temperature records

    NASA Astrophysics Data System (ADS)

    Pintar, A. L.; Possolo, A.; Zhang, N. F.

    2013-09-01

    We describe several statistical methods to detect possible change-points in a time series of values of surface temperature measured at a meteorological station, and to assess the statistical significance of such changes, taking into account the natural variability of the measured values, and the autocorrelations between them. These methods serve to determine whether the record may suffer from biases unrelated to the climate signal, hence whether there may be a need for adjustments as considered by M. J. Menne and C. N. Williams (2009) "Homogenization of Temperature Series via Pairwise Comparisons", Journal of Climate 22 (7), 1700-1717. We also review methods to characterize patterns of seasonality (seasonal decomposition using monthly medians or robust local regression), and explain the role they play in the imputation of missing values, and in enabling robust decompositions of the measured values into a seasonal component, a possible climate signal, and a station-specific remainder. The methods for change-point detection that we describe include statistical process control, wavelet multi-resolution analysis, adaptive weights smoothing, and a Bayesian procedure, all of which are applicable to single station records.

  13. Benign-malignant mass classification in mammogram using edge weighted local texture features

    NASA Astrophysics Data System (ADS)

    Rabidas, Rinku; Midya, Abhishek; Sadhu, Anup; Chakraborty, Jayasree

    2016-03-01

    This paper introduces novel Discriminative Robust Local Binary Pattern (DRLBP) and Discriminative Robust Local Ternary Pattern (DRLTP) for the classification of mammographic masses as benign or malignant. Mass is one of the common, however, challenging evidence of breast cancer in mammography and diagnosis of masses is a difficult task. Since DRLBP and DRLTP overcome the drawbacks of Local Binary Pattern (LBP) and Local Ternary Pattern (LTP) by discriminating a brighter object against the dark background and vice-versa, in addition to the preservation of the edge information along with the texture information, several edge-preserving texture features are extracted, in this study, from DRLBP and DRLTP. Finally, a Fisher Linear Discriminant Analysis method is incorporated with discriminating features, selected by stepwise logistic regression method, for the classification of benign and malignant masses. The performance characteristics of DRLBP and DRLTP features are evaluated using a ten-fold cross-validation technique with 58 masses from the mini-MIAS database, and the best result is observed with DRLBP having an area under the receiver operating characteristic curve of 0.982.

  14. Metabolic profiling of body fluids and multivariate data analysis.

    PubMed

    Trezzi, Jean-Pierre; Jäger, Christian; Galozzi, Sara; Barkovits, Katalin; Marcus, Katrin; Mollenhauer, Brit; Hiller, Karsten

    2017-01-01

    Metabolome analyses of body fluids are challenging due pre-analytical variations, such as pre-processing delay and temperature, and constant dynamical changes of biochemical processes within the samples. Therefore, proper sample handling starting from the time of collection up to the analysis is crucial to obtain high quality samples and reproducible results. A metabolomics analysis is divided into 4 main steps: 1) Sample collection, 2) Metabolite extraction, 3) Data acquisition and 4) Data analysis. Here, we describe a protocol for gas chromatography coupled to mass spectrometry (GC-MS) based metabolic analysis for biological matrices, especially body fluids. This protocol can be applied on blood serum/plasma, saliva and cerebrospinal fluid (CSF) samples of humans and other vertebrates. It covers sample collection, sample pre-processing, metabolite extraction, GC-MS measurement and guidelines for the subsequent data analysis. Advantages of this protocol include: •Robust and reproducible metabolomics results, taking into account pre-analytical variations that may occur during the sampling process•Small sample volume required•Rapid and cost-effective processing of biological samples•Logistic regression based determination of biomarker signatures for in-depth data analysis.

  15. ℓ p-Norm Multikernel Learning Approach for Stock Market Price Forecasting

    PubMed Central

    Shao, Xigao; Wu, Kun; Liao, Bifeng

    2012-01-01

    Linear multiple kernel learning model has been used for predicting financial time series. However, ℓ 1-norm multiple support vector regression is rarely observed to outperform trivial baselines in practical applications. To allow for robust kernel mixtures that generalize well, we adopt ℓ p-norm multiple kernel support vector regression (1 ≤ p < ∞) as a stock price prediction model. The optimization problem is decomposed into smaller subproblems, and the interleaved optimization strategy is employed to solve the regression model. The model is evaluated on forecasting the daily stock closing prices of Shanghai Stock Index in China. Experimental results show that our proposed model performs better than ℓ 1-norm multiple support vector regression model. PMID:23365561

  16. Performance of the modified Poisson regression approach for estimating relative risks from clustered prospective data.

    PubMed

    Yelland, Lisa N; Salter, Amy B; Ryan, Philip

    2011-10-15

    Modified Poisson regression, which combines a log Poisson regression model with robust variance estimation, is a useful alternative to log binomial regression for estimating relative risks. Previous studies have shown both analytically and by simulation that modified Poisson regression is appropriate for independent prospective data. This method is often applied to clustered prospective data, despite a lack of evidence to support its use in this setting. The purpose of this article is to evaluate the performance of the modified Poisson regression approach for estimating relative risks from clustered prospective data, by using generalized estimating equations to account for clustering. A simulation study is conducted to compare log binomial regression and modified Poisson regression for analyzing clustered data from intervention and observational studies. Both methods generally perform well in terms of bias, type I error, and coverage. Unlike log binomial regression, modified Poisson regression is not prone to convergence problems. The methods are contrasted by using example data sets from 2 large studies. The results presented in this article support the use of modified Poisson regression as an alternative to log binomial regression for analyzing clustered prospective data when clustering is taken into account by using generalized estimating equations.

  17. Evaluation of laser cutting process with auxiliary gas pressure by soft computing approach

    NASA Astrophysics Data System (ADS)

    Lazov, Lyubomir; Nikolić, Vlastimir; Jovic, Srdjan; Milovančević, Miloš; Deneva, Heristina; Teirumenieka, Erika; Arsic, Nebojsa

    2018-06-01

    Evaluation of the optimal laser cutting parameters is very important for the high cut quality. This is highly nonlinear process with different parameters which is the main challenge in the optimization process. Data mining methodology is one of most versatile method which can be used laser cutting process optimization. Support vector regression (SVR) procedure is implemented since it is a versatile and robust technique for very nonlinear data regression. The goal in this study was to determine the optimal laser cutting parameters to ensure robust condition for minimization of average surface roughness. Three cutting parameters, the cutting speed, the laser power, and the assist gas pressure, were used in the investigation. As a laser type TruLaser 1030 technological system was used. Nitrogen as an assisted gas was used in the laser cutting process. As the data mining method, support vector regression procedure was used. Data mining prediction accuracy was very high according the coefficient (R2) of determination and root mean square error (RMSE): R2 = 0.9975 and RMSE = 0.0337. Therefore the data mining approach could be used effectively for determination of the optimal conditions of the laser cutting process.

  18. Visual tracking using objectness-bounding box regression and correlation filters

    NASA Astrophysics Data System (ADS)

    Mbelwa, Jimmy T.; Zhao, Qingjie; Lu, Yao; Wang, Fasheng; Mbise, Mercy

    2018-03-01

    Visual tracking is a fundamental problem in computer vision with extensive application domains in surveillance and intelligent systems. Recently, correlation filter-based tracking methods have shown a great achievement in terms of robustness, accuracy, and speed. However, such methods have a problem of dealing with fast motion (FM), motion blur (MB), illumination variation (IV), and drifting caused by occlusion (OCC). To solve this problem, a tracking method that integrates objectness-bounding box regression (O-BBR) model and a scheme based on kernelized correlation filter (KCF) is proposed. The scheme based on KCF is used to improve the tracking performance of FM and MB. For handling drift problem caused by OCC and IV, we propose objectness proposals trained in bounding box regression as prior knowledge to provide candidates and background suppression. Finally, scheme KCF as a base tracker and O-BBR are fused to obtain a state of a target object. Extensive experimental comparisons of the developed tracking method with other state-of-the-art trackers are performed on some of the challenging video sequences. Experimental comparison results show that our proposed tracking method outperforms other state-of-the-art tracking methods in terms of effectiveness, accuracy, and robustness.

  19. A comparative study of multivariable robustness analysis methods as applied to integrated flight and propulsion control

    NASA Technical Reports Server (NTRS)

    Schierman, John D.; Lovell, T. A.; Schmidt, David K.

    1993-01-01

    Three multivariable robustness analysis methods are compared and contrasted. The focus of the analysis is on system stability and performance robustness to uncertainty in the coupling dynamics between two interacting subsystems. Of particular interest is interacting airframe and engine subsystems, and an example airframe/engine vehicle configuration is utilized in the demonstration of these approaches. The singular value (SV) and structured singular value (SSV) analysis methods are compared to a method especially well suited for analysis of robustness to uncertainties in subsystem interactions. This approach is referred to here as the interacting subsystem (IS) analysis method. This method has been used previously to analyze airframe/engine systems, emphasizing the study of stability robustness. However, performance robustness is also investigated here, and a new measure of allowable uncertainty for acceptable performance robustness is introduced. The IS methodology does not require plant uncertainty models to measure the robustness of the system, and is shown to yield valuable information regarding the effects of subsystem interactions. In contrast, the SV and SSV methods allow for the evaluation of the robustness of the system to particular models of uncertainty, and do not directly indicate how the airframe (engine) subsystem interacts with the engine (airframe) subsystem.

  20. Individual memory change after anterior temporal lobectomy: a base rate analysis using regression-based outcome methodology.

    PubMed

    Martin, R C; Sawrie, S M; Roth, D L; Gilliam, F G; Faught, E; Morawetz, R B; Kuzniecky, R

    1998-10-01

    To characterize patterns of base rate change on measures of verbal and visual memory after anterior temporal lobectomy (ATL) using a newly developed regression-based outcome methodology that accounts for effects of practice and regression towards the mean, and to comment on the predictive utility of baseline memory measures on postoperative memory outcome. Memory change was operationalized using regression-based change norms in a group of left (n = 53) and right (n = 48) ATL patients. All patients were administered tests of episodic verbal (prose recall, list learning) and visual (figure reproduction) memory, and semantic memory before and after ATL. ATL patients displayed a wide range of memory outcome across verbal and visual memory domains. Significant performance declines were noted for 25-50% of left ATL patients on verbal semantic and episodic memory tasks, while one-third of right ATL patients displayed significant declines in immediate and delayed episodic prose recall. Significant performance improvement was noted in an additional one-third of right ATL patients on delayed prose recall. Base rate change was similar between the two ATL groups across immediate and delayed visual memory. Approximately one-fourth of all patients displayed clinically meaningful losses on the visual memory task following surgery. Robust relationships between preoperative memory measures and nonstandardized change scores were attenuated or reversed using standardized memory outcome techniques. Our results demonstrated substantial group variability in memory outcome for ATL patients. These results extend previous research by incorporating known effects of practice and regression to the mean when addressing meaningful neuropsychological change following epilepsy surgery. Our findings also suggest that future neuropsychological outcome studies should take steps towards controlling for regression-to-the-mean before drawing predictive conclusions.

  1. Semisupervised Clustering by Iterative Partition and Regression with Neuroscience Applications

    PubMed Central

    Qian, Guoqi; Wu, Yuehua; Ferrari, Davide; Qiao, Puxue; Hollande, Frédéric

    2016-01-01

    Regression clustering is a mixture of unsupervised and supervised statistical learning and data mining method which is found in a wide range of applications including artificial intelligence and neuroscience. It performs unsupervised learning when it clusters the data according to their respective unobserved regression hyperplanes. The method also performs supervised learning when it fits regression hyperplanes to the corresponding data clusters. Applying regression clustering in practice requires means of determining the underlying number of clusters in the data, finding the cluster label of each data point, and estimating the regression coefficients of the model. In this paper, we review the estimation and selection issues in regression clustering with regard to the least squares and robust statistical methods. We also provide a model selection based technique to determine the number of regression clusters underlying the data. We further develop a computing procedure for regression clustering estimation and selection. Finally, simulation studies are presented for assessing the procedure, together with analyzing a real data set on RGB cell marking in neuroscience to illustrate and interpret the method. PMID:27212939

  2. Geographically weighted regression and multicollinearity: dispelling the myth

    NASA Astrophysics Data System (ADS)

    Fotheringham, A. Stewart; Oshan, Taylor M.

    2016-10-01

    Geographically weighted regression (GWR) extends the familiar regression framework by estimating a set of parameters for any number of locations within a study area, rather than producing a single parameter estimate for each relationship specified in the model. Recent literature has suggested that GWR is highly susceptible to the effects of multicollinearity between explanatory variables and has proposed a series of local measures of multicollinearity as an indicator of potential problems. In this paper, we employ a controlled simulation to demonstrate that GWR is in fact very robust to the effects of multicollinearity. Consequently, the contention that GWR is highly susceptible to multicollinearity issues needs rethinking.

  3. Melamine detection by mid- and near-infrared (MIR/NIR) spectroscopy: a quick and sensitive method for dairy products analysis including liquid milk, infant formula, and milk powder.

    PubMed

    Balabin, Roman M; Smirnov, Sergey V

    2011-07-15

    Melamine (2,4,6-triamino-1,3,5-triazine) is a nitrogen-rich chemical implicated in the pet and human food recalls and in the global food safety scares involving milk products. Due to the serious health concerns associated with melamine consumption and the extensive scope of affected products, rapid and sensitive methods to detect melamine's presence are essential. We propose the use of spectroscopy data-produced by near-infrared (near-IR/NIR) and mid-infrared (mid-IR/MIR) spectroscopies, in particular-for melamine detection in complex dairy matrixes. None of the up-to-date reported IR-based methods for melamine detection has unambiguously shown its wide applicability to different dairy products as well as limit of detection (LOD) below 1 ppm on independent sample set. It was found that infrared spectroscopy is an effective tool to detect melamine in dairy products, such as infant formula, milk powder, or liquid milk. ALOD below 1 ppm (0.76±0.11 ppm) can be reached if a correct spectrum preprocessing (pretreatment) technique and a correct multivariate (MDA) algorithm-partial least squares regression (PLS), polynomial PLS (Poly-PLS), artificial neural network (ANN), support vector regression (SVR), or least squares support vector machine (LS-SVM)-are used for spectrum analysis. The relationship between MIR/NIR spectrum of milk products and melamine content is nonlinear. Thus, nonlinear regression methods are needed to correctly predict the triazine-derivative content of milk products. It can be concluded that mid- and near-infrared spectroscopy can be regarded as a quick, sensitive, robust, and low-cost method for liquid milk, infant formula, and milk powder analysis. Copyright © 2011 Elsevier B.V. All rights reserved.

  4. A non-linear regression method for CT brain perfusion analysis

    NASA Astrophysics Data System (ADS)

    Bennink, E.; Oosterbroek, J.; Viergever, M. A.; Velthuis, B. K.; de Jong, H. W. A. M.

    2015-03-01

    CT perfusion (CTP) imaging allows for rapid diagnosis of ischemic stroke. Generation of perfusion maps from CTP data usually involves deconvolution algorithms providing estimates for the impulse response function in the tissue. We propose the use of a fast non-linear regression (NLR) method that we postulate has similar performance to the current academic state-of-art method (bSVD), but that has some important advantages, including the estimation of vascular permeability, improved robustness to tracer-delay, and very few tuning parameters, that are all important in stroke assessment. The aim of this study is to evaluate the fast NLR method against bSVD and a commercial clinical state-of-art method. The three methods were tested against a published digital perfusion phantom earlier used to illustrate the superiority of bSVD. In addition, the NLR and clinical methods were also tested against bSVD on 20 clinical scans. Pearson correlation coefficients were calculated for each of the tested methods. All three methods showed high correlation coefficients (>0.9) with the ground truth in the phantom. With respect to the clinical scans, the NLR perfusion maps showed higher correlation with bSVD than the perfusion maps from the clinical method. Furthermore, the perfusion maps showed that the fast NLR estimates are robust to tracer-delay. In conclusion, the proposed fast NLR method provides a simple and flexible way of estimating perfusion parameters from CT perfusion scans, with high correlation coefficients. This suggests that it could be a better alternative to the current clinical and academic state-of-art methods.

  5. QSRR modeling for the chromatographic retention behavior of some β-lactam antibiotics using forward and firefly variable selection algorithms coupled with multiple linear regression.

    PubMed

    Fouad, Marwa A; Tolba, Enas H; El-Shal, Manal A; El Kerdawy, Ahmed M

    2018-05-11

    The justified continuous emerging of new β-lactam antibiotics provokes the need for developing suitable analytical methods that accelerate and facilitate their analysis. A face central composite experimental design was adopted using different levels of phosphate buffer pH, acetonitrile percentage at zero time and after 15 min in a gradient program to obtain the optimum chromatographic conditions for the elution of 31 β-lactam antibiotics. Retention factors were used as the target property to build two QSRR models utilizing the conventional forward selection and the advanced nature-inspired firefly algorithm for descriptor selection, coupled with multiple linear regression. The obtained models showed high performance in both internal and external validation indicating their robustness and predictive ability. Williams-Hotelling test and student's t-test showed that there is no statistical significant difference between the models' results. Y-randomization validation showed that the obtained models are due to significant correlation between the selected molecular descriptors and the analytes' chromatographic retention. These results indicate that the generated FS-MLR and FFA-MLR models are showing comparable quality on both the training and validation levels. They also gave comparable information about the molecular features that influence the retention behavior of β-lactams under the current chromatographic conditions. We can conclude that in some cases simple conventional feature selection algorithm can be used to generate robust and predictive models comparable to that are generated using advanced ones. Copyright © 2018 Elsevier B.V. All rights reserved.

  6. Item-level psychometrics and predictors of performance for Spanish/English bilingual speakers on an object and action naming battery.

    PubMed

    Edmonds, Lisa A; Donovan, Neila J

    2012-04-01

    There is a pressing need for psychometrically sound naming materials for Spanish/English bilingual adults. To address this need, in this study the authors examined the psychometric properties of An Object and Action Naming Battery (An O&A Battery; Druks & Masterson, 2000) in bilingual speakers. Ninety-one Spanish/English bilinguals named O&A Battery items in English and Spanish. Responses underwent a Rasch analysis. Using correlation and regression analyses, the authors evaluated the effect of psycholinguistic (e.g., imageability) and participant (e.g., proficiency ratings) variables on accuracy. Rasch analysis determined unidimensionality across English and Spanish nouns and verbs and robust item-level psychometric properties, evidence for content validity. Few items did not fit the model, there were no ceiling or floor effects after uninformative and misfit items were removed, and items reflected a range of difficulty. Reliability coefficients were high, and the number of statistically different ability levels provided indices of sensitivity. Regression analyses revealed significant correlations between psycholinguistic variables and accuracy, providing preliminary construct validity. The participant variables that contributed most to accuracy were proficiency ratings and time of language use. Results suggest adequate content and construct validity of O&A items retained in the analysis for Spanish/English bilingual adults and support future efforts to evaluate naming in older bilinguals and persons with bilingual aphasia.

  7. Risk prediction for myocardial infarction via generalized functional regression models.

    PubMed

    Ieva, Francesca; Paganoni, Anna M

    2016-08-01

    In this paper, we propose a generalized functional linear regression model for a binary outcome indicating the presence/absence of a cardiac disease with multivariate functional data among the relevant predictors. In particular, the motivating aim is the analysis of electrocardiographic traces of patients whose pre-hospital electrocardiogram (ECG) has been sent to 118 Dispatch Center of Milan (the Italian free-toll number for emergencies) by life support personnel of the basic rescue units. The statistical analysis starts with a preprocessing of ECGs treated as multivariate functional data. The signals are reconstructed from noisy observations. The biological variability is then removed by a nonlinear registration procedure based on landmarks. Thus, in order to perform a data-driven dimensional reduction, a multivariate functional principal component analysis is carried out on the variance-covariance matrix of the reconstructed and registered ECGs and their first derivatives. We use the scores of the Principal Components decomposition as covariates in a generalized linear model to predict the presence of the disease in a new patient. Hence, a new semi-automatic diagnostic procedure is proposed to estimate the risk of infarction (in the case of interest, the probability of being affected by Left Bundle Brunch Block). The performance of this classification method is evaluated and compared with other methods proposed in literature. Finally, the robustness of the procedure is checked via leave-j-out techniques. © The Author(s) 2013.

  8. Application of Fourier transform near-infrared spectroscopy to optimization of green tea steaming process conditions.

    PubMed

    Ono, Daiki; Bamba, Takeshi; Oku, Yuichi; Yonetani, Tsutomu; Fukusaki, Eiichiro

    2011-09-01

    In this study, we constructed prediction models by metabolic fingerprinting of fresh green tea leaves using Fourier transform near-infrared (FT-NIR) spectroscopy and partial least squares (PLS) regression analysis to objectively optimize of the steaming process conditions in green tea manufacture. The steaming process is the most important step for manufacturing high quality green tea products. However, the parameter setting of the steamer is currently determined subjectively by the manufacturer. Therefore, a simple and robust system that can be used to objectively set the steaming process parameters is necessary. We focused on FT-NIR spectroscopy because of its simple operation, quick measurement, and low running costs. After removal of noise in the spectral data by principal component analysis (PCA), PLS regression analysis was performed using spectral information as independent variables, and the steaming parameters set by experienced manufacturers as dependent variables. The prediction models were successfully constructed with satisfactory accuracy. Moreover, the results of the demonstrated experiment suggested that the green tea steaming process parameters could be predicted on a larger manufacturing scale. This technique will contribute to improvement of the quality and productivity of green tea because it can objectively optimize the complicated green tea steaming process and will be suitable for practical use in green tea manufacture. Copyright © 2011 The Society for Biotechnology, Japan. Published by Elsevier B.V. All rights reserved.

  9. The income elasticity of Willingness-To-Pay (WTP) revisited: A meta-analysis of studies for restoring Good Ecological Status (GES) of water bodies under the Water Framework Directive (WFD).

    PubMed

    Tyllianakis, Emmanouil; Skuras, Dimitris

    2016-11-01

    The income elasticity of Willingness-To-Pay (WTP) is ambiguous and results from meta-analyses are disparate. This may be because the environmental good or service to be valued is very broadly defined or because the income measured in individual studies suffers from extensive non-reporting or miss reporting. The present study carries out a meta-analysis of WTP to restore Good Ecological Status (GES) under the Water Framework Directive (WFD). This environmental service is narrowly defined and its aims and objectives are commonly understood among the members of the scientific community. Besides income reported by the individual studies, wealth and income indicators collected by Eurostat for the geographic entities covered by the individual studies are used. Meta-regression analyses show that income is statistically significant, explains a substantial proportion of WTP variability and its elasticity is considerable in magnitude ranging from 0.6 to almost 1.7. Results are robust to variations in the sample of the individual studies participating in the meta-analysis, the econometric approach and the function form of the meta-regression. The choice of wealth or income measure is not that important as it is whether this measure is Purchasing Power Parity (PPP) adjusted among the individual studies. Copyright © 2016 Elsevier Ltd. All rights reserved.

  10. Stability indicating high performance thin-layer chromatographic method for simultaneous estimation of pantoprazole sodium and itopride hydrochloride in combined dosage form

    PubMed Central

    Bageshwar, Deepak; Khanvilkar, Vineeta; Kadam, Vilasrao

    2011-01-01

    A specific, precise and stability indicating high-performance thin-layer chromatographic method for simultaneous estimation of pantoprazole sodium and itopride hydrochloride in pharmaceutical formulations was developed and validated. The method employed TLC aluminium plates precoated with silica gel 60F254 as the stationary phase. The solvent system consisted of methanol:water:ammonium acetate; 4.0:1.0:0.5 (v/v/v). This system was found to give compact and dense spots for both itopride hydrochloride (Rf value of 0.55±0.02) and pantoprazole sodium (Rf value of 0.85±0.04). Densitometric analysis of both drugs was carried out in the reflectance–absorbance mode at 289 nm. The linear regression analysis data for the calibration plots showed a good linear relationship with R2=0.9988±0.0012 in the concentration range of 100–400 ng for pantoprazole sodium. Also, the linear regression analysis data for the calibration plots showed a good linear relationship with R2=0.9990±0.0008 in the concentration range of 200–1200 ng for itopride hydrochloride. The method was validated for specificity, precision, robustness and recovery. Statistical analysis proves that the method is repeatable and selective for the estimation of both the said drugs. As the method could effectively separate the drug from its degradation products, it can be employed as a stability indicating method. PMID:29403710

  11. Stability indicating high performance thin-layer chromatographic method for simultaneous estimation of pantoprazole sodium and itopride hydrochloride in combined dosage form.

    PubMed

    Bageshwar, Deepak; Khanvilkar, Vineeta; Kadam, Vilasrao

    2011-11-01

    A specific, precise and stability indicating high-performance thin-layer chromatographic method for simultaneous estimation of pantoprazole sodium and itopride hydrochloride in pharmaceutical formulations was developed and validated. The method employed TLC aluminium plates precoated with silica gel 60F 254 as the stationary phase. The solvent system consisted of methanol:water:ammonium acetate; 4.0:1.0:0.5 (v/v/v). This system was found to give compact and dense spots for both itopride hydrochloride ( R f value of 0.55±0.02) and pantoprazole sodium ( R f value of 0.85±0.04). Densitometric analysis of both drugs was carried out in the reflectance-absorbance mode at 289 nm. The linear regression analysis data for the calibration plots showed a good linear relationship with R 2 =0.9988±0.0012 in the concentration range of 100-400 ng for pantoprazole sodium. Also, the linear regression analysis data for the calibration plots showed a good linear relationship with R 2 =0.9990±0.0008 in the concentration range of 200-1200 ng for itopride hydrochloride. The method was validated for specificity, precision, robustness and recovery. Statistical analysis proves that the method is repeatable and selective for the estimation of both the said drugs. As the method could effectively separate the drug from its degradation products, it can be employed as a stability indicating method.

  12. Confounding adjustment in comparative effectiveness research conducted within distributed research networks.

    PubMed

    Toh, Sengwee; Gagne, Joshua J; Rassen, Jeremy A; Fireman, Bruce H; Kulldorff, Martin; Brown, Jeffrey S

    2013-08-01

    A distributed research network (DRN) of electronic health care databases, in which data reside behind the firewall of each data partner, can support a wide range of comparative effectiveness research (CER) activities. An essential component of a fully functional DRN is the capability to perform robust statistical analyses to produce valid, actionable evidence without compromising patient privacy, data security, or proprietary interests. We describe the strengths and limitations of different confounding adjustment approaches that can be considered in observational CER studies conducted within DRNs, and the theoretical and practical issues to consider when selecting among them in various study settings. Several methods can be used to adjust for multiple confounders simultaneously, either as individual covariates or as confounder summary scores (eg, propensity scores and disease risk scores), including: (1) centralized analysis of patient-level data, (2) case-centered logistic regression of risk set data, (3) stratified or matched analysis of aggregated data, (4) distributed regression analysis, and (5) meta-analysis of site-specific effect estimates. These methods require different granularities of information be shared across sites and afford investigators different levels of analytic flexibility. DRNs are growing in use and sharing of highly detailed patient-level information is not always feasible in DRNs. Methods that incorporate confounder summary scores allow investigators to adjust for a large number of confounding factors without the need to transfer potentially identifiable information in DRNs. They have the potential to let investigators perform many analyses traditionally conducted through a centralized dataset with detailed patient-level information.

  13. Low-level lead exposure and the IQ of children. A meta-analysis of modern studies

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Needleman, H.L.; Gatsonis, C.A.

    1990-02-02

    We identified 24 modern studies of childhood exposures to lead in relation to IQ. From this population, 12 that employed multiple regression analysis with IQ as the dependent variable and lead as the main effect and that controlled for nonlead covariates were selected for a quantitative, integrated review or meta-analysis. The studies were grouped according to type of tissue analyzed for lead. There were 7 blood and 5 tooth lead studies. Within each group, we obtained joint P values by two different methods and average effect sizes as measured by the partial correlation coefficients. We also investigated the sensitivity ofmore » the results to any single study. The sample sizes ranged from 75 to 724. The sign of the regression coefficient for lead was negative in 11 of 12 studies. The negative partial r's for lead ranged from -.27 to -.003. The power to find an effect was limited, below 0.6 in 7 of 12 studies. The joint P values for the blood lead studies were less than .0001 for both methods of analysis (95% confidence interval for group partial r, -.15 {plus minus} .05), while for the tooth lead studies they were .0005 and .004, respectively (95% confidence interval for group partial r, -.08 {plus minus} .05). The hypothesis that lead impairs children's IQ at low dose is strongly supported by this quantitative review. The effect is robust to the impact of any single study.« less

  14. Hadley circulation extent and strength in a wide range of simulated climates

    NASA Astrophysics Data System (ADS)

    D'Agostino, Roberta; Adam, Ori; Lionello, Piero; Schneider, Tapio

    2017-04-01

    Understanding the Hadley circulation (HC) dynamics is crucial because its changes affect the seasonal migration of the ITCZ, the extent of subtropical arid regions and the strength of the monsoons. Despite decades of study, the factors controlling its strength and extent have remained unclear. Here we analyse how HC strength and extent change over a wide range of climate conditions from the Last Glacial Maximum to future projections. The large climate change between paleoclimate simulations and future scenarios offers the chance to analyse robust HC changes and their link to large-scale factors. The HC shrinks and strengthens in the coldest simulation relative to the warmest. A progressive poleward shift of its edges is evident as the climate warms (at a rate of 0.35°lat./K in each hemisphere). The HC extent and strength both depend on the isentropic slope, which in turn is related to the meridional temperature gradient, subtropical static stability and tropopause height. In multiple robust regression analysis using these as predictors, we find that the tropical tropopause height does not add relevant information to the model beyond surface temperature. Therefore, primarily the static stability and secondarily the meridional temperature contrast together account for the bulk of the almost the total HC variance. However, the regressions leave some of the northern HC edge and southern HC strength variance unexplained. The effectiveness of this analysis is limited by the correlation among the predictors and their relationship with mean temperature. In fact, for all simulations, the tropical temperature explains well the variations of HC except its southern hemisphere intensity. Hence, it can be used as the sole predictor to diagnose the HC response to greenhouse-induced global warming. How to account for the evolution of the southern HC strength remains unclear, because of the large inter-model spread in this quantity.

  15. Data-driven discovery of partial differential equations

    PubMed Central

    Rudy, Samuel H.; Brunton, Steven L.; Proctor, Joshua L.; Kutz, J. Nathan

    2017-01-01

    We propose a sparse regression method capable of discovering the governing partial differential equation(s) of a given system by time series measurements in the spatial domain. The regression framework relies on sparsity-promoting techniques to select the nonlinear and partial derivative terms of the governing equations that most accurately represent the data, bypassing a combinatorially large search through all possible candidate models. The method balances model complexity and regression accuracy by selecting a parsimonious model via Pareto analysis. Time series measurements can be made in an Eulerian framework, where the sensors are fixed spatially, or in a Lagrangian framework, where the sensors move with the dynamics. The method is computationally efficient, robust, and demonstrated to work on a variety of canonical problems spanning a number of scientific domains including Navier-Stokes, the quantum harmonic oscillator, and the diffusion equation. Moreover, the method is capable of disambiguating between potentially nonunique dynamical terms by using multiple time series taken with different initial data. Thus, for a traveling wave, the method can distinguish between a linear wave equation and the Korteweg–de Vries equation, for instance. The method provides a promising new technique for discovering governing equations and physical laws in parameterized spatiotemporal systems, where first-principles derivations are intractable. PMID:28508044

  16. Missing heritability in the tails of quantitative traits? A simulation study on the impact of slightly altered true genetic models.

    PubMed

    Pütter, Carolin; Pechlivanis, Sonali; Nöthen, Markus M; Jöckel, Karl-Heinz; Wichmann, Heinz-Erich; Scherag, André

    2011-01-01

    Genome-wide association studies have identified robust associations between single nucleotide polymorphisms and complex traits. As the proportion of phenotypic variance explained is still limited for most of the traits, larger and larger meta-analyses are being conducted to detect additional associations. Here we investigate the impact of the study design and the underlying assumption about the true genetic effect in a bimodal mixture situation on the power to detect associations. We performed simulations of quantitative phenotypes analysed by standard linear regression and dichotomized case-control data sets from the extremes of the quantitative trait analysed by standard logistic regression. Using linear regression, markers with an effect in the extremes of the traits were almost undetectable, whereas analysing extremes by case-control design had superior power even for much smaller sample sizes. Two real data examples are provided to support our theoretical findings and to explore our mixture and parameter assumption. Our findings support the idea to re-analyse the available meta-analysis data sets to detect new loci in the extremes. Moreover, our investigation offers an explanation for discrepant findings when analysing quantitative traits in the general population and in the extremes. Copyright © 2011 S. Karger AG, Basel.

  17. Moving beyond regression techniques in cardiovascular risk prediction: applying machine learning to address analytic challenges

    PubMed Central

    Goldstein, Benjamin A.; Navar, Ann Marie; Carter, Rickey E.

    2017-01-01

    Abstract Risk prediction plays an important role in clinical cardiology research. Traditionally, most risk models have been based on regression models. While useful and robust, these statistical methods are limited to using a small number of predictors which operate in the same way on everyone, and uniformly throughout their range. The purpose of this review is to illustrate the use of machine-learning methods for development of risk prediction models. Typically presented as black box approaches, most machine-learning methods are aimed at solving particular challenges that arise in data analysis that are not well addressed by typical regression approaches. To illustrate these challenges, as well as how different methods can address them, we consider trying to predicting mortality after diagnosis of acute myocardial infarction. We use data derived from our institution's electronic health record and abstract data on 13 regularly measured laboratory markers. We walk through different challenges that arise in modelling these data and then introduce different machine-learning approaches. Finally, we discuss general issues in the application of machine-learning methods including tuning parameters, loss functions, variable importance, and missing data. Overall, this review serves as an introduction for those working on risk modelling to approach the diffuse field of machine learning. PMID:27436868

  18. Evaluation of the confusion matrix method in the validation of an automated system for measuring feeding behaviour of cattle.

    PubMed

    Ruuska, Salla; Hämäläinen, Wilhelmiina; Kajava, Sari; Mughal, Mikaela; Matilainen, Pekka; Mononen, Jaakko

    2018-03-01

    The aim of the present study was to evaluate empirically confusion matrices in device validation. We compared the confusion matrix method to linear regression and error indices in the validation of a device measuring feeding behaviour of dairy cattle. In addition, we studied how to extract additional information on classification errors with confusion probabilities. The data consisted of 12 h behaviour measurements from five dairy cows; feeding and other behaviour were detected simultaneously with a device and from video recordings. The resulting 216 000 pairs of classifications were used to construct confusion matrices and calculate performance measures. In addition, hourly durations of each behaviour were calculated and the accuracy of measurements was evaluated with linear regression and error indices. All three validation methods agreed when the behaviour was detected very accurately or inaccurately. Otherwise, in the intermediate cases, the confusion matrix method and error indices produced relatively concordant results, but the linear regression method often disagreed with them. Our study supports the use of confusion matrix analysis in validation since it is robust to any data distribution and type of relationship, it makes a stringent evaluation of validity, and it offers extra information on the type and sources of errors. Copyright © 2018 Elsevier B.V. All rights reserved.

  19. Robust learning for optimal treatment decision with NP-dimensionality

    PubMed Central

    Shi, Chengchun; Song, Rui; Lu, Wenbin

    2016-01-01

    In order to identify important variables that are involved in making optimal treatment decision, Lu, Zhang and Zeng (2013) proposed a penalized least squared regression framework for a fixed number of predictors, which is robust against the misspecification of the conditional mean model. Two problems arise: (i) in a world of explosively big data, effective methods are needed to handle ultra-high dimensional data set, for example, with the dimension of predictors is of the non-polynomial (NP) order of the sample size; (ii) both the propensity score and conditional mean models need to be estimated from data under NP dimensionality. In this paper, we propose a robust procedure for estimating the optimal treatment regime under NP dimensionality. In both steps, penalized regressions are employed with the non-concave penalty function, where the conditional mean model of the response given predictors may be misspecified. The asymptotic properties, such as weak oracle properties, selection consistency and oracle distributions, of the proposed estimators are investigated. In addition, we study the limiting distribution of the estimated value function for the obtained optimal treatment regime. The empirical performance of the proposed estimation method is evaluated by simulations and an application to a depression dataset from the STAR*D study. PMID:28781717

  20. The Dissolution Behavior of Borosilicate Glasses in Far-From Equilibrium Conditions

    DOE PAGES

    Neeway, James J.; Rieke, Peter C.; Parruzot, Benjamin P.; ...

    2018-02-10

    An area of agreement in the waste glass corrosion community is that, at far-from-equilibrium conditions, the dissolution of borosilicate glasses used to immobilize nuclear waste is known to be a function of both temperature and pH. The aim of this work is to study the effects of temperature and pH on the dissolution rate of three model nuclear waste glasses (SON68, ISG, AFCI). The dissolution rate data are then used to parameterize a kinetic rate model based on Transition State Theory that has been developed to model glass corrosion behavior in dilute conditions. To do this, experiments were conducted atmore » temperatures of 23, 40, 70, and 90 °C and pH(22 °C) values of 9, 10, 11, and 12 with the single-pass flow-through (SPFT) test method. Both the absolute dissolution rates and the rate model parameters are compared with previous results. Rate model parameters for the three glasses studied here are nearly equivalent within error and in relative agreement with previous studies though quantifiable differences exist. The glass dissolution rates were analyzed with a linear multivariate regression (LMR) and a nonlinear multivariate regression performed with the use of the Glass Corrosion Modeling Tool (GCMT), with which a robust uncertainty analysis is performed. This robust analysis highlights the high degree of correlation of various parameters in the kinetic rate model. As more data are obtained on borosilicate glasses with varying compositions, a mathematical description of the effect of glass composition on the rate parameter values should be possible. This would allow for the possibility of calculating the forward dissolution rate of glass based solely on composition. In addition, the method of determination of parameter uncertainty and correlation provides a framework for other rate models that describe the dissolution rates of other amorphous and crystalline materials in a wide range of chemical conditions. As a result, the higher level of uncertainty analysis would provide a basis for comparison of different rate models and allow for a better means of quantifiably comparing the various models.« less

  1. The dissolution behavior of borosilicate glasses in far-from equilibrium conditions

    NASA Astrophysics Data System (ADS)

    Neeway, James J.; Rieke, Peter C.; Parruzot, Benjamin P.; Ryan, Joseph V.; Asmussen, R. Matthew

    2018-04-01

    An area of agreement in the waste glass corrosion community is that, at far-from-equilibrium conditions, the dissolution of borosilicate glasses used to immobilize nuclear waste is known to be a function of both temperature and pH. The aim of this work is to study the effects of temperature and pH on the dissolution rate of three model nuclear waste glasses (SON68, ISG, AFCI). The dissolution rate data are then used to parameterize a kinetic rate model based on Transition State Theory that has been developed to model glass corrosion behavior in dilute conditions. To do this, experiments were conducted at temperatures of 23, 40, 70, and 90 °C and pH (22 °C) values of 9, 10, 11, and 12 with the single-pass flow-through (SPFT) test method. Both the absolute dissolution rates and the rate model parameters are compared with previous results. Rate model parameters for the three glasses studied here are nearly equivalent within error and in relative agreement with previous studies though quantifiable differences exist. The glass dissolution rates were analyzed with a linear multivariate regression (LMR) and a nonlinear multivariate regression performed with the use of the Glass Corrosion Modeling Tool (GCMT), with which a robust uncertainty analysis is performed. This robust analysis highlights the high degree of correlation of various parameters in the kinetic rate model. As more data are obtained on borosilicate glasses with varying compositions, a mathematical description of the effect of glass composition on the rate parameter values should be possible. This would allow for the possibility of calculating the forward dissolution rate of glass based solely on composition. In addition, the method of determination of parameter uncertainty and correlation provides a framework for other rate models that describe the dissolution rates of other amorphous and crystalline materials in a wide range of chemical conditions. The higher level of uncertainty analysis would provide a basis for comparison of different rate models and allow for a better means of quantifiably comparing the various models.

  2. The Dissolution Behavior of Borosilicate Glasses in Far-From Equilibrium Conditions

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Neeway, James J.; Rieke, Peter C.; Parruzot, Benjamin P.

    An area of agreement in the waste glass corrosion community is that, at far-from-equilibrium conditions, the dissolution of borosilicate glasses used to immobilize nuclear waste is known to be a function of both temperature and pH. The aim of this work is to study the effects of temperature and pH on the dissolution rate of three model nuclear waste glasses (SON68, ISG, AFCI). The dissolution rate data are then used to parameterize a kinetic rate model based on Transition State Theory that has been developed to model glass corrosion behavior in dilute conditions. To do this, experiments were conducted atmore » temperatures of 23, 40, 70, and 90 °C and pH(22 °C) values of 9, 10, 11, and 12 with the single-pass flow-through (SPFT) test method. Both the absolute dissolution rates and the rate model parameters are compared with previous results. Rate model parameters for the three glasses studied here are nearly equivalent within error and in relative agreement with previous studies though quantifiable differences exist. The glass dissolution rates were analyzed with a linear multivariate regression (LMR) and a nonlinear multivariate regression performed with the use of the Glass Corrosion Modeling Tool (GCMT), with which a robust uncertainty analysis is performed. This robust analysis highlights the high degree of correlation of various parameters in the kinetic rate model. As more data are obtained on borosilicate glasses with varying compositions, a mathematical description of the effect of glass composition on the rate parameter values should be possible. This would allow for the possibility of calculating the forward dissolution rate of glass based solely on composition. In addition, the method of determination of parameter uncertainty and correlation provides a framework for other rate models that describe the dissolution rates of other amorphous and crystalline materials in a wide range of chemical conditions. As a result, the higher level of uncertainty analysis would provide a basis for comparison of different rate models and allow for a better means of quantifiably comparing the various models.« less

  3. Robust statistical methods for impulse noise suppressing of spread spectrum induced polarization data, with application to a mine site, Gansu province, China

    NASA Astrophysics Data System (ADS)

    Liu, Weiqiang; Chen, Rujun; Cai, Hongzhu; Luo, Weibin

    2016-12-01

    In this paper, we investigated the robust processing of noisy spread spectrum induced polarization (SSIP) data. SSIP is a new frequency domain induced polarization method that transmits pseudo-random m-sequence as source current where m-sequence is a broadband signal. The potential information at multiple frequencies can be obtained through measurement. Removing the noise is a crucial problem for SSIP data processing. Considering that if the ordinary mean stack and digital filter are not capable of reducing the impulse noise effectively in SSIP data processing, the impact of impulse noise will remain in the complex resistivity spectrum that will affect the interpretation of profile anomalies. We implemented a robust statistical method to SSIP data processing. The robust least-squares regression is used to fit and remove the linear trend from the original data before stacking. The robust M estimate is used to stack the data of all periods. The robust smooth filter is used to suppress the residual noise for data after stacking. For robust statistical scheme, the most appropriate influence function and iterative algorithm are chosen by testing the simulated data to suppress the outliers' influence. We tested the benefits of the robust SSIP data processing using examples of SSIP data recorded in a test site beside a mine in Gansu province, China.

  4. Collateral missing value imputation: a new robust missing value estimation algorithm for microarray data.

    PubMed

    Sehgal, Muhammad Shoaib B; Gondal, Iqbal; Dooley, Laurence S

    2005-05-15

    Microarray data are used in a range of application areas in biology, although often it contains considerable numbers of missing values. These missing values can significantly affect subsequent statistical analysis and machine learning algorithms so there is a strong motivation to estimate these values as accurately as possible before using these algorithms. While many imputation algorithms have been proposed, more robust techniques need to be developed so that further analysis of biological data can be accurately undertaken. In this paper, an innovative missing value imputation algorithm called collateral missing value estimation (CMVE) is presented which uses multiple covariance-based imputation matrices for the final prediction of missing values. The matrices are computed and optimized using least square regression and linear programming methods. The new CMVE algorithm has been compared with existing estimation techniques including Bayesian principal component analysis imputation (BPCA), least square impute (LSImpute) and K-nearest neighbour (KNN). All these methods were rigorously tested to estimate missing values in three separate non-time series (ovarian cancer based) and one time series (yeast sporulation) dataset. Each method was quantitatively analyzed using the normalized root mean square (NRMS) error measure, covering a wide range of randomly introduced missing value probabilities from 0.01 to 0.2. Experiments were also undertaken on the yeast dataset, which comprised 1.7% actual missing values, to test the hypothesis that CMVE performed better not only for randomly occurring but also for a real distribution of missing values. The results confirmed that CMVE consistently demonstrated superior and robust estimation capability of missing values compared with other methods for both series types of data, for the same order of computational complexity. A concise theoretical framework has also been formulated to validate the improved performance of the CMVE algorithm. The CMVE software is available upon request from the authors.

  5. Measuring multi-joint stiffness during single movements: numerical validation of a novel time-frequency approach.

    PubMed

    Piovesan, Davide; Pierobon, Alberto; DiZio, Paul; Lackner, James R

    2012-01-01

    This study presents and validates a Time-Frequency technique for measuring 2-dimensional multijoint arm stiffness throughout a single planar movement as well as during static posture. It is proposed as an alternative to current regressive methods which require numerous repetitions to obtain average stiffness on a small segment of the hand trajectory. The method is based on the analysis of the reassigned spectrogram of the arm's response to impulsive perturbations and can estimate arm stiffness on a trial-by-trial basis. Analytic and empirical methods are first derived and tested through modal analysis on synthetic data. The technique's accuracy and robustness are assessed by modeling the estimation of stiffness time profiles changing at different rates and affected by different noise levels. Our method obtains results comparable with two well-known regressive techniques. We also test how the technique can identify the viscoelastic component of non-linear and higher than second order systems with a non-parametrical approach. The technique proposed here is very impervious to noise and can be used easily for both postural and movement tasks. Estimations of stiffness profiles are possible with only one perturbation, making our method a useful tool for estimating limb stiffness during motor learning and adaptation tasks, and for understanding the modulation of stiffness in individuals with neurodegenerative diseases.

  6. Investigation of the UK37' vs. SST relationship for Atlantic Ocean suspended particulate alkenones: An alternative regression model and discussion of possible sampling bias

    NASA Astrophysics Data System (ADS)

    Gould, Jessica; Kienast, Markus; Dowd, Michael

    2017-05-01

    Alkenone unsaturation, expressed as the UK37' index, is closely related to growth temperature of prymnesiophytes, thus providing a reliable proxy to infer past sea surface temperatures (SSTs). Here we address two lingering uncertainties related to this SST proxy. First, calibration models developed for core-top sediments and those developed for surface suspended particulates organic material (SPOM) show systematic offsets, raising concerns regarding the transfer of the primary signal into the sedimentary record. Second, questions remain regarding changes in slope of the UK37' vs. growth temperature relationship at the temperature extremes. Based on (re)analysis of 31 new and 394 previously published SPOM UK37' data from the Atlantic Ocean, a new regression model to relate UK37' to SST is introduced; the Richards curve (Richards, 1959). This non-linear regression model provides a robust calibration of the UK37' vs. SST relationship for Atlantic SPOM samples and uniquely accounts for both the fact that the UK37' index is a proportion, and so must lie between 0 and 1, as well as for the observed reduction in slope at the warm and cold ends of the temperature range. As with prior fits of SPOM UK37' vs. SST, the Richards model is offset from traditional regression models of sedimentary UK37' vs. SST. We posit that (some of) this offset can be attributed to the seasonally and depth biased sampling of SPOM material.

  7. Exploring the links between macro-level contextual factors and their influence on nursing workforce composition.

    PubMed

    Squires, Allison; Beltrán-Sánchez, Hiram

    2011-11-01

    Research that links macro-level socioeconomic development variables to health care human resources workforce composition is scarce at best. The purpose of this study was to explore the links between nonnursing factors and nursing workforce composition through a secondary, descriptive analysis of year 2000, publicly available national nursing human resources data from Mexico. Building on previous research, the authors conducted multiple robust regression analysis by federal typing of nursing human resources from 31 Mexican states against macro-level socioeconomic development variables. Average education in a state was significantly associated in predicting all types of formally educated nurses in Mexico. Other results suggest that macro-level indicators have a different association with each type of nurse. Context may play a greater role in determining nursing workforce composition than previously thought. Further studies may help to explain differences both within and between countries.

  8. Exploring the Links Between Macro-Level Contextual Factors and Their Influence on Nursing Workforce Composition

    PubMed Central

    Squires, Allison; Beltrán-Sánchez, Hiram

    2012-01-01

    Research that links macro-level socioeconomic development variables to healthcare human resources workforce composition is scarce at best. The purpose of this study was to explore the links between non-nursing factors and nursing workforce composition through a secondary, descriptive analysis of year 2000, publicly available national nursing human resources data from Mexico. Building on previous research, the authors conducted multiple robust regression analysis by federal typing of nursing human resources from 31 Mexican states against macro-level socioeconomic development variables. Average education in a state was significantly associated in predicting all types of formally educated nurses in Mexico. Other results suggest that macro level indicators have a different association with each type of nurse. Context may play a greater role in determining nursing workforce composition than previously thought. Further studies may help to explain differences both within and between countries. PMID:22513839

  9. BMI and diabetes risk in Singaporean Chinese.

    PubMed

    Odegaard, Andrew O; Koh, Woon-Puay; Vazquez, Gabrielle; Arakawa, Kazuko; Lee, Hin-Peng; Yu, Mimi C; Pereira, Mark A

    2009-06-01

    Increased BMI is a robust risk factor for type 2 diabetes. Paradoxically, South Asians have relatively low BMIs despite their high prevalence of type 2 diabetes. We examined the association between BMI and incident type 2 diabetes because detailed prospective cohort data on this topic in Asians are scarce. This study was a prospective analysis of 37,091 men and women aged 45-74 years in the Singapore Chinese Health Study, using Cox regression analysis. Risk of incident type 2 diabetes significantly increased beginning with BMIs 18.5-23.0 kg/m(2)(relative risk 2.47 [95% CI 1.75-3.48]) and continued in a monotonic fashion across the spectrum of BMI. Results were stronger for younger than for older adults. BMIs considered lean and normal in Singaporean Chinese are strongly associated with increased risk of incident type 2 diabetes. This association weakened with advanced age but remained significant.

  10. Advanced spectrophotometric chemometric methods for resolving the binary mixture of doxylamine succinate and pyridoxine hydrochloride.

    PubMed

    Katsarov, Plamen; Gergov, Georgi; Alin, Aylin; Pilicheva, Bissera; Al-Degs, Yahya; Simeonov, Vasil; Kassarova, Margarita

    2018-03-01

    The prediction power of partial least squares (PLS) and multivariate curve resolution-alternating least squares (MCR-ALS) methods have been studied for simultaneous quantitative analysis of the binary drug combination - doxylamine succinate and pyridoxine hydrochloride. Analysis of first-order UV overlapped spectra was performed using different PLS models - classical PLS1 and PLS2 as well as partial robust M-regression (PRM). These linear models were compared to MCR-ALS with equality and correlation constraints (MCR-ALS-CC). All techniques operated within the full spectral region and extracted maximum information for the drugs analysed. The developed chemometric methods were validated on external sample sets and were applied to the analyses of pharmaceutical formulations. The obtained statistical parameters were satisfactory for calibration and validation sets. All developed methods can be successfully applied for simultaneous spectrophotometric determination of doxylamine and pyridoxine both in laboratory-prepared mixtures and commercial dosage forms.

  11. Analysis of ethnic disparities in workers' compensation claims using data linkage.

    PubMed

    Friedman, Lee S; Ruestow, Peter; Forst, Linda

    2012-10-01

    The overall goal of this research project was to assess ethnic disparities in monetary compensation among construction workers injured on the job through the linkage of medical records and workers' compensation data. Probabilistic linkage of medical records with workers' compensation claim data. In the final multivariable robust regression model, compensation was $5824 higher (P = 0.030; 95% confidence interval: 551 to 11,097) for white non-Hispanic workers than for other ethnic groups when controlling for injury severity, affected body region, type of injury, average weekly wage, weeks of temporary total disability, percent permanent partial disability, death, or attorney use. The analysis indicates that white non-Hispanic construction workers are awarded higher monetary settlements despite the observation that for specific injuries the mean temporary total disability and permanent partial disability were equivalent to or lower than those in Hispanic and black construction workers.

  12. Trace element analysis of rough diamond by LA-ICP-MS: a case of source discrimination?

    PubMed

    Dalpé, Claude; Hudon, Pierre; Ballantyne, David J; Williams, Darrell; Marcotte, Denis

    2010-11-01

    Current profiling of rough diamond source is performed using different physical and/or morphological techniques that require strong knowledge and experience in the field. More recently, chemical impurities have been used to discriminate diamond source and with the advance of laser ablation-inductively coupled plasma-mass spectrometry (LA-ICP-MS) empirical profiling of rough diamonds is possible to some extent. In this study, we present a LA-ICP-MS methodology that we developed for analyzing ultra-trace element impurities in rough diamond for origin determination ("profiling"). Diamonds from two sources were analyzed by LA-ICP-MS and were statistically classified by accepted methods. For the two diamond populations analyzed in this study, binomial logistic regression produced a better overall correct classification than linear discriminant analysis. The results suggest that an anticipated matrix match reference material would improve the robustness of our methodology for forensic applications. © 2010 American Academy of Forensic Sciences.

  13. Curcumin downregulates human tumor necrosis factor-α levels: A systematic review and meta-analysis ofrandomized controlled trials.

    PubMed

    Sahebkar, Amirhossein; Cicero, Arrigo F G; Simental-Mendía, Luis E; Aggarwal, Bharat B; Gupta, Subash C

    2016-05-01

    Tumor necrosis factor-α (TNF-α) is a key inflammatory mediator and its reduction is a therapeutic target in several inflammatory diseases. Curcumin, a bioactive polyphenol from turmeric, has been shown in several preclinical studies to block TNF-α effectively. However, clinical evidence has not been fully conclusive. The aim of the present meta-analysis was to evaluate the efficacy of curcumin supplementation on circulating levels of TNF-α in randomized controlled trials (RCTs). The search included PubMed-Medline, Scopus, Web of Science and Google Scholar databases by up to September 21, 2015, to identify RCTs investigating the impact of curcumin on circulating TNF-α concentration. Quantitative data synthesis was performed using a random-effects model, with weighed mean difference (WMD) and 95% confidence interval (CI) as summary statistics. Meta-regression and leave-one-out sensitivity analyses were performed to assess the modifiers of treatment response. Eight RCTs comprising nine treatment arms were finally selected for the meta-analysis. There was a significant reduction of circulating TNF-α concentrations following curcumin supplementation (WMD: -4.69pg/mL, 95% CI: -7.10, -2.28, p<0.001). This effect size was robust in sensitivity analysis. Meta-regression did not suggest any significant association between the circulating TNF-α-lowering effects of curcumin with either dose or duration (slope: 0.197; 95% CI: -1.73, 2.12; p=0.841) of treatment. This meta-analysis of RCTs suggested a significant effect of curcumin in lowering circulating TNF-α concentration. Copyright © 2016 Elsevier Ltd. All rights reserved.

  14. SLC6A3 polymorphism and response to methylphenidate in children with ADHD: A systematic review and meta-analysis.

    PubMed

    Soleimani, Robabeh; Salehi, Zivar; Soltanipour, Soheil; Hasandokht, Tolou; Jalali, Mir Mohammad

    2018-04-01

    Methylphenidate (MPH) is the most commonly used treatment for attention-deficit hyperactivity disorder (ADHD) in children. However, the response to MPH is not similar in all patients. This meta-analysis investigated the potential role of SLC6A3 polymorphisms in response to MPH in children with ADHD. Clinical trials or naturalistic studies were selected from electronic databases. A meta-analysis was conducted using a random-effects model. Cohen's d effect size and 95% confidence intervals (CIs) were determined. Sensitivity analysis and meta-regression were performed. Q-statistic and Egger's tests were conducted to evaluate heterogeneity and publication bias, respectively. The Grading of Recommendations Assessment, Development and Evaluation (GRADE) system was used to assess the quality of evidence. Sixteen studies with follow-up periods of 1-28 weeks were eligible. The mean treatment acceptability of MPH was 97.2%. In contrast to clinical trials, the meta-analysis of naturalistic studies indicated that children without 10/10 repeat carriers had better response to MPH (Cohen's d: -0.09 and 0.44, respectively). The 9/9 repeat polymorphism had no effect on the response rate (Cohen's d: -0.43). In the meta-regression, a significant association was observed between baseline severity of ADHD, MPH dosage, and combined type of ADHD in some genetic models. Sensitivity analysis indicated the robustness of our findings. No publication bias was observed in our meta-analysis. The GRADE evaluations revealed very low levels of confidence for each outcome of response to MPH. The results of clinical trials and naturalistic studies regarding the effect size between different polymorphisms of SLC6A3 were contradictory. Therefore, further research is recommended. © 2017 Wiley Periodicals, Inc.

  15. Brain grey matter volume alterations in late-life depression

    PubMed Central

    Du, Mingying; Liu, Jia; Chen, Ziqi; Huang, Xiaoqi; Li, Jing; Kuang, Weihong; Yang, Yanchun; Zhang, Wei; Zhou, Dong; Bi, Feng; Kendrick, Keith Maurice; Gong, Qiyong

    2014-01-01

    Background Voxel-based morphometry (VBM) studies have demonstrated that grey matter abnormalities are involved in the pathophysiology of late-life depression (LLD), but the findings are inconsistent and have not been quantitatively reviewed. The aim of the present study was to conduct a meta-analysis that integrated the reported VBM studies, to determine consistent grey matter alterations in individuals with LLD. Methods A systematic search was conducted to identify VBM studies that compared patients with LLD and healthy controls. We performed a meta-analysis using the effect size signed differential mapping method to quantitatively estimate regional grey matter abnormalities in patients with LLD. Results We included 9 studies with 11 data sets comprising 292 patients with LLD and 278 healthy controls in our meta-analysis. The pooled and subgroup meta-analyses showed robust grey matter reductions in the right lentiform nucleus extending into the parahippocampus, the hippocampus and the amygdala, the bilateral medial frontal gyrus and the right subcallosal gyrus as well as a grey matter increase in the right lingual gyrus. Meta-regression analyses showed that mean age and the percentage of female patients with LLD were not significantly related to grey matter changes. Limitations The analysis techniques, patient characteristics and clinical variables of the studies included were heterogeneous, and most participants were medicated. Conclusion The present meta-analysis is, to our knowledge, the first to overcome previous inconsistencies in the VBM studies of LLD and provide robust evidence for grey matter alterations within fronto–striatal–limbic networks, thereby implicating them in the pathophysiology of LLD. The mean age and the percentage of female patients with LLD did not appear to have a measurable impact on grey matter changes, although we cannot rule out the contributory effects of medication. PMID:24949867

  16. Real-time measurement system for evaluation of the carotid intima-media thickness with a robust edge operator.

    PubMed

    Faita, Francesco; Gemignani, Vincenzo; Bianchini, Elisabetta; Giannarelli, Chiara; Ghiadoni, Lorenzo; Demi, Marcello

    2008-09-01

    The purpose of this report is to describe an automatic real-time system for evaluation of the carotid intima-media thickness (CIMT) characterized by 3 main features: minimal interobserver and intraobserver variability, real-time capabilities, and great robustness against noise. One hundred fifty carotid B-mode ultrasound images were used to validate the system. Two skilled operators were involved in the analysis. Agreement with the gold standard, defined as the mean of 2 manual measurements of a skilled operator, and the interobserver and intraobserver variability were quantitatively evaluated by regression analysis and Bland-Altman statistics. The automatic measure of the CIMT showed a mean bias +/- SD of 0.001 +/- 0.035 mm toward the manual measurement. The intraobserver variability, evaluated with Bland-Altman plots, showed a bias that was not significantly different from 0, whereas the SD of the differences was greater in the manual analysis (0.038 mm) than in the automatic analysis (0.006 mm). For interobserver variability, the automatic measurement had a bias that was not significantly different from 0, with a satisfactory SD of the differences (0.01 mm), whereas in the manual measurement, a little bias was present (0.012 mm), and the SD of the differences was noticeably greater (0.044 mm). The CIMT has been accepted as a noninvasive marker of early vascular alteration. At present, the manual approach is largely used to estimate CIMT values. However, that method is highly operator dependent and time-consuming. For these reasons, we developed a new system for the CIMT measurement that conjugates precision with real-time analysis, thus providing considerable advantages in clinical practice.

  17. Brain grey matter volume alterations in late-life depression.

    PubMed

    Du, Mingying; Liu, Jia; Chen, Ziqi; Huang, Xiaoqi; Li, Jing; Kuang, Weihong; Yang, Yanchun; Zhang, Wei; Zhou, Dong; Bi, Feng; Kendrick, Keith M; Gong, Qiyong

    2014-11-01

    Voxel-based morphometry (VBM) studies have demonstrated that grey matter abnormalities are involved in the pathophysiology of late-life depression (LLD), but the findings are inconsistent and have not been quantitatively reviewed. The aim of the present study was to conduct a meta-analysis that integrated the reported VBM studies, to determine consistent grey matter alterations in individuals with LLD. A systematic search was conducted to identify VBM studies that compared patients with LLD and healthy controls. We performed a meta-analysis using the effect size signed differential mapping method to quantitatively estimate regional grey matter abnormalities in patients with LLD. We included 9 studies with 11 data sets comprising 292 patients with LLD and 278 healthy controls in our meta-analysis. The pooled and subgroup meta-analyses showed robust grey matter reductions in the right lentiform nucleus extending into the parahippocampus, the hippocampus and the amygdala, the bilateral medial frontal gyrus and the right subcallosal gyrus as well as a grey matter increase in the right lingual gyrus. Meta-regression analyses showed that mean age and the percentage of female patients with LLD were not significantly related to grey matter changes. The analysis techniques, patient characteristics and clinical variables of the studies included were heterogeneous, and most participants were medicated. The present meta-analysis is, to our knowledge, the first to overcome previous inconsistencies in the VBM studies of LLD and provide robust evidence for grey matter alterations within fronto-striatal-limbic networks, thereby implicating them in the pathophysiology of LLD. The mean age and the percentage of female patients with LLD did not appear to have a measurable impact on grey matter changes, although we cannot rule out the contributory effects of medication.

  18. Predictive Validity of National Basketball Association Draft Combine on Future Performance.

    PubMed

    Teramoto, Masaru; Cross, Chad L; Rieger, Randall H; Maak, Travis G; Willick, Stuart E

    2018-02-01

    Teramoto, M, Cross, CL, Rieger, RH, Maak, TG, and Willick, SE. Predictive validity of national basketball association draft combine on future performance. J Strength Cond Res 32(2): 396-408, 2018-The National Basketball Association (NBA) Draft Combine is an annual event where prospective players are evaluated in terms of their athletic abilities and basketball skills. Data collected at the Combine should help NBA teams select right the players for the upcoming NBA draft; however, its value for predicting future performance of players has not been examined. This study investigated predictive validity of the NBA Draft Combine on future performance of basketball players. We performed a principal component analysis (PCA) on the 2010-2015 Combine data to reduce correlated variables (N = 234), a correlation analysis on the Combine data and future on-court performance to examine relationships (maximum pairwise N = 217), and a robust principal component regression (PCR) analysis to predict first-year and 3-year on-court performance from the Combine measures (N = 148 and 127, respectively). Three components were identified within the Combine data through PCA (= Combine subscales): length-size, power-quickness, and upper-body strength. As per the correlation analysis, the individual Combine items for anthropometrics, including height without shoes, standing reach, weight, wingspan, and hand length, as well as the Combine subscale of length-size, had positive, medium-to-large-sized correlations (r = 0.313-0.545) with defensive performance quantified by Defensive Box Plus/Minus. The robust PCR analysis showed that the Combine subscale of length-size was a predictor most significantly associated with future on-court performance (p ≤ 0.05), including Win Shares, Box Plus/Minus, and Value Over Replacement Player, followed by upper-body strength. In conclusion, the NBA Draft Combine has value for predicting future performance of players.

  19. Robustness Analysis and Optimally Robust Control Design via Sum-of-Squares

    NASA Technical Reports Server (NTRS)

    Dorobantu, Andrei; Crespo, Luis G.; Seiler, Peter J.

    2012-01-01

    A control analysis and design framework is proposed for systems subject to parametric uncertainty. The underlying strategies are based on sum-of-squares (SOS) polynomial analysis and nonlinear optimization to design an optimally robust controller. The approach determines a maximum uncertainty range for which the closed-loop system satisfies a set of stability and performance requirements. These requirements, de ned as inequality constraints on several metrics, are restricted to polynomial functions of the uncertainty. To quantify robustness, SOS analysis is used to prove that the closed-loop system complies with the requirements for a given uncertainty range. The maximum uncertainty range, calculated by assessing a sequence of increasingly larger ranges, serves as a robustness metric for the closed-loop system. To optimize the control design, nonlinear optimization is used to enlarge the maximum uncertainty range by tuning the controller gains. Hence, the resulting controller is optimally robust to parametric uncertainty. This approach balances the robustness margins corresponding to each requirement in order to maximize the aggregate system robustness. The proposed framework is applied to a simple linear short-period aircraft model with uncertain aerodynamic coefficients.

  20. imDEV: a graphical user interface to R multivariate analysis tools in Microsoft Excel

    PubMed Central

    Grapov, Dmitry; Newman, John W.

    2012-01-01

    Summary: Interactive modules for Data Exploration and Visualization (imDEV) is a Microsoft Excel spreadsheet embedded application providing an integrated environment for the analysis of omics data through a user-friendly interface. Individual modules enables interactive and dynamic analyses of large data by interfacing R's multivariate statistics and highly customizable visualizations with the spreadsheet environment, aiding robust inferences and generating information-rich data visualizations. This tool provides access to multiple comparisons with false discovery correction, hierarchical clustering, principal and independent component analyses, partial least squares regression and discriminant analysis, through an intuitive interface for creating high-quality two- and a three-dimensional visualizations including scatter plot matrices, distribution plots, dendrograms, heat maps, biplots, trellis biplots and correlation networks. Availability and implementation: Freely available for download at http://sourceforge.net/projects/imdev/. Implemented in R and VBA and supported by Microsoft Excel (2003, 2007 and 2010). Contact: John.Newman@ars.usda.gov Supplementary Information: Installation instructions, tutorials and users manual are available at http://sourceforge.net/projects/imdev/. PMID:22815358

  1. Beer fermentation: monitoring of process parameters by FT-NIR and multivariate data analysis.

    PubMed

    Grassi, Silvia; Amigo, José Manuel; Lyndgaard, Christian Bøge; Foschino, Roberto; Casiraghi, Ernestina

    2014-07-15

    This work investigates the capability of Fourier-Transform near infrared (FT-NIR) spectroscopy to monitor and assess process parameters in beer fermentation at different operative conditions. For this purpose, the fermentation of wort with two different yeast strains and at different temperatures was monitored for nine days by FT-NIR. To correlate the collected spectra with °Brix, pH and biomass, different multivariate data methodologies were applied. Principal component analysis (PCA), partial least squares (PLS) and locally weighted regression (LWR) were used to assess the relationship between FT-NIR spectra and the abovementioned process parameters that define the beer fermentation. The accuracy and robustness of the obtained results clearly show the suitability of FT-NIR spectroscopy, combined with multivariate data analysis, to be used as a quality control tool in the beer fermentation process. FT-NIR spectroscopy, when combined with LWR, demonstrates to be a perfectly suitable quantitative method to be implemented in the production of beer. Copyright © 2014 Elsevier Ltd. All rights reserved.

  2. Developing appropriate methods for cost-effectiveness analysis of cluster randomized trials.

    PubMed

    Gomes, Manuel; Ng, Edmond S-W; Grieve, Richard; Nixon, Richard; Carpenter, James; Thompson, Simon G

    2012-01-01

    Cost-effectiveness analyses (CEAs) may use data from cluster randomized trials (CRTs), where the unit of randomization is the cluster, not the individual. However, most studies use analytical methods that ignore clustering. This article compares alternative statistical methods for accommodating clustering in CEAs of CRTs. Our simulation study compared the performance of statistical methods for CEAs of CRTs with 2 treatment arms. The study considered a method that ignored clustering--seemingly unrelated regression (SUR) without a robust standard error (SE)--and 4 methods that recognized clustering--SUR and generalized estimating equations (GEEs), both with robust SE, a "2-stage" nonparametric bootstrap (TSB) with shrinkage correction, and a multilevel model (MLM). The base case assumed CRTs with moderate numbers of balanced clusters (20 per arm) and normally distributed costs. Other scenarios included CRTs with few clusters, imbalanced cluster sizes, and skewed costs. Performance was reported as bias, root mean squared error (rMSE), and confidence interval (CI) coverage for estimating incremental net benefits (INBs). We also compared the methods in a case study. Each method reported low levels of bias. Without the robust SE, SUR gave poor CI coverage (base case: 0.89 v. nominal level: 0.95). The MLM and TSB performed well in each scenario (CI coverage, 0.92-0.95). With few clusters, the GEE and SUR (with robust SE) had coverage below 0.90. In the case study, the mean INBs were similar across all methods, but ignoring clustering underestimated statistical uncertainty and the value of further research. MLMs and the TSB are appropriate analytical methods for CEAs of CRTs with the characteristics described. SUR and GEE are not recommended for studies with few clusters.

  3. Using ventricular modeling to robustly probe significant deep gray matter pathologies: Application to cerebral palsy.

    PubMed

    Pagnozzi, Alex M; Shen, Kaikai; Doecke, James D; Boyd, Roslyn N; Bradley, Andrew P; Rose, Stephen; Dowson, Nicholas

    2016-11-01

    Understanding the relationships between the structure and function of the brain largely relies on the qualitative assessment of Magnetic Resonance Images (MRIs) by expert clinicians. Automated analysis systems can support these assessments by providing quantitative measures of brain injury. However, the assessment of deep gray matter structures, which are critical to motor and executive function, remains difficult as a result of large anatomical injuries commonly observed in children with Cerebral Palsy (CP). Hence, this article proposes a robust surrogate marker of the extent of deep gray matter injury based on impingement due to local ventricular enlargement on surrounding anatomy. Local enlargement was computed using a statistical shape model of the lateral ventricles constructed from 44 healthy subjects. Measures of injury on 95 age-matched CP patients were used to train a regression model to predict six clinical measures of function. The robustness of identifying ventricular enlargement was demonstrated by an area under the curve of 0.91 when tested against a dichotomised expert clinical assessment. The measures also showed strong and significant relationships for multiple clinical scores, including: motor function (r 2  = 0.62, P < 0.005), executive function (r 2  = 0.55, P < 0.005), and communication (r 2  = 0.50, P < 0.005), especially compared to using volumes obtained from standard anatomical segmentation approaches. The lack of reliance on accurate anatomical segmentations and its resulting robustness to large anatomical variations is a key feature of the proposed automated approach. This coupled with its strong correlation with clinically meaningful scores, signifies the potential utility to repeatedly assess MRIs for clinicians diagnosing children with CP. Hum Brain Mapp 37:3795-3809, 2016. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  4. Developing Appropriate Methods for Cost-Effectiveness Analysis of Cluster Randomized Trials

    PubMed Central

    Gomes, Manuel; Ng, Edmond S.-W.; Nixon, Richard; Carpenter, James; Thompson, Simon G.

    2012-01-01

    Aim. Cost-effectiveness analyses (CEAs) may use data from cluster randomized trials (CRTs), where the unit of randomization is the cluster, not the individual. However, most studies use analytical methods that ignore clustering. This article compares alternative statistical methods for accommodating clustering in CEAs of CRTs. Methods. Our simulation study compared the performance of statistical methods for CEAs of CRTs with 2 treatment arms. The study considered a method that ignored clustering—seemingly unrelated regression (SUR) without a robust standard error (SE)—and 4 methods that recognized clustering—SUR and generalized estimating equations (GEEs), both with robust SE, a “2-stage” nonparametric bootstrap (TSB) with shrinkage correction, and a multilevel model (MLM). The base case assumed CRTs with moderate numbers of balanced clusters (20 per arm) and normally distributed costs. Other scenarios included CRTs with few clusters, imbalanced cluster sizes, and skewed costs. Performance was reported as bias, root mean squared error (rMSE), and confidence interval (CI) coverage for estimating incremental net benefits (INBs). We also compared the methods in a case study. Results. Each method reported low levels of bias. Without the robust SE, SUR gave poor CI coverage (base case: 0.89 v. nominal level: 0.95). The MLM and TSB performed well in each scenario (CI coverage, 0.92–0.95). With few clusters, the GEE and SUR (with robust SE) had coverage below 0.90. In the case study, the mean INBs were similar across all methods, but ignoring clustering underestimated statistical uncertainty and the value of further research. Conclusions. MLMs and the TSB are appropriate analytical methods for CEAs of CRTs with the characteristics described. SUR and GEE are not recommended for studies with few clusters. PMID:22016450

  5. Auto Regressive Moving Average (ARMA) Modeling Method for Gyro Random Noise Using a Robust Kalman Filter

    PubMed Central

    Huang, Lei

    2015-01-01

    To solve the problem in which the conventional ARMA modeling methods for gyro random noise require a large number of samples and converge slowly, an ARMA modeling method using a robust Kalman filtering is developed. The ARMA model parameters are employed as state arguments. Unknown time-varying estimators of observation noise are used to achieve the estimated mean and variance of the observation noise. Using the robust Kalman filtering, the ARMA model parameters are estimated accurately. The developed ARMA modeling method has the advantages of a rapid convergence and high accuracy. Thus, the required sample size is reduced. It can be applied to modeling applications for gyro random noise in which a fast and accurate ARMA modeling method is required. PMID:26437409

  6. Likert scales, levels of measurement and the "laws" of statistics.

    PubMed

    Norman, Geoff

    2010-12-01

    Reviewers of research reports frequently criticize the choice of statistical methods. While some of these criticisms are well-founded, frequently the use of various parametric methods such as analysis of variance, regression, correlation are faulted because: (a) the sample size is too small, (b) the data may not be normally distributed, or (c) The data are from Likert scales, which are ordinal, so parametric statistics cannot be used. In this paper, I dissect these arguments, and show that many studies, dating back to the 1930s consistently show that parametric statistics are robust with respect to violations of these assumptions. Hence, challenges like those above are unfounded, and parametric methods can be utilized without concern for "getting the wrong answer".

  7. QSAR study of curcumine derivatives as HIV-1 integrase inhibitors.

    PubMed

    Gupta, Pawan; Sharma, Anju; Garg, Prabha; Roy, Nilanjan

    2013-03-01

    A QSAR study was performed on curcumine derivatives as HIV-1 integrase inhibitors using multiple linear regression. The statistically significant model was developed with squared correlation coefficients (r(2)) 0.891 and cross validated r(2) (r(2) cv) 0.825. The developed model revealed that electronic, shape, size, geometry, substitution's information and hydrophilicity were important atomic properties for determining the inhibitory activity of these molecules. The model was also tested successfully for external validation (r(2) pred = 0.849) as well as Tropsha's test for model predictability. Furthermore, the domain analysis was carried out to evaluate the prediction reliability of external set molecules. The model was statistically robust and had good predictive power which can be successfully utilized for screening of new molecules.

  8. Recent developments in measurement and evaluation of FAC damage in power plants

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Garud, Y.S.; Besuner, P.; Cohn, M.J.

    1999-11-01

    This paper describes some recent developments in the measurement and evaluation of flow-accelerated corrosion (FAC) damage in power plants. The evaluation focuses on data checking and smoothing to account for gross errors, noise, and uncertainty in the wall thickness measurements from ultrasonic or pulsed eddy-current data. Also, the evaluation method utilizes advanced regression analysis for spatial and temporal evolution of the wall loss, providing statistically robust predictions of wear rates and associated uncertainty. Results of the application of these new tools are presented for several components in actual service. More importantly, the practical implications of using these advances are discussedmore » in relation to the likely impact on the scope and effectiveness of FAC related inspection programs.« less

  9. The weighted priors approach for combining expert opinions in logistic regression experiments

    DOE PAGES

    Quinlan, Kevin R.; Anderson-Cook, Christine M.; Myers, Kary L.

    2017-04-24

    When modeling the reliability of a system or component, it is not uncommon for more than one expert to provide very different prior estimates of the expected reliability as a function of an explanatory variable such as age or temperature. Our goal in this paper is to incorporate all information from the experts when choosing a design about which units to test. Bayesian design of experiments has been shown to be very successful for generalized linear models, including logistic regression models. We use this approach to develop methodology for the case where there are several potentially non-overlapping priors under consideration.more » While multiple priors have been used for analysis in the past, they have never been used in a design context. The Weighted Priors method performs well for a broad range of true underlying model parameter choices and is more robust when compared to other reasonable design choices. Finally, we illustrate the method through multiple scenarios and a motivating example. Additional figures for this article are available in the online supplementary information.« less

  10. Prediction of Vancomycin Dose for Recommended Trough Concentrations in Pediatric Patients With Cystic Fibrosis.

    PubMed

    Amin, Raid W; Guttmann, Rodney P; Harris, Quianna R; Thomas, Janesha W

    2018-05-01

    Vancomycin is a key antibiotic used in the treatment of multiple conditions including infections associated with cystic fibrosis and methicillin-resistant Staphylococcus aureus. The present study sought to develop a model based on empirical evidence of optimal vancomycin dose as judged by clinical observations that could accelerate the achievement of desired trough level in children with cystic fibrosis. Transformations of dose and trough were used to arrive at regression models with excellent fit for dose based on weight or age for a target trough. Results of this study indicate that the 2 proposed regression models are robust to changes in age or weight, suggesting that the daily dose on a per-kilogram basis is determined primarily by the desired trough level. The results show that to obtain a vancomycin trough level of 20 μg/mL, a dose of 80 mg/kg/day is needed. This analysis should improve the efficiency of vancomycin usage by reducing the number of titration steps, resulting in improved patient outcome and experience. © 2018, The American College of Clinical Pharmacology.

  11. Statistical analysis of water-quality data containing multiple detection limits: S-language software for regression on order statistics

    USGS Publications Warehouse

    Lee, L.; Helsel, D.

    2005-01-01

    Trace contaminants in water, including metals and organics, often are measured at sufficiently low concentrations to be reported only as values below the instrument detection limit. Interpretation of these "less thans" is complicated when multiple detection limits occur. Statistical methods for multiply censored, or multiple-detection limit, datasets have been developed for medical and industrial statistics, and can be employed to estimate summary statistics or model the distributions of trace-level environmental data. We describe S-language-based software tools that perform robust linear regression on order statistics (ROS). The ROS method has been evaluated as one of the most reliable procedures for developing summary statistics of multiply censored data. It is applicable to any dataset that has 0 to 80% of its values censored. These tools are a part of a software library, or add-on package, for the R environment for statistical computing. This library can be used to generate ROS models and associated summary statistics, plot modeled distributions, and predict exceedance probabilities of water-quality standards. ?? 2005 Elsevier Ltd. All rights reserved.

  12. The weighted priors approach for combining expert opinions in logistic regression experiments

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Quinlan, Kevin R.; Anderson-Cook, Christine M.; Myers, Kary L.

    When modeling the reliability of a system or component, it is not uncommon for more than one expert to provide very different prior estimates of the expected reliability as a function of an explanatory variable such as age or temperature. Our goal in this paper is to incorporate all information from the experts when choosing a design about which units to test. Bayesian design of experiments has been shown to be very successful for generalized linear models, including logistic regression models. We use this approach to develop methodology for the case where there are several potentially non-overlapping priors under consideration.more » While multiple priors have been used for analysis in the past, they have never been used in a design context. The Weighted Priors method performs well for a broad range of true underlying model parameter choices and is more robust when compared to other reasonable design choices. Finally, we illustrate the method through multiple scenarios and a motivating example. Additional figures for this article are available in the online supplementary information.« less

  13. Epidemiological characteristics of reported sporadic and outbreak cases of E. coli O157 in people from Alberta, Canada (2000-2002): methodological challenges of comparing clustered to unclustered data.

    PubMed

    Pearl, D L; Louie, M; Chui, L; Doré, K; Grimsrud, K M; Martin, S W; Michel, P; Svenson, L W; McEwen, S A

    2008-04-01

    Using multivariable models, we compared whether there were significant differences between reported outbreak and sporadic cases in terms of their sex, age, and mode and site of disease transmission. We also determined the potential role of administrative, temporal, and spatial factors within these models. We compared a variety of approaches to account for clustering of cases in outbreaks including weighted logistic regression, random effects models, general estimating equations, robust variance estimates, and the random selection of one case from each outbreak. Age and mode of transmission were the only epidemiologically and statistically significant covariates in our final models using the above approaches. Weighing observations in a logistic regression model by the inverse of their outbreak size appeared to be a relatively robust and valid means for modelling these data. Some analytical techniques, designed to account for clustering, had difficulty converging or producing realistic measures of association.

  14. Recovering Galaxy Properties Using Gaussian Process SED Fitting

    NASA Astrophysics Data System (ADS)

    Iyer, Kartheik; Awan, Humna

    2018-01-01

    Information about physical quantities like the stellar mass, star formation rates, and ages for distant galaxies is contained in their spectral energy distributions (SEDs), obtained through photometric surveys like SDSS, CANDELS, LSST etc. However, noise in the photometric observations often is a problem, and using naive machine learning methods to estimate physical quantities can result in overfitting the noise, or converging on solutions that lie outside the physical regime of parameter space.We use Gaussian Process regression trained on a sample of SEDs corresponding to galaxies from a Semi-Analytic model (Somerville+15a) to estimate their stellar masses, and compare its performance to a variety of different methods, including simple linear regression, Random Forests, and k-Nearest Neighbours. We find that the Gaussian Process method is robust to noise and predicts not only stellar masses but also their uncertainties. The method is also robust in the cases where the distribution of the training data is not identical to the target data, which can be extremely useful when generalized to more subtle galaxy properties.

  15. Evaluation of Ares-I Control System Robustness to Uncertain Aerodynamics and Flex Dynamics

    NASA Technical Reports Server (NTRS)

    Jang, Jiann-Woei; VanTassel, Chris; Bedrossian, Nazareth; Hall, Charles; Spanos, Pol

    2008-01-01

    This paper discusses the application of robust control theory to evaluate robustness of the Ares-I control systems. Three techniques for estimating upper and lower bounds of uncertain parameters which yield stable closed-loop response are used here: (1) Monte Carlo analysis, (2) mu analysis, and (3) characteristic frequency response analysis. All three methods are used to evaluate stability envelopes of the Ares-I control systems with uncertain aerodynamics and flex dynamics. The results show that characteristic frequency response analysis is the most effective of these methods for assessing robustness.

  16. A robust multifactor dimensionality reduction method for detecting gene-gene interactions with application to the genetic analysis of bladder cancer susceptibility

    PubMed Central

    Gui, Jiang; Andrew, Angeline S.; Andrews, Peter; Nelson, Heather M.; Kelsey, Karl T.; Karagas, Margaret R.; Moore, Jason H.

    2010-01-01

    A central goal of human genetics is to identify and characterize susceptibility genes for common complex human diseases. An important challenge in this endeavor is the modeling of gene-gene interaction or epistasis that can result in non-additivity of genetic effects. The multifactor dimensionality reduction (MDR) method was developed as machine learning alternative to parametric logistic regression for detecting interactions in absence of significant marginal effects. The goal of MDR is to reduce the dimensionality inherent in modeling combinations of polymorphisms using a computational approach called constructive induction. Here, we propose a Robust Multifactor Dimensionality Reduction (RMDR) method that performs constructive induction using a Fisher’s Exact Test rather than a predetermined threshold. The advantage of this approach is that only those genotype combinations that are determined to be statistically significant are considered in the MDR analysis. We use two simulation studies to demonstrate that this approach will increase the success rate of MDR when there are only a few genotype combinations that are significantly associated with case-control status. We show that there is no loss of success rate when this is not the case. We then apply the RMDR method to the detection of gene-gene interactions in genotype data from a population-based study of bladder cancer in New Hampshire. PMID:21091664

  17. Evaluating Machine Learning Regression Algorithms for Operational Retrieval of Biophysical Parameters: Opportunities for Sentinel

    NASA Astrophysics Data System (ADS)

    Verrelst, Jochem; Rivera, J. P.; Alonso, L.; Guanter, L.; Moreno, J.

    2012-04-01

    ESA’s upcoming satellites Sentinel-2 (S2) and Sentinel-3 (S3) aim to ensure continuity for Landsat 5/7, SPOT- 5, SPOT-Vegetation and Envisat MERIS observations by providing superspectral images of high spatial and temporal resolution. S2 and S3 will deliver near real-time operational products with a high accuracy for land monitoring. This unprecedented data availability leads to an urgent need for developing robust and accurate retrieval methods. Machine learning regression algorithms could be powerful candidates for the estimation of biophysical parameters from satellite reflectance measurements because of their ability to perform adaptive, nonlinear data fitting. By using data from the ESA-led field campaign SPARC (Barrax, Spain), it was recently found [1] that Gaussian processes regression (GPR) outperformed competitive machine learning algorithms such as neural networks, support vector regression) and kernel ridge regression both in terms of accuracy and computational speed. For various Sentinel configurations (S2-10m, S2- 20m, S2-60m and S3-300m) three important biophysical parameters were estimated: leaf chlorophyll content (Chl), leaf area index (LAI) and fractional vegetation cover (FVC). GPR was the only method that reached the 10% precision required by end users in the estimation of Chl. In view of implementing the regressor into operational monitoring applications, here the portability of locally trained GPR models to other images was evaluated. The associated confidence maps proved to be a good indicator for evaluating the robustness of the trained models. Consistent retrievals were obtained across the different images, particularly over agricultural sites. To make the method suitable for operational use, however, the poorer confidences over bare soil areas suggest that the training dataset should be expanded with inputs from various land cover types.

  18. Comparison of two-concentration with multi-concentration linear regressions: Retrospective data analysis of multiple regulated LC-MS bioanalytical projects.

    PubMed

    Musuku, Adrien; Tan, Aimin; Awaiye, Kayode; Trabelsi, Fethi

    2013-09-01

    Linear calibration is usually performed using eight to ten calibration concentration levels in regulated LC-MS bioanalysis because a minimum of six are specified in regulatory guidelines. However, we have previously reported that two-concentration linear calibration is as reliable as or even better than using multiple concentrations. The purpose of this research is to compare two-concentration with multiple-concentration linear calibration through retrospective data analysis of multiple bioanalytical projects that were conducted in an independent regulated bioanalytical laboratory. A total of 12 bioanalytical projects were randomly selected: two validations and two studies for each of the three most commonly used types of sample extraction methods (protein precipitation, liquid-liquid extraction, solid-phase extraction). When the existing data were retrospectively linearly regressed using only the lowest and the highest concentration levels, no extra batch failure/QC rejection was observed and the differences in accuracy and precision between the original multi-concentration regression and the new two-concentration linear regression are negligible. Specifically, the differences in overall mean apparent bias (square root of mean individual bias squares) are within the ranges of -0.3% to 0.7% and 0.1-0.7% for the validations and studies, respectively. The differences in mean QC concentrations are within the ranges of -0.6% to 1.8% and -0.8% to 2.5% for the validations and studies, respectively. The differences in %CV are within the ranges of -0.7% to 0.9% and -0.3% to 0.6% for the validations and studies, respectively. The average differences in study sample concentrations are within the range of -0.8% to 2.3%. With two-concentration linear regression, an average of 13% of time and cost could have been saved for each batch together with 53% of saving in the lead-in for each project (the preparation of working standard solutions, spiking, and aliquoting). Furthermore, examples are given as how to evaluate the linearity over the entire concentration range when only two concentration levels are used for linear regression. To conclude, two-concentration linear regression is accurate and robust enough for routine use in regulated LC-MS bioanalysis and it significantly saves time and cost as well. Copyright © 2013 Elsevier B.V. All rights reserved.

  19. Hybrid Support Vector Regression and Autoregressive Integrated Moving Average Models Improved by Particle Swarm Optimization for Property Crime Rates Forecasting with Economic Indicators

    PubMed Central

    Alwee, Razana; Hj Shamsuddin, Siti Mariyam; Sallehuddin, Roselina

    2013-01-01

    Crimes forecasting is an important area in the field of criminology. Linear models, such as regression and econometric models, are commonly applied in crime forecasting. However, in real crimes data, it is common that the data consists of both linear and nonlinear components. A single model may not be sufficient to identify all the characteristics of the data. The purpose of this study is to introduce a hybrid model that combines support vector regression (SVR) and autoregressive integrated moving average (ARIMA) to be applied in crime rates forecasting. SVR is very robust with small training data and high-dimensional problem. Meanwhile, ARIMA has the ability to model several types of time series. However, the accuracy of the SVR model depends on values of its parameters, while ARIMA is not robust to be applied to small data sets. Therefore, to overcome this problem, particle swarm optimization is used to estimate the parameters of the SVR and ARIMA models. The proposed hybrid model is used to forecast the property crime rates of the United State based on economic indicators. The experimental results show that the proposed hybrid model is able to produce more accurate forecasting results as compared to the individual models. PMID:23766729

  20. Hybrid support vector regression and autoregressive integrated moving average models improved by particle swarm optimization for property crime rates forecasting with economic indicators.

    PubMed

    Alwee, Razana; Shamsuddin, Siti Mariyam Hj; Sallehuddin, Roselina

    2013-01-01

    Crimes forecasting is an important area in the field of criminology. Linear models, such as regression and econometric models, are commonly applied in crime forecasting. However, in real crimes data, it is common that the data consists of both linear and nonlinear components. A single model may not be sufficient to identify all the characteristics of the data. The purpose of this study is to introduce a hybrid model that combines support vector regression (SVR) and autoregressive integrated moving average (ARIMA) to be applied in crime rates forecasting. SVR is very robust with small training data and high-dimensional problem. Meanwhile, ARIMA has the ability to model several types of time series. However, the accuracy of the SVR model depends on values of its parameters, while ARIMA is not robust to be applied to small data sets. Therefore, to overcome this problem, particle swarm optimization is used to estimate the parameters of the SVR and ARIMA models. The proposed hybrid model is used to forecast the property crime rates of the United State based on economic indicators. The experimental results show that the proposed hybrid model is able to produce more accurate forecasting results as compared to the individual models.

  1. Nonparametric methods for doubly robust estimation of continuous treatment effects.

    PubMed

    Kennedy, Edward H; Ma, Zongming; McHugh, Matthew D; Small, Dylan S

    2017-09-01

    Continuous treatments (e.g., doses) arise often in practice, but many available causal effect estimators are limited by either requiring parametric models for the effect curve, or by not allowing doubly robust covariate adjustment. We develop a novel kernel smoothing approach that requires only mild smoothness assumptions on the effect curve, and still allows for misspecification of either the treatment density or outcome regression. We derive asymptotic properties and give a procedure for data-driven bandwidth selection. The methods are illustrated via simulation and in a study of the effect of nurse staffing on hospital readmissions penalties.

  2. Conditionally Unbiased Bounded Influence Robust Regression with Applications to Generalized Linear Models.

    DTIC Science & Technology

    1987-03-01

    some general results and definitions from robust statistics (see Hampel et. al.. 1986). The influence function of an M-Lstimator is IC (y.x.0) = D_...estimator is (4,T T T (\\cond(YXGB) . Yx(x,O.B) ) . where X~xO.) =x T T8 xT-lx 1/2 = x x v(x . b/(x B x) ) - B. The influence function of this...the influence function for (3.7) and (3.8) is equal to -. Eo[\\Pcond(y.X.,B)] -1 cond(Yx. ,B). (4.3) On the other hand, the influence function for the

  3. A robust nonlinear filter for image restoration.

    PubMed

    Koivunen, V

    1995-01-01

    A class of nonlinear regression filters based on robust estimation theory is introduced. The goal of the filtering is to recover a high-quality image from degraded observations. Models for desired image structures and contaminating processes are employed, but deviations from strict assumptions are allowed since the assumptions on signal and noise are typically only approximately true. The robustness of filters is usually addressed only in a distributional sense, i.e., the actual error distribution deviates from the nominal one. In this paper, the robustness is considered in a broad sense since the outliers may also be due to inappropriate signal model, or there may be more than one statistical population present in the processing window, causing biased estimates. Two filtering algorithms minimizing a least trimmed squares criterion are provided. The design of the filters is simple since no scale parameters or context-dependent threshold values are required. Experimental results using both real and simulated data are presented. The filters effectively attenuate both impulsive and nonimpulsive noise while recovering the signal structure and preserving interesting details.

  4. Use of fish embryo toxicity tests for the prediction of acute fish toxicity to chemicals.

    PubMed

    Belanger, Scott E; Rawlings, Jane M; Carr, Gregory J

    2013-08-01

    The fish embryo test (FET) is a potential animal alternative for the acute fish toxicity (AFT) test. A comprehensive validation program assessed 20 different chemicals to understand intra- and interlaboratory variability for the FET. The FET had sufficient reproducibility across a range of potencies and modes of action. In the present study, the suitability of the FET as an alternative model is reviewed by relating FET and AFT. In total, 985 FET studies and 1531 AFT studies were summarized. The authors performed FET-AFT regressions to understand potential relationships based on physical-chemical properties, species choices, duration of exposure, chemical classes, chemical functional uses, and modes of action. The FET-AFT relationships are very robust (slopes near 1.0, intercepts near 0) across 9 orders of magnitude in potency. A recommendation for the predictive regression relationship is based on 96-h FET and AFT data: log FET median lethal concentration (LC50) = (0.989 × log fish LC50) - 0.195; n = 72 chemicals, r = 0.95, p < 0.001, LC50 in mg/L. A similar, not statistically different regression was developed for the entire data set (n = 144 chemicals, unreliable studies deleted). The FET-AFT regressions were robust for major chemical classes with suitably large data sets. Furthermore, regressions were similar to those for large groups of functional chemical categories such as pesticides, surfactants, and industrial organics. Pharmaceutical regressions (n = 8 studies only) were directionally correct. The FET-AFT relationships were not quantitatively different from acute fish-acute fish toxicity relationships with the following species: fathead minnow, rainbow trout, bluegill sunfish, Japanese medaka, and zebrafish. The FET is scientifically supportable as a rational animal alternative model for ecotoxicological testing of acute toxicity of chemicals to fish. Copyright © 2013 SETAC.

  5. SMOS salinity retrieval by using Support Vector Regression (SVR)

    NASA Astrophysics Data System (ADS)

    Katagis, Thomas; Fernández-Prieto, Diego; Marconcini, Mattia; Sabia, Roberto; Martinez, Justino

    2013-04-01

    The Soil Moisture and Ocean Salinity (SMOS) mission was launched in November 2009 within the framework of the European Space Agency (ESA) Living Planet programme. Over the oceans, it aims at providing Sea Surface Salinity (SSS) maps with spatial and temporal coverage adequate for large scale oceanography. A comprehensive inversion scheme has been defined and implemented in the operational retrieval chain to allow proper SSS estimates in a single satellite overpass (L2 product) from the multi-angular brightness temperatures (TBs) measured by SMOS. Such SMOS operational L2 salinity processor minimizes the difference between the measured and modeled TBs, including additional constraints on Sea Surface Temperature (SST) and wind speed auxiliary fields. In particular, by adopting a maximum-likelihood Bayesian approach, the inversion scheme retrieves salinity under an iterative convergence loop. However, despite the implemented iterative technique is well established and robust, it is still prone to limitations; for instance, the presence of local minima in the cost function cannot be excluded. Moreover, previous studies have demonstrated that the background and observational terms of the cost function are not properly balanced and this is likely to introduce errors in the retrieval procedure. In order to overcome such potential drawbacks, in this study it is proposed a novel approach for the SSS estimation based on the ɛ-insensitive Support Vector Regression (SVR), where both SMOS L1 measurements and auxiliary parameters are used as input. The SVR technique already proved capable of high generalization and robustness in a variety of different applications, with a limited complexity in handling the learning phase. Notably, instead of minimizing the observed training error, it attempts to minimize the generalization error bound so as to achieve generalized performance. For this purpose, the original input domain is mapped into a higher dimensionality space (where the function underlying the data is supposed to have increased flatness) and linear regression is performed. The SVR training is performed using suitable in situ SSS data (i.e., ARGO buoys data) collected in a representative region of the ocean. So far, in situ data coming from a match-up ARGO database in November 2010 over the South Pacific constitute the preliminary benchmark of the study. Ongoing activities point at extending this spatial and temporal frame to assess the robustness of the method. The in situ data have been collocated with SMOS TB measurements and additional parameters (e.g., SST and wind speed) in the learning phase of the SVR under various training/testing configurations. Afterwards, the SSS regression has been performed out of the SMOS TBs or emissivities. Estimated SVR salinity fields are in general (very) well correlated with ARGO data. The analysis of the different impact of the various features has been performed once a rigorous data filtering/flagging is applied, and misfit (SSSSVR-SSSARGO) statistics have been computed. For assessing the effectiveness of the proposed method, final results will be compared to those obtained using the official SMOS SSS retrieval algorithm.

  6. Method validation for control determination of mercury in fresh fish and shrimp samples by solid sampling thermal decomposition/amalgamation atomic absorption spectrometry.

    PubMed

    Torres, Daiane Placido; Martins-Teixeira, Maristela Braga; Cadore, Solange; Queiroz, Helena Müller

    2015-01-01

    A method for the determination of total mercury in fresh fish and shrimp samples by solid sampling thermal decomposition/amalgamation atomic absorption spectrometry (TDA AAS) has been validated following international foodstuff protocols in order to fulfill the Brazilian National Residue Control Plan. The experimental parameters have been previously studied and optimized according to specific legislation on validation and inorganic contaminants in foodstuff. Linearity, sensitivity, specificity, detection and quantification limits, precision (repeatability and within-laboratory reproducibility), robustness as well as accuracy of the method have been evaluated. Linearity of response was satisfactory for the two range concentrations available on the TDA AAS equipment, between approximately 25.0 and 200.0 μg kg(-1) (square regression) and 250.0 and 2000.0 μg kg(-1) (linear regression) of mercury. The residues for both ranges were homoscedastic and independent, with normal distribution. Correlation coefficients obtained for these ranges were higher than 0.995. Limits of quantification (LOQ) and of detection of the method (LDM), based on signal standard deviation (SD) for a low-in-mercury sample, were 3.0 and 1.0 μg kg(-1), respectively. Repeatability of the method was better than 4%. Within-laboratory reproducibility achieved a relative SD better than 6%. Robustness of the current method was evaluated and pointed sample mass as a significant factor. Accuracy (assessed as the analyte recovery) was calculated on basis of the repeatability, and ranged from 89% to 99%. The obtained results showed the suitability of the present method for direct mercury measurement in fresh fish and shrimp samples and the importance of monitoring the analysis conditions for food control purposes. Additionally, the competence of this method was recognized by accreditation under the standard ISO/IEC 17025.

  7. Development of variable pathlength UV-vis spectroscopy combined with partial-least-squares regression for wastewater chemical oxygen demand (COD) monitoring.

    PubMed

    Chen, Baisheng; Wu, Huanan; Li, Sam Fong Yau

    2014-03-01

    To overcome the challenging task to select an appropriate pathlength for wastewater chemical oxygen demand (COD) monitoring with high accuracy by UV-vis spectroscopy in wastewater treatment process, a variable pathlength approach combined with partial-least squares regression (PLSR) was developed in this study. Two new strategies were proposed to extract relevant information of UV-vis spectral data from variable pathlength measurements. The first strategy was by data fusion with two data fusion levels: low-level data fusion (LLDF) and mid-level data fusion (MLDF). Predictive accuracy was found to improve, indicated by the lower root-mean-square errors of prediction (RMSEP) compared with those obtained for single pathlength measurements. Both fusion levels were found to deliver very robust PLSR models with residual predictive deviations (RPD) greater than 3 (i.e. 3.22 and 3.29, respectively). The second strategy involved calculating the slopes of absorbance against pathlength at each wavelength to generate slope-derived spectra. Without the requirement to select the optimal pathlength, the predictive accuracy (RMSEP) was improved by 20-43% as compared to single pathlength spectroscopy. Comparing to nine-factor models from fusion strategy, the PLSR model from slope-derived spectroscopy was found to be more parsimonious with only five factors and more robust with residual predictive deviation (RPD) of 3.72. It also offered excellent correlation of predicted and measured COD values with R(2) of 0.936. In sum, variable pathlength spectroscopy with the two proposed data analysis strategies proved to be successful in enhancing prediction performance of COD in wastewater and showed high potential to be applied in on-line water quality monitoring. Copyright © 2013 Elsevier B.V. All rights reserved.

  8. A quick on-line state of health estimation method for Li-ion battery with incremental capacity curves processed by Gaussian filter

    NASA Astrophysics Data System (ADS)

    Li, Yi; Abdel-Monem, Mohamed; Gopalakrishnan, Rahul; Berecibar, Maitane; Nanini-Maury, Elise; Omar, Noshin; van den Bossche, Peter; Van Mierlo, Joeri

    2018-01-01

    This paper proposes an advanced state of health (SoH) estimation method for high energy NMC lithium-ion batteries based on the incremental capacity (IC) analysis. IC curves are used due to their ability of detect and quantify battery degradation mechanism. A simple and robust smoothing method is proposed based on Gaussian filter to reduce the noise on IC curves, the signatures associated with battery ageing can therefore be accurately identified. A linear regression relationship is found between the battery capacity with the positions of features of interest (FOIs) on IC curves. Results show that the developed SoH estimation function from one single battery cell is able to evaluate the SoH of other batteries cycled under different cycling depth with less than 2.5% maximum errors, which proves the robustness of the proposed method on SoH estimation. With this technique, partial charging voltage curves can be used for SoH estimation and the testing time can be therefore largely reduced. This method shows great potential to be applied in reality, as it only requires static charging curves and can be easily implemented in battery management system (BMS).

  9. Source Apportionment and Risk Assessment of Emerging Contaminants: An Approach of Pharmaco-Signature in Water Systems

    PubMed Central

    Jiang, Jheng Jie; Lee, Chon Lin; Fang, Meng Der; Boyd, Kenneth G.; Gibb, Stuart W.

    2015-01-01

    This paper presents a methodology based on multivariate data analysis for characterizing potential source contributions of emerging contaminants (ECs) detected in 26 river water samples across multi-scape regions during dry and wet seasons. Based on this methodology, we unveil an approach toward potential source contributions of ECs, a concept we refer to as the “Pharmaco-signature.” Exploratory analysis of data points has been carried out by unsupervised pattern recognition (hierarchical cluster analysis, HCA) and receptor model (principal component analysis-multiple linear regression, PCA-MLR) in an attempt to demonstrate significant source contributions of ECs in different land-use zone. Robust cluster solutions grouped the database according to different EC profiles. PCA-MLR identified that 58.9% of the mean summed ECs were contributed by domestic impact, 9.7% by antibiotics application, and 31.4% by drug abuse. Diclofenac, ibuprofen, codeine, ampicillin, tetracycline, and erythromycin-H2O have significant pollution risk quotients (RQ>1), indicating potentially high risk to aquatic organisms in Taiwan. PMID:25874375

  10. Anesthesia Technique and Outcomes of Mechanical Thrombectomy in Patients With Acute Ischemic Stroke.

    PubMed

    Bekelis, Kimon; Missios, Symeon; MacKenzie, Todd A; Tjoumakaris, Stavropoula; Jabbour, Pascal

    2017-02-01

    The impact of anesthesia technique on the outcomes of mechanical thrombectomy for acute ischemic stroke remains an issue of debate. We investigated the association of general anesthesia with outcomes in patients undergoing mechanical thrombectomy for ischemic stroke. We performed a cohort study involving patients undergoing mechanical thrombectomy for ischemic stroke from 2009 to 2013, who were registered in the New York Statewide Planning and Research Cooperative System database. An instrumental variable (hospital rate of general anesthesia) analysis was used to simulate the effects of randomization and investigate the association of anesthesia technique with case-fatality and length of stay. Among 1174 patients, 441 (37.6%) underwent general anesthesia and 733 (62.4%) underwent conscious sedation. Using an instrumental variable analysis, we identified that general anesthesia was associated with a 6.4% increased case-fatality (95% confidence interval, 1.9%-11.0%) and 8.4 days longer length of stay (95% confidence interval, 2.9-14.0) in comparison to conscious sedation. This corresponded to 15 patients needing to be treated with conscious sedation to prevent 1 death. Our results were robust in sensitivity analysis with mixed effects regression and propensity score-adjusted regression models. Using a comprehensive all-payer cohort of acute ischemic stroke patients undergoing mechanical thrombectomy in New York State, we identified an association of general anesthesia with increased case-fatality and length of stay. These considerations should be taken into account when standardizing acute stroke care. © 2017 American Heart Association, Inc.

  11. Impact of silica diagenesis on the porosity of fine-grained strata: An analysis of Cenozoic mudstones from the North Sea

    NASA Astrophysics Data System (ADS)

    Wrona, Thilo; Taylor, Kevin G.; Jackson, Christopher A.-L.; Huuse, Mads; Najorka, Jens; Pan, Indranil

    2017-04-01

    Silica diagenesis has the potential to drastically change the physical and fluid flow properties of its host strata and therefore plays a key role in the development of sedimentary basins. The specific processes involved in silica diagenesis are, however, still poorly explained by existing models. This knowledge gap is addressed by investigating the effect of silica diagenesis on the porosity of Cenozoic mudstones of the North Viking Graben, northern North Sea through a multiple linear regression analysis. First, we identify and quantify the mineralogy of these rocks by scanning electron microscopy and X-ray diffraction, respectively. Mineral contents and host rock porosity data inferred from wireline data of two exploration wells are then analyzed by multiple linear regressions. This robust statistical analysis reveals that biogenic opal-A is a significant control and authigenic opal-CT is a minor influence on the porosity of these rocks. These results suggest that the initial porosity of siliceous mudstones increases with biogenic opal-A production during deposition and that the porosity reduction during opal-A/CT transformation results from opal-A dissolution. These findings advance our understanding of compaction, dewatering, and lithification of siliceous sediments and rocks. Moreover, this study provides a recipe for the derivation of the key controls (e.g., composition) on a rock property (e.g., porosity) that can be applied to a variety of problems in rock physics.

  12. Assessing the sensitivity and robustness of prediction models for apple firmness using spectral scattering technique

    USDA-ARS?s Scientific Manuscript database

    Spectral scattering is useful for nondestructive sensing of fruit firmness. Prediction models, however, are typically built using multivariate statistical methods such as partial least squares regression (PLSR), whose performance generally depends on the characteristics of the data. The aim of this ...

  13. Steep discounting of delayed monetary and food rewards in obesity: a meta-analysis.

    PubMed

    Amlung, M; Petker, T; Jackson, J; Balodis, I; MacKillop, J

    2016-08-01

    An increasing number of studies have investigated delay discounting (DD) in relation to obesity, but with mixed findings. This meta-analysis synthesized the literature on the relationship between monetary and food DD and obesity, with three objectives: (1) to characterize the relationship between DD and obesity in both case-control comparisons and continuous designs; (2) to examine potential moderators, including case-control v. continuous design, money v. food rewards, sample sex distribution, and sample age (18 years); and (3) to evaluate publication bias. From 134 candidate articles, 39 independent investigations yielded 29 case-control and 30 continuous comparisons (total n = 10 278). Random-effects meta-analysis was conducted using Cohen's d as the effect size. Publication bias was evaluated using fail-safe N, Begg-Mazumdar and Egger tests, meta-regression of publication year and effect size, and imputation of missing studies. The primary analysis revealed a medium effect size across studies that was highly statistically significant (d = 0.43, p < 10-14). None of the moderators examined yielded statistically significant differences, although notably larger effect sizes were found for studies with case-control designs, food rewards and child/adolescent samples. Limited evidence of publication bias was present, although the Begg-Mazumdar test and meta-regression suggested a slightly diminishing effect size over time. Steep DD of food and money appears to be a robust feature of obesity that is relatively consistent across the DD assessment methodologies and study designs examined. These findings are discussed in the context of research on DD in drug addiction, the neural bases of DD in obesity, and potential clinical applications.

  14. "A Bayesian sensitivity analysis to evaluate the impact of unmeasured confounding with external data: a real world comparative effectiveness study in osteoporosis".

    PubMed

    Zhang, Xiang; Faries, Douglas E; Boytsov, Natalie; Stamey, James D; Seaman, John W

    2016-09-01

    Observational studies are frequently used to assess the effectiveness of medical interventions in routine clinical practice. However, the use of observational data for comparative effectiveness is challenged by selection bias and the potential of unmeasured confounding. This is especially problematic for analyses using a health care administrative database, in which key clinical measures are often not available. This paper provides an approach to conducting a sensitivity analyses to investigate the impact of unmeasured confounding in observational studies. In a real world osteoporosis comparative effectiveness study, the bone mineral density (BMD) score, an important predictor of fracture risk and a factor in the selection of osteoporosis treatments, is unavailable in the data base and lack of baseline BMD could potentially lead to significant selection bias. We implemented Bayesian twin-regression models, which simultaneously model both the observed outcome and the unobserved unmeasured confounder, using information from external sources. A sensitivity analysis was also conducted to assess the robustness of our conclusions to changes in such external data. The use of Bayesian modeling in this study suggests that the lack of baseline BMD did have a strong impact on the analysis, reversing the direction of the estimated effect (odds ratio of fracture incidence at 24 months: 0.40 vs. 1.36, with/without adjusting for unmeasured baseline BMD). The Bayesian twin-regression models provide a flexible sensitivity analysis tool to quantitatively assess the impact of unmeasured confounding in observational studies. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  15. Robust Mosaicking of Stereo Digital Elevation Models from the Ames Stereo Pipeline

    NASA Technical Reports Server (NTRS)

    Kim, Tae Min; Moratto, Zachary M.; Nefian, Ara Victor

    2010-01-01

    Robust estimation method is proposed to combine multiple observations and create consistent, accurate, dense Digital Elevation Models (DEMs) from lunar orbital imagery. The NASA Ames Intelligent Robotics Group (IRG) aims to produce higher-quality terrain reconstructions of the Moon from Apollo Metric Camera (AMC) data than is currently possible. In particular, IRG makes use of a stereo vision process, the Ames Stereo Pipeline (ASP), to automatically generate DEMs from consecutive AMC image pairs. However, the DEMs currently produced by the ASP often contain errors and inconsistencies due to image noise, shadows, etc. The proposed method addresses this problem by making use of multiple observations and by considering their goodness of fit to improve both the accuracy and robustness of the estimate. The stepwise regression method is applied to estimate the relaxed weight of each observation.

  16. Robust regression and posterior predictive simulation increase power to detect early bursts of trait evolution.

    PubMed

    Slater, Graham J; Pennell, Matthew W

    2014-05-01

    A central prediction of much theory on adaptive radiations is that traits should evolve rapidly during the early stages of a clade's history and subsequently slowdown in rate as niches become saturated--a so-called "Early Burst." Although a common pattern in the fossil record, evidence for early bursts of trait evolution in phylogenetic comparative data has been equivocal at best. We show here that this may not necessarily be due to the absence of this pattern in nature. Rather, commonly used methods to infer its presence perform poorly when when the strength of the burst--the rate at which phenotypic evolution declines--is small, and when some morphological convergence is present within the clade. We present two modifications to existing comparative methods that allow greater power to detect early bursts in simulated datasets. First, we develop posterior predictive simulation approaches and show that they outperform maximum likelihood approaches at identifying early bursts at moderate strength. Second, we use a robust regression procedure that allows for the identification and down-weighting of convergent taxa, leading to moderate increases in method performance. We demonstrate the utility and power of these approach by investigating the evolution of body size in cetaceans. Model fitting using maximum likelihood is equivocal with regards the mode of cetacean body size evolution. However, posterior predictive simulation combined with a robust node height test return low support for Brownian motion or rate shift models, but not the early burst model. While the jury is still out on whether early bursts are actually common in nature, our approach will hopefully facilitate more robust testing of this hypothesis. We advocate the adoption of similar posterior predictive approaches to improve the fit and to assess the adequacy of macroevolutionary models in general.

  17. Influence plots for LASSO

    DOE PAGES

    Jang, Dae -Heung; Anderson-Cook, Christine Michaela

    2016-11-22

    With many predictors in regression, fitting the full model can induce multicollinearity problems. Least Absolute Shrinkage and Selection Operation (LASSO) is useful when the effects of many explanatory variables are sparse in a high-dimensional dataset. Influential points can have a disproportionate impact on the estimated values of model parameters. Here, this paper describes a new influence plot that can be used to increase understanding of the contributions of individual observations and the robustness of results. This can serve as a complement to other regression diagnostics techniques in the LASSO regression setting. Using this influence plot, we can find influential pointsmore » and their impact on shrinkage of model parameters and model selection. Lastly, we provide two examples to illustrate the methods.« less

  18. Influence plots for LASSO

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jang, Dae -Heung; Anderson-Cook, Christine Michaela

    With many predictors in regression, fitting the full model can induce multicollinearity problems. Least Absolute Shrinkage and Selection Operation (LASSO) is useful when the effects of many explanatory variables are sparse in a high-dimensional dataset. Influential points can have a disproportionate impact on the estimated values of model parameters. Here, this paper describes a new influence plot that can be used to increase understanding of the contributions of individual observations and the robustness of results. This can serve as a complement to other regression diagnostics techniques in the LASSO regression setting. Using this influence plot, we can find influential pointsmore » and their impact on shrinkage of model parameters and model selection. Lastly, we provide two examples to illustrate the methods.« less

  19. Risk stratification personalised model for prediction of life-threatening ventricular tachyarrhythmias in patients with chronic heart failure.

    PubMed

    Frolov, Alexander Vladimirovich; Vaikhanskaya, Tatjana Gennadjevna; Melnikova, Olga Petrovna; Vorobiev, Anatoly Pavlovich; Guel, Ludmila Michajlovna

    2017-01-01

    The development of prognostic factors of life-threatening ventricular tachyarrhythmias (VTA) and sudden cardiac death (SCD) continues to maintain its priority and relevance in cardiology. The development of a method of personalised prognosis based on multifactorial analysis of the risk factors associated with life-threatening heart rhythm disturbances is considered a key research and clinical task. To design a prognostic and mathematical model to define personalised risk for life-threatening VTA in patients with chronic heart failure (CHF). The study included 240 patients with CHF (mean-age of 50.5 ± 12.1 years; left ventricular ejection fraction 32.8 ± 10.9%; follow-up period 36.8 ± 5.7 months). The participants received basic therapy for heart failure. The elec-trocardiogram (ECG) markers of myocardial electrical instability were assessed including microvolt T-wave alternans, heart rate turbulence, heart rate deceleration, and QT dispersion. Additionally, echocardiography and Holter monitoring (HM) were performed. The cardiovascular events were considered as primary endpoints, including SCD, paroxysmal ventricular tachycardia/ventricular fibrillation (VT/VF) based on HM-ECG data, and data obtained from implantable device interrogation (CRT-D, ICD) as well as appropriated shocks. During the follow-up period, 66 (27.5%) subjects with CHF showed adverse arrhythmic events, including nine SCD events and 57 VTAs. Data from a stepwise discriminant analysis of cumulative ECG-markers of myocardial electrical instability were used to make a mathematical model of preliminary VTA risk stratification. Uni- and multivariate Cox logistic regression analysis were performed to define an individualised risk stratification model of SCD/VTA. A binary logistic regression model demonstrated a high prognostic significance of discriminant function with a classification sensitivity of 80.8% and specificity of 99.1% (F = 31.2; c2 = 143.2; p < 0.0001). The method of personalised risk stratification using Cox logistic regression allows correct classification of more than 93.9% of CHF cases. A robust body of evidence concerning logistic regression prognostic significance to define VTA risk allows inclusion of this method into the algorithm of subsequent control and selection of the optimal treatment modality to treat patients with CHF.

  20. Government, politics and health policy: A quantitative analysis of 30 European countries.

    PubMed

    Mackenbach, Johan P; McKee, Martin

    2015-10-01

    Public health policies are often dependent on political decision-making, but little is known of the impact of different forms of government on countries' health policies. In this exploratory study we studied the association between a wide range of process and outcome indicators of health policy and four groups of political factors (levels of democracy, e.g. voice and accountability; political representation, e.g. voter turnout; distribution of power, e.g. constraints on the executive; and quality of government, e.g. absence of corruption) in contemporary Europe. Data on 15 aspects of government and 18 indicators of health policy as well as on potential confounders were extracted from harmonized international data sources, covering 30 European countries and the years 1990-2010. In a first step, multivariate regression analysis was used to relate cumulative measures of government to indicators of health policy, and in a second step panel regression with country fixed effects was used to relate changes in selected measures of government to changes in indicators of health policy. In multivariate regression analyses, measures of quality of democracy and quality of government had many positive associations with process and outcome indicators of health policy, while measures of distribution of power and political representation had few and inconsistent associations. Associations for quality of democracy were robust against more extensive control for confounding variables, including tests in panel regressions with country fixed effects, but associations for quality of government were not. In this period in Europe, the predominant political influence on health policy has been the rise of levels of democracy in countries in the Central & Eastern part of the region. In contrast to other areas of public policy, health policy does not appear to be strongly influenced by institutional features of democracy determining the distribution of power, nor by aspects of political representation. The effect of quality of government on health policy warrants more study. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  1. Disconcordance in Statistical Models of Bisphenol A and Chronic Disease Outcomes in NHANES 2003-08

    PubMed Central

    Casey, Martin F.; Neidell, Matthew

    2013-01-01

    Background Bisphenol A (BPA), a high production chemical commonly found in plastics, has drawn great attention from researchers due to the substance’s potential toxicity. Using data from three National Health and Nutrition Examination Survey (NHANES) cycles, we explored the consistency and robustness of BPA’s reported effects on coronary heart disease and diabetes. Methods And Findings We report the use of three different statistical models in the analysis of BPA: (1) logistic regression, (2) log-linear regression, and (3) dose-response logistic regression. In each variation, confounders were added in six blocks to account for demographics, urinary creatinine, source of BPA exposure, healthy behaviours, and phthalate exposure. Results were sensitive to the variations in functional form of our statistical models, but no single model yielded consistent results across NHANES cycles. Reported ORs were also found to be sensitive to inclusion/exclusion criteria. Further, observed effects, which were most pronounced in NHANES 2003-04, could not be explained away by confounding. Conclusions Limitations in the NHANES data and a poor understanding of the mode of action of BPA have made it difficult to develop informative statistical models. Given the sensitivity of effect estimates to functional form, researchers should report results using multiple specifications with different assumptions about BPA measurement, thus allowing for the identification of potential discrepancies in the data. PMID:24223205

  2. Moving beyond regression techniques in cardiovascular risk prediction: applying machine learning to address analytic challenges.

    PubMed

    Goldstein, Benjamin A; Navar, Ann Marie; Carter, Rickey E

    2017-06-14

    Risk prediction plays an important role in clinical cardiology research. Traditionally, most risk models have been based on regression models. While useful and robust, these statistical methods are limited to using a small number of predictors which operate in the same way on everyone, and uniformly throughout their range. The purpose of this review is to illustrate the use of machine-learning methods for development of risk prediction models. Typically presented as black box approaches, most machine-learning methods are aimed at solving particular challenges that arise in data analysis that are not well addressed by typical regression approaches. To illustrate these challenges, as well as how different methods can address them, we consider trying to predicting mortality after diagnosis of acute myocardial infarction. We use data derived from our institution's electronic health record and abstract data on 13 regularly measured laboratory markers. We walk through different challenges that arise in modelling these data and then introduce different machine-learning approaches. Finally, we discuss general issues in the application of machine-learning methods including tuning parameters, loss functions, variable importance, and missing data. Overall, this review serves as an introduction for those working on risk modelling to approach the diffuse field of machine learning. © The Author 2016. Published by Oxford University Press on behalf of the European Society of Cardiology.

  3. Genetic Programming Transforms in Linear Regression Situations

    NASA Astrophysics Data System (ADS)

    Castillo, Flor; Kordon, Arthur; Villa, Carlos

    The chapter summarizes the use of Genetic Programming (GP) inMultiple Linear Regression (MLR) to address multicollinearity and Lack of Fit (LOF). The basis of the proposed method is applying appropriate input transforms (model respecification) that deal with these issues while preserving the information content of the original variables. The transforms are selected from symbolic regression models with optimal trade-off between accuracy of prediction and expressional complexity, generated by multiobjective Pareto-front GP. The chapter includes a comparative study of the GP-generated transforms with Ridge Regression, a variant of ordinary Multiple Linear Regression, which has been a useful and commonly employed approach for reducing multicollinearity. The advantages of GP-generated model respecification are clearly defined and demonstrated. Some recommendations for transforms selection are given as well. The application benefits of the proposed approach are illustrated with a real industrial application in one of the broadest empirical modeling areas in manufacturing - robust inferential sensors. The chapter contributes to increasing the awareness of the potential of GP in statistical model building by MLR.

  4. Testing in Microbiome-Profiling Studies with MiRKAT, the Microbiome Regression-Based Kernel Association Test

    PubMed Central

    Zhao, Ni; Chen, Jun; Carroll, Ian M.; Ringel-Kulka, Tamar; Epstein, Michael P.; Zhou, Hua; Zhou, Jin J.; Ringel, Yehuda; Li, Hongzhe; Wu, Michael C.

    2015-01-01

    High-throughput sequencing technology has enabled population-based studies of the role of the human microbiome in disease etiology and exposure response. Distance-based analysis is a popular strategy for evaluating the overall association between microbiome diversity and outcome, wherein the phylogenetic distance between individuals’ microbiome profiles is computed and tested for association via permutation. Despite their practical popularity, distance-based approaches suffer from important challenges, especially in selecting the best distance and extending the methods to alternative outcomes, such as survival outcomes. We propose the microbiome regression-based kernel association test (MiRKAT), which directly regresses the outcome on the microbiome profiles via the semi-parametric kernel machine regression framework. MiRKAT allows for easy covariate adjustment and extension to alternative outcomes while non-parametrically modeling the microbiome through a kernel that incorporates phylogenetic distance. It uses a variance-component score statistic to test for the association with analytical p value calculation. The model also allows simultaneous examination of multiple distances, alleviating the problem of choosing the best distance. Our simulations demonstrated that MiRKAT provides correctly controlled type I error and adequate power in detecting overall association. “Optimal” MiRKAT, which considers multiple candidate distances, is robust in that it suffers from little power loss in comparison to when the best distance is used and can achieve tremendous power gain in comparison to when a poor distance is chosen. Finally, we applied MiRKAT to real microbiome datasets to show that microbial communities are associated with smoking and with fecal protease levels after confounders are controlled for. PMID:25957468

  5. Comparison of univariate and multivariate calibration for the determination of micronutrients in pellets of plant materials by laser induced breakdown spectrometry

    NASA Astrophysics Data System (ADS)

    Braga, Jez Willian Batista; Trevizan, Lilian Cristina; Nunes, Lidiane Cristina; Rufini, Iolanda Aparecida; Santos, Dário, Jr.; Krug, Francisco José

    2010-01-01

    The application of laser induced breakdown spectrometry (LIBS) aiming the direct analysis of plant materials is a great challenge that still needs efforts for its development and validation. In this way, a series of experimental approaches has been carried out in order to show that LIBS can be used as an alternative method to wet acid digestions based methods for analysis of agricultural and environmental samples. The large amount of information provided by LIBS spectra for these complex samples increases the difficulties for selecting the most appropriated wavelengths for each analyte. Some applications have suggested that improvements in both accuracy and precision can be achieved by the application of multivariate calibration in LIBS data when compared to the univariate regression developed with line emission intensities. In the present work, the performance of univariate and multivariate calibration, based on partial least squares regression (PLSR), was compared for analysis of pellets of plant materials made from an appropriate mixture of cryogenically ground samples with cellulose as the binding agent. The development of a specific PLSR model for each analyte and the selection of spectral regions containing only lines of the analyte of interest were the best conditions for the analysis. In this particular application, these models showed a similar performance, but PLSR seemed to be more robust due to a lower occurrence of outliers in comparison to the univariate method. Data suggests that efforts dealing with sample presentation and fitness of standards for LIBS analysis must be done in order to fulfill the boundary conditions for matrix independent development and validation.

  6. Substituting values for censored data from Texas, USA, reservoirs inflated and obscured trends in analyses commonly used for water quality target development.

    PubMed

    Grantz, Erin; Haggard, Brian; Scott, J Thad

    2018-06-12

    We calculated four median datasets (chlorophyll a, Chl a; total phosphorus, TP; and transparency) using multiple approaches to handling censored observations, including substituting fractions of the quantification limit (QL; dataset 1 = 1QL, dataset 2 = 0.5QL) and statistical methods for censored datasets (datasets 3-4) for approximately 100 Texas, USA reservoirs. Trend analyses of differences between dataset 1 and 3 medians indicated percent difference increased linearly above thresholds in percent censored data (%Cen). This relationship was extrapolated to estimate medians for site-parameter combinations with %Cen > 80%, which were combined with dataset 3 as dataset 4. Changepoint analysis of Chl a- and transparency-TP relationships indicated threshold differences up to 50% between datasets. Recursive analysis identified secondary thresholds in dataset 4. Threshold differences show that information introduced via substitution or missing due to limitations of statistical methods biased values, underestimated error, and inflated the strength of TP thresholds identified in datasets 1-3. Analysis of covariance identified differences in linear regression models relating transparency-TP between datasets 1, 2, and the more statistically robust datasets 3-4. Study findings identify high-risk scenarios for biased analytical outcomes when using substitution. These include high probability of median overestimation when %Cen > 50-60% for a single QL, or when %Cen is as low 16% for multiple QL's. Changepoint analysis was uniquely vulnerable to substitution effects when using medians from sites with %Cen > 50%. Linear regression analysis was less sensitive to substitution and missing data effects, but differences in model parameters for transparency cannot be discounted and could be magnified by log-transformation of the variables.

  7. Exposure-response analysis of alectinib in crizotinib-resistant ALK-positive non-small cell lung cancer.

    PubMed

    Morcos, Peter N; Nueesch, Eveline; Jaminion, Felix; Guerini, Elena; Hsu, Joy C; Bordogna, Walter; Balas, Bogdana; Mercier, Francois

    2018-05-10

    Alectinib is a selective and potent anaplastic lymphoma kinase (ALK) inhibitor that is active in the central nervous system (CNS). Alectinib demonstrated robust efficacy in a pooled analysis of two single-arm, open-label phase II studies (NP28673, NCT01801111; NP28761, NCT01871805) in crizotinib-resistant ALK-positive non-small-cell lung cancer (NSCLC): median overall survival (OS) 29.1 months (95% confidence interval [CI]: 21.3-39.0) for alectinib 600 mg twice daily (BID). We investigated exposure-response relationships from final pooled phase II OS and safety data to assess alectinib dose selection. A semi-parametric Cox proportional hazards model analyzed relationships between individual median observed steady-state trough concentrations (C trough,ss ) for combined exposure of alectinib and its major metabolite (M4), baseline covariates (demographics and disease characteristics) and OS. Univariate logistic regression analysis analyzed relationships between C trough,ss and incidence of adverse events (AEs: serious and Grade ≥ 3). Overall, 92% of patients (n = 207/225) had C trough,ss data and were included in the analysis. No statistically significant relationship was found between C trough,ss and OS following alectinib treatment. The only baseline covariates that statistically influenced OS were baseline tumor size and prior crizotinib treatment duration. Larger baseline tumor size and shorter prior crizotinib treatment were both associated with shorter OS. Logistic regression confirmed no significant relationship between C trough,ss and AEs. Alectinib 600 mg BID provides systemic exposures at plateau of response for OS while maintaining a well-tolerated safety profile. This analysis confirms alectinib 600 mg BID as the recommended global dose for patients with crizotinib-resistant ALK-positive NSCLC.

  8. Rank estimation and the multivariate analysis of in vivo fast-scan cyclic voltammetric data

    PubMed Central

    Keithley, Richard B.; Carelli, Regina M.; Wightman, R. Mark

    2010-01-01

    Principal component regression has been used in the past to separate current contributions from different neuromodulators measured with in vivo fast-scan cyclic voltammetry. Traditionally, a percent cumulative variance approach has been used to determine the rank of the training set voltammetric matrix during model development, however this approach suffers from several disadvantages including the use of arbitrary percentages and the requirement of extreme precision of training sets. Here we propose that Malinowski’s F-test, a method based on a statistical analysis of the variance contained within the training set, can be used to improve factor selection for the analysis of in vivo fast-scan cyclic voltammetric data. These two methods of rank estimation were compared at all steps in the calibration protocol including the number of principal components retained, overall noise levels, model validation as determined using a residual analysis procedure, and predicted concentration information. By analyzing 119 training sets from two different laboratories amassed over several years, we were able to gain insight into the heterogeneity of in vivo fast-scan cyclic voltammetric data and study how differences in factor selection propagate throughout the entire principal component regression analysis procedure. Visualizing cyclic voltammetric representations of the data contained in the retained and discarded principal components showed that using Malinowski’s F-test for rank estimation of in vivo training sets allowed for noise to be more accurately removed. Malinowski’s F-test also improved the robustness of our criterion for judging multivariate model validity, even though signal-to-noise ratios of the data varied. In addition, pH change was the majority noise carrier of in vivo training sets while dopamine prediction was more sensitive to noise. PMID:20527815

  9. Effect of sour tea (Hibiscus sabdariffa L.) on arterial hypertension: a systematic review and meta-analysis of randomized controlled trials.

    PubMed

    Serban, Corina; Sahebkar, Amirhossein; Ursoniu, Sorin; Andrica, Florina; Banach, Maciej

    2015-06-01

    Hibiscus sabdariffa L. is a tropical wild plant rich in organic acids, polyphenols, anthocyanins, polysaccharides, and volatile constituents that are beneficial for the cardiovascular system. Hibiscus sabdariffa beverages are commonly consumed to treat arterial hypertension, yet the evidence from randomized controlled trials (RCTs) has not been fully conclusive. Therefore, we aimed to assess the potential antihypertensive effects of H. sabdariffa through systematic review of literature and meta-analysis of available RCTs. The search included PUBMED, Cochrane Library, Scopus, and EMBASE (up to July 2014) to identify RCTs investigating the efficacy of H. sabdariffa supplementation on SBP and DBP values. Two independent reviewers extracted data on the study characteristics, methods, and outcomes. Quantitative data synthesis and meta-regression were performed using a fixed-effect model, and sensitivity analysis using leave-one-out method. Five RCTs (comprising seven treatment arms) were selected for the meta-analysis. In total, 390 participants were randomized, of whom 225 were allocated to the H. sabdariffa supplementation group and 165 to the control group in the selected studies. Fixed-effect meta-regression indicated a significant effect of H. sabdariffa supplementation in lowering both SBP (weighed mean difference -7.58 mmHg, 95% confidence interval -9.69 to -5.46, P < 0.00001) and DBP (weighed mean difference -3.53 mmHg, 95% confidence interval -5.16 to -1.89, P < 0.0001). These effects were inversely associated with baseline BP values, and were robust in sensitivity analyses. This meta-analysis of RCTs showed a significant effect of H. sabdariffa in lowering both SBP and DBP. Further well designed trials are necessary to validate these results.

  10. Estimation of genetic parameters and breeding values across challenged environments to select for robust pigs.

    PubMed

    Herrero-Medrano, J M; Mathur, P K; ten Napel, J; Rashidi, H; Alexandri, P; Knol, E F; Mulder, H A

    2015-04-01

    Robustness is an important issue in the pig production industry. Since pigs from international breeding organizations have to withstand a variety of environmental challenges, selection of pigs with the inherent ability to sustain their productivity in diverse environments may be an economically feasible approach in the livestock industry. The objective of this study was to estimate genetic parameters and breeding values across different levels of environmental challenge load. The challenge load (CL) was estimated as the reduction in reproductive performance during different weeks of a year using 925,711 farrowing records from farms distributed worldwide. A wide range of levels of challenge, from favorable to unfavorable environments, was observed among farms with high CL values being associated with confirmed situations of unfavorable environment. Genetic parameters and breeding values were estimated in high- and low-challenge environments using a bivariate analysis, as well as across increasing levels of challenge with a random regression model using Legendre polynomials. Although heritability estimates of number of pigs born alive were slightly higher in environments with extreme CL than in those with intermediate levels of CL, the heritabilities of number of piglet losses increased progressively as CL increased. Genetic correlations among environments with different levels of CL suggest that selection in environments with extremes of low or high CL would result in low response to selection. Therefore, selection programs of breeding organizations that are commonly conducted under favorable environments could have low response to selection in commercial farms that have unfavorable environmental conditions. Sows that had experienced high levels of challenge at least once during their productive life were ranked according to their EBV. The selection of pigs using EBV ignoring environmental challenges or on the basis of records from only favorable environments resulted in a sharp decline in productivity as the level of challenge increased. In contrast, selection using the random regression approach resulted in limited change in productivity with increasing levels of challenge. Hence, we demonstrate that the use of a quantitative measure of environmental CL and a random regression approach can be comprehensively combined for genetic selection of pigs with enhanced ability to maintain high productivity in harsh environments.

  11. The Evaluation on the Cadmium Net Concentration for Soil Ecosystems.

    PubMed

    Yao, Yu; Wang, Pei-Fang; Wang, Chao; Hou, Jun; Miao, Ling-Zhan

    2017-03-12

    Yixing, known as the "City of Ceramics", is facing a new dilemma: a raw material crisis. Cadmium (Cd) exists in extremely high concentrations in soil due to the considerable input of industrial wastewater into the soil ecosystem. The in situ technique of diffusive gradients in thin film (DGT), the ex situ static equilibrium approach (HAc, EDTA and CaCl2), and the dissolved concentration in soil solution, as well as microwave digestion, were applied to predict the Cd bioavailability of soil, aiming to provide a robust and accurate method for Cd bioavailability evaluation in Yixing. Moreover, the typical local cash crops-paddy and zizania aquatica-were selected for Cd accumulation, aiming to select the ideal plants with tolerance to the soil Cd contamination. The results indicated that the biomasses of the two applied plants were sufficiently sensitive to reflect the stark regional differences of different sampling sites. The zizania aquatica could effectively reduce the total Cd concentration, as indicated by the high accumulation coefficients. However, the fact that the zizania aquatica has extremely high transfer coefficients, and its stem, as the edible part, might accumulate large amounts of Cd, led to the conclusion that zizania aquatica was not an ideal cash crop in Yixing. Furthermore, the labile Cd concentrations which were obtained by the DGT technique and dissolved in the soil solution showed a significant correlation with the Cd concentrations of the biota accumulation. However, the ex situ methods and the microwave digestion-obtained Cd concentrations showed a poor correlation with the accumulated Cd concentration in plant tissue. Correspondingly, the multiple linear regression models were built for fundamental analysis of the performance of different methods available for Cd bioavailability evaluation. The correlation coefficients of DGT obtained by the improved multiple linear regression model have not significantly improved compared to the coefficients obtained by the simple linear regression model. The results revealed that DGT was a robust measurement, which could obtain the labile Cd concentrations independent of the physicochemical features' variation in the soil ecosystem. Consequently, these findings provide stronger evidence that DGT is an effective and ideal tool for labile Cd evaluation in Yixing.

  12. The Evaluation on the Cadmium Net Concentration for Soil Ecosystems

    PubMed Central

    Yao, Yu; Wang, Pei-Fang; Wang, Chao; Hou, Jun; Miao, Ling-Zhan

    2017-01-01

    Yixing, known as the “City of Ceramics”, is facing a new dilemma: a raw material crisis. Cadmium (Cd) exists in extremely high concentrations in soil due to the considerable input of industrial wastewater into the soil ecosystem. The in situ technique of diffusive gradients in thin film (DGT), the ex situ static equilibrium approach (HAc, EDTA and CaCl2), and the dissolved concentration in soil solution, as well as microwave digestion, were applied to predict the Cd bioavailability of soil, aiming to provide a robust and accurate method for Cd bioavailability evaluation in Yixing. Moreover, the typical local cash crops—paddy and zizania aquatica—were selected for Cd accumulation, aiming to select the ideal plants with tolerance to the soil Cd contamination. The results indicated that the biomasses of the two applied plants were sufficiently sensitive to reflect the stark regional differences of different sampling sites. The zizania aquatica could effectively reduce the total Cd concentration, as indicated by the high accumulation coefficients. However, the fact that the zizania aquatica has extremely high transfer coefficients, and its stem, as the edible part, might accumulate large amounts of Cd, led to the conclusion that zizania aquatica was not an ideal cash crop in Yixing. Furthermore, the labile Cd concentrations which were obtained by the DGT technique and dissolved in the soil solution showed a significant correlation with the Cd concentrations of the biota accumulation. However, the ex situ methods and the microwave digestion-obtained Cd concentrations showed a poor correlation with the accumulated Cd concentration in plant tissue. Correspondingly, the multiple linear regression models were built for fundamental analysis of the performance of different methods available for Cd bioavailability evaluation. The correlation coefficients of DGT obtained by the improved multiple linear regression model have not significantly improved compared to the coefficients obtained by the simple linear regression model. The results revealed that DGT was a robust measurement, which could obtain the labile Cd concentrations independent of the physicochemical features’ variation in the soil ecosystem. Consequently, these findings provide stronger evidence that DGT is an effective and ideal tool for labile Cd evaluation in Yixing. PMID:28287500

  13. Toward customer-centric organizational science: A common language effect size indicator for multiple linear regressions and regressions with higher-order terms.

    PubMed

    Krasikova, Dina V; Le, Huy; Bachura, Eric

    2018-06-01

    To address a long-standing concern regarding a gap between organizational science and practice, scholars called for more intuitive and meaningful ways of communicating research results to users of academic research. In this article, we develop a common language effect size index (CLβ) that can help translate research results to practice. We demonstrate how CLβ can be computed and used to interpret the effects of continuous and categorical predictors in multiple linear regression models. We also elaborate on how the proposed CLβ index is computed and used to interpret interactions and nonlinear effects in regression models. In addition, we test the robustness of the proposed index to violations of normality and provide means for computing standard errors and constructing confidence intervals around its estimates. (PsycINFO Database Record (c) 2018 APA, all rights reserved).

  14. The Importance of Specific Workplace Environment Characteristics for Maximum Health and Performance: Healthcare Workers' Perspective.

    PubMed

    Sagha Zadeh, Rana; Shepley, Mardelle M; Owora, Arthur Hamie; Dannenbaum, Martha C; Waggener, Laurie T; Chung, Susan Sung Eun

    2018-05-01

    To examine the importance of specific workplace environment characteristics for maximum health and performance, assigned by healthcare employees, and how they relate to the nature of their work. A cross-sectional mixed-method study was conducted with content analysis and robust regression models to examine the relationship between workplace environment characteristics and perceived importance in promoting health and performance. Our findings suggest that perceptions of key environment characteristics that safeguard health and performance in healthcare workplaces may vary by employee sex, setting, and nature of healthcare work involved. Theme and model descriptions of the influence of these factors on participant perceptions are provided. Employee feedback on workplace characteristics that impact health and performance could be instrumental in determining the priorities of workplace design.

  15. Quantitative structure-activity relationship study of P2X7 receptor inhibitors using combination of principal component analysis and artificial intelligence methods.

    PubMed

    Ahmadi, Mehdi; Shahlaei, Mohsen

    2015-01-01

    P2X7 antagonist activity for a set of 49 molecules of the P2X7 receptor antagonists, derivatives of purine, was modeled with the aid of chemometric and artificial intelligence techniques. The activity of these compounds was estimated by means of combination of principal component analysis (PCA), as a well-known data reduction method, genetic algorithm (GA), as a variable selection technique, and artificial neural network (ANN), as a non-linear modeling method. First, a linear regression, combined with PCA, (principal component regression) was operated to model the structure-activity relationships, and afterwards a combination of PCA and ANN algorithm was employed to accurately predict the biological activity of the P2X7 antagonist. PCA preserves as much of the information as possible contained in the original data set. Seven most important PC's to the studied activity were selected as the inputs of ANN box by an efficient variable selection method, GA. The best computational neural network model was a fully-connected, feed-forward model with 7-7-1 architecture. The developed ANN model was fully evaluated by different validation techniques, including internal and external validation, and chemical applicability domain. All validations showed that the constructed quantitative structure-activity relationship model suggested is robust and satisfactory.

  16. Quantitative structure–activity relationship study of P2X7 receptor inhibitors using combination of principal component analysis and artificial intelligence methods

    PubMed Central

    Ahmadi, Mehdi; Shahlaei, Mohsen

    2015-01-01

    P2X7 antagonist activity for a set of 49 molecules of the P2X7 receptor antagonists, derivatives of purine, was modeled with the aid of chemometric and artificial intelligence techniques. The activity of these compounds was estimated by means of combination of principal component analysis (PCA), as a well-known data reduction method, genetic algorithm (GA), as a variable selection technique, and artificial neural network (ANN), as a non-linear modeling method. First, a linear regression, combined with PCA, (principal component regression) was operated to model the structure–activity relationships, and afterwards a combination of PCA and ANN algorithm was employed to accurately predict the biological activity of the P2X7 antagonist. PCA preserves as much of the information as possible contained in the original data set. Seven most important PC's to the studied activity were selected as the inputs of ANN box by an efficient variable selection method, GA. The best computational neural network model was a fully-connected, feed-forward model with 7−7−1 architecture. The developed ANN model was fully evaluated by different validation techniques, including internal and external validation, and chemical applicability domain. All validations showed that the constructed quantitative structure–activity relationship model suggested is robust and satisfactory. PMID:26600858

  17. The effects of competition on premiums: using United Healthcare's 2015 entry into Affordable Care Act's marketplaces as an instrumental variable.

    PubMed

    Agirdas, Cagdas; Krebs, Robert J; Yano, Masato

    2018-01-08

    One goal of the Affordable Care Act is to increase insurance coverage by improving competition and lowering premiums. To facilitate this goal, the federal government enacted online marketplaces in the 395 rating areas spanning 34 states that chose not to establish their own state-run marketplaces. Few multivariate regression studies analyzing the effects of competition on premiums suffer from endogeneity, due to simultaneity and omitted variable biases. However, United Healthcare's decision to enter these marketplaces in 2015 provides the researcher with an opportunity to address this endogeneity problem. Exploiting the variation caused by United Healthcare's entry decision as an instrument for competition, we study the impact of competition on premiums during the first 2 years of these marketplaces. Combining panel data from five different sources and controlling for 12 variables, we find that one more insurer in a rating area leads to a 6.97% reduction in the second-lowest-priced silver plan premium, which is larger than the estimated effects in existing literature. Furthermore, we run a threshold analysis and find that competition's effects on premiums become statistically insignificant if there are four or more insurers in a rating area. These findings are robust to alternative measures of premiums, inclusion of a non-linear term in the regression models and a county-level analysis.

  18. Measuring Multi-Joint Stiffness during Single Movements: Numerical Validation of a Novel Time-Frequency Approach

    PubMed Central

    Piovesan, Davide; Pierobon, Alberto; DiZio, Paul; Lackner, James R.

    2012-01-01

    This study presents and validates a Time-Frequency technique for measuring 2-dimensional multijoint arm stiffness throughout a single planar movement as well as during static posture. It is proposed as an alternative to current regressive methods which require numerous repetitions to obtain average stiffness on a small segment of the hand trajectory. The method is based on the analysis of the reassigned spectrogram of the arm's response to impulsive perturbations and can estimate arm stiffness on a trial-by-trial basis. Analytic and empirical methods are first derived and tested through modal analysis on synthetic data. The technique's accuracy and robustness are assessed by modeling the estimation of stiffness time profiles changing at different rates and affected by different noise levels. Our method obtains results comparable with two well-known regressive techniques. We also test how the technique can identify the viscoelastic component of non-linear and higher than second order systems with a non-parametrical approach. The technique proposed here is very impervious to noise and can be used easily for both postural and movement tasks. Estimations of stiffness profiles are possible with only one perturbation, making our method a useful tool for estimating limb stiffness during motor learning and adaptation tasks, and for understanding the modulation of stiffness in individuals with neurodegenerative diseases. PMID:22448233

  19. Long-Term Vegetation Trends Detected In Northern Canada Using Landsat Image Stacks

    NASA Astrophysics Data System (ADS)

    Fraser, R.; Olthof, I.; Carrière, M.; Deschamps, A.; Pouliot, D.

    2011-12-01

    Evidence of recent productivity increases in arctic vegetation comes from a variety of sources. At local scales, long-term plot measurements in North America are beginning to record increases in vascular plant cover and biomass. At landscape scales, expansion and densification of shrubs has been observed using repeat oblique photographs. Finally, continental-scale increases in vegetation "greenness" have been documented based on analysis of coarse resolution (≥ 1 km) NOAA-AVHRR satellite imagery. In this study we investigated intermediate, regional-level changes occurring in tundra vegetation since 1984 using the Landsat TM and ETM+ satellite image archive. Four study areas averaging 13,619 km2 were located over widely distributed national parks in northern Canada (Ivvavik, Sirmilik, Torngat Mountains, and Wapusk). Time-series image stacks of 16-41 growing-season Landsat scenes from overlapping WRS-2 frames were acquired spanning periods of 17-25 years. Each pixel's unique temporal database of clear-sky values was then analyzed for trends in four indices (NDVI, Tasseled Cap Brightness, Greenness and Wetness) using robust linear regression. The trends were further related to changes in the fractional cover of functional vegetation types using regression tree models trained with plot data and high resolution (≤ 10 m) satellite imagery. We found all four study areas to have a larger proportion of significant (p<0.05) positive greenness trends (range 6.1-25.5%) by comparison to negative trends (range 0.3-4.1%). For the three study areas where regression tree models could be derived, consistent trends of increasing shrub or vascular fractional cover and decreasing bare cover were predicted. The Landsat-based observations were associated with warming trends in each park over the analysis periods. Many of the major changes observed could be corroborated using published studies or field observations.

  20. Simultaneous Quantification of Syringic Acid and Kaempferol in Extracts of Bergenia Species Using Validated High-Performance Thin-Layer Chromatographic-Densitometric Method.

    PubMed

    Srivastava, Nishi; Srivastava, Amit; Srivastava, Sharad; Rawat, Ajay Kumar Singh; Khan, Abdul Rahman

    2016-03-01

    A rapid, sensitive, selective and robust quantitative densitometric high-performance thin-layer chromatographic method was developed and validated for separation and quantification of syringic acid (SYA) and kaempferol (KML) in the hydrolyzed extracts of Bergenia ciliata and Bergenia stracheyi. The separation was performed on silica gel 60F254 high-performance thin-layer chromatography plates using toluene : ethyl acetate : formic acid (5 : 4: 1, v/v/v) as the mobile phase. The quantification of SYA and KML was carried out using a densitometric reflection/absorption mode at 290 nm. A dense spot of SYA and KML appeared on the developed plate at a retention factor value of 0.61 ± 0.02 and 0.70 ± 0.01. A precise and accurate quantification was performed using linear regression analysis by plotting the peak area vs concentration 100-600 ng/band (correlation coefficient: r = 0.997, regression coefficient: R(2) = 0.996) for SYA and 100-600 ng/band (correlation coefficient: r = 0.995, regression coefficient: R(2) = 0.991) for KML. The developed method was validated in terms of accuracy, recovery and inter- and intraday study as per International Conference on Harmonisation guidelines. The limit of detection and limit of quantification of SYA and KML were determined, respectively, as 91.63, 142.26 and 277.67, 431.09 ng. The statistical data analysis showed that the method is reproducible and selective for the estimation of SYA and KML in extracts of B. ciliata and B. stracheyi. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  1. Hip fracture in the elderly: a re-analysis of the EPIDOS study with causal Bayesian networks.

    PubMed

    Caillet, Pascal; Klemm, Sarah; Ducher, Michel; Aussem, Alexandre; Schott, Anne-Marie

    2015-01-01

    Hip fractures commonly result in permanent disability, institutionalization or death in elderly. Existing hip-fracture predicting tools are underused in clinical practice, partly due to their lack of intuitive interpretation. By use of a graphical layer, Bayesian network models could increase the attractiveness of fracture prediction tools. Our aim was to study the potential contribution of a causal Bayesian network in this clinical setting. A logistic regression was performed as a standard control approach to check the robustness of the causal Bayesian network approach. EPIDOS is a multicenter study, conducted in an ambulatory care setting in five French cities between 1992 and 1996 and updated in 2010. The study included 7598 women aged 75 years or older, in which fractures were assessed quarterly during 4 years. A causal Bayesian network and a logistic regression were performed on EPIDOS data to describe major variables involved in hip fractures occurrences. Both models had similar association estimations and predictive performances. They detected gait speed and mineral bone density as variables the most involved in the fracture process. The causal Bayesian network showed that gait speed and bone mineral density were directly connected to fracture and seem to mediate the influence of all the other variables included in our model. The logistic regression approach detected multiple interactions involving psychotropic drug use, age and bone mineral density. Both approaches retrieved similar variables as predictors of hip fractures. However, Bayesian network highlighted the whole web of relation between the variables involved in the analysis, suggesting a possible mechanism leading to hip fracture. According to the latter results, intervention focusing concomitantly on gait speed and bone mineral density may be necessary for an optimal prevention of hip fracture occurrence in elderly people.

  2. Robustness Analysis of Integrated LPV-FDI Filters and LTI-FTC System for a Transport Aircraft

    NASA Technical Reports Server (NTRS)

    Khong, Thuan H.; Shin, Jong-Yeob

    2007-01-01

    This paper proposes an analysis framework for robustness analysis of a nonlinear dynamics system that can be represented by a polynomial linear parameter varying (PLPV) system with constant bounded uncertainty. The proposed analysis framework contains three key tools: 1) a function substitution method which can convert a nonlinear system in polynomial form into a PLPV system, 2) a matrix-based linear fractional transformation (LFT) modeling approach, which can convert a PLPV system into an LFT system with the delta block that includes key uncertainty and scheduling parameters, 3) micro-analysis, which is a well known robust analysis tool for linear systems. The proposed analysis framework is applied to evaluating the performance of the LPV-fault detection and isolation (FDI) filters of the closed-loop system of a transport aircraft in the presence of unmodeled actuator dynamics and sensor gain uncertainty. The robustness analysis results are compared with nonlinear time simulations.

  3. Panel regressions to estimate low-flow response to rainfall variability in ungaged basins

    USGS Publications Warehouse

    Bassiouni, Maoya; Vogel, Richard M.; Archfield, Stacey A.

    2016-01-01

    Multicollinearity and omitted-variable bias are major limitations to developing multiple linear regression models to estimate streamflow characteristics in ungaged areas and varying rainfall conditions. Panel regression is used to overcome limitations of traditional regression methods, and obtain reliable model coefficients, in particular to understand the elasticity of streamflow to rainfall. Using annual rainfall and selected basin characteristics at 86 gaged streams in the Hawaiian Islands, regional regression models for three stream classes were developed to estimate the annual low-flow duration discharges. Three panel-regression structures (random effects, fixed effects, and pooled) were compared to traditional regression methods, in which space is substituted for time. Results indicated that panel regression generally was able to reproduce the temporal behavior of streamflow and reduce the standard errors of model coefficients compared to traditional regression, even for models in which the unobserved heterogeneity between streams is significant and the variance inflation factor for rainfall is much greater than 10. This is because both spatial and temporal variability were better characterized in panel regression. In a case study, regional rainfall elasticities estimated from panel regressions were applied to ungaged basins on Maui, using available rainfall projections to estimate plausible changes in surface-water availability and usable stream habitat for native species. The presented panel-regression framework is shown to offer benefits over existing traditional hydrologic regression methods for developing robust regional relations to investigate streamflow response in a changing climate.

  4. Panel regressions to estimate low-flow response to rainfall variability in ungaged basins

    NASA Astrophysics Data System (ADS)

    Bassiouni, Maoya; Vogel, Richard M.; Archfield, Stacey A.

    2016-12-01

    Multicollinearity and omitted-variable bias are major limitations to developing multiple linear regression models to estimate streamflow characteristics in ungaged areas and varying rainfall conditions. Panel regression is used to overcome limitations of traditional regression methods, and obtain reliable model coefficients, in particular to understand the elasticity of streamflow to rainfall. Using annual rainfall and selected basin characteristics at 86 gaged streams in the Hawaiian Islands, regional regression models for three stream classes were developed to estimate the annual low-flow duration discharges. Three panel-regression structures (random effects, fixed effects, and pooled) were compared to traditional regression methods, in which space is substituted for time. Results indicated that panel regression generally was able to reproduce the temporal behavior of streamflow and reduce the standard errors of model coefficients compared to traditional regression, even for models in which the unobserved heterogeneity between streams is significant and the variance inflation factor for rainfall is much greater than 10. This is because both spatial and temporal variability were better characterized in panel regression. In a case study, regional rainfall elasticities estimated from panel regressions were applied to ungaged basins on Maui, using available rainfall projections to estimate plausible changes in surface-water availability and usable stream habitat for native species. The presented panel-regression framework is shown to offer benefits over existing traditional hydrologic regression methods for developing robust regional relations to investigate streamflow response in a changing climate.

  5. 3D statistical shape models incorporating 3D random forest regression voting for robust CT liver segmentation

    NASA Astrophysics Data System (ADS)

    Norajitra, Tobias; Meinzer, Hans-Peter; Maier-Hein, Klaus H.

    2015-03-01

    During image segmentation, 3D Statistical Shape Models (SSM) usually conduct a limited search for target landmarks within one-dimensional search profiles perpendicular to the model surface. In addition, landmark appearance is modeled only locally based on linear profiles and weak learners, altogether leading to segmentation errors from landmark ambiguities and limited search coverage. We present a new method for 3D SSM segmentation based on 3D Random Forest Regression Voting. For each surface landmark, a Random Regression Forest is trained that learns a 3D spatial displacement function between the according reference landmark and a set of surrounding sample points, based on an infinite set of non-local randomized 3D Haar-like features. Landmark search is then conducted omni-directionally within 3D search spaces, where voxelwise forest predictions on landmark position contribute to a common voting map which reflects the overall position estimate. Segmentation experiments were conducted on a set of 45 CT volumes of the human liver, of which 40 images were randomly chosen for training and 5 for testing. Without parameter optimization, using a simple candidate selection and a single resolution approach, excellent results were achieved, while faster convergence and better concavity segmentation were observed, altogether underlining the potential of our approach in terms of increased robustness from distinct landmark detection and from better search coverage.

  6. Classification of mislabelled microarrays using robust sparse logistic regression.

    PubMed

    Bootkrajang, Jakramate; Kabán, Ata

    2013-04-01

    Previous studies reported that labelling errors are not uncommon in microarray datasets. In such cases, the training set may become misleading, and the ability of classifiers to make reliable inferences from the data is compromised. Yet, few methods are currently available in the bioinformatics literature to deal with this problem. The few existing methods focus on data cleansing alone, without reference to classification, and their performance crucially depends on some tuning parameters. In this article, we develop a new method to detect mislabelled arrays simultaneously with learning a sparse logistic regression classifier. Our method may be seen as a label-noise robust extension of the well-known and successful Bayesian logistic regression classifier. To account for possible mislabelling, we formulate a label-flipping process as part of the classifier. The regularization parameter is automatically set using Bayesian regularization, which not only saves the computation time that cross-validation would take, but also eliminates any unwanted effects of label noise when setting the regularization parameter. Extensive experiments with both synthetic data and real microarray datasets demonstrate that our approach is able to counter the bad effects of labelling errors in terms of predictive performance, it is effective at identifying marker genes and simultaneously it detects mislabelled arrays to high accuracy. The code is available from http://cs.bham.ac.uk/∼jxb008. Supplementary data are available at Bioinformatics online.

  7. Diagnostics and Robust Estimation When Transforming the Regression Model and the Response.

    DTIC Science & Technology

    1986-10-01

    Bounded-influence estimators place a bound on the influence function of each observation. Bounded-influence regression estimators have been proposed by...Hampel et al. (1986, chapter 4) for further details. First we note that the influence function of 9 satisfying (14) is IF -)8""’IF.(yi,.9) B -.(Y. ,9...where 1 1.1-1 Page 21 B - N-I N T B N IZ NVT O(Y0]1=1 1 1iY~). This definition of the influence function is conditional on x .. XN but coincides with

  8. Intrinsic Raman spectroscopy for quantitative biological spectroscopy Part II

    PubMed Central

    Bechtel, Kate L.; Shih, Wei-Chuan; Feld, Michael S.

    2009-01-01

    We demonstrate the effectiveness of intrinsic Raman spectroscopy (IRS) at reducing errors caused by absorption and scattering. Physical tissue models, solutions of varying absorption and scattering coefficients with known concentrations of Raman scatterers, are studied. We show significant improvement in prediction error by implementing IRS to predict concentrations of Raman scatterers using both ordinary least squares regression (OLS) and partial least squares regression (PLS). In particular, we show that IRS provides a robust calibration model that does not increase in error when applied to samples with optical properties outside the range of calibration. PMID:18711512

  9. Schooling, Literacy and Individual Earnings. International Adult Literacy Survey.

    ERIC Educational Resources Information Center

    Osberg, Lars

    This paper uses direct measures of literacy skill levels provided by the International Adult Literacy Survey to estimate the return to literacy skills. Using a very simple human capital earnings equation and standard ordinary least squares regression, it tested estimates of the return to literacy skills for their robustness to alternative scalings…

  10. Effects of Diverse Forms of Family Structure on Female and Male Homicide

    ERIC Educational Resources Information Center

    Schwartz, Jennifer

    2006-01-01

    Utilizing 2000 data on 1,618 counties and seemingly unrelated regression, I assess whether family structure effects on homicide vary across family structure measures and gender. There is evidence of robust, multidimensional family structure effects across constructs reflecting the presence of two-parent families: mother/father absence, shortages…

  11. Accurate motion parameter estimation for colonoscopy tracking using a regression method

    NASA Astrophysics Data System (ADS)

    Liu, Jianfei; Subramanian, Kalpathi R.; Yoo, Terry S.

    2010-03-01

    Co-located optical and virtual colonoscopy images have the potential to provide important clinical information during routine colonoscopy procedures. In our earlier work, we presented an optical flow based algorithm to compute egomotion from live colonoscopy video, permitting navigation and visualization of the corresponding patient anatomy. In the original algorithm, motion parameters were estimated using the traditional Least Sum of squares(LS) procedure which can be unstable in the context of optical flow vectors with large errors. In the improved algorithm, we use the Least Median of Squares (LMS) method, a robust regression method for motion parameter estimation. Using the LMS method, we iteratively analyze and converge toward the main distribution of the flow vectors, while disregarding outliers. We show through three experiments the improvement in tracking results obtained using the LMS method, in comparison to the LS estimator. The first experiment demonstrates better spatial accuracy in positioning the virtual camera in the sigmoid colon. The second and third experiments demonstrate the robustness of this estimator, resulting in longer tracked sequences: from 300 to 1310 in the ascending colon, and 410 to 1316 in the transverse colon.

  12. Different techniques of multispectral data analysis for vegetation fraction retrieval

    NASA Astrophysics Data System (ADS)

    Kancheva, Rumiana; Georgiev, Georgi

    2012-07-01

    Vegetation monitoring is one of the most important applications of remote sensing technologies. In respect to farmlands, the assessment of crop condition constitutes the basis of growth, development, and yield processes monitoring. Plant condition is defined by a set of biometric variables, such as density, height, biomass amount, leaf area index, and etc. The canopy cover fraction is closely related to these variables, and is state-indicative of the growth process. At the same time it is a defining factor of the soil-vegetation system spectral signatures. That is why spectral mixtures decomposition is a primary objective in remotely sensed data processing and interpretation, specifically in agricultural applications. The actual usefulness of the applied methods depends on their prediction reliability. The goal of this paper is to present and compare different techniques for quantitative endmember extraction from soil-crop patterns reflectance. These techniques include: linear spectral unmixing, two-dimensional spectra analysis, spectral ratio analysis (vegetation indices), spectral derivative analysis (red edge position), colorimetric analysis (tristimulus values sum, chromaticity coordinates and dominant wavelength). The objective is to reveal their potential, accuracy and robustness for plant fraction estimation from multispectral data. Regression relationships have been established between crop canopy cover and various spectral estimators.

  13. Signaling mechanisms underlying the robustness and tunability of the plant immune network

    PubMed Central

    Kim, Yungil; Tsuda, Kenichi; Igarashi, Daisuke; Hillmer, Rachel A.; Sakakibara, Hitoshi; Myers, Chad L.; Katagiri, Fumiaki

    2014-01-01

    Summary How does robust and tunable behavior emerge in a complex biological network? We sought to understand this for the signaling network controlling pattern-triggered immunity (PTI) in Arabidopsis. A dynamic network model containing four major signaling sectors, the jasmonate, ethylene, PAD4, and salicylate sectors, which together explain up to 80% of the PTI level, was built using data for dynamic sector activities and PTI levels under exhaustive combinatorial sector perturbations. Our regularized multiple regression model had a high level of predictive power and captured known and unexpected signal flows in the network. The sole inhibitory sector in the model, the ethylene sector, was central to the network robustness via its inhibition of the jasmonate sector. The model's multiple input sites linked specific signal input patterns varying in strength and timing to different network response patterns, indicating a mechanism enabling tunability. PMID:24439900

  14. Modern CACSD using the Robust-Control Toolbox

    NASA Technical Reports Server (NTRS)

    Chiang, Richard Y.; Safonov, Michael G.

    1989-01-01

    The Robust-Control Toolbox is a collection of 40 M-files which extend the capability of PC/PRO-MATLAB to do modern multivariable robust control system design. Included are robust analysis tools like singular values and structured singular values, robust synthesis tools like continuous/discrete H(exp 2)/H infinity synthesis and Linear Quadratic Gaussian Loop Transfer Recovery methods and a variety of robust model reduction tools such as Hankel approximation, balanced truncation and balanced stochastic truncation, etc. The capabilities of the toolbox are described and illustated with examples to show how easily they can be used in practice. Examples include structured singular value analysis, H infinity loop-shaping and large space structure model reduction.

  15. Transthoracic echocardiography and mortality in sepsis: analysis of the MIMIC-III database.

    PubMed

    Feng, Mengling; McSparron, Jakob I; Kien, Dang Trung; Stone, David J; Roberts, David H; Schwartzstein, Richard M; Vieillard-Baron, Antoine; Celi, Leo Anthony

    2018-06-01

    While the use of transthoracic echocardiography (TTE) in the ICU is rapidly expanding, the contribution of TTE to altering patient outcomes among ICU patients with sepsis has not been examined. This study was designed to examine the association of TTE with 28-day mortality specifically in that population. The MIMIC-III database was employed to identify patients with sepsis who had and had not received TTE. The statistical approaches utilized included multivariate regression, propensity score analysis, doubly robust estimation, the gradient boosted model, and an inverse probability-weighting model to ensure the robustness of our findings. Significant benefit in terms of 28-day mortality was observed among the TTE patients compared to the control (no TTE) group (odds ratio = 0.78, 95% CI 0.68-0.90, p < 0.001). The amount of fluid administered (2.5 vs. 2.1 L on day 1, p < 0.001), use of dobutamine (2% vs. 1%, p = 0.007), and the maximum dose of norepinephrine (1.4 vs. 1 mg/min, p = 0.001) were significantly higher for the TTE patients. Importantly, the TTE patients were weaned off vasopressors more quickly than those in the no TTE group (vasopressor-free days on day 28 of 21 vs. 19, p = 0.004). In a general population of critically ill patients with sepsis, use of TTE is associated with an improvement in 28-day mortality.

  16. A novel strategy of integrated microarray analysis identifies CENPA, CDK1 and CDC20 as a cluster of diagnostic biomarkers in lung adenocarcinoma.

    PubMed

    Liu, Wan-Ting; Wang, Yang; Zhang, Jing; Ye, Fei; Huang, Xiao-Hui; Li, Bin; He, Qing-Yu

    2018-07-01

    Lung adenocarcinoma (LAC) is the most lethal cancer and the leading cause of cancer-related death worldwide. The identification of meaningful clusters of co-expressed genes or representative biomarkers may help improve the accuracy of LAC diagnoses. Public databases, such as the Gene Expression Omnibus (GEO), provide rich resources of valuable information for clinics, however, the integration of multiple microarray datasets from various platforms and institutes remained a challenge. To determine potential indicators of LAC, we performed genome-wide relative significance (GWRS), genome-wide global significance (GWGS) and support vector machine (SVM) analyses progressively to identify robust gene biomarker signatures from 5 different microarray datasets that included 330 samples. The top 200 genes with robust signatures were selected for integrative analysis according to "guilt-by-association" methods, including protein-protein interaction (PPI) analysis and gene co-expression analysis. Of these 200 genes, only 10 genes showed both intensive PPI network and high gene co-expression correlation (r > 0.8). IPA analysis of this regulatory networks suggested that the cell cycle process is a crucial determinant of LAC. CENPA, as well as two linked hub genes CDK1 and CDC20, are determined to be potential indicators of LAC. Immunohistochemical staining showed that CENPA, CDK1 and CDC20 were highly expressed in LAC cancer tissue with co-expression patterns. A Cox regression model indicated that LAC patients with CENPA + /CDK1 + and CENPA + /CDC20 + were high-risk groups in terms of overall survival. In conclusion, our integrated microarray analysis demonstrated that CENPA, CDK1 and CDC20 might serve as novel cluster of prognostic biomarkers for LAC, and the cooperative unit of three genes provides a technically simple approach for identification of LAC patients. Copyright © 2018 Elsevier B.V. All rights reserved.

  17. Computer models and the evidence of anthropogenic climate change: An epistemology of variety-of-evidence inferences and robustness analysis.

    PubMed

    Vezér, Martin A

    2016-04-01

    To study climate change, scientists employ computer models, which approximate target systems with various levels of skill. Given the imperfection of climate models, how do scientists use simulations to generate knowledge about the causes of observed climate change? Addressing a similar question in the context of biological modelling, Levins (1966) proposed an account grounded in robustness analysis. Recent philosophical discussions dispute the confirmatory power of robustness, raising the question of how the results of computer modelling studies contribute to the body of evidence supporting hypotheses about climate change. Expanding on Staley's (2004) distinction between evidential strength and security, and Lloyd's (2015) argument connecting variety-of-evidence inferences and robustness analysis, I address this question with respect to recent challenges to the epistemology robustness analysis. Applying this epistemology to case studies of climate change, I argue that, despite imperfections in climate models, and epistemic constraints on variety-of-evidence reasoning and robustness analysis, this framework accounts for the strength and security of evidence supporting climatological inferences, including the finding that global warming is occurring and its primary causes are anthropogenic. Copyright © 2016 Elsevier Ltd. All rights reserved.

  18. A Robust State Estimation Framework Considering Measurement Correlations and Imperfect Synchronization

    DOE PAGES

    Zhao, Junbo; Wang, Shaobu; Mili, Lamine; ...

    2018-01-08

    Here, this paper develops a robust power system state estimation framework with the consideration of measurement correlations and imperfect synchronization. In the framework, correlations of SCADA and Phasor Measurements (PMUs) are calculated separately through unscented transformation and a Vector Auto-Regression (VAR) model. In particular, PMU measurements during the waiting period of two SCADA measurement scans are buffered to develop the VAR model with robustly estimated parameters using projection statistics approach. The latter takes into account the temporal and spatial correlations of PMU measurements and provides redundant measurements to suppress bad data and mitigate imperfect synchronization. In case where the SCADAmore » and PMU measurements are not time synchronized, either the forecasted PMU measurements or the prior SCADA measurements from the last estimation run are leveraged to restore system observability. Then, a robust generalized maximum-likelihood (GM)-estimator is extended to integrate measurement error correlations and to handle the outliers in the SCADA and PMU measurements. Simulation results that stem from a comprehensive comparison with other alternatives under various conditions demonstrate the benefits of the proposed framework.« less

  19. A Robust State Estimation Framework Considering Measurement Correlations and Imperfect Synchronization

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhao, Junbo; Wang, Shaobu; Mili, Lamine

    Here, this paper develops a robust power system state estimation framework with the consideration of measurement correlations and imperfect synchronization. In the framework, correlations of SCADA and Phasor Measurements (PMUs) are calculated separately through unscented transformation and a Vector Auto-Regression (VAR) model. In particular, PMU measurements during the waiting period of two SCADA measurement scans are buffered to develop the VAR model with robustly estimated parameters using projection statistics approach. The latter takes into account the temporal and spatial correlations of PMU measurements and provides redundant measurements to suppress bad data and mitigate imperfect synchronization. In case where the SCADAmore » and PMU measurements are not time synchronized, either the forecasted PMU measurements or the prior SCADA measurements from the last estimation run are leveraged to restore system observability. Then, a robust generalized maximum-likelihood (GM)-estimator is extended to integrate measurement error correlations and to handle the outliers in the SCADA and PMU measurements. Simulation results that stem from a comprehensive comparison with other alternatives under various conditions demonstrate the benefits of the proposed framework.« less

  20. Machine learning modeling of plant phenology based on coupling satellite and gridded meteorological dataset

    NASA Astrophysics Data System (ADS)

    Czernecki, Bartosz; Nowosad, Jakub; Jabłońska, Katarzyna

    2018-04-01

    Changes in the timing of plant phenological phases are important proxies in contemporary climate research. However, most of the commonly used traditional phenological observations do not give any coherent spatial information. While consistent spatial data can be obtained from airborne sensors and preprocessed gridded meteorological data, not many studies robustly benefit from these data sources. Therefore, the main aim of this study is to create and evaluate different statistical models for reconstructing, predicting, and improving quality of phenological phases monitoring with the use of satellite and meteorological products. A quality-controlled dataset of the 13 BBCH plant phenophases in Poland was collected for the period 2007-2014. For each phenophase, statistical models were built using the most commonly applied regression-based machine learning techniques, such as multiple linear regression, lasso, principal component regression, generalized boosted models, and random forest. The quality of the models was estimated using a k-fold cross-validation. The obtained results showed varying potential for coupling meteorological derived indices with remote sensing products in terms of phenological modeling; however, application of both data sources improves models' accuracy from 0.6 to 4.6 day in terms of obtained RMSE. It is shown that a robust prediction of early phenological phases is mostly related to meteorological indices, whereas for autumn phenophases, there is a stronger information signal provided by satellite-derived vegetation metrics. Choosing a specific set of predictors and applying a robust preprocessing procedures is more important for final results than the selection of a particular statistical model. The average RMSE for the best models of all phenophases is 6.3, while the individual RMSE vary seasonally from 3.5 to 10 days. Models give reliable proxy for ground observations with RMSE below 5 days for early spring and late spring phenophases. For other phenophases, RMSE are higher and rise up to 9-10 days in the case of the earliest spring phenophases.

  1. Bayesian Inference and Application of Robust Growth Curve Models Using Student's "t" Distribution

    ERIC Educational Resources Information Center

    Zhang, Zhiyong; Lai, Keke; Lu, Zhenqiu; Tong, Xin

    2013-01-01

    Despite the widespread popularity of growth curve analysis, few studies have investigated robust growth curve models. In this article, the "t" distribution is applied to model heavy-tailed data and contaminated normal data with outliers for growth curve analysis. The derived robust growth curve models are estimated through Bayesian…

  2. An Example-Based Brain MRI Simulation Framework.

    PubMed

    He, Qing; Roy, Snehashis; Jog, Amod; Pham, Dzung L

    2015-02-21

    The simulation of magnetic resonance (MR) images plays an important role in the validation of image analysis algorithms such as image segmentation, due to lack of sufficient ground truth in real MR images. Previous work on MRI simulation has focused on explicitly modeling the MR image formation process. However, because of the overwhelming complexity of MR acquisition these simulations must involve simplifications and approximations that can result in visually unrealistic simulated images. In this work, we describe an example-based simulation framework, which uses an "atlas" consisting of an MR image and its anatomical models derived from the hard segmentation. The relationships between the MR image intensities and its anatomical models are learned using a patch-based regression that implicitly models the physics of the MR image formation. Given the anatomical models of a new brain, a new MR image can be simulated using the learned regression. This approach has been extended to also simulate intensity inhomogeneity artifacts based on the statistical model of training data. Results show that the example based MRI simulation method is capable of simulating different image contrasts and is robust to different choices of atlas. The simulated images resemble real MR images more than simulations produced by a physics-based model.

  3. Standardization and validation of the body weight adjustment regression equations in Olympic weightlifting.

    PubMed

    Kauhanen, Heikki; Komi, Paavo V; Häkkinen, Keijo

    2002-02-01

    The problems in comparing the performances of Olympic weightlifters arise from the fact that the relationship between body weight and weightlifting results is not linear. In the present study, this relationship was examined by using a nonparametric curve fitting technique of robust locally weighted regression (LOWESS) on relatively large data sets of the weightlifting results made in top international competitions. Power function formulas were derived from the fitted LOWESS values to represent the relationship between the 2 variables in a way that directly compares the snatch, clean-and-jerk, and total weightlifting results of a given athlete with those of the world-class weightlifters (golden standards). A residual analysis of several other parametric models derived from the initial results showed that they all experience inconsistencies, yielding either underestimation or overestimation of certain body weights. In addition, the existing handicapping formulas commonly used in normalizing the performances of Olympic weightlifters did not yield satisfactory results when applied to the present data. It was concluded that the devised formulas may provide objective means for the evaluation of the performances of male weightlifters, regardless of their body weights, ages, or performance levels.

  4. Quantitative association analysis between PM2.5 concentration and factors on industry, energy, agriculture, and transportation.

    PubMed

    Zhang, Nan; Huang, Hong; Duan, Xiaoli; Zhao, Jinlong; Su, Boni

    2018-06-21

    Rapid urbanization is causing serious PM 2.5 (particulate matter ≤2.5 μm) pollution in China. However, the impacts of human activities (including industrial production, energy production, agriculture, and transportation) on PM 2.5 concentrations have not been thoroughly studied. In this study, we obtained a regression formula for PM 2.5 concentration based on more than 1 million PM 2.5 recorded values and data from meteorology, industrial production, energy production, agriculture, and transportation for 31 provinces of mainland China between January 2013 and May 2017. We used stepwise regression to process 49 factors that influence PM 2.5 concentration, and obtained the 10 primary influencing factors. Data of PM 2.5 concentration and 10 factors from June to December, 2017 was used to verify the robustness of the model. Excluding meteorological factors, production of natural gas, industrial boilers, and ore production have the highest association with PM 2.5 concentration, while nuclear power generation is the most positive factor in decreasing PM 2.5 concentration. Tianjin, Beijing, and Hebei provinces are the most vulnerable to high PM 2.5 concentrations caused by industrial production, energy production, agriculture, and transportation (IEAT).

  5. A novel Gaussian process regression model for state-of-health estimation of lithium-ion battery using charging curve

    NASA Astrophysics Data System (ADS)

    Yang, Duo; Zhang, Xu; Pan, Rui; Wang, Yujie; Chen, Zonghai

    2018-04-01

    The state-of-health (SOH) estimation is always a crucial issue for lithium-ion batteries. In order to provide an accurate and reliable SOH estimation, a novel Gaussian process regression (GPR) model based on charging curve is proposed in this paper. Different from other researches where SOH is commonly estimated by cycle life, in this work four specific parameters extracted from charging curves are used as inputs of the GPR model instead of cycle numbers. These parameters can reflect the battery aging phenomenon from different angles. The grey relational analysis method is applied to analyze the relational grade between selected features and SOH. On the other hand, some adjustments are made in the proposed GPR model. Covariance function design and the similarity measurement of input variables are modified so as to improve the SOH estimate accuracy and adapt to the case of multidimensional input. Several aging data from NASA data repository are used for demonstrating the estimation effect by the proposed method. Results show that the proposed method has high SOH estimation accuracy. Besides, a battery with dynamic discharging profile is used to verify the robustness and reliability of this method.

  6. Analysis of Longitudinal Studies With Repeated Outcome Measures: Adjusting for Time-Dependent Confounding Using Conventional Methods.

    PubMed

    Keogh, Ruth H; Daniel, Rhian M; VanderWeele, Tyler J; Vansteelandt, Stijn

    2018-05-01

    Estimation of causal effects of time-varying exposures using longitudinal data is a common problem in epidemiology. When there are time-varying confounders, which may include past outcomes, affected by prior exposure, standard regression methods can lead to bias. Methods such as inverse probability weighted estimation of marginal structural models have been developed to address this problem. However, in this paper we show how standard regression methods can be used, even in the presence of time-dependent confounding, to estimate the total effect of an exposure on a subsequent outcome by controlling appropriately for prior exposures, outcomes, and time-varying covariates. We refer to the resulting estimation approach as sequential conditional mean models (SCMMs), which can be fitted using generalized estimating equations. We outline this approach and describe how including propensity score adjustment is advantageous. We compare the causal effects being estimated using SCMMs and marginal structural models, and we compare the two approaches using simulations. SCMMs enable more precise inferences, with greater robustness against model misspecification via propensity score adjustment, and easily accommodate continuous exposures and interactions. A new test for direct effects of past exposures on a subsequent outcome is described.

  7. Locally Weighted Score Estimation for Quantile Classification in Binary Regression Models

    PubMed Central

    Rice, John D.; Taylor, Jeremy M. G.

    2016-01-01

    One common use of binary response regression methods is classification based on an arbitrary probability threshold dictated by the particular application. Since this is given to us a priori, it is sensible to incorporate the threshold into our estimation procedure. Specifically, for the linear logistic model, we solve a set of locally weighted score equations, using a kernel-like weight function centered at the threshold. The bandwidth for the weight function is selected by cross validation of a novel hybrid loss function that combines classification error and a continuous measure of divergence between observed and fitted values; other possible cross-validation functions based on more common binary classification metrics are also examined. This work has much in common with robust estimation, but diers from previous approaches in this area in its focus on prediction, specifically classification into high- and low-risk groups. Simulation results are given showing the reduction in error rates that can be obtained with this method when compared with maximum likelihood estimation, especially under certain forms of model misspecification. Analysis of a melanoma data set is presented to illustrate the use of the method in practice. PMID:28018492

  8. Model-Free Feature Screening for Ultrahigh Dimensional Discriminant Analysis

    PubMed Central

    Cui, Hengjian; Li, Runze

    2014-01-01

    This work is concerned with marginal sure independence feature screening for ultra-high dimensional discriminant analysis. The response variable is categorical in discriminant analysis. This enables us to use conditional distribution function to construct a new index for feature screening. In this paper, we propose a marginal feature screening procedure based on empirical conditional distribution function. We establish the sure screening and ranking consistency properties for the proposed procedure without assuming any moment condition on the predictors. The proposed procedure enjoys several appealing merits. First, it is model-free in that its implementation does not require specification of a regression model. Second, it is robust to heavy-tailed distributions of predictors and the presence of potential outliers. Third, it allows the categorical response having a diverging number of classes in the order of O(nκ) with some κ ≥ 0. We assess the finite sample property of the proposed procedure by Monte Carlo simulation studies and numerical comparison. We further illustrate the proposed methodology by empirical analyses of two real-life data sets. PMID:26392643

  9. An effective approach to quantitative analysis of ternary amino acids in foxtail millet substrate based on terahertz spectroscopy.

    PubMed

    Lu, Shao Hua; Li, Bao Qiong; Zhai, Hong Lin; Zhang, Xin; Zhang, Zhuo Yong

    2018-04-25

    Terahertz time-domain spectroscopy has been applied to many fields, however, it still encounters drawbacks in multicomponent mixtures analysis due to serious spectral overlapping. Here, an effective approach to quantitative analysis was proposed, and applied on the determination of the ternary amino acids in foxtail millet substrate. Utilizing three parameters derived from the THz-TDS, the images were constructed and the Tchebichef image moments were used to extract the information of target components. Then the quantitative models were obtained by stepwise regression. The correlation coefficients of leave-one-out cross-validation (R loo-cv 2 ) were more than 0.9595. As for external test set, the predictive correlation coefficients (R p 2 ) were more than 0.8026 and the root mean square error of prediction (RMSE p ) were less than 1.2601. Compared with the traditional methods (PLS and N-PLS methods), our approach is more accurate, robust and reliable, and can be a potential excellent approach to quantify multicomponent with THz-TDS spectroscopy. Copyright © 2017 Elsevier Ltd. All rights reserved.

  10. Digital Games, Design, and Learning: A Systematic Review and Meta-Analysis.

    PubMed

    Clark, Douglas B; Tanner-Smith, Emily E; Killingsworth, Stephen S

    2016-03-01

    In this meta-analysis, we systematically reviewed research on digital games and learning for K-16 students. We synthesized comparisons of game versus nongame conditions (i.e., media comparisons) and comparisons of augmented games versus standard game designs (i.e., value-added comparisons). We used random-effects meta-regression models with robust variance estimates to summarize overall effects and explore potential moderator effects. Results from media comparisons indicated that digital games significantly enhanced student learning relative to nongame conditions ([Formula: see text] = 0.33, 95% confidence interval [0.19, 0.48], k = 57, n = 209). Results from value-added comparisons indicated significant learning benefits associated with augmented game designs ([Formula: see text] = 0.34, 95% confidence interval [0.17, 0.51], k = 20, n = 40). Moderator analyses demonstrated that effects varied across various game mechanics characteristics, visual and narrative characteristics, and research quality characteristics. Taken together, the results highlight the affordances of games for learning as well as the key role of design beyond medium.

  11. The analysis of initial Juno magnetometer data using a sparse magnetic field representation

    NASA Astrophysics Data System (ADS)

    Moore, Kimberly M.; Bloxham, Jeremy; Connerney, John E. P.; Jørgensen, John L.; Merayo, José M. G.

    2017-05-01

    The Juno spacecraft, now in polar orbit about Jupiter, passes much closer to Jupiter's surface than any previous spacecraft, presenting a unique opportunity to study the largest and most accessible planetary dynamo in the solar system. Here we present an analysis of magnetometer observations from Juno's first perijove pass (PJ1; to within 1.06 RJ of Jupiter's center). We calculate the residuals between the vector magnetic field observations and that calculated using the VIP4 spherical harmonic model and fit these residuals using an elastic net regression. The resulting model demonstrates how effective Juno's near-surface observations are in improving the spatial resolution of the magnetic field within the immediate vicinity of the orbit track. We identify two features resulting from our analyses: the presence of strong, oppositely signed pairs of flux patches near the equator and weak, possibly reversed-polarity patches of magnetic field over the polar regions. Additional orbits will be required to assess how robust these intriguing features are.

  12. An Annotation Agnostic Algorithm for Detecting Nascent RNA Transcripts in GRO-Seq.

    PubMed

    Azofeifa, Joseph G; Allen, Mary A; Lladser, Manuel E; Dowell, Robin D

    2017-01-01

    We present a fast and simple algorithm to detect nascent RNA transcription in global nuclear run-on sequencing (GRO-seq). GRO-seq is a relatively new protocol that captures nascent transcripts from actively engaged polymerase, providing a direct read-out on bona fide transcription. Most traditional assays, such as RNA-seq, measure steady state RNA levels which are affected by transcription, post-transcriptional processing, and RNA stability. GRO-seq data, however, presents unique analysis challenges that are only beginning to be addressed. Here, we describe a new algorithm, Fast Read Stitcher (FStitch), that takes advantage of two popular machine-learning techniques, hidden Markov models and logistic regression, to classify which regions of the genome are transcribed. Given a small user-defined training set, our algorithm is accurate, robust to varying read depth, annotation agnostic, and fast. Analysis of GRO-seq data without a priori need for annotation uncovers surprising new insights into several aspects of the transcription process.

  13. Robust scaling laws for energy confinement time, including radiated fraction, in Tokamaks

    NASA Astrophysics Data System (ADS)

    Murari, A.; Peluso, E.; Gaudio, P.; Gelfusa, M.

    2017-12-01

    In recent years, the limitations of scalings in power-law form that are obtained from traditional log regression have become increasingly evident in many fields of research. Given the wide gap in operational space between present-day and next-generation devices, robustness of the obtained models in guaranteeing reasonable extrapolability is a major issue. In this paper, a new technique, called symbolic regression, is reviewed, refined, and applied to the ITPA database for extracting scaling laws of the energy-confinement time at different radiated fraction levels. The main advantage of this new methodology is its ability to determine the most appropriate mathematical form of the scaling laws to model the available databases without the restriction of their having to be power laws. In a completely new development, this technique is combined with the concept of geodesic distance on Gaussian manifolds so as to take into account the error bars in the measurements and provide more reliable models. Robust scaling laws, including radiated fractions as regressor, have been found; they are not in power-law form, and are significantly better than the traditional scalings. These scaling laws, including radiated fractions, extrapolate quite differently to ITER, and therefore they require serious consideration. On the other hand, given the limitations of the existing databases, dedicated experimental investigations will have to be carried out to fully understand the impact of radiated fractions on the confinement in metallic machines and in the next generation of devices.

  14. The Effectiveness of Psychosocial Interventions Delivered by Teachers in Schools: A Systematic Review and Meta-Analysis.

    PubMed

    Franklin, Cynthia; Kim, Johnny S; Beretvas, Tasha S; Zhang, Anao; Guz, Samantha; Park, Sunyoung; Montgomery, Katherine; Chung, Saras; Maynard, Brandy R

    2017-09-01

    The growing mental health needs of students within schools have resulted in teachers increasing their involvement in the delivery of school-based, psychosocial interventions. Current research reports mixed findings concerning the effectiveness of psychosocial interventions delivered by teachers for mental health outcomes. This article presents a systematic review and meta-analysis that examined the effectiveness of school-based psychosocial interventions delivered by teachers on internalizing and externalizing outcomes and the moderating factors that influence treatment effects on these outcomes. Nine electronic databases, major journals, and gray literature (e.g., websites, conference abstract) were searched and field experts were contacted to locate additional studies. Twenty-four studies that met the study inclusion criteria were coded into internalizing or externalizing outcomes and further analyzed using robust variance estimation in meta-regression. Both publication and risk of bias of studies were further assessed. The results showed statistically significant reductions in students' internalizing outcomes (d = .133, 95% CI [.002, .263]) and no statistical significant effect for externalizing outcomes (d = .15, 95% CI [-.037, .066]). Moderator analysis with meta-regression revealed that gender (%male, b = -.017, p < .05), race (% Caucasian, b = .002, p < .05), and the tier of intervention (b = .299, p = .06) affected intervention effectiveness. This study builds on existing literature that shows that teacher-delivered Tier 1 interventions are effective interventions but also adds to this literature by showing that interventions are more effective with internalizing outcomes than on the externalizing outcomes. Moderator analysis also revealed treatments were more effective with female students for internalizing outcomes and more effective with Caucasian students for externalizing outcomes.

  15. A robust real-time surface reconstruction method on point clouds captured from a 3D surface photogrammetry system

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Liu, Wenyang; Cheung, Yam; Sawant, Amit

    2016-05-15

    Purpose: To develop a robust and real-time surface reconstruction method on point clouds captured from a 3D surface photogrammetry system. Methods: The authors have developed a robust and fast surface reconstruction method on point clouds acquired by the photogrammetry system, without explicitly solving the partial differential equation required by a typical variational approach. Taking advantage of the overcomplete nature of the acquired point clouds, their method solves and propagates a sparse linear relationship from the point cloud manifold to the surface manifold, assuming both manifolds share similar local geometry. With relatively consistent point cloud acquisitions, the authors propose a sparsemore » regression (SR) model to directly approximate the target point cloud as a sparse linear combination from the training set, assuming that the point correspondences built by the iterative closest point (ICP) is reasonably accurate and have residual errors following a Gaussian distribution. To accommodate changing noise levels and/or presence of inconsistent occlusions during the acquisition, the authors further propose a modified sparse regression (MSR) model to model the potentially large and sparse error built by ICP with a Laplacian prior. The authors evaluated the proposed method on both clinical point clouds acquired under consistent acquisition conditions and on point clouds with inconsistent occlusions. The authors quantitatively evaluated the reconstruction performance with respect to root-mean-squared-error, by comparing its reconstruction results against that from the variational method. Results: On clinical point clouds, both the SR and MSR models have achieved sub-millimeter reconstruction accuracy and reduced the reconstruction time by two orders of magnitude to a subsecond reconstruction time. On point clouds with inconsistent occlusions, the MSR model has demonstrated its advantage in achieving consistent and robust performance despite the introduced occlusions. Conclusions: The authors have developed a fast and robust surface reconstruction method on point clouds captured from a 3D surface photogrammetry system, with demonstrated sub-millimeter reconstruction accuracy and subsecond reconstruction time. It is suitable for real-time motion tracking in radiotherapy, with clear surface structures for better quantifications.« less

  16. A robust real-time surface reconstruction method on point clouds captured from a 3D surface photogrammetry system.

    PubMed

    Liu, Wenyang; Cheung, Yam; Sawant, Amit; Ruan, Dan

    2016-05-01

    To develop a robust and real-time surface reconstruction method on point clouds captured from a 3D surface photogrammetry system. The authors have developed a robust and fast surface reconstruction method on point clouds acquired by the photogrammetry system, without explicitly solving the partial differential equation required by a typical variational approach. Taking advantage of the overcomplete nature of the acquired point clouds, their method solves and propagates a sparse linear relationship from the point cloud manifold to the surface manifold, assuming both manifolds share similar local geometry. With relatively consistent point cloud acquisitions, the authors propose a sparse regression (SR) model to directly approximate the target point cloud as a sparse linear combination from the training set, assuming that the point correspondences built by the iterative closest point (ICP) is reasonably accurate and have residual errors following a Gaussian distribution. To accommodate changing noise levels and/or presence of inconsistent occlusions during the acquisition, the authors further propose a modified sparse regression (MSR) model to model the potentially large and sparse error built by ICP with a Laplacian prior. The authors evaluated the proposed method on both clinical point clouds acquired under consistent acquisition conditions and on point clouds with inconsistent occlusions. The authors quantitatively evaluated the reconstruction performance with respect to root-mean-squared-error, by comparing its reconstruction results against that from the variational method. On clinical point clouds, both the SR and MSR models have achieved sub-millimeter reconstruction accuracy and reduced the reconstruction time by two orders of magnitude to a subsecond reconstruction time. On point clouds with inconsistent occlusions, the MSR model has demonstrated its advantage in achieving consistent and robust performance despite the introduced occlusions. The authors have developed a fast and robust surface reconstruction method on point clouds captured from a 3D surface photogrammetry system, with demonstrated sub-millimeter reconstruction accuracy and subsecond reconstruction time. It is suitable for real-time motion tracking in radiotherapy, with clear surface structures for better quantifications.

  17. A robust real-time surface reconstruction method on point clouds captured from a 3D surface photogrammetry system

    PubMed Central

    Liu, Wenyang; Cheung, Yam; Sawant, Amit; Ruan, Dan

    2016-01-01

    Purpose: To develop a robust and real-time surface reconstruction method on point clouds captured from a 3D surface photogrammetry system. Methods: The authors have developed a robust and fast surface reconstruction method on point clouds acquired by the photogrammetry system, without explicitly solving the partial differential equation required by a typical variational approach. Taking advantage of the overcomplete nature of the acquired point clouds, their method solves and propagates a sparse linear relationship from the point cloud manifold to the surface manifold, assuming both manifolds share similar local geometry. With relatively consistent point cloud acquisitions, the authors propose a sparse regression (SR) model to directly approximate the target point cloud as a sparse linear combination from the training set, assuming that the point correspondences built by the iterative closest point (ICP) is reasonably accurate and have residual errors following a Gaussian distribution. To accommodate changing noise levels and/or presence of inconsistent occlusions during the acquisition, the authors further propose a modified sparse regression (MSR) model to model the potentially large and sparse error built by ICP with a Laplacian prior. The authors evaluated the proposed method on both clinical point clouds acquired under consistent acquisition conditions and on point clouds with inconsistent occlusions. The authors quantitatively evaluated the reconstruction performance with respect to root-mean-squared-error, by comparing its reconstruction results against that from the variational method. Results: On clinical point clouds, both the SR and MSR models have achieved sub-millimeter reconstruction accuracy and reduced the reconstruction time by two orders of magnitude to a subsecond reconstruction time. On point clouds with inconsistent occlusions, the MSR model has demonstrated its advantage in achieving consistent and robust performance despite the introduced occlusions. Conclusions: The authors have developed a fast and robust surface reconstruction method on point clouds captured from a 3D surface photogrammetry system, with demonstrated sub-millimeter reconstruction accuracy and subsecond reconstruction time. It is suitable for real-time motion tracking in radiotherapy, with clear surface structures for better quantifications. PMID:27147347

  18. Improving Your Data Transformations: Applying the Box-Cox Transformation

    ERIC Educational Resources Information Center

    Osborne, Jason W.

    2010-01-01

    Many of us in the social sciences deal with data that do not conform to assumptions of normality and/or homoscedasticity/homogeneity of variance. Some research has shown that parametric tests (e.g., multiple regression, ANOVA) can be robust to modest violations of these assumptions. Yet the reality is that almost all analyses (even nonparametric…

  19. Poster — Thur Eve — 44: Linearization of Compartmental Models for More Robust Estimates of Regional Hemodynamic, Metabolic and Functional Parameters using DCE-CT/PET Imaging

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Blais, AR; Dekaban, M; Lee, T-Y

    2014-08-15

    Quantitative analysis of dynamic positron emission tomography (PET) data usually involves minimizing a cost function with nonlinear regression, wherein the choice of starting parameter values and the presence of local minima affect the bias and variability of the estimated kinetic parameters. These nonlinear methods can also require lengthy computation time, making them unsuitable for use in clinical settings. Kinetic modeling of PET aims to estimate the rate parameter k{sub 3}, which is the binding affinity of the tracer to a biological process of interest and is highly susceptible to noise inherent in PET image acquisition. We have developed linearized kineticmore » models for kinetic analysis of dynamic contrast enhanced computed tomography (DCE-CT)/PET imaging, including a 2-compartment model for DCE-CT and a 3-compartment model for PET. Use of kinetic parameters estimated from DCE-CT can stabilize the kinetic analysis of dynamic PET data, allowing for more robust estimation of k{sub 3}. Furthermore, these linearized models are solved with a non-negative least squares algorithm and together they provide other advantages including: 1) only one possible solution and they do not require a choice of starting parameter values, 2) parameter estimates are comparable in accuracy to those from nonlinear models, 3) significantly reduced computational time. Our simulated data show that when blood volume and permeability are estimated with DCE-CT, the bias of k{sub 3} estimation with our linearized model is 1.97 ± 38.5% for 1,000 runs with a signal-to-noise ratio of 10. In summary, we have developed a computationally efficient technique for accurate estimation of k{sub 3} from noisy dynamic PET data.« less

  20. Nurse Workforce Characteristics and Infection Risk in VA Community Living Centers: A Longitudinal Analysis

    PubMed Central

    Uchida-Nakakoji, Mayuko; Stone, Patricia W.; Schmitt, Susan K.; Phibbs, Ciaran S.

    2015-01-01

    Objective To examine effects of workforce characteristics on resident infections in Veterans Affairs (VA) Community Living Centers (CLCs). Data Sources A six-year panel of monthly, unit-specific data included workforce characteristics (from the VA Decision Support System and Payroll data) and characteristics of residents and outcome measures (from the Minimum Data Set). Study Design A resident infection composite was the dependent variable. Workforce characteristics of registered nurses (RN), licensed practical nurses (LPN), nurse aides (NA), and contract nurses included: staffing levels, skill mix and tenure. Descriptive statistics and unit-level fixed effects regressions were conducted. Robustness checks varying workforce and outcome parameters were examined. Principal Findings Average nursing hours per resident day was 4.59 hours (sd = 1.21). RN tenure averaged 4.7 years (sd = 1.64) and 4.2 years for both LPN (sd= 1.84) and NA (sd= 1.72). In multivariate analyses RN and LPN tenure were associated with decreased infections by 3.8% (IRR= 0.962 p<0.01) and 2% (IRR=0.98 p<0.01) respectively. Robustness checks consistently found RN and LPN tenure to be associated with decreased infections. Conclusions Increasing RN and LPN tenure are likely to reduce CLC resident infections. Administrators and policymakers need to focus on recruiting and retaining a skilled nursing workforce. PMID:25634087

  1. Structural exploration for the refinement of anticancer matrix metalloproteinase-2 inhibitor designing approaches through robust validated multi-QSARs

    NASA Astrophysics Data System (ADS)

    Adhikari, Nilanjan; Amin, Sk. Abdul; Saha, Achintya; Jha, Tarun

    2018-03-01

    Matrix metalloproteinase-2 (MMP-2) is a promising pharmacological target for designing potential anticancer drugs. MMP-2 plays critical functions in apoptosis by cleaving the DNA repair enzyme namely poly (ADP-ribose) polymerase (PARP). Moreover, MMP-2 expression triggers the vascular endothelial growth factor (VEGF) having a positive influence on tumor size, invasion, and angiogenesis. Therefore, it is an urgent need to develop potential MMP-2 inhibitors without any toxicity but better pharmacokinetic property. In this article, robust validated multi-quantitative structure-activity relationship (QSAR) modeling approaches were attempted on a dataset of 222 MMP-2 inhibitors to explore the important structural and pharmacophoric requirements for higher MMP-2 inhibition. Different validated regression and classification-based QSARs, pharmacophore mapping and 3D-QSAR techniques were performed. These results were challenged and subjected to further validation to explain 24 in house MMP-2 inhibitors to judge the reliability of these models further. All these models were individually validated internally as well as externally and were supported and validated by each other. These results were further justified by molecular docking analysis. Modeling techniques adopted here not only helps to explore the necessary structural and pharmacophoric requirements but also for the overall validation and refinement techniques for designing potential MMP-2 inhibitors.

  2. Improved quality-by-design compliant methodology for method development in reversed-phase liquid chromatography.

    PubMed

    Debrus, Benjamin; Guillarme, Davy; Rudaz, Serge

    2013-10-01

    A complete strategy dedicated to quality-by-design (QbD) compliant method development using design of experiments (DOE), multiple linear regressions responses modelling and Monte Carlo simulations for error propagation was evaluated for liquid chromatography (LC). The proposed approach includes four main steps: (i) the initial screening of column chemistry, mobile phase pH and organic modifier, (ii) the selectivity optimization through changes in gradient time and mobile phase temperature, (iii) the adaptation of column geometry to reach sufficient resolution, and (iv) the robust resolution optimization and identification of the method design space. This procedure was employed to obtain a complex chromatographic separation of 15 antipsychotic basic drugs, widely prescribed. To fully automate and expedite the QbD method development procedure, short columns packed with sub-2 μm particles were employed, together with a UHPLC system possessing columns and solvents selection valves. Through this example, the possibilities of the proposed QbD method development workflow were exposed and the different steps of the automated strategy were critically discussed. A baseline separation of the mixture of antipsychotic drugs was achieved with an analysis time of less than 15 min and the robustness of the method was demonstrated simultaneously with the method development phase. Copyright © 2013 Elsevier B.V. All rights reserved.

  3. Multi-scale computational study of the mechanical regulation of cell mitotic rounding in epithelia

    PubMed Central

    Xu, Zhiliang; Zartman, Jeremiah J.; Alber, Mark

    2017-01-01

    Mitotic rounding during cell division is critical for preventing daughter cells from inheriting an abnormal number of chromosomes, a condition that occurs frequently in cancer cells. Cells must significantly expand their apical area and transition from a polygonal to circular apical shape to achieve robust mitotic rounding in epithelial tissues, which is where most cancers initiate. However, how cells mechanically regulate robust mitotic rounding within packed tissues is unknown. Here, we analyze mitotic rounding using a newly developed multi-scale subcellular element computational model that is calibrated using experimental data. Novel biologically relevant features of the model include separate representations of the sub-cellular components including the apical membrane and cytoplasm of the cell at the tissue scale level as well as detailed description of cell properties during mitotic rounding. Regression analysis of predictive model simulation results reveals the relative contributions of osmotic pressure, cell-cell adhesion and cortical stiffness to mitotic rounding. Mitotic area expansion is largely driven by regulation of cytoplasmic pressure. Surprisingly, mitotic shape roundness within physiological ranges is most sensitive to variation in cell-cell adhesivity and stiffness. An understanding of how perturbed mechanical properties impact mitotic rounding has important potential implications on, amongst others, how tumors progressively become more genetically unstable due to increased chromosomal aneuploidy and more aggressive. PMID:28531187

  4. Canine scent detection in the diagnosis of lung cancer: revisiting a puzzling phenomenon.

    PubMed

    Ehmann, R; Boedeker, E; Friedrich, U; Sagert, J; Dippon, J; Friedel, G; Walles, T

    2012-03-01

    Patient prognosis in lung cancer largely depends on early diagnosis. The exhaled breath of patients may represent the ideal specimen for future lung cancer screening. However, the clinical applicability of current diagnostic sensor technologies based on signal pattern analysis remains incalculable due to their inability to identify a clear target. To test the robustness of the presence of a so far unknown volatile organic compound in the breath of patients with lung cancer, sniffer dogs were applied. Exhalation samples of 220 volunteers (healthy individuals, confirmed lung cancer or chronic obstructive pulmonary disease (COPD)) were presented to sniffer dogs following a rigid scientific protocol. Patient history, drug administration and clinicopathological data were analysed to identify potential bias or confounders. Lung cancer was identified with an overall sensitivity of 71% and a specificity of 93%. Lung cancer detection was independent from COPD and the presence of tobacco smoke and food odours. Logistic regression identified two drugs as potential confounders. It must be assumed that a robust and specific volatile organic compound (or pattern) is present in the breath of patients with lung cancer. Additional research efforts are required to overcome the current technical limitations of electronic sensor technologies to engineer a clinically applicable screening tool.

  5. Nurse workforce characteristics and infection risk in VA Community Living Centers: a longitudinal analysis.

    PubMed

    Uchida-Nakakoji, Mayuko; Stone, Patricia W; Schmitt, Susan K; Phibbs, Ciaran S

    2015-03-01

    To examine effects of workforce characteristics on resident infections in Veterans Affairs (VA) Community Living Centers (CLCs). A 6-year panel of monthly, unit-specific data included workforce characteristics (from the VA Decision Support System and Payroll data) and characteristics of residents and outcome measures (from the Minimum Data Set). A resident infection composite was the dependent variable. Workforce characteristics of registered nurses (RN), licensed practical nurses (LPN), nurse aides (NA), and contract nurses included: staffing levels, skill mix, and tenure. Descriptive statistics and unit-level fixed effects regressions were conducted. Robustness checks varying workforce and outcome parameters were examined. Average nursing hours per resident day was 4.59 hours (SD=1.21). RN tenure averaged 4.7 years (SD=1.64) and 4.2 years for both LPN (SD=1.84) and NA (SD=1.72). In multivariate analyses RN and LPN tenure were associated with decreased infections by 3.8% (incident rate ratio [IRR]=0.962, P<0.01) and 2% (IRR=0.98, P<0.01) respectively. Robustness checks consistently found RN and LPN tenure to be associated with decreased infections. Increasing RN and LPN tenure are likely to reduce CLC resident infections. Administrators and policymakers need to focus on recruiting and retaining a skilled nursing workforce.

  6. TU-H-CAMPUS-JeP1-04: Deformable Image Registration Performances in Pelvis Patients: Impact of CBCT Image Quality

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Fusella, M; Loi, G; Fiandra, C

    Purpose: To investigate the accuracy and robustness, against image noise and artifacts (typical of CBCT images), of a commercial algorithm for deformable image registration (DIR), to propagate regions of interest (ROIs) in computational phantoms based on real prostate patient images. Methods: The Anaconda DIR algorithm, implemented in RayStation was tested. Two specific Deformation Vector Fields (DVFs) were applied to the reference data set (CTref) using the ImSimQA software, obtaining two deformed CTs. For each dataset twenty-four different level of noise and/or capping artifacts were applied to simulate CBCT images. DIR was performed between CTref and each deformed CTs and CBCTs.more » In order to investigate the relationship between image quality parameters and the DIR results (expressed by a logit transform of the Dice Index) a bilinear regression was defined. Results: More than 550 DIR-mapped ROIs were analyzed. The Statistical analysis states that deformation strenght and artifacts were significant prognostic factors of DIR performances, while noise appeared to have a minor role in DIR process as implemented in RayStation as expected by the image similarity metric built in the registration algorithm. Capping artifacts reveals a determinant role for the accuracy of DIR results. Two optimal values for capping artifacts were found to obtain acceptable DIR results (DICE> 075/ 0.85). Various clinical CBCT acquisition protocol were reported to evaluate the significance of the study. Conclusion: This work illustrates the impact of image quality on DIR performance. Clinical issues like Adaptive Radiation Therapy (ART) and Dose Accumulation need accurate and robust DIR software. The RayStation DIR algorithm resulted robust against noise, but sensitive to image artifacts. This result highlights the need of robustness quality assurance against image noise and artifacts in the commissioning of a DIR commercial system and underlines the importance to adopt optimized protocols for CBCT image acquisitions in ART clinical implementation.« less

  7. Testing for gene-environment interaction under exposure misspecification.

    PubMed

    Sun, Ryan; Carroll, Raymond J; Christiani, David C; Lin, Xihong

    2017-11-09

    Complex interplay between genetic and environmental factors characterizes the etiology of many diseases. Modeling gene-environment (GxE) interactions is often challenged by the unknown functional form of the environment term in the true data-generating mechanism. We study the impact of misspecification of the environmental exposure effect on inference for the GxE interaction term in linear and logistic regression models. We first examine the asymptotic bias of the GxE interaction regression coefficient, allowing for confounders as well as arbitrary misspecification of the exposure and confounder effects. For linear regression, we show that under gene-environment independence and some confounder-dependent conditions, when the environment effect is misspecified, the regression coefficient of the GxE interaction can be unbiased. However, inference on the GxE interaction is still often incorrect. In logistic regression, we show that the regression coefficient is generally biased if the genetic factor is associated with the outcome directly or indirectly. Further, we show that the standard robust sandwich variance estimator for the GxE interaction does not perform well in practical GxE studies, and we provide an alternative testing procedure that has better finite sample properties. © 2017, The International Biometric Society.

  8. GLOBALLY ADAPTIVE QUANTILE REGRESSION WITH ULTRA-HIGH DIMENSIONAL DATA

    PubMed Central

    Zheng, Qi; Peng, Limin; He, Xuming

    2015-01-01

    Quantile regression has become a valuable tool to analyze heterogeneous covaraite-response associations that are often encountered in practice. The development of quantile regression methodology for high dimensional covariates primarily focuses on examination of model sparsity at a single or multiple quantile levels, which are typically prespecified ad hoc by the users. The resulting models may be sensitive to the specific choices of the quantile levels, leading to difficulties in interpretation and erosion of confidence in the results. In this article, we propose a new penalization framework for quantile regression in the high dimensional setting. We employ adaptive L1 penalties, and more importantly, propose a uniform selector of the tuning parameter for a set of quantile levels to avoid some of the potential problems with model selection at individual quantile levels. Our proposed approach achieves consistent shrinkage of regression quantile estimates across a continuous range of quantiles levels, enhancing the flexibility and robustness of the existing penalized quantile regression methods. Our theoretical results include the oracle rate of uniform convergence and weak convergence of the parameter estimators. We also use numerical studies to confirm our theoretical findings and illustrate the practical utility of our proposal. PMID:26604424

  9. Association analysis of multiple traits by an approach of combining P values.

    PubMed

    Chen, Lili; Wang, Yong; Zhou, Yajing

    2018-03-01

    Increasing evidence shows that one variant can affect multiple traits, which is a widespread phenomenon in complex diseases. Joint analysis of multiple traits can increase statistical power of association analysis and uncover the underlying genetic mechanism. Although there are many statistical methods to analyse multiple traits, most of these methods are usually suitable for detecting common variants associated with multiple traits. However, because of low minor allele frequency of rare variant, these methods are not optimal for rare variant association analysis. In this paper, we extend an adaptive combination of P values method (termed ADA) for single trait to test association between multiple traits and rare variants in the given region. For a given region, we use reverse regression model to test each rare variant associated with multiple traits and obtain the P value of single-variant test. Further, we take the weighted combination of these P values as the test statistic. Extensive simulation studies show that our approach is more powerful than several other comparison methods in most cases and is robust to the inclusion of a high proportion of neutral variants and the different directions of effects of causal variants.

  10. Statistical primer: how to deal with missing data in scientific research?

    PubMed

    Papageorgiou, Grigorios; Grant, Stuart W; Takkenberg, Johanna J M; Mokhles, Mostafa M

    2018-05-10

    Missing data are a common challenge encountered in research which can compromise the results of statistical inference when not handled appropriately. This paper aims to introduce basic concepts of missing data to a non-statistical audience, list and compare some of the most popular approaches for handling missing data in practice and provide guidelines and recommendations for dealing with and reporting missing data in scientific research. Complete case analysis and single imputation are simple approaches for handling missing data and are popular in practice, however, in most cases they are not guaranteed to provide valid inferences. Multiple imputation is a robust and general alternative which is appropriate for data missing at random, surpassing the disadvantages of the simpler approaches, but should always be conducted with care. The aforementioned approaches are illustrated and compared in an example application using Cox regression.

  11. seawaveQ: an R package providing a model and utilities for analyzing trends in chemical concentrations in streams with a seasonal wave (seawave) and adjustment for streamflow (Q) and other ancillary variables

    USGS Publications Warehouse

    Ryberg, Karen R.; Vecchia, Aldo V.

    2013-01-01

    The seawaveQ R package fits a parametric regression model (seawaveQ) to pesticide concentration data from streamwater samples to assess variability and trends. The model incorporates the strong seasonality and high degree of censoring common in pesticide data and users can incorporate numerous ancillary variables, such as streamflow anomalies. The model is fitted to pesticide data using maximum likelihood methods for censored data and is robust in terms of pesticide, stream location, and degree of censoring of the concentration data. This R package standardizes this methodology for trend analysis, documents the code, and provides help and tutorial information, as well as providing additional utility functions for plotting pesticide and other chemical concentration data.

  12. Correlates of Illicit Drug Use Among Indigenous Peoples in Canada: A Test of Social Support Theory.

    PubMed

    Cao, Liqun; Burton, Velmer S; Liu, Liu

    2018-02-01

    Relying on a national stratified random sample of Indigenous peoples aged 19 years old and above in Canada, this study investigates the correlates of illicit drug use among Indigenous peoples, paying special attention to the association between social support measures and illegal drug use. Results from multivariate logistical regression show that measures of social support, such as residential mobility, strength of ties within communities, and lack of timely counseling, are statistically significant correlates of illicit drug use. Those identifying as Christian are significantly less likely to use illegal drugs. This is the first nationwide analysis of the illicit drug usage of Indigenous peoples in Canada. The results are robust because we have controlled for a range of comorbidity variables as well as a series of sociodemographic variables. Policy implications from these findings are discussed.

  13. Robust Variance Estimation with Dependent Effect Sizes: Practical Considerations Including a Software Tutorial in Stata and SPSS

    ERIC Educational Resources Information Center

    Tanner-Smith, Emily E.; Tipton, Elizabeth

    2014-01-01

    Methodologists have recently proposed robust variance estimation as one way to handle dependent effect sizes in meta-analysis. Software macros for robust variance estimation in meta-analysis are currently available for Stata (StataCorp LP, College Station, TX, USA) and SPSS (IBM, Armonk, NY, USA), yet there is little guidance for authors regarding…

  14. Estimating integrated variance in the presence of microstructure noise using linear regression

    NASA Astrophysics Data System (ADS)

    Holý, Vladimír

    2017-07-01

    Using financial high-frequency data for estimation of integrated variance of asset prices is beneficial but with increasing number of observations so-called microstructure noise occurs. This noise can significantly bias the realized variance estimator. We propose a method for estimation of the integrated variance robust to microstructure noise as well as for testing the presence of the noise. Our method utilizes linear regression in which realized variances estimated from different data subsamples act as dependent variable while the number of observations act as explanatory variable. We compare proposed estimator with other methods on simulated data for several microstructure noise structures.

  15. DOA Finding with Support Vector Regression Based Forward-Backward Linear Prediction.

    PubMed

    Pan, Jingjing; Wang, Yide; Le Bastard, Cédric; Wang, Tianzhen

    2017-05-27

    Direction-of-arrival (DOA) estimation has drawn considerable attention in array signal processing, particularly with coherent signals and a limited number of snapshots. Forward-backward linear prediction (FBLP) is able to directly deal with coherent signals. Support vector regression (SVR) is robust with small samples. This paper proposes the combination of the advantages of FBLP and SVR in the estimation of DOAs of coherent incoming signals with low snapshots. The performance of the proposed method is validated with numerical simulations in coherent scenarios, in terms of different angle separations, numbers of snapshots, and signal-to-noise ratios (SNRs). Simulation results show the effectiveness of the proposed method.

  16. Quantitative Analysis of the Cervical Texture by Ultrasound and Correlation with Gestational Age.

    PubMed

    Baños, Núria; Perez-Moreno, Alvaro; Migliorelli, Federico; Triginer, Laura; Cobo, Teresa; Bonet-Carne, Elisenda; Gratacos, Eduard; Palacio, Montse

    2017-01-01

    Quantitative texture analysis has been proposed to extract robust features from the ultrasound image to detect subtle changes in the textures of the images. The aim of this study was to evaluate the feasibility of quantitative cervical texture analysis to assess cervical tissue changes throughout pregnancy. This was a cross-sectional study including singleton pregnancies between 20.0 and 41.6 weeks of gestation from women who delivered at term. Cervical length was measured, and a selected region of interest in the cervix was delineated. A model to predict gestational age based on features extracted from cervical images was developed following three steps: data splitting, feature transformation, and regression model computation. Seven hundred images, 30 per gestational week, were included for analysis. There was a strong correlation between the gestational age at which the images were obtained and the estimated gestational age by quantitative analysis of the cervical texture (R = 0.88). This study provides evidence that quantitative analysis of cervical texture can extract features from cervical ultrasound images which correlate with gestational age. Further research is needed to evaluate its applicability as a biomarker of the risk of spontaneous preterm birth, as well as its role in cervical assessment in other clinical situations in which cervical evaluation might be relevant. © 2016 S. Karger AG, Basel.

  17. Robust Image Regression Based on the Extended Matrix Variate Power Exponential Distribution of Dependent Noise.

    PubMed

    Luo, Lei; Yang, Jian; Qian, Jianjun; Tai, Ying; Lu, Gui-Fu

    2017-09-01

    Dealing with partial occlusion or illumination is one of the most challenging problems in image representation and classification. In this problem, the characterization of the representation error plays a crucial role. In most current approaches, the error matrix needs to be stretched into a vector and each element is assumed to be independently corrupted. This ignores the dependence between the elements of error. In this paper, it is assumed that the error image caused by partial occlusion or illumination changes is a random matrix variate and follows the extended matrix variate power exponential distribution. This has the heavy tailed regions and can be used to describe a matrix pattern of l×m dimensional observations that are not independent. This paper reveals the essence of the proposed distribution: it actually alleviates the correlations between pixels in an error matrix E and makes E approximately Gaussian. On the basis of this distribution, we derive a Schatten p -norm-based matrix regression model with L q regularization. Alternating direction method of multipliers is applied to solve this model. To get a closed-form solution in each step of the algorithm, two singular value function thresholding operators are introduced. In addition, the extended Schatten p -norm is utilized to characterize the distance between the test samples and classes in the design of the classifier. Extensive experimental results for image reconstruction and classification with structural noise demonstrate that the proposed algorithm works much more robustly than some existing regression-based methods.

  18. A robust and efficient stepwise regression method for building sparse polynomial chaos expansions

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Abraham, Simon, E-mail: Simon.Abraham@ulb.ac.be; Raisee, Mehrdad; Ghorbaniasl, Ghader

    2017-03-01

    Polynomial Chaos (PC) expansions are widely used in various engineering fields for quantifying uncertainties arising from uncertain parameters. The computational cost of classical PC solution schemes is unaffordable as the number of deterministic simulations to be calculated grows dramatically with the number of stochastic dimension. This considerably restricts the practical use of PC at the industrial level. A common approach to address such problems is to make use of sparse PC expansions. This paper presents a non-intrusive regression-based method for building sparse PC expansions. The most important PC contributions are detected sequentially through an automatic search procedure. The variable selectionmore » criterion is based on efficient tools relevant to probabilistic method. Two benchmark analytical functions are used to validate the proposed algorithm. The computational efficiency of the method is then illustrated by a more realistic CFD application, consisting of the non-deterministic flow around a transonic airfoil subject to geometrical uncertainties. To assess the performance of the developed methodology, a detailed comparison is made with the well established LAR-based selection technique. The results show that the developed sparse regression technique is able to identify the most significant PC contributions describing the problem. Moreover, the most important stochastic features are captured at a reduced computational cost compared to the LAR method. The results also demonstrate the superior robustness of the method by repeating the analyses using random experimental designs.« less

  19. Estimating monotonic rates from biological data using local linear regression.

    PubMed

    Olito, Colin; White, Craig R; Marshall, Dustin J; Barneche, Diego R

    2017-03-01

    Accessing many fundamental questions in biology begins with empirical estimation of simple monotonic rates of underlying biological processes. Across a variety of disciplines, ranging from physiology to biogeochemistry, these rates are routinely estimated from non-linear and noisy time series data using linear regression and ad hoc manual truncation of non-linearities. Here, we introduce the R package LoLinR, a flexible toolkit to implement local linear regression techniques to objectively and reproducibly estimate monotonic biological rates from non-linear time series data, and demonstrate possible applications using metabolic rate data. LoLinR provides methods to easily and reliably estimate monotonic rates from time series data in a way that is statistically robust, facilitates reproducible research and is applicable to a wide variety of research disciplines in the biological sciences. © 2017. Published by The Company of Biologists Ltd.

  20. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jordan, Dirk C.; Deceglie, Michael G.; Kurtz, Sarah R.

    What is the best method to determine long-term PV system performance and degradation rates? Ideally, one universally applicable methodology would be desirable so that a single number could be derived. However, data sets vary in their attributes and evidence is presented that defining two methodologies may be preferable. Monte Carlo simulations of artificial performance data allowed investigation of different methodologies and their respective confidence intervals. Tradeoffs between different approaches were delineated, elucidating as to why two separate approaches may need to be included in a standard. Regression approaches tend to be preferable when data sets are less contaminated by seasonality,more » noise and occurrence of outliers although robust regression can significantly improve the accuracy when outliers are present. In the presence of outliers, marked seasonality, or strong soiling events, year-on-year approaches tend to outperform regression approaches.« less

  1. Evaluation of Penalized and Nonpenalized Methods for Disease Prediction with Large-Scale Genetic Data.

    PubMed

    Won, Sungho; Choi, Hosik; Park, Suyeon; Lee, Juyoung; Park, Changyi; Kwon, Sunghoon

    2015-01-01

    Owing to recent improvement of genotyping technology, large-scale genetic data can be utilized to identify disease susceptibility loci and this successful finding has substantially improved our understanding of complex diseases. However, in spite of these successes, most of the genetic effects for many complex diseases were found to be very small, which have been a big hurdle to build disease prediction model. Recently, many statistical methods based on penalized regressions have been proposed to tackle the so-called "large P and small N" problem. Penalized regressions including least absolute selection and shrinkage operator (LASSO) and ridge regression limit the space of parameters, and this constraint enables the estimation of effects for very large number of SNPs. Various extensions have been suggested, and, in this report, we compare their accuracy by applying them to several complex diseases. Our results show that penalized regressions are usually robust and provide better accuracy than the existing methods for at least diseases under consideration.

  2. Robustness analysis of non-ordinary Petri nets for flexible assembly systems

    NASA Astrophysics Data System (ADS)

    Hsieh, Fu-Shiung

    2010-05-01

    Non-ordinary controlled Petri nets (NCPNs) have the advantages to model flexible assembly systems in which multiple identical resources may be required to perform an operation. However, existing studies on NCPNs are still limited. For example, the robustness properties of NCPNs have not been studied. This motivates us to develop an analysis method for NCPNs. Robustness analysis concerns the ability for a system to maintain operation in the presence of uncertainties. It provides an alternative way to analyse a perturbed system without reanalysis. In our previous research, we have analysed the robustness properties of several subclasses of ordinary controlled Petri nets. To study the robustness properties of NCPNs, we augment NCPNs with an uncertainty model, which specifies an upper bound on the uncertainties for each reachable marking. The resulting PN models are called non-ordinary controlled Petri nets with uncertainties (NCPNU). Based on NCPNU, the problem is to characterise the maximal tolerable uncertainties for each reachable marking. The computational complexities to characterise maximal tolerable uncertainties for each reachable marking grow exponentially with the size of the nets. Instead of considering general NCPNU, we limit our scope to a subclass of PN models called non-ordinary controlled flexible assembly Petri net with uncertainties (NCFAPNU) for assembly systems and study its robustness. We will extend the robustness analysis to NCFAPNU. We identify two types of uncertainties under which the liveness of NCFAPNU can be maintained.

  3. GPS baseline configuration design based on robustness analysis

    NASA Astrophysics Data System (ADS)

    Yetkin, M.; Berber, M.

    2012-11-01

    The robustness analysis results obtained from a Global Positioning System (GPS) network are dramatically influenced by the configurationof the observed baselines. The selection of optimal GPS baselines may allow for a cost effective survey campaign and a sufficiently robustnetwork. Furthermore, using the approach described in this paper, the required number of sessions, the baselines to be observed, and thesignificance levels for statistical testing and robustness analysis can be determined even before the GPS campaign starts. In this study, wepropose a robustness criterion for the optimal design of geodetic networks, and present a very simple and efficient algorithm based on thiscriterion for the selection of optimal GPS baselines. We also show the relationship between the number of sessions and the non-centralityparameter. Finally, a numerical example is given to verify the efficacy of the proposed approach.

  4. Gap-metric-based robustness analysis of nonlinear systems with full and partial feedback linearisation

    NASA Astrophysics Data System (ADS)

    Al-Gburi, A.; Freeman, C. T.; French, M. C.

    2018-06-01

    This paper uses gap metric analysis to derive robustness and performance margins for feedback linearising controllers. Distinct from previous robustness analysis, it incorporates the case of output unstructured uncertainties, and is shown to yield general stability conditions which can be applied to both stable and unstable plants. It then expands on existing feedback linearising control schemes by introducing a more general robust feedback linearising control design which classifies the system nonlinearity into stable and unstable components and cancels only the unstable plant nonlinearities. This is done in order to preserve the stabilising action of the inherently stabilising nonlinearities. Robustness and performance margins are derived for this control scheme, and are expressed in terms of bounds on the plant nonlinearities and the accuracy of the cancellation of the unstable plant nonlinearity by the controller. Case studies then confirm reduced conservatism compared with standard methods.

  5. Effects of tibolone on fibrinogen and antithrombin III: A systematic review and meta-analysis of controlled trials.

    PubMed

    Bała, Małgorzata; Sahebkar, Amirhossein; Ursoniu, Sorin; Serban, Maria-Corina; Undas, Anetta; Mikhailidis, Dimitri P; Lip, Gregory Y H; Rysz, Jacek; Banach, Maciej

    2017-10-01

    Tibolone is a synthetic steroid with estrogenic, androgenic and progestogenic activity, but the evidence regarding its effects on fibrinogen and antithrombin III (ATIII) has not been conclusive. We assessed the impact of tibolone on fibrinogen and ATIII through a systematic review and meta-analysis of available randomized controlled trials (RCTs). The search included PUBMED, Web of Science, Scopus, and Google Scholar (up to January 31st, 2016) to identify controlled clinical studies investigating the effects of oral tibolone treatment on fibrinogen and ATIII. Overall, the impact of tibolone on plasma fibrinogen concentrations was reported in 10 trials comprising 11 treatment arms. Meta-analysis did not suggest a significant reduction of fibrinogen levels following treatment with tibolone (WMD: -5.38%, 95% CI: -11.92, +1.16, p=0.107). This result was robust in the sensitivity analysis and not influenced after omitting each of the included studies from meta-analysis. When the studies were categorized according to the duration of treatment, there was no effect in the subsets of trials lasting either <12months (WMD: -7.64%, 95% CI: -16.58, +1.29, p=0.094) or ≥12months (WMD: -0.62%, 95% CI: -8.40, +7.17, p=0.876). With regard to ATIII, there was no change following treatment with tibolone (WMD: +0.74%, 95% CI: -1.44, +2.93, p=0.505) and this effect was robust in sensitivity analysis. There was no differential effect of tibolone on plasma ATIII concentrations in trials with either <12months (WMD: +2.26%, 95% CI: -3.14, +7.66, p=0.411) or≥12months (WMD: +0.06%, 95% CI: -1.16, +1.28, p=0.926) duration. Consistent with the results of subgroup analysis, meta-regression did not suggest any significant association between the changes in plasma concentrations of fibrinogen (slope: +0.40; 95% CI: -0.39, +1.19; p=0.317) and ATIII (slope: -0.17; 95% CI: -0.54, +0.20; p=0.374) with duration of treatment. In conclusion, meta-analysis did not suggest a significant reduction of fibrinogen and ATIII levels following treatment with tibolone. Copyright © 2017 Elsevier Ltd. All rights reserved.

  6. GWAR: robust analysis and meta-analysis of genome-wide association studies.

    PubMed

    Dimou, Niki L; Tsirigos, Konstantinos D; Elofsson, Arne; Bagos, Pantelis G

    2017-05-15

    In the context of genome-wide association studies (GWAS), there is a variety of statistical techniques in order to conduct the analysis, but, in most cases, the underlying genetic model is usually unknown. Under these circumstances, the classical Cochran-Armitage trend test (CATT) is suboptimal. Robust procedures that maximize the power and preserve the nominal type I error rate are preferable. Moreover, performing a meta-analysis using robust procedures is of great interest and has never been addressed in the past. The primary goal of this work is to implement several robust methods for analysis and meta-analysis in the statistical package Stata and subsequently to make the software available to the scientific community. The CATT under a recessive, additive and dominant model of inheritance as well as robust methods based on the Maximum Efficiency Robust Test statistic, the MAX statistic and the MIN2 were implemented in Stata. Concerning MAX and MIN2, we calculated their asymptotic null distributions relying on numerical integration resulting in a great gain in computational time without losing accuracy. All the aforementioned approaches were employed in a fixed or a random effects meta-analysis setting using summary data with weights equal to the reciprocal of the combined cases and controls. Overall, this is the first complete effort to implement procedures for analysis and meta-analysis in GWAS using Stata. A Stata program and a web-server are freely available for academic users at http://www.compgen.org/tools/GWAR. pbagos@compgen.org. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  7. Effect of Metformin on Plasma Fibrinogen Concentrations: A Systematic Review and Meta-Analysis of Randomized Placebo-Controlled Trials.

    PubMed

    Simental-Mendia, Luis E; Pirro, Matteo; Atkin, Stephen L; Banach, Maciej; Mikhailidis, Dimitri P; Sahebkar, Amirhossein

    2018-01-01

    Fibrinogen is a key mediator of thrombosis and it has been implicated in the pathogenesis of atherosclerosis. Because metformin has shown a potential protective effect on different atherothrombotic risk factors, we assessed in this meta-analysis its effect on plasma fibrinogen concentrations. A systematic review and meta-analysis was carried out to identify randomized placebo-controlled trials evaluating the effect of metformin administration on fibrinogen levels. The search included PubMed-Medline, Scopus, ISI Web of Knowledge and Google Scholar databases (by June 2, 2017) and quality of studies was performed according to Cochrane criteria. Quantitative data synthesis was conducted using a random-effects model and sensitivity analysis by the leave-one-out method. Meta-regression analysis was performed to assess the modifiers of treatment response. Meta-analysis of data from 9 randomized placebo-controlled clinical trials with 2302 patients comprising 10 treatment arms did not suggest a significant change in plasma fibrinogen concentrations following metformin therapy (WMD: -0.25 g/L, 95% CI: -0.53, 0.04, p = 0.092). The effect size was robust in the leave-one-out sensitivity analysis and remained non-significant after omission of each single study from the meta-analysis. No significant effect of metformin on plasma fibrinogen concentrations was demonstrated in the current meta-analysis. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  8. SU-E-J-212: Identifying Bones From MRI: A Dictionary Learnign and Sparse Regression Approach

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ruan, D; Yang, Y; Cao, M

    2014-06-01

    Purpose: To develop an efficient and robust scheme to identify bony anatomy based on MRI-only simulation images. Methods: MRI offers important soft tissue contrast and functional information, yet its lack of correlation to electron-density has placed it as an auxiliary modality to CT in radiotherapy simulation and adaptation. An effective scheme to identify bony anatomy is an important first step towards MR-only simulation/treatment paradigm and would satisfy most practical purposes. We utilize a UTE acquisition sequence to achieve visibility of the bone. By contrast to manual + bulk or registration-to identify bones, we propose a novel learning-based approach for improvedmore » robustness to MR artefacts and environmental changes. Specifically, local information is encoded with MR image patch, and the corresponding label is extracted (during training) from simulation CT aligned to the UTE. Within each class (bone vs. nonbone), an overcomplete dictionary is learned so that typical patches within the proper class can be represented as a sparse combination of the dictionary entries. For testing, an acquired UTE-MRI is divided to patches using a sliding scheme, where each patch is sparsely regressed against both bone and nonbone dictionaries, and subsequently claimed to be associated with the class with the smaller residual. Results: The proposed method has been applied to the pilot site of brain imaging and it has showed general good performance, with dice similarity coefficient of greater than 0.9 in a crossvalidation study using 4 datasets. Importantly, it is robust towards consistent foreign objects (e.g., headset) and the artefacts relates to Gibbs and field heterogeneity. Conclusion: A learning perspective has been developed for inferring bone structures based on UTE MRI. The imaging setting is subject to minimal motion effects and the post-processing is efficient. The improved efficiency and robustness enables a first translation to MR-only routine. The scheme generalizes to multiple tissue classes.« less

  9. Robust Control Systems.

    DTIC Science & Technology

    1981-12-01

    time control system algorithms that will perform adequately (i.e., at least maintain closed-loop system stability) when ucertain parameters in the...system design models vary significantly. Such a control algorithm is said to have stability robustness-or more simply is said to be "robust". This...cas6s above, the performance is analyzed using a covariance analysis. The development of all the controllers and the performance analysis algorithms is

  10. Robust Tests for Additive Gene-Environment Interaction in Case-Control Studies Using Gene-Environment Independence.

    PubMed

    Liu, Gang; Mukherjee, Bhramar; Lee, Seunggeun; Lee, Alice W; Wu, Anna H; Bandera, Elisa V; Jensen, Allan; Rossing, Mary Anne; Moysich, Kirsten B; Chang-Claude, Jenny; Doherty, Jennifer A; Gentry-Maharaj, Aleksandra; Kiemeney, Lambertus; Gayther, Simon A; Modugno, Francesmary; Massuger, Leon; Goode, Ellen L; Fridley, Brooke L; Terry, Kathryn L; Cramer, Daniel W; Ramus, Susan J; Anton-Culver, Hoda; Ziogas, Argyrios; Tyrer, Jonathan P; Schildkraut, Joellen M; Kjaer, Susanne K; Webb, Penelope M; Ness, Roberta B; Menon, Usha; Berchuck, Andrew; Pharoah, Paul D; Risch, Harvey; Pearce, Celeste Leigh

    2018-02-01

    There have been recent proposals advocating the use of additive gene-environment interaction instead of the widely used multiplicative scale, as a more relevant public health measure. Using gene-environment independence enhances statistical power for testing multiplicative interaction in case-control studies. However, under departure from this assumption, substantial bias in the estimates and inflated type I error in the corresponding tests can occur. In this paper, we extend the empirical Bayes (EB) approach previously developed for multiplicative interaction, which trades off between bias and efficiency in a data-adaptive way, to the additive scale. An EB estimator of the relative excess risk due to interaction is derived, and the corresponding Wald test is proposed with a general regression setting under a retrospective likelihood framework. We study the impact of gene-environment association on the resultant test with case-control data. Our simulation studies suggest that the EB approach uses the gene-environment independence assumption in a data-adaptive way and provides a gain in power compared with the standard logistic regression analysis and better control of type I error when compared with the analysis assuming gene-environment independence. We illustrate the methods with data from the Ovarian Cancer Association Consortium. © The Author(s) 2017. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  11. On Improved Least Squares Regression and Artificial Neural Network Meta-Models for Simulation via Control Variates

    DTIC Science & Technology

    2016-09-15

    18] under the context of robust parameter design for simulation. Bellucci’s technique is used in this research, primarily because the interior -point...Fundamentals of Radial Basis Neural Network (RBNN) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 1.2.2.2 Design of Experiments...with Neural Nets . . . . . . . . . . . . . 31 1.2.2.3 Factorial Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 1.2.2.4

  12. Robust Semi-Active Ride Control under Stochastic Excitation

    DTIC Science & Technology

    2014-01-01

    broad classes of time-series models which are of practical importance; the Auto-Regressive (AR) models, the Integrated (I) models, and the Moving...Average (MA) models [12]. Combinations of these models result in autoregressive moving average (ARMA) and autoregressive integrated moving average...Down Up 4) Down Down These four cases can be written in compact form as: (20) Where is the Heaviside

  13. Revisiting Fixed- and Random-Effects Models: Some Considerations for Policy-Relevant Education Research

    ERIC Educational Resources Information Center

    Clarke, Paul; Crawford, Claire; Steele, Fiona; Vignoles, Anna

    2015-01-01

    The use of fixed (FE) and random effects (RE) in two-level hierarchical linear regression is discussed in the context of education research. We compare the robustness of FE models with the modelling flexibility and potential efficiency of those from RE models. We argue that the two should be seen as complementary approaches. We then compare both…

  14. Estimation of water quality by UV/Vis spectrometry in the framework of treated wastewater reuse.

    PubMed

    Carré, Erwan; Pérot, Jean; Jauzein, Vincent; Lin, Liming; Lopez-Ferber, Miguel

    2017-07-01

    The aim of this study is to investigate the potential of ultraviolet/visible (UV/Vis) spectrometry as a complementary method for routine monitoring of reclaimed water production. Robustness of the models and compliance of their sensitivity with current quality limits are investigated. The following indicators are studied: total suspended solids (TSS), turbidity, chemical oxygen demand (COD) and nitrate. Partial least squares regression (PLSR) is used to find linear correlations between absorbances and indicators of interest. Artificial samples are made by simulating a sludge leak on the wastewater treatment plant and added to the original dataset, then divided into calibration and prediction datasets. The models are built on the calibration set, and then tested on the prediction set. The best models are developed with: PLSR for COD (R pred 2 = 0.80), TSS (R pred 2 = 0.86) and turbidity (R pred 2 = 0.96), and with a simple linear regression from absorbance at 208 nm (R pred 2 = 0.95) for nitrate concentration. The input of artificial data significantly enhances the robustness of the models. The sensitivity of the UV/Vis spectrometry monitoring system developed is compatible with quality requirements of reclaimed water production processes.

  15. Robust extraction of baseline signal of atmospheric trace species using local regression

    NASA Astrophysics Data System (ADS)

    Ruckstuhl, A. F.; Henne, S.; Reimann, S.; Steinbacher, M.; Vollmer, M. K.; O'Doherty, S.; Buchmann, B.; Hueglin, C.

    2012-11-01

    The identification of atmospheric trace species measurements that are representative of well-mixed background air masses is required for monitoring atmospheric composition change at background sites. We present a statistical method based on robust local regression that is well suited for the selection of background measurements and the estimation of associated baseline curves. The bootstrap technique is applied to calculate the uncertainty in the resulting baseline curve. The non-parametric nature of the proposed approach makes it a very flexible data filtering method. Application to carbon monoxide (CO) measured from 1996 to 2009 at the high-alpine site Jungfraujoch (Switzerland, 3580 m a.s.l.), and to measurements of 1,1-difluoroethane (HFC-152a) from Jungfraujoch (2000 to 2009) and Mace Head (Ireland, 1995 to 2009) demonstrates the feasibility and usefulness of the proposed approach. The determined average annual change of CO at Jungfraujoch for the 1996 to 2009 period as estimated from filtered annual mean CO concentrations is -2.2 ± 1.1 ppb yr-1. For comparison, the linear trend of unfiltered CO measurements at Jungfraujoch for this time period is -2.9 ± 1.3 ppb yr-1.

  16. Robust Machine Learning Variable Importance Analyses of Medical Conditions for Health Care Spending.

    PubMed

    Rose, Sherri

    2018-03-11

    To propose nonparametric double robust machine learning in variable importance analyses of medical conditions for health spending. 2011-2012 Truven MarketScan database. I evaluate how much more, on average, commercially insured enrollees with each of 26 of the most prevalent medical conditions cost per year after controlling for demographics and other medical conditions. This is accomplished within the nonparametric targeted learning framework, which incorporates ensemble machine learning. Previous literature studying the impact of medical conditions on health care spending has almost exclusively focused on parametric risk adjustment; thus, I compare my approach to parametric regression. My results demonstrate that multiple sclerosis, congestive heart failure, severe cancers, major depression and bipolar disorders, and chronic hepatitis are the most costly medical conditions on average per individual. These findings differed from those obtained using parametric regression. The literature may be underestimating the spending contributions of several medical conditions, which is a potentially critical oversight. If current methods are not capturing the true incremental effect of medical conditions, undesirable incentives related to care may remain. Further work is needed to directly study these issues in the context of federal formulas. © Health Research and Educational Trust.

  17. Model Checking Techniques for Assessing Functional Form Specifications in Censored Linear Regression Models.

    PubMed

    León, Larry F; Cai, Tianxi

    2012-04-01

    In this paper we develop model checking techniques for assessing functional form specifications of covariates in censored linear regression models. These procedures are based on a censored data analog to taking cumulative sums of "robust" residuals over the space of the covariate under investigation. These cumulative sums are formed by integrating certain Kaplan-Meier estimators and may be viewed as "robust" censored data analogs to the processes considered by Lin, Wei & Ying (2002). The null distributions of these stochastic processes can be approximated by the distributions of certain zero-mean Gaussian processes whose realizations can be generated by computer simulation. Each observed process can then be graphically compared with a few realizations from the Gaussian process. We also develop formal test statistics for numerical comparison. Such comparisons enable one to assess objectively whether an apparent trend seen in a residual plot reects model misspecification or natural variation. We illustrate the methods with a well known dataset. In addition, we examine the finite sample performance of the proposed test statistics in simulation experiments. In our simulation experiments, the proposed test statistics have good power of detecting misspecification while at the same time controlling the size of the test.

  18. Complement factor H Y402H variant and risk of age-related macular degeneration in Asians: a systematic review and meta-analysis.

    PubMed

    Kondo, Naoshi; Bessho, Hiroaki; Honda, Shigeru; Negi, Akira

    2011-02-01

    To investigate whether the Y402H variant in the complement factor H gene is associated with age-related macular degeneration (AMD) in Asian populations. Meta-analysis of previous publications. Case-control groups of subjects with AMD and controls from 13 association studies. We performed a meta-analysis of the association between Y402H and AMD in Asian populations using data available from 13 case-control studies involving 3973 subjects. Summary odds ratios (ORs) and 95% confidence intervals (CIs) were estimated using fixed- and random-effects models. The Q-statistic test was used to assess heterogeneity, and Egger's test was used to evaluate publication bias. Sensitivity analysis, cumulative meta-analysis, and meta-regression analysis were also performed. Allele and genotype frequencies of the Y402H variant. The Y402H variant showed a significant summary OR of 1.97 (95% CI, 1.54-2.52; P<0.001; allelic contrast model) per allele. Possession of at least 1 copy of the C allele increased the disease risk by 1.97-fold (95% CI, 1.63-2.39; P<0.001; dominant model) and accounted for 8.8% of the attributable risk of AMD in Asian populations. Sensitivity analysis indicated the robustness of our findings, and evidence of publication bias was not observed in our meta-analysis. Meta-regression analysis indicated no significant effect of baseline study characteristics on the summary effect size. Cumulative meta-analysis revealed that the summary ORs were stable and the 95% CIs narrowed with the accumulation of data over time. Our analysis provides substantial evidence that the Y402H variant is significantly associated with AMD in Asian populations. Our results expand the number of confirmed AMD susceptibility loci for Asians populations, which provide a better understanding of the genetic architecture underlying disease susceptibility and may advance the potential for preclinical prediction in future genetic tests by a combined evaluation of inherited susceptibility with previously established loci. Copyright © 2011 American Academy of Ophthalmology. Published by Elsevier Inc. All rights reserved.

  19. Robust Flutter Margin Analysis that Incorporates Flight Data

    NASA Technical Reports Server (NTRS)

    Lind, Rick; Brenner, Martin J.

    1998-01-01

    An approach for computing worst-case flutter margins has been formulated in a robust stability framework. Uncertainty operators are included with a linear model to describe modeling errors and flight variations. The structured singular value, mu, computes a stability margin that directly accounts for these uncertainties. This approach introduces a new method of computing flutter margins and an associated new parameter for describing these margins. The mu margins are robust margins that indicate worst-case stability estimates with respect to the defined uncertainty. Worst-case flutter margins are computed for the F/A-18 Systems Research Aircraft using uncertainty sets generated by flight data analysis. The robust margins demonstrate flight conditions for flutter may lie closer to the flight envelope than previously estimated by p-k analysis.

  20. Can Targeted Intervention Mitigate Early Emotional and Behavioral Problems?: Generating Robust Evidence within Randomized Controlled Trials

    PubMed Central

    Doyle, Orla; McGlanaghy, Edel; O’Farrelly, Christine; Tremblay, Richard E.

    2016-01-01

    This study examined the impact of a targeted Irish early intervention program on children’s emotional and behavioral development using multiple methods to test the robustness of the results. Data on 164 Preparing for Life participants who were randomly assigned into an intervention group, involving home visits from pregnancy onwards, or a control group, was used to test the impact of the intervention on Child Behavior Checklist scores at 24-months. Using inverse probability weighting to account for differential attrition, permutation testing to address small sample size, and quantile regression to characterize the distributional impact of the intervention, we found that the few treatment effects were largely concentrated among boys most at risk of developing emotional and behavioral problems. The average treatment effect identified a 13% reduction in the likelihood of falling into the borderline clinical threshold for Total Problems. The interaction and subgroup analysis found that this main effect was driven by boys. The distributional analysis identified a 10-point reduction in the Externalizing Problems score for boys at the 90th percentile. No effects were observed for girls or for the continuous measures of Total, Internalizing, and Externalizing problems. These findings suggest that the impact of this prenatally commencing home visiting program may be limited to boys experiencing the most difficulties. Further adoption of the statistical methods applied here may help to improve the internal validity of randomized controlled trials and contribute to the field of evaluation science more generally. Trial Registration: ISRCTN Registry ISRCTN04631728 PMID:27253184

  1. Observed ground-motion variabilities and implication for source properties

    NASA Astrophysics Data System (ADS)

    Cotton, F.; Bora, S. S.; Bindi, D.; Specht, S.; Drouet, S.; Derras, B.; Pina-Valdes, J.

    2016-12-01

    One of the key challenges of seismology is to be able to calibrate and analyse the physical factors that control earthquake and ground-motion variabilities. Within the framework of empirical ground-motion prediction equation (GMPE) developments, ground-motions residuals (differences between recorded ground motions and the values predicted by a GMPE) are computed. The exponential growth of seismological near-field records and modern regression algorithms allow to decompose these residuals into between-event and a within-event residual components. The between-event term quantify all the residual effects of the source (e.g. stress-drops) which are not accounted by magnitude term as the only source parameter of the model. Between-event residuals provide a new and rather robust way to analyse the physical factors that control earthquake source properties and associated variabilities. We first will show the correlation between classical stress-drops and between-event residuals. We will also explain why between-event residuals may be a more robust way (compared to classical stress-drop analysis) to analyse earthquake source-properties. We will finally calibrate between-events variabilities using recent high-quality global accelerometric datasets (NGA-West 2, RESORCE) and datasets from recent earthquakes sequences (Aquila, Iquique, Kunamoto). The obtained between-events variabilities will be used to evaluate the variability of earthquake stress-drops but also the variability of source properties which cannot be explained by a classical Brune stress-drop variations. We will finally use the between-event residual analysis to discuss regional variations of source properties, differences between aftershocks and mainshocks and potential magnitude dependencies of source characteristics.

  2. Doubly robust estimation of generalized partial linear models for longitudinal data with dropouts.

    PubMed

    Lin, Huiming; Fu, Bo; Qin, Guoyou; Zhu, Zhongyi

    2017-12-01

    We develop a doubly robust estimation of generalized partial linear models for longitudinal data with dropouts. Our method extends the highly efficient aggregate unbiased estimating function approach proposed in Qu et al. (2010) to a doubly robust one in the sense that under missing at random (MAR), our estimator is consistent when either the linear conditional mean condition is satisfied or a model for the dropout process is correctly specified. We begin with a generalized linear model for the marginal mean, and then move forward to a generalized partial linear model, allowing for nonparametric covariate effect by using the regression spline smoothing approximation. We establish the asymptotic theory for the proposed method and use simulation studies to compare its finite sample performance with that of Qu's method, the complete-case generalized estimating equation (GEE) and the inverse-probability weighted GEE. The proposed method is finally illustrated using data from a longitudinal cohort study. © 2017, The International Biometric Society.

  3. Principal component regression analysis with SPSS.

    PubMed

    Liu, R X; Kuang, J; Gong, Q; Hou, X L

    2003-06-01

    The paper introduces all indices of multicollinearity diagnoses, the basic principle of principal component regression and determination of 'best' equation method. The paper uses an example to describe how to do principal component regression analysis with SPSS 10.0: including all calculating processes of the principal component regression and all operations of linear regression, factor analysis, descriptives, compute variable and bivariate correlations procedures in SPSS 10.0. The principal component regression analysis can be used to overcome disturbance of the multicollinearity. The simplified, speeded up and accurate statistical effect is reached through the principal component regression analysis with SPSS.

  4. Comparing the index-flood and multiple-regression methods using L-moments

    NASA Astrophysics Data System (ADS)

    Malekinezhad, H.; Nachtnebel, H. P.; Klik, A.

    In arid and semi-arid regions, the length of records is usually too short to ensure reliable quantile estimates. Comparing index-flood and multiple-regression analyses based on L-moments was the main objective of this study. Factor analysis was applied to determine main influencing variables on flood magnitude. Ward’s cluster and L-moments approaches were applied to several sites in the Namak-Lake basin in central Iran to delineate homogeneous regions based on site characteristics. Homogeneity test was done using L-moments-based measures. Several distributions were fitted to the regional flood data and index-flood and multiple-regression methods as two regional flood frequency methods were compared. The results of factor analysis showed that length of main waterway, compactness coefficient, mean annual precipitation, and mean annual temperature were the main variables affecting flood magnitude. The study area was divided into three regions based on the Ward’s method of clustering approach. The homogeneity test based on L-moments showed that all three regions were acceptably homogeneous. Five distributions were fitted to the annual peak flood data of three homogeneous regions. Using the L-moment ratios and the Z-statistic criteria, GEV distribution was identified as the most robust distribution among five candidate distributions for all the proposed sub-regions of the study area, and in general, it was concluded that the generalised extreme value distribution was the best-fit distribution for every three regions. The relative root mean square error (RRMSE) measure was applied for evaluating the performance of the index-flood and multiple-regression methods in comparison with the curve fitting (plotting position) method. In general, index-flood method gives more reliable estimations for various flood magnitudes of different recurrence intervals. Therefore, this method should be adopted as regional flood frequency method for the study area and the Namak-Lake basin in central Iran. To estimate floods of various return periods for gauged catchments in the study area, the mean annual peak flood of the catchments may be multiplied by corresponding values of the growth factors, and computed using the GEV distribution.

  5. Model reference tracking control of an aircraft: a robust adaptive approach

    NASA Astrophysics Data System (ADS)

    Tanyer, Ilker; Tatlicioglu, Enver; Zergeroglu, Erkan

    2017-05-01

    This work presents the design and the corresponding analysis of a nonlinear robust adaptive controller for model reference tracking of an aircraft that has parametric uncertainties in its system matrices and additive state- and/or time-dependent nonlinear disturbance-like terms in its dynamics. Specifically, robust integral of the sign of the error feedback term and an adaptive term is fused with a proportional integral controller. Lyapunov-based stability analysis techniques are utilised to prove global asymptotic convergence of the output tracking error. Extensive numerical simulations are presented to illustrate the performance of the proposed robust adaptive controller.

  6. Robust-mode analysis of hydrodynamic flows

    NASA Astrophysics Data System (ADS)

    Roy, Sukesh; Gord, James R.; Hua, Jia-Chen; Gunaratne, Gemunu H.

    2017-04-01

    The emergence of techniques to extract high-frequency high-resolution data introduces a new avenue for modal decomposition to assess the underlying dynamics, especially of complex flows. However, this task requires the differentiation of robust, repeatable flow constituents from noise and other irregular features of a flow. Traditional approaches involving low-pass filtering and principle components analysis have shortcomings. The approach outlined here, referred to as robust-mode analysis, is based on Koopman decomposition. Three applications to (a) a counter-rotating cellular flame state, (b) variations in financial markets, and (c) turbulent injector flows are provided.

  7. Household air pollution and stillbirths in India: analysis of the DLHS-II National Survey.

    PubMed

    Lakshmi, P V M; Virdi, Navkiran Kaur; Sharma, Atul; Tripathy, Jaya Prasad; Smith, Kirk R; Bates, Michael N; Kumar, Rajesh

    2013-02-01

    Several studies have linked biomass cooking fuel with adverse pregnancy outcomes such as preterm births, low birth weight and post-neonatal infant mortality, but very few have studied the associations with cooking fuel independent of other factors associated with stillbirths. We analyzed the data from 188,917 ever-married women aged 15-49 included in India's 2003-2004 District Level Household Survey-II to investigate the association between household use of cooking fuels (liquid petroleum gas/electricity, kerosene, biomass) and risk of stillbirth. Prevalence ratios (PRs) were obtained using Poisson regression with robust standard errors after controlling for several potentially confounding factors (socio-demographic and maternal health characteristics). Risk factors significantly associated with occurrence of stillbirth in the Poisson regression with robust standard errors model were: literacy status of the mother and father, lighting fuel and cooking fuel used, gravida status, history of previous abortion, whether the woman had an antenatal check up, age at last pregnancy >35 years, labor complications, bleeding complications, fetal and other complications, prematurity and home delivery. After controlling the effect of these factors, women who cook with firewood (PR 1.24; 95% CI: 1.08-1.41, p=0.003) or kerosene (PR 1.36; 95% CI: 1.10-1.67, p=0.004) were more likely to have experienced a stillbirth than those who cook with LPG/electricity. Kerosene lamp use was also associated with stillbirths compared to electric lighting (PR 1.15; 95% CI: 1.06-1.25, p=0.001). The population attributable risk of firewood as cooking fuel for stillbirths in India was 11% and 1% for kerosene cooking. Biomass and kerosene cooking fuels are associated with stillbirth occurrence in this population sample. Assuming these associations are causal, about 12% of stillbirths in India could be prevented by providing access to cleaner cooking fuel. Copyright © 2012 Elsevier Inc. All rights reserved.

  8. FRAIL Questionnaire Screening Tool and Short-Term Outcomes in Geriatric Fracture Patients.

    PubMed

    Gleason, Lauren Jan; Benton, Emily A; Alvarez-Nebreda, M Loreto; Weaver, Michael J; Harris, Mitchel B; Javedan, Houman

    2017-12-01

    There are limited screening tools to predict adverse postoperative outcomes for the geriatric surgical fracture population. Frailty is increasingly recognized as a risk assessment to capture complexity. The goal of this study was to use a short screening tool, the FRAIL scale, to categorize the level of frailty of older adults admitted with a fracture to determine the association of each frailty category with postoperative and 30-day outcomes. Retrospective cohort study. Level 1 trauma center. A total of 175 consecutive patients over age 70 years admitted to co-managed orthopedic trauma and geriatrics services. The FRAIL scale (short 5-question assessment of fatigue, resistance, aerobic capacity, illnesses, and loss of weight) classified the patients into 3 categories: robust (score = 0), prefrail (score = 1-2), and frail (score = 3-5). Postoperative outcome variables collected were postoperative complications, unplanned intensive care unit admission, length of stay (LOS), discharge disposition, and orthopedic follow-up after surgery. Thirty-day outcomes measured were 30-day readmission and 30-day mortality. Analysis of variance (1-way) and Kruskal-Wallis tests were used to compare continuous variables across the 3 FRAIL categories. Fisher exact tests were used to compare categorical variables. Multiple regression analysis, adjusted by age, sex, and Charlson index, was conducted to study the association between frailty category and outcomes. FRAIL scale categorized the patients into 3 groups: robust (n = 29), prefrail (n = 73), and frail (n = 73). There were statistically significant differences between groups in terms of age, comorbidity, dementia, functional dependency, polypharmacy, and rate of institutionalization, being higher in the frailest patients. Hip fracture was the most frequent fracture, and it was more frequent as the frailty of the patient increased (48%, 61%, and 75% in robust, prefrail, and frail groups, respectively). The American Society of Anesthesiologists preoperative risk significantly correlated with the frailty of the patient (American Society of Anesthesiologists score 3-4: 41%, 82% and 86%, in robust, prefrail, and frail groups, P < .001). After adjustment by age, sex, and comorbidity, there was a statistically significant association between frailty and both LOS and the development of any complication after surgery (LOS: 4.2, 5.0, and 7.1 days, P = .002; any complication: 3.4%, 26%, and 39.7%, P = .03; in robust, prefrail, and frail groups). There were also significant differences in discharge disposition (31% of robust vs 4.1% frail, P = .008) and follow-up completion (97% of robust vs 69% of the frail ones). Differences in time to surgery, unplanned intensive care unit admission, and 30-day readmission and mortality, although showing a trend, did not reach statistical significance. Frailty, measured by the FRAIL scale, was associated with increase LOS, complications after surgery, and discharge to rehabilitation facility in geriatric fracture patients. The FRAIL scale is a promising short screen to stratify and help operationalize the perioperative care of older surgical patients. Copyright © 2017 AMDA – The Society for Post-Acute and Long-Term Care Medicine. Published by Elsevier Inc. All rights reserved.

  9. Color normalization for robust evaluation of microscopy images

    NASA Astrophysics Data System (ADS)

    Švihlík, Jan; Kybic, Jan; Habart, David

    2015-09-01

    This paper deals with color normalization of microscopy images of Langerhans islets in order to increase robustness of the islet segmentation to illumination changes. The main application is automatic quantitative evaluation of the islet parameters, useful for determining the feasibility of islet transplantation in diabetes. First, background illumination inhomogeneity is compensated and a preliminary foreground/background segmentation is performed. The color normalization itself is done in either lαβ or logarithmic RGB color spaces, by comparison with a reference image. The color-normalized images are segmented using color-based features and pixel-wise logistic regression, trained on manually labeled images. Finally, relevant statistics such as the total islet area are evaluated in order to determine the success likelihood of the transplantation.

  10. The influence of inequality on the standard of living: worldwide anthropometric evidence from the 19th and 20th centuries.

    PubMed

    Blum, Matthias

    2013-12-01

    We provide empirical evidence on the existence of the Pigou-Dalton principle. The latter indicates that aggregate welfare is - ceteris paribus - maximized when incomes of all individuals are equalized (and therefore marginal utility from income is as well). Using anthropometric panel data on 101 countries during the 19th and 20th centuries, we determine that there is a systematic negative and concave relationship between height inequality and average height. The robustness of this relationship is tested by means of several robustness checks, including two instrument variable regressions. These findings help to elucidate the impact of economic inequality on welfare. Copyright © 2012 Elsevier B.V. All rights reserved.

  11. Aerial robot intelligent control method based on back-stepping

    NASA Astrophysics Data System (ADS)

    Zhou, Jian; Xue, Qian

    2018-05-01

    The aerial robot is characterized as strong nonlinearity, high coupling and parameter uncertainty, a self-adaptive back-stepping control method based on neural network is proposed in this paper. The uncertain part of the aerial robot model is compensated online by the neural network of Cerebellum Model Articulation Controller and robust control items are designed to overcome the uncertainty error of the system during online learning. At the same time, particle swarm algorithm is used to optimize and fix parameters so as to improve the dynamic performance, and control law is obtained by the recursion of back-stepping regression. Simulation results show that the designed control law has desired attitude tracking performance and good robustness in case of uncertainties and large errors in the model parameters.

  12. MOST: most-similar ligand based approach to target prediction.

    PubMed

    Huang, Tao; Mi, Hong; Lin, Cheng-Yuan; Zhao, Ling; Zhong, Linda L D; Liu, Feng-Bin; Zhang, Ge; Lu, Ai-Ping; Bian, Zhao-Xiang

    2017-03-11

    Many computational approaches have been used for target prediction, including machine learning, reverse docking, bioactivity spectra analysis, and chemical similarity searching. Recent studies have suggested that chemical similarity searching may be driven by the most-similar ligand. However, the extent of bioactivity of most-similar ligands has been oversimplified or even neglected in these studies, and this has impaired the prediction power. Here we propose the MOst-Similar ligand-based Target inference approach, namely MOST, which uses fingerprint similarity and explicit bioactivity of the most-similar ligands to predict targets of the query compound. Performance of MOST was evaluated by using combinations of different fingerprint schemes, machine learning methods, and bioactivity representations. In sevenfold cross-validation with a benchmark Ki dataset from CHEMBL release 19 containing 61,937 bioactivity data of 173 human targets, MOST achieved high average prediction accuracy (0.95 for pKi ≥ 5, and 0.87 for pKi ≥ 6). Morgan fingerprint was shown to be slightly better than FP2. Logistic Regression and Random Forest methods performed better than Naïve Bayes. In a temporal validation, the Ki dataset from CHEMBL19 were used to train models and predict the bioactivity of newly deposited ligands in CHEMBL20. MOST also performed well with high accuracy (0.90 for pKi ≥ 5, and 0.76 for pKi ≥ 6), when Logistic Regression and Morgan fingerprint were employed. Furthermore, the p values associated with explicit bioactivity were found be a robust index for removing false positive predictions. Implicit bioactivity did not offer this capability. Finally, p values generated with Logistic Regression, Morgan fingerprint and explicit activity were integrated with a false discovery rate (FDR) control procedure to reduce false positives in multiple-target prediction scenario, and the success of this strategy it was demonstrated with a case of fluanisone. In the case of aloe-emodin's laxative effect, MOST predicted that acetylcholinesterase was the mechanism-of-action target; in vivo studies validated this prediction. Using the MOST approach can result in highly accurate and robust target prediction. Integrated with a FDR control procedure, MOST provides a reliable framework for multiple-target inference. It has prospective applications in drug repurposing and mechanism-of-action target prediction.

  13. A fast and direct spectrophotometric method for the simultaneous determination of methyl paraben and hydroquinone in cosmetic products using successive projections algorithm.

    PubMed

    Esteki, M; Nouroozi, S; Shahsavari, Z

    2016-02-01

    To develop a simple and efficient spectrophotometric technique combined with chemometrics for the simultaneous determination of methyl paraben (MP) and hydroquinone (HQ) in cosmetic products, and specifically, to: (i) evaluate the potential use of successive projections algorithm (SPA) to derivative spectrophotometric data in order to provide sufficient accuracy and model robustness and (ii) determine MP and HQ concentration in cosmetics without tedious pre-treatments such as derivatization or extraction techniques which are time-consuming and require hazardous solvents. The absorption spectra were measured in the wavelength range of 200-350 nm. Prior to performing chemometric models, the original and first-derivative absorption spectra of binary mixtures were used as calibration matrices. Variable selected by successive projections algorithm was used to obtain multiple linear regression (MLR) models based on a small subset of wavelengths. The number of wavelengths and the starting vector were optimized, and the comparison of the root mean square error of calibration (RMSEC) and cross-validation (RMSECV) was applied to select effective wavelengths with the least collinearity and redundancy. Principal component regression (PCR) and partial least squares (PLS) were also developed for comparison. The concentrations of the calibration matrix ranged from 0.1 to 20 μg mL(-1) for MP, and from 0.1 to 25 μg mL(-1) for HQ. The constructed models were tested on an external validation data set and finally cosmetic samples. The results indicated that successive projections algorithm-multiple linear regression (SPA-MLR), applied on the first-derivative spectra, achieved the optimal performance for two compounds when compared with the full-spectrum PCR and PLS. The root mean square error of prediction (RMSEP) was 0.083, 0.314 for MP and HQ, respectively. To verify the accuracy of the proposed method, a recovery study on real cosmetic samples was carried out with satisfactory results (84-112%). The proposed method, which is an environmentally friendly approach, using minimum amount of solvent, is a simple, fast and low-cost analysis method that can provide high accuracy and robust models. The suggested method does not need any complex extraction procedure which is time-consuming and requires hazardous solvents. © 2015 Society of Cosmetic Scientists and the Société Française de Cosmétologie.

  14. Mapping the natural variation in whole bone stiffness and strength across skeletal sites.

    PubMed

    Schlecht, Stephen H; Bigelow, Erin M R; Jepsen, Karl J

    2014-10-01

    Traits of the skeletal system are coordinately adjusted to establish mechanical homeostasis in response to genetic and environmental factors. Prior work demonstrated that this 'complex adaptive' process is not perfect, revealing a two-fold difference in whole bone stiffness of the tibia across a population. Robustness (specifically, total cross-sectional area relative to length) varies widely across skeletal sites and between sexes. However, it is unknown whether the natural variation in whole bone stiffness and strength also varies across skeletal sites and between men and women. We tested the hypotheses that: 1) all major long bones of the appendicular skeleton demonstrate inherent, systemic constraints in the degree to which morphological and compositional traits can be adjusted for a given robustness; and 2) these traits covary in a predictable manner independent of body size and robustness. We assessed the functional relationships among robustness, cortical area (Ct.Ar), cortical tissue mineral density (Ct.TMD), and bone strength index (BSI) across the long bones of the upper and lower limbs of 115 adult men and women. All bones showed a significant (p<0.001) positive regression between BSI and robustness after adjusting for body size, with slender bones being 1.7-2.3 times less stiff and strong in men and 1.3-2.8 times less stiff and strong in women compared to robust bones. Our findings are the first to document the natural inter-individual variation in whole bone stiffness and strength that exist within populations and that is predictable based on skeletal robustness for all major long bones. Documenting and further understanding this natural variation in strength may be critical for differentially diagnosing and treating skeletal fragility. Copyright © 2014 Elsevier Inc. All rights reserved.

  15. Vascular robustness: The missing parameter in cardiovascular risk prediction.

    PubMed

    Kraushaar, Lutz E; Dressel, Alexander; Maßmann, Alexander

    2018-03-01

    Undetected high risk for premature death of cardiovascular disease (CVD) among individuals with low-to-moderate risk factor scores is an acknowledged obstacle to CVD prevention. The vasculature's functional robustness against risk factor derailment may serve as a novel discriminator of mortality risk under similar risk factor loads. To test this assumption, we hypothesized that the expected inverse robustness-mortality association is verifiable as a significant trend along the age spectrum of risk factor-challenged cohorts. This is a retrospective cohort study of 372 adults (mean age 56.1 years, range 21-92; 45% female) with a variety of CV risk factors. An arterial model (VascAssist 2, iSYMED GmbH, Germany) was used to derive global parameters of arterial function from non-invasively acquired pulse pressure waves. Participants were stratified by health status: apparently healthy (AH; n = 221); with hypertension and/or hypercholesterolemia (CC; n = 61); with history of CV event(s) (CVE; n = 90). Multivariate linear regression was used to derive a robustness score which was calibrated against the CVD mortality hazard rate of a sub-cohort of the LURIC study (n = 1369; mean age 59.1 years, range 20-75; 37% female). Robustness correlated linearly with calendar age in CC (F(1, 59) = 10.42; p  < 0.01) and CVE (F(1, 88) = 40.34; p  < 0.0001) but not in the AH strata, supporting the hypothesis of preferential elimination of less robust individuals along the aging trajectory under risk factor challenges. Vascular robustness may serve as a biomarker of vulnerability to CVD risk factor challenges, prognosticating otherwise undetectable elevated risk for premature CVD mortality.

  16. Mapping the natural variation in whole bone stiffness and strength across skeletal sites

    PubMed Central

    Schlecht, Stephen H.; Bigelow, Erin M.R.; Jepsen, Karl J.

    2016-01-01

    Traits of the skeletal system are coordinately adjusted to establish mechanical homeostasis in response to genetic and environmental factors. Prior work demonstrated that this `complex adaptive' process is not perfect, revealing a two-fold difference in whole bone stiffness of the tibia across a population. Robustness (specifically, total cross-sectional area relative to length) varies widely across skeletal sites and between sexes. However, it is unknown whether the natural variation in whole bone stiffness and strength also varies across skeletal sites and between men and women. We tested the hypotheses that: 1) all major long bones of the appendicular skeleton demonstrate inherent, systemic constraints in the degree to which morphological and compositional traits can be adjusted for a given robustness; and 2) these traits covary in a predictable manner independent of body size and robustness. We assessed the functional relationships among robustness, cortical area (Ct.Ar), cortical tissue mineral density (Ct.TMD), and bone strength index (BSI) across the long bones of the upper and lower limbs of 115 adult men and women. All bones showed a significant (p < 0.001) positive regression between BSI and robustness after adjusting for body size, with slender bones being 1.7–2.3 times less stiff and strong in men and 1.3–2.8 times less stiff and strong in women compared to robust bones. Our findings are the first to document the natural inter-individual variation in whole bone stiffness and strength that exist within populations and that is predictable based on skeletal robustness for all major long bones. Documenting and further understanding this natural variation in strength may be critical for differentially diagnosing and treating skeletal fragility. PMID:24999223

  17. Transportation Infrastructure Robustness : Joint Engineering and Economic Analysis

    DOT National Transportation Integrated Search

    2017-11-01

    The objectives of this study are to develop a methodology for assessing the robustness of transportation infrastructure facilities and assess the effect of damage to such facilities on travel demand and the facilities users welfare. The robustness...

  18. Enabling Rapid and Robust Structural Analysis During Conceptual Design

    NASA Technical Reports Server (NTRS)

    Eldred, Lloyd B.; Padula, Sharon L.; Li, Wu

    2015-01-01

    This paper describes a multi-year effort to add a structural analysis subprocess to a supersonic aircraft conceptual design process. The desired capabilities include parametric geometry, automatic finite element mesh generation, static and aeroelastic analysis, and structural sizing. The paper discusses implementation details of the new subprocess, captures lessons learned, and suggests future improvements. The subprocess quickly compares concepts and robustly handles large changes in wing or fuselage geometry. The subprocess can rank concepts with regard to their structural feasibility and can identify promising regions of the design space. The automated structural analysis subprocess is deemed robust and rapid enough to be included in multidisciplinary conceptual design and optimization studies.

  19. Detecting sea-level hazards: Simple regression-based methods for calculating the acceleration of sea level

    USGS Publications Warehouse

    Doran, Kara S.; Howd, Peter A.; Sallenger,, Asbury H.

    2016-01-04

    Recent studies, and most of their predecessors, use tide gage data to quantify SL acceleration, ASL(t). In the current study, three techniques were used to calculate acceleration from tide gage data, and of those examined, it was determined that the two techniques based on sliding a regression window through the time series are more robust compared to the technique that fits a single quadratic form to the entire time series, particularly if there is temporal variation in the magnitude of the acceleration. The single-fit quadratic regression method has been the most commonly used technique in determining acceleration in tide gage data. The inability of the single-fit method to account for time-varying acceleration may explain some of the inconsistent findings between investigators. Properly quantifying ASL(t) from field measurements is of particular importance in evaluating numerical models of past, present, and future SLR resulting from anticipated climate change.

  20. A quantile regression model for failure-time data with time-dependent covariates

    PubMed Central

    Gorfine, Malka; Goldberg, Yair; Ritov, Ya’acov

    2017-01-01

    Summary Since survival data occur over time, often important covariates that we wish to consider also change over time. Such covariates are referred as time-dependent covariates. Quantile regression offers flexible modeling of survival data by allowing the covariates to vary with quantiles. This article provides a novel quantile regression model accommodating time-dependent covariates, for analyzing survival data subject to right censoring. Our simple estimation technique assumes the existence of instrumental variables. In addition, we present a doubly-robust estimator in the sense of Robins and Rotnitzky (1992, Recovery of information and adjustment for dependent censoring using surrogate markers. In: Jewell, N. P., Dietz, K. and Farewell, V. T. (editors), AIDS Epidemiology. Boston: Birkhaäuser, pp. 297–331.). The asymptotic properties of the estimators are rigorously studied. Finite-sample properties are demonstrated by a simulation study. The utility of the proposed methodology is demonstrated using the Stanford heart transplant dataset. PMID:27485534

  1. A Fast Gradient Method for Nonnegative Sparse Regression With Self-Dictionary

    NASA Astrophysics Data System (ADS)

    Gillis, Nicolas; Luce, Robert

    2018-01-01

    A nonnegative matrix factorization (NMF) can be computed efficiently under the separability assumption, which asserts that all the columns of the given input data matrix belong to the cone generated by a (small) subset of them. The provably most robust methods to identify these conic basis columns are based on nonnegative sparse regression and self dictionaries, and require the solution of large-scale convex optimization problems. In this paper we study a particular nonnegative sparse regression model with self dictionary. As opposed to previously proposed models, this model yields a smooth optimization problem where the sparsity is enforced through linear constraints. We show that the Euclidean projection on the polyhedron defined by these constraints can be computed efficiently, and propose a fast gradient method to solve our model. We compare our algorithm with several state-of-the-art methods on synthetic data sets and real-world hyperspectral images.

  2. Parametric Human Body Reconstruction Based on Sparse Key Points.

    PubMed

    Cheng, Ke-Li; Tong, Ruo-Feng; Tang, Min; Qian, Jing-Ye; Sarkis, Michel

    2016-11-01

    We propose an automatic parametric human body reconstruction algorithm which can efficiently construct a model using a single Kinect sensor. A user needs to stand still in front of the sensor for a couple of seconds to measure the range data. The user's body shape and pose will then be automatically constructed in several seconds. Traditional methods optimize dense correspondences between range data and meshes. In contrast, our proposed scheme relies on sparse key points for the reconstruction. It employs regression to find the corresponding key points between the scanned range data and some annotated training data. We design two kinds of feature descriptors as well as corresponding regression stages to make the regression robust and accurate. Our scheme follows with dense refinement where a pre-factorization method is applied to improve the computational efficiency. Compared with other methods, our scheme achieves similar reconstruction accuracy but significantly reduces runtime.

  3. Bayesian median regression for temporal gene expression data

    NASA Astrophysics Data System (ADS)

    Yu, Keming; Vinciotti, Veronica; Liu, Xiaohui; 't Hoen, Peter A. C.

    2007-09-01

    Most of the existing methods for the identification of biologically interesting genes in a temporal expression profiling dataset do not fully exploit the temporal ordering in the dataset and are based on normality assumptions for the gene expression. In this paper, we introduce a Bayesian median regression model to detect genes whose temporal profile is significantly different across a number of biological conditions. The regression model is defined by a polynomial function where both time and condition effects as well as interactions between the two are included. MCMC-based inference returns the posterior distribution of the polynomial coefficients. From this a simple Bayes factor test is proposed to test for significance. The estimation of the median rather than the mean, and within a Bayesian framework, increases the robustness of the method compared to a Hotelling T2-test previously suggested. This is shown on simulated data and on muscular dystrophy gene expression data.

  4. Regression Analysis by Example. 5th Edition

    ERIC Educational Resources Information Center

    Chatterjee, Samprit; Hadi, Ali S.

    2012-01-01

    Regression analysis is a conceptually simple method for investigating relationships among variables. Carrying out a successful application of regression analysis, however, requires a balance of theoretical results, empirical rules, and subjective judgment. "Regression Analysis by Example, Fifth Edition" has been expanded and thoroughly…

  5. Quantifying mineral abundances of complex mixtures by coupling spectral deconvolution of SWIR spectra (2.1-2.4 μm) and regression tree analysis

    USGS Publications Warehouse

    Mulder, V.L.; Plotze, Michael; de Bruin, Sytze; Schaepman, Michael E.; Mavris, C.; Kokaly, Raymond F.; Egli, Markus

    2013-01-01

    This paper presents a methodology for assessing mineral abundances of mixtures having more than two constituents using absorption features in the 2.1-2.4 μm wavelength region. In the first step, the absorption behaviour of mineral mixtures is parameterised by exponential Gaussian optimisation. Next, mineral abundances are predicted by regression tree analysis using these parameters as inputs. The approach is demonstrated on a range of prepared samples with known abundances of kaolinite, dioctahedral mica, smectite, calcite and quartz and on a set of field samples from Morocco. The latter contained varying quantities of other minerals, some of which did not have diagnostic absorption features in the 2.1-2.4 μm region. Cross validation showed that the prepared samples of kaolinite, dioctahedral mica, smectite and calcite were predicted with a root mean square error (RMSE) less than 9 wt.%. For the field samples, the RMSE was less than 8 wt.% for calcite, dioctahedral mica and kaolinite abundances. Smectite could not be well predicted, which was attributed to spectral variation of the cations within the dioctahedral layered smectites. Substitution of part of the quartz by chlorite at the prediction phase hardly affected the accuracy of the predicted mineral content; this suggests that the method is robust in handling the omission of minerals during the training phase. The degree of expression of absorption components was different between the field sample and the laboratory mixtures. This demonstrates that the method should be calibrated and trained on local samples. Our method allows the simultaneous quantification of more than two minerals within a complex mixture and thereby enhances the perspectives of spectral analysis for mineral abundances.

  6. Comparison of multi-subject ICA methods for analysis of fMRI data

    PubMed Central

    Erhardt, Erik Barry; Rachakonda, Srinivas; Bedrick, Edward; Allen, Elena; Adali, Tülay; Calhoun, Vince D.

    2010-01-01

    Spatial independent component analysis (ICA) applied to functional magnetic resonance imaging (fMRI) data identifies functionally connected networks by estimating spatially independent patterns from their linearly mixed fMRI signals. Several multi-subject ICA approaches estimating subject-specific time courses (TCs) and spatial maps (SMs) have been developed, however there has not yet been a full comparison of the implications of their use. Here, we provide extensive comparisons of four multi-subject ICA approaches in combination with data reduction methods for simulated and fMRI task data. For multi-subject ICA, the data first undergo reduction at the subject and group levels using principal component analysis (PCA). Comparisons of subject-specific, spatial concatenation, and group data mean subject-level reduction strategies using PCA and probabilistic PCA (PPCA) show that computationally intensive PPCA is equivalent to PCA, and that subject-specific and group data mean subject-level PCA are preferred because of well-estimated TCs and SMs. Second, aggregate independent components are estimated using either noise free ICA or probabilistic ICA (PICA). Third, subject-specific SMs and TCs are estimated using back-reconstruction. We compare several direct group ICA (GICA) back-reconstruction approaches (GICA1-GICA3) and an indirect back-reconstruction approach, spatio-temporal regression (STR, or dual regression). Results show the earlier group ICA (GICA1) approximates STR, however STR has contradictory assumptions and may show mixed-component artifacts in estimated SMs. Our evidence-based recommendation is to use GICA3, introduced here, with subject-specific PCA and noise-free ICA, providing the most robust and accurate estimated SMs and TCs in addition to offering an intuitive interpretation. PMID:21162045

  7. Correlation Between Body Temperature and Survival Rate in Patients With Hospital-Acquired Bacteremia: A Prospective Observational Study.

    PubMed

    Dai, Yu-Tzu; Lu, Shu-Hua; Chen, Yee-Chun; Ko, Wen-Je

    2015-10-01

    Fever is a complex and major sign of a patient's acute response to infection. However, analysis of the risks and benefits associated with the change in body temperature of an infected host remains controversial. To examine the relationship between the intensity of the change in body temperature and the mortality of patients with hospital-acquired bacteremia. A prospective observational study. Subjects were hospitalized adult patients who developed clinical signs of infection 48 hr or more after admission and had documented bacterial growth in blood culture. The maximum body temperature (maxTe) during the early period of infection measurements (i.e., the day before, the day of, and 2 days after the day of blood culture) was used to indicate the intensity of the body temperature response. Patients were categorized as discharged alive or died in hospital. Cox regression analysis was employed to analyze the data. The cohort consisted of 502 subjects. The mean maxTe of subjects was 38.6°C, and 14.9% had a maxTe lower than 38.0°C. The in-hospital mortality rate was 18.9%. The highest in-hospital mortality was found in subjects with a maxTe lower than 38°C (30.7%). Multivariate Cox regression analysis determined that the maxTe and the severity of comorbidity are the two variables associated with in-hospital mortality. Lack of a robust febrile response may be associated with greater risk of mortality in patients with bacteremia. Clinicians must be vigilant in identifying patients at risk for a blunted febrile response to bacteremia for more intensive monitoring. © The Author(s) 2014.

  8. BCI Competition IV – Data Set I: Learning Discriminative Patterns for Self-Paced EEG-Based Motor Imagery Detection

    PubMed Central

    Zhang, Haihong; Guan, Cuntai; Ang, Kai Keng; Wang, Chuanchu

    2012-01-01

    Detecting motor imagery activities versus non-control in brain signals is the basis of self-paced brain-computer interfaces (BCIs), but also poses a considerable challenge to signal processing due to the complex and non-stationary characteristics of motor imagery as well as non-control. This paper presents a self-paced BCI based on a robust learning mechanism that extracts and selects spatio-spectral features for differentiating multiple EEG classes. It also employs a non-linear regression and post-processing technique for predicting the time-series of class labels from the spatio-spectral features. The method was validated in the BCI Competition IV on Dataset I where it produced the lowest prediction error of class labels continuously. This report also presents and discusses analysis of the method using the competition data set. PMID:22347153

  9. Antimicrobial Drug Use and Resistance in Europe

    PubMed Central

    van de Sande-Bruinsma, Nienke; Verloo, Didier; Tiemersma, Edine; Monen, Jos; Goossens, Herman; Ferech, Matus

    2008-01-01

    Our study confronts the use of antimicrobial agents in ambulatory care with the resistance trends of 2 major pathogens, Streptococcus pneumoniae and Escherichia coli, in 21 European countries in 2000–2005 and explores whether the notion that antimicrobial drug use determines resistance can be supported by surveillance data at national aggregation levels. The data obtained from the European Surveillance of Antimicrobial Consumption and the European Antimicrobial Resistance Surveillance System suggest that variation of consumption coincides with the occurrence of resistance at the country level. Linear regression analysis showed that the association between antimicrobial drug use and resistance was specific and robust for 2 of 3 compound pathogen combinations, stable over time, but not sensitive enough to explain all of the observed variations. Ecologic studies based on routine surveillance data indicate a relation between use and resistance and support interventions designed to reduce antimicrobial drug consumption at a national level in Europe. PMID:18976555

  10. Real Time 3D Facial Movement Tracking Using a Monocular Camera

    PubMed Central

    Dong, Yanchao; Wang, Yanming; Yue, Jiguang; Hu, Zhencheng

    2016-01-01

    The paper proposes a robust framework for 3D facial movement tracking in real time using a monocular camera. It is designed to estimate the 3D face pose and local facial animation such as eyelid movement and mouth movement. The framework firstly utilizes the Discriminative Shape Regression method to locate the facial feature points on the 2D image and fuses the 2D data with a 3D face model using Extended Kalman Filter to yield 3D facial movement information. An alternating optimizing strategy is adopted to fit to different persons automatically. Experiments show that the proposed framework could track the 3D facial movement across various poses and illumination conditions. Given the real face scale the framework could track the eyelid with an error of 1 mm and mouth with an error of 2 mm. The tracking result is reliable for expression analysis or mental state inference. PMID:27463714

  11. Real Time 3D Facial Movement Tracking Using a Monocular Camera.

    PubMed

    Dong, Yanchao; Wang, Yanming; Yue, Jiguang; Hu, Zhencheng

    2016-07-25

    The paper proposes a robust framework for 3D facial movement tracking in real time using a monocular camera. It is designed to estimate the 3D face pose and local facial animation such as eyelid movement and mouth movement. The framework firstly utilizes the Discriminative Shape Regression method to locate the facial feature points on the 2D image and fuses the 2D data with a 3D face model using Extended Kalman Filter to yield 3D facial movement information. An alternating optimizing strategy is adopted to fit to different persons automatically. Experiments show that the proposed framework could track the 3D facial movement across various poses and illumination conditions. Given the real face scale the framework could track the eyelid with an error of 1 mm and mouth with an error of 2 mm. The tracking result is reliable for expression analysis or mental state inference.

  12. Leveraging AMI data for distribution system model calibration and situational awareness

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Peppanen, Jouni; Reno, Matthew J.; Thakkar, Mohini

    The many new distributed energy resources being installed at the distribution system level require increased visibility into system operations that will be enabled by distribution system state estimation (DSSE) and situational awareness applications. Reliable and accurate DSSE requires both robust methods for managing the big data provided by smart meters and quality distribution system models. This paper presents intelligent methods for detecting and dealing with missing or inaccurate smart meter data, as well as the ways to process the data for different applications. It also presents an efficient and flexible parameter estimation method based on the voltage drop equation andmore » regression analysis to enhance distribution system model accuracy. Finally, it presents a 3-D graphical user interface for advanced visualization of the system state and events. Moreover, we demonstrate this paper for a university distribution network with the state-of-the-art real-time and historical smart meter data infrastructure.« less

  13. Measurement of pattern roughness and local size variation using CD-SEM: current status

    NASA Astrophysics Data System (ADS)

    Fukuda, Hiroshi; Kawasaki, Takahiro; Kawada, Hiroki; Sakai, Kei; Kato, Takashi; Yamaguchi, Satoru; Ikota, Masami; Momonoi, Yoshinori

    2018-03-01

    Measurement of line edge roughness (LER) is discussed from four aspects: edge detection, PSD prediction, sampling strategy, and noise mitigation, and general guidelines and practical solutions for LER measurement today are introduced. Advanced edge detection algorithms such as wave-matching method are shown effective for robustly detecting edges from low SNR images, while conventional algorithm with weak filtering is still effective in suppressing SEM noise and aliasing. Advanced PSD prediction method such as multi-taper method is effective in suppressing sampling noise within a line edge to analyze, while number of lines is still required for suppressing line to line variation. Two types of SEM noise mitigation methods, "apparent noise floor" subtraction method and LER-noise decomposition using regression analysis are verified to successfully mitigate SEM noise from PSD curves. These results are extended to LCDU measurement to clarify the impact of SEM noise and sampling noise on LCDU.

  14. Ebola Virus Disease, Democratic Republic of the Congo, 2014.

    PubMed

    Nanclares, Carolina; Kapetshi, Jimmy; Lionetto, Fanshen; de la Rosa, Olimpia; Tamfun, Jean-Jacques Muyembe; Alia, Miriam; Kobinger, Gary; Bernasconi, Andrea

    2016-09-01

    During July-November 2014, the Democratic Republic of the Congo underwent its seventh Ebola virus disease (EVD) outbreak. The etiologic agent was Zaire Ebola virus; 66 cases were reported (overall case-fatality rate 74.2%). Through a retrospective observational study of confirmed EVD in 25 patients admitted to either of 2 Ebola treatment centers, we described clinical features and investigated correlates associated with death. Clinical features were mainly generic. At admission, 76% of patients had >1 gastrointestinal symptom and 28% >1 hemorrhagic symptom. The case-fatality rate in this group was 48% and was higher for female patients (67%). Cox regression analysis correlated death with initial low cycle threshold, indicating high viral load. Cycle threshold was a robust predictor of death, as were fever, hiccups, diarrhea, dyspnea, dehydration, disorientation, hematemesis, bloody feces during hospitalization, and anorexia in recent medical history. Differences from other outbreaks could suggest guidance for optimizing clinical management and disease control.

  15. Exposure to pesticide mixtures and DNA damage among rice field workers.

    PubMed

    Varona-Uribe, Marcela Eugenia; Torres-Rey, Carlos H; Díaz-Criollo, Sonia; Palma-Parra, Ruth Marien; Narváez, Diana María; Carmona, Sandra Patricia; Briceño, Leonardo; Idrovo, Alvaro J

    2016-01-01

    This study describes the use of pesticides mixtures and their potential association with comet assay results in 223 rice field workers in Colombia. Thirty-one pesticides were quantified in blood, serum, and urine (15 organochlorines, 10 organophosphorus, 5 carbamates, and ethylenethiourea), and the comet assay was performed. Twenty-four (77.42%) pesticides were present in the workers. The use of the maximum-likelihood factor analysis identified 8 different mixtures. Afterwards, robust regressions were used to explore associations between the factors identified and the comet assay. Two groups of mixtures--α-benzene hexachloride (α-BHC), hexachlorobenzene (HCB), and β-BHC (β: 1.21, 95% confidence interval [CI]: 0.33-2.10) and pirimiphos-methyl, malathion, bromophos-methyl, and bromophos-ethyl (β: 11.97, 95% CI: 2.34-21.60)--were associated with a higher percentage of DNA damage and comet tail length, respectively. The findings suggest that exposure to pesticides varies greatly among rice field workers.

  16. Photodynamic therapy monitoring with optical coherence angiography

    NASA Astrophysics Data System (ADS)

    Sirotkina, M. A.; Matveev, L. A.; Shirmanova, M. V.; Zaitsev, V. Y.; Buyanova, N. L.; Elagin, V. V.; Gelikonov, G. V.; Kuznetsov, S. S.; Kiseleva, E. B.; Moiseev, A. A.; Gamayunov, S. V.; Zagaynova, E. V.; Feldchtein, F. I.; Vitkin, A.; Gladkova, N. D.

    2017-02-01

    Photodynamic therapy (PDT) is a promising modern approach for cancer therapy with low normal tissue toxicity. This study was focused on a vascular-targeting Chlorine E6 mediated PDT. A new angiographic imaging approach known as M-mode-like optical coherence angiography (MML-OCA) was able to sensitively detect PDT-induced microvascular alterations in the mouse ear tumour model CT26. Histological analysis showed that the main mechanisms of vascular PDT was thrombosis of blood vessels and hemorrhage, which agrees with angiographic imaging by MML-OCA. Relationship between MML-OCA-detected early microvascular damage post PDT (within 24 hours) and tumour regression/regrowth was confirmed by histology. The advantages of MML-OCA such as direct image acquisition, fast processing, robust and affordable system opto-electronics, and label-free high contrast 3D visualization of the microvasculature suggest attractive possibilities of this method in practical clinical monitoring of cancer therapies with microvascular involvement.

  17. Leveraging AMI data for distribution system model calibration and situational awareness

    DOE PAGES

    Peppanen, Jouni; Reno, Matthew J.; Thakkar, Mohini; ...

    2015-01-15

    The many new distributed energy resources being installed at the distribution system level require increased visibility into system operations that will be enabled by distribution system state estimation (DSSE) and situational awareness applications. Reliable and accurate DSSE requires both robust methods for managing the big data provided by smart meters and quality distribution system models. This paper presents intelligent methods for detecting and dealing with missing or inaccurate smart meter data, as well as the ways to process the data for different applications. It also presents an efficient and flexible parameter estimation method based on the voltage drop equation andmore » regression analysis to enhance distribution system model accuracy. Finally, it presents a 3-D graphical user interface for advanced visualization of the system state and events. Moreover, we demonstrate this paper for a university distribution network with the state-of-the-art real-time and historical smart meter data infrastructure.« less

  18. 21st Century Trends in the Potential for Ozone Depletion

    NASA Astrophysics Data System (ADS)

    Hurwitz, M. M.; Newman, P. A.

    2009-05-01

    We find robust trends in the area where Antarctic stratospheric temperatures are below the threshold for polar stratospheric cloud (PSC) formation in Goddard Earth Observing System (GEOS) chemistry-climate model (CCM) simulations of the 21st century. In late winter (September-October-November), cold area trends are consistent with the respective trends in equivalent effective stratospheric chlorine (EESC), i.e. negative cold area trends in 'realistic future' simulations where EESC decreases and the ozone layer recovers. In the early winter (April through June), regardless of EESC scenario, we find an increasing cold area trend in all simulations; multiple linear regression analysis shows that this early winter cooling trend is associated with the predicted increase in greenhouse gas concentrations in the future. We compare the seasonality of the potential for Antarctic ozone depletion in two versions of the GEOS CCM and assess the impact of the above-mentioned cold area trends on polar stratospheric chemistry.

  19. Dynamic multifactor clustering of financial networks

    NASA Astrophysics Data System (ADS)

    Ross, Gordon J.

    2014-02-01

    We investigate the tendency for financial instruments to form clusters when there are multiple factors influencing the correlation structure. Specifically, we consider a stock portfolio which contains companies from different industrial sectors, located in several different countries. Both sector membership and geography combine to create a complex clustering structure where companies seem to first be divided based on sector, with geographical subclusters emerging within each industrial sector. We argue that standard techniques for detecting overlapping clusters and communities are not able to capture this type of structure and show how robust regression techniques can instead be used to remove the influence of both sector and geography from the correlation matrix separately. Our analysis reveals that prior to the 2008 financial crisis, companies did not tend to form clusters based on geography. This changed immediately following the crisis, with geography becoming a more important determinant of clustering structure.

  20. A bootstrap method for estimating uncertainty of water quality trends

    USGS Publications Warehouse

    Hirsch, Robert M.; Archfield, Stacey A.; DeCicco, Laura

    2015-01-01

    Estimation of the direction and magnitude of trends in surface water quality remains a problem of great scientific and practical interest. The Weighted Regressions on Time, Discharge, and Season (WRTDS) method was recently introduced as an exploratory data analysis tool to provide flexible and robust estimates of water quality trends. This paper enhances the WRTDS method through the introduction of the WRTDS Bootstrap Test (WBT), an extension of WRTDS that quantifies the uncertainty in WRTDS-estimates of water quality trends and offers various ways to visualize and communicate these uncertainties. Monte Carlo experiments are applied to estimate the Type I error probabilities for this method. WBT is compared to other water-quality trend-testing methods appropriate for data sets of one to three decades in length with sampling frequencies of 6–24 observations per year. The software to conduct the test is in the EGRETci R-package.

Top