Sample records for quantile regression models

  1. Quantile regression models of animal habitat relationships

    USGS Publications Warehouse

    Cade, Brian S.

    2003-01-01

    Typically, all factors that limit an organism are not measured and included in statistical models used to investigate relationships with their environment. If important unmeasured variables interact multiplicatively with the measured variables, the statistical models often will have heterogeneous response distributions with unequal variances. Quantile regression is an approach for estimating the conditional quantiles of a response variable distribution in the linear model, providing a more complete view of possible causal relationships between variables in ecological processes. Chapter 1 introduces quantile regression and discusses the ordering characteristics, interval nature, sampling variation, weighting, and interpretation of estimates for homogeneous and heterogeneous regression models. Chapter 2 evaluates performance of quantile rankscore tests used for hypothesis testing and constructing confidence intervals for linear quantile regression estimates (0 ≤ τ ≤ 1). A permutation F test maintained better Type I errors than the Chi-square T test for models with smaller n, greater number of parameters p, and more extreme quantiles τ. Both versions of the test required weighting to maintain correct Type I errors when there was heterogeneity under the alternative model. An example application related trout densities to stream channel width:depth. Chapter 3 evaluates a drop in dispersion, F-ratio like permutation test for hypothesis testing and constructing confidence intervals for linear quantile regression estimates (0 ≤ τ ≤ 1). Chapter 4 simulates from a large (N = 10,000) finite population representing grid areas on a landscape to demonstrate various forms of hidden bias that might occur when the effect of a measured habitat variable on some animal was confounded with the effect of another unmeasured variable (spatially and not spatially structured). Depending on whether interactions of the measured habitat and unmeasured variable were negative

  2. Quantile regression via vector generalized additive models.

    PubMed

    Yee, Thomas W

    2004-07-30

    One of the most popular methods for quantile regression is the LMS method of Cole and Green. The method naturally falls within a penalized likelihood framework, and consequently allows for considerable flexible because all three parameters may be modelled by cubic smoothing splines. The model is also very understandable: for a given value of the covariate, the LMS method applies a Box-Cox transformation to the response in order to transform it to standard normality; to obtain the quantiles, an inverse Box-Cox transformation is applied to the quantiles of the standard normal distribution. The purposes of this article are three-fold. Firstly, LMS quantile regression is presented within the framework of the class of vector generalized additive models. This confers a number of advantages such as a unifying theory and estimation process. Secondly, a new LMS method based on the Yeo-Johnson transformation is proposed, which has the advantage that the response is not restricted to be positive. Lastly, this paper describes a software implementation of three LMS quantile regression methods in the S language. This includes the LMS-Yeo-Johnson method, which is estimated efficiently by a new numerical integration scheme. The LMS-Yeo-Johnson method is illustrated by way of a large cross-sectional data set from a New Zealand working population. Copyright 2004 John Wiley & Sons, Ltd.

  3. SEMIPARAMETRIC QUANTILE REGRESSION WITH HIGH-DIMENSIONAL COVARIATES

    PubMed Central

    Zhu, Liping; Huang, Mian; Li, Runze

    2012-01-01

    This paper is concerned with quantile regression for a semiparametric regression model, in which both the conditional mean and conditional variance function of the response given the covariates admit a single-index structure. This semiparametric regression model enables us to reduce the dimension of the covariates and simultaneously retains the flexibility of nonparametric regression. Under mild conditions, we show that the simple linear quantile regression offers a consistent estimate of the index parameter vector. This is a surprising and interesting result because the single-index model is possibly misspecified under the linear quantile regression. With a root-n consistent estimate of the index vector, one may employ a local polynomial regression technique to estimate the conditional quantile function. This procedure is computationally efficient, which is very appealing in high-dimensional data analysis. We show that the resulting estimator of the quantile function performs asymptotically as efficiently as if the true value of the index vector were known. The methodologies are demonstrated through comprehensive simulation studies and an application to a real dataset. PMID:24501536

  4. Modeling energy expenditure in children and adolescents using quantile regression

    USDA-ARS?s Scientific Manuscript database

    Advanced mathematical models have the potential to capture the complex metabolic and physiological processes that result in energy expenditure (EE). Study objective is to apply quantile regression (QR) to predict EE and determine quantile-dependent variation in covariate effects in nonobese and obes...

  5. GLOBALLY ADAPTIVE QUANTILE REGRESSION WITH ULTRA-HIGH DIMENSIONAL DATA

    PubMed Central

    Zheng, Qi; Peng, Limin; He, Xuming

    2015-01-01

    Quantile regression has become a valuable tool to analyze heterogeneous covaraite-response associations that are often encountered in practice. The development of quantile regression methodology for high dimensional covariates primarily focuses on examination of model sparsity at a single or multiple quantile levels, which are typically prespecified ad hoc by the users. The resulting models may be sensitive to the specific choices of the quantile levels, leading to difficulties in interpretation and erosion of confidence in the results. In this article, we propose a new penalization framework for quantile regression in the high dimensional setting. We employ adaptive L1 penalties, and more importantly, propose a uniform selector of the tuning parameter for a set of quantile levels to avoid some of the potential problems with model selection at individual quantile levels. Our proposed approach achieves consistent shrinkage of regression quantile estimates across a continuous range of quantiles levels, enhancing the flexibility and robustness of the existing penalized quantile regression methods. Our theoretical results include the oracle rate of uniform convergence and weak convergence of the parameter estimators. We also use numerical studies to confirm our theoretical findings and illustrate the practical utility of our proposal. PMID:26604424

  6. Censored quantile regression with recursive partitioning-based weights

    PubMed Central

    Wey, Andrew; Wang, Lan; Rudser, Kyle

    2014-01-01

    Censored quantile regression provides a useful alternative to the Cox proportional hazards model for analyzing survival data. It directly models the conditional quantile of the survival time and hence is easy to interpret. Moreover, it relaxes the proportionality constraint on the hazard function associated with the popular Cox model and is natural for modeling heterogeneity of the data. Recently, Wang and Wang (2009. Locally weighted censored quantile regression. Journal of the American Statistical Association 103, 1117–1128) proposed a locally weighted censored quantile regression approach that allows for covariate-dependent censoring and is less restrictive than other censored quantile regression methods. However, their kernel smoothing-based weighting scheme requires all covariates to be continuous and encounters practical difficulty with even a moderate number of covariates. We propose a new weighting approach that uses recursive partitioning, e.g. survival trees, that offers greater flexibility in handling covariate-dependent censoring in moderately high dimensions and can incorporate both continuous and discrete covariates. We prove that this new weighting scheme leads to consistent estimation of the quantile regression coefficients and demonstrate its effectiveness via Monte Carlo simulations. We also illustrate the new method using a widely recognized data set from a clinical trial on primary biliary cirrhosis. PMID:23975800

  7. Quantile regression applied to spectral distance decay

    USGS Publications Warehouse

    Rocchini, D.; Cade, B.S.

    2008-01-01

    Remotely sensed imagery has long been recognized as a powerful support for characterizing and estimating biodiversity. Spectral distance among sites has proven to be a powerful approach for detecting species composition variability. Regression analysis of species similarity versus spectral distance allows us to quantitatively estimate the amount of turnover in species composition with respect to spectral and ecological variability. In classical regression analysis, the residual sum of squares is minimized for the mean of the dependent variable distribution. However, many ecological data sets are characterized by a high number of zeroes that add noise to the regression model. Quantile regressions can be used to evaluate trend in the upper quantiles rather than a mean trend across the whole distribution of the dependent variable. In this letter, we used ordinary least squares (OLS) and quantile regressions to estimate the decay of species similarity versus spectral distance. The achieved decay rates were statistically nonzero (p < 0.01), considering both OLS and quantile regressions. Nonetheless, the OLS regression estimate of the mean decay rate was only half the decay rate indicated by the upper quantiles. Moreover, the intercept value, representing the similarity reached when the spectral distance approaches zero, was very low compared with the intercepts of the upper quantiles, which detected high species similarity when habitats are more similar. In this letter, we demonstrated the power of using quantile regressions applied to spectral distance decay to reveal species diversity patterns otherwise lost or underestimated by OLS regression. ?? 2008 IEEE.

  8. Quantile Regression Models for Current Status Data

    PubMed Central

    Ou, Fang-Shu; Zeng, Donglin; Cai, Jianwen

    2016-01-01

    Current status data arise frequently in demography, epidemiology, and econometrics where the exact failure time cannot be determined but is only known to have occurred before or after a known observation time. We propose a quantile regression model to analyze current status data, because it does not require distributional assumptions and the coefficients can be interpreted as direct regression effects on the distribution of failure time in the original time scale. Our model assumes that the conditional quantile of failure time is a linear function of covariates. We assume conditional independence between the failure time and observation time. An M-estimator is developed for parameter estimation which is computed using the concave-convex procedure and its confidence intervals are constructed using a subsampling method. Asymptotic properties for the estimator are derived and proven using modern empirical process theory. The small sample performance of the proposed method is demonstrated via simulation studies. Finally, we apply the proposed method to analyze data from the Mayo Clinic Study of Aging. PMID:27994307

  9. Efficient Regressions via Optimally Combining Quantile Information*

    PubMed Central

    Zhao, Zhibiao; Xiao, Zhijie

    2014-01-01

    We develop a generally applicable framework for constructing efficient estimators of regression models via quantile regressions. The proposed method is based on optimally combining information over multiple quantiles and can be applied to a broad range of parametric and nonparametric settings. When combining information over a fixed number of quantiles, we derive an upper bound on the distance between the efficiency of the proposed estimator and the Fisher information. As the number of quantiles increases, this upper bound decreases and the asymptotic variance of the proposed estimator approaches the Cramér-Rao lower bound under appropriate conditions. In the case of non-regular statistical estimation, the proposed estimator leads to super-efficient estimation. We illustrate the proposed method for several widely used regression models. Both asymptotic theory and Monte Carlo experiments show the superior performance over existing methods. PMID:25484481

  10. Modeling energy expenditure in children and adolescents using quantile regression

    PubMed Central

    Yang, Yunwen; Adolph, Anne L.; Puyau, Maurice R.; Vohra, Firoz A.; Zakeri, Issa F.

    2013-01-01

    Advanced mathematical models have the potential to capture the complex metabolic and physiological processes that result in energy expenditure (EE). Study objective is to apply quantile regression (QR) to predict EE and determine quantile-dependent variation in covariate effects in nonobese and obese children. First, QR models will be developed to predict minute-by-minute awake EE at different quantile levels based on heart rate (HR) and physical activity (PA) accelerometry counts, and child characteristics of age, sex, weight, and height. Second, the QR models will be used to evaluate the covariate effects of weight, PA, and HR across the conditional EE distribution. QR and ordinary least squares (OLS) regressions are estimated in 109 children, aged 5–18 yr. QR modeling of EE outperformed OLS regression for both nonobese and obese populations. Average prediction errors for QR compared with OLS were not only smaller at the median τ = 0.5 (18.6 vs. 21.4%), but also substantially smaller at the tails of the distribution (10.2 vs. 39.2% at τ = 0.1 and 8.7 vs. 19.8% at τ = 0.9). Covariate effects of weight, PA, and HR on EE for the nonobese and obese children differed across quantiles (P < 0.05). The associations (linear and quadratic) between PA and HR with EE were stronger for the obese than nonobese population (P < 0.05). In conclusion, QR provided more accurate predictions of EE compared with conventional OLS regression, especially at the tails of the distribution, and revealed substantially different covariate effects of weight, PA, and HR on EE in nonobese and obese children. PMID:23640591

  11. Estimating effects of limiting factors with regression quantiles

    USGS Publications Warehouse

    Cade, B.S.; Terrell, J.W.; Schroeder, R.L.

    1999-01-01

    In a recent Concepts paper in Ecology, Thomson et al. emphasized that assumptions of conventional correlation and regression analyses fundamentally conflict with the ecological concept of limiting factors, and they called for new statistical procedures to address this problem. The analytical issue is that unmeasured factors may be the active limiting constraint and may induce a pattern of unequal variation in the biological response variable through an interaction with the measured factors. Consequently, changes near the maxima, rather than at the center of response distributions, are better estimates of the effects expected when the observed factor is the active limiting constraint. Regression quantiles provide estimates for linear models fit to any part of a response distribution, including near the upper bounds, and require minimal assumptions about the form of the error distribution. Regression quantiles extend the concept of one-sample quantiles to the linear model by solving an optimization problem of minimizing an asymmetric function of absolute errors. Rank-score tests for regression quantiles provide tests of hypotheses and confidence intervals for parameters in linear models with heteroscedastic errors, conditions likely to occur in models of limiting ecological relations. We used selected regression quantiles (e.g., 5th, 10th, ..., 95th) and confidence intervals to test hypotheses that parameters equal zero for estimated changes in average annual acorn biomass due to forest canopy cover of oak (Quercus spp.) and oak species diversity. Regression quantiles also were used to estimate changes in glacier lily (Erythronium grandiflorum) seedling numbers as a function of lily flower numbers, rockiness, and pocket gopher (Thomomys talpoides fossor) activity, data that motivated the query by Thomson et al. for new statistical procedures. Both example applications showed that effects of limiting factors estimated by changes in some upper regression quantile (e

  12. Consistent model identification of varying coefficient quantile regression with BIC tuning parameter selection

    PubMed Central

    Zheng, Qi; Peng, Limin

    2016-01-01

    Quantile regression provides a flexible platform for evaluating covariate effects on different segments of the conditional distribution of response. As the effects of covariates may change with quantile level, contemporaneously examining a spectrum of quantiles is expected to have a better capacity to identify variables with either partial or full effects on the response distribution, as compared to focusing on a single quantile. Under this motivation, we study a general adaptively weighted LASSO penalization strategy in the quantile regression setting, where a continuum of quantile index is considered and coefficients are allowed to vary with quantile index. We establish the oracle properties of the resulting estimator of coefficient function. Furthermore, we formally investigate a BIC-type uniform tuning parameter selector and show that it can ensure consistent model selection. Our numerical studies confirm the theoretical findings and illustrate an application of the new variable selection procedure. PMID:28008212

  13. A gentle introduction to quantile regression for ecologists

    USGS Publications Warehouse

    Cade, B.S.; Noon, B.R.

    2003-01-01

    Quantile regression is a way to estimate the conditional quantiles of a response variable distribution in the linear model that provides a more complete view of possible causal relationships between variables in ecological processes. Typically, all the factors that affect ecological processes are not measured and included in the statistical models used to investigate relationships between variables associated with those processes. As a consequence, there may be a weak or no predictive relationship between the mean of the response variable (y) distribution and the measured predictive factors (X). Yet there may be stronger, useful predictive relationships with other parts of the response variable distribution. This primer relates quantile regression estimates to prediction intervals in parametric error distribution regression models (eg least squares), and discusses the ordering characteristics, interval nature, sampling variation, weighting, and interpretation of the estimates for homogeneous and heterogeneous regression models.

  14. Analysis of the Influence of Quantile Regression Model on Mainland Tourists' Service Satisfaction Performance

    PubMed Central

    Wang, Wen-Cheng; Cho, Wen-Chien; Chen, Yin-Jen

    2014-01-01

    It is estimated that mainland Chinese tourists travelling to Taiwan can bring annual revenues of 400 billion NTD to the Taiwan economy. Thus, how the Taiwanese Government formulates relevant measures to satisfy both sides is the focus of most concern. Taiwan must improve the facilities and service quality of its tourism industry so as to attract more mainland tourists. This paper conducted a questionnaire survey of mainland tourists and used grey relational analysis in grey mathematics to analyze the satisfaction performance of all satisfaction question items. The first eight satisfaction items were used as independent variables, and the overall satisfaction performance was used as a dependent variable for quantile regression model analysis to discuss the relationship between the dependent variable under different quantiles and independent variables. Finally, this study further discussed the predictive accuracy of the least mean regression model and each quantile regression model, as a reference for research personnel. The analysis results showed that other variables could also affect the overall satisfaction performance of mainland tourists, in addition to occupation and age. The overall predictive accuracy of quantile regression model Q0.25 was higher than that of the other three models. PMID:24574916

  15. Analysis of the influence of quantile regression model on mainland tourists' service satisfaction performance.

    PubMed

    Wang, Wen-Cheng; Cho, Wen-Chien; Chen, Yin-Jen

    2014-01-01

    It is estimated that mainland Chinese tourists travelling to Taiwan can bring annual revenues of 400 billion NTD to the Taiwan economy. Thus, how the Taiwanese Government formulates relevant measures to satisfy both sides is the focus of most concern. Taiwan must improve the facilities and service quality of its tourism industry so as to attract more mainland tourists. This paper conducted a questionnaire survey of mainland tourists and used grey relational analysis in grey mathematics to analyze the satisfaction performance of all satisfaction question items. The first eight satisfaction items were used as independent variables, and the overall satisfaction performance was used as a dependent variable for quantile regression model analysis to discuss the relationship between the dependent variable under different quantiles and independent variables. Finally, this study further discussed the predictive accuracy of the least mean regression model and each quantile regression model, as a reference for research personnel. The analysis results showed that other variables could also affect the overall satisfaction performance of mainland tourists, in addition to occupation and age. The overall predictive accuracy of quantile regression model Q0.25 was higher than that of the other three models.

  16. Variable Selection for Nonparametric Quantile Regression via Smoothing Spline AN OVA

    PubMed Central

    Lin, Chen-Yen; Bondell, Howard; Zhang, Hao Helen; Zou, Hui

    2014-01-01

    Quantile regression provides a more thorough view of the effect of covariates on a response. Nonparametric quantile regression has become a viable alternative to avoid restrictive parametric assumption. The problem of variable selection for quantile regression is challenging, since important variables can influence various quantiles in different ways. We tackle the problem via regularization in the context of smoothing spline ANOVA models. The proposed sparse nonparametric quantile regression (SNQR) can identify important variables and provide flexible estimates for quantiles. Our numerical study suggests the promising performance of the new procedure in variable selection and function estimation. Supplementary materials for this article are available online. PMID:24554792

  17. Quantile Regression in the Study of Developmental Sciences

    PubMed Central

    Petscher, Yaacov; Logan, Jessica A. R.

    2014-01-01

    Linear regression analysis is one of the most common techniques applied in developmental research, but only allows for an estimate of the average relations between the predictor(s) and the outcome. This study describes quantile regression, which provides estimates of the relations between the predictor(s) and outcome, but across multiple points of the outcome’s distribution. Using data from the High School and Beyond and U.S. Sustained Effects Study databases, quantile regression is demonstrated and contrasted with linear regression when considering models with: (a) one continuous predictor, (b) one dichotomous predictor, (c) a continuous and a dichotomous predictor, and (d) a longitudinal application. Results from each example exhibited the differential inferences which may be drawn using linear or quantile regression. PMID:24329596

  18. Quantile Regression with Censored Data

    ERIC Educational Resources Information Center

    Lin, Guixian

    2009-01-01

    The Cox proportional hazards model and the accelerated failure time model are frequently used in survival data analysis. They are powerful, yet have limitation due to their model assumptions. Quantile regression offers a semiparametric approach to model data with possible heterogeneity. It is particularly powerful for censored responses, where the…

  19. Shrinkage Estimation of Varying Covariate Effects Based On Quantile Regression

    PubMed Central

    Peng, Limin; Xu, Jinfeng; Kutner, Nancy

    2013-01-01

    Varying covariate effects often manifest meaningful heterogeneity in covariate-response associations. In this paper, we adopt a quantile regression model that assumes linearity at a continuous range of quantile levels as a tool to explore such data dynamics. The consideration of potential non-constancy of covariate effects necessitates a new perspective for variable selection, which, under the assumed quantile regression model, is to retain variables that have effects on all quantiles of interest as well as those that influence only part of quantiles considered. Current work on l1-penalized quantile regression either does not concern varying covariate effects or may not produce consistent variable selection in the presence of covariates with partial effects, a practical scenario of interest. In this work, we propose a shrinkage approach by adopting a novel uniform adaptive LASSO penalty. The new approach enjoys easy implementation without requiring smoothing. Moreover, it can consistently identify the true model (uniformly across quantiles) and achieve the oracle estimation efficiency. We further extend the proposed shrinkage method to the case where responses are subject to random right censoring. Numerical studies confirm the theoretical results and support the utility of our proposals. PMID:25332515

  20. A quantile regression model for failure-time data with time-dependent covariates

    PubMed Central

    Gorfine, Malka; Goldberg, Yair; Ritov, Ya’acov

    2017-01-01

    Summary Since survival data occur over time, often important covariates that we wish to consider also change over time. Such covariates are referred as time-dependent covariates. Quantile regression offers flexible modeling of survival data by allowing the covariates to vary with quantiles. This article provides a novel quantile regression model accommodating time-dependent covariates, for analyzing survival data subject to right censoring. Our simple estimation technique assumes the existence of instrumental variables. In addition, we present a doubly-robust estimator in the sense of Robins and Rotnitzky (1992, Recovery of information and adjustment for dependent censoring using surrogate markers. In: Jewell, N. P., Dietz, K. and Farewell, V. T. (editors), AIDS Epidemiology. Boston: Birkhaäuser, pp. 297–331.). The asymptotic properties of the estimators are rigorously studied. Finite-sample properties are demonstrated by a simulation study. The utility of the proposed methodology is demonstrated using the Stanford heart transplant dataset. PMID:27485534

  1. Boosting structured additive quantile regression for longitudinal childhood obesity data.

    PubMed

    Fenske, Nora; Fahrmeir, Ludwig; Hothorn, Torsten; Rzehak, Peter; Höhle, Michael

    2013-07-25

    Childhood obesity and the investigation of its risk factors has become an important public health issue. Our work is based on and motivated by a German longitudinal study including 2,226 children with up to ten measurements on their body mass index (BMI) and risk factors from birth to the age of 10 years. We introduce boosting of structured additive quantile regression as a novel distribution-free approach for longitudinal quantile regression. The quantile-specific predictors of our model include conventional linear population effects, smooth nonlinear functional effects, varying-coefficient terms, and individual-specific effects, such as intercepts and slopes. Estimation is based on boosting, a computer intensive inference method for highly complex models. We propose a component-wise functional gradient descent boosting algorithm that allows for penalized estimation of the large variety of different effects, particularly leading to individual-specific effects shrunken toward zero. This concept allows us to flexibly estimate the nonlinear age curves of upper quantiles of the BMI distribution, both on population and on individual-specific level, adjusted for further risk factors and to detect age-varying effects of categorical risk factors. Our model approach can be regarded as the quantile regression analog of Gaussian additive mixed models (or structured additive mean regression models), and we compare both model classes with respect to our obesity data.

  2. Non-stationary hydrologic frequency analysis using B-spline quantile regression

    NASA Astrophysics Data System (ADS)

    Nasri, B.; Bouezmarni, T.; St-Hilaire, A.; Ouarda, T. B. M. J.

    2017-11-01

    Hydrologic frequency analysis is commonly used by engineers and hydrologists to provide the basic information on planning, design and management of hydraulic and water resources systems under the assumption of stationarity. However, with increasing evidence of climate change, it is possible that the assumption of stationarity, which is prerequisite for traditional frequency analysis and hence, the results of conventional analysis would become questionable. In this study, we consider a framework for frequency analysis of extremes based on B-Spline quantile regression which allows to model data in the presence of non-stationarity and/or dependence on covariates with linear and non-linear dependence. A Markov Chain Monte Carlo (MCMC) algorithm was used to estimate quantiles and their posterior distributions. A coefficient of determination and Bayesian information criterion (BIC) for quantile regression are used in order to select the best model, i.e. for each quantile, we choose the degree and number of knots of the adequate B-spline quantile regression model. The method is applied to annual maximum and minimum streamflow records in Ontario, Canada. Climate indices are considered to describe the non-stationarity in the variable of interest and to estimate the quantiles in this case. The results show large differences between the non-stationary quantiles and their stationary equivalents for an annual maximum and minimum discharge with high annual non-exceedance probabilities.

  3. Statistical downscaling modeling with quantile regression using lasso to estimate extreme rainfall

    NASA Astrophysics Data System (ADS)

    Santri, Dewi; Wigena, Aji Hamim; Djuraidah, Anik

    2016-02-01

    Rainfall is one of the climatic elements with high diversity and has many negative impacts especially extreme rainfall. Therefore, there are several methods that required to minimize the damage that may occur. So far, Global circulation models (GCM) are the best method to forecast global climate changes include extreme rainfall. Statistical downscaling (SD) is a technique to develop the relationship between GCM output as a global-scale independent variables and rainfall as a local- scale response variable. Using GCM method will have many difficulties when assessed against observations because GCM has high dimension and multicollinearity between the variables. The common method that used to handle this problem is principal components analysis (PCA) and partial least squares regression. The new method that can be used is lasso. Lasso has advantages in simultaneuosly controlling the variance of the fitted coefficients and performing automatic variable selection. Quantile regression is a method that can be used to detect extreme rainfall in dry and wet extreme. Objective of this study is modeling SD using quantile regression with lasso to predict extreme rainfall in Indramayu. The results showed that the estimation of extreme rainfall (extreme wet in January, February and December) in Indramayu could be predicted properly by the model at quantile 90th.

  4. Linear Regression Quantile Mapping (RQM) - A new approach to bias correction with consistent quantile trends

    NASA Astrophysics Data System (ADS)

    Passow, Christian; Donner, Reik

    2017-04-01

    Quantile mapping (QM) is an established concept that allows to correct systematic biases in multiple quantiles of the distribution of a climatic observable. It shows remarkable results in correcting biases in historical simulations through observational data and outperforms simpler correction methods which relate only to the mean or variance. Since it has been shown that bias correction of future predictions or scenario runs with basic QM can result in misleading trends in the projection, adjusted, trend preserving, versions of QM were introduced in the form of detrended quantile mapping (DQM) and quantile delta mapping (QDM) (Cannon, 2015, 2016). Still, all previous versions and applications of QM based bias correction rely on the assumption of time-independent quantiles over the investigated period, which can be misleading in the context of a changing climate. Here, we propose a novel combination of linear quantile regression (QR) with the classical QM method to introduce a consistent, time-dependent and trend preserving approach of bias correction for historical and future projections. Since QR is a regression method, it is possible to estimate quantiles in the same resolution as the given data and include trends or other dependencies. We demonstrate the performance of the new method of linear regression quantile mapping (RQM) in correcting biases of temperature and precipitation products from historical runs (1959 - 2005) of the COSMO model in climate mode (CCLM) from the Euro-CORDEX ensemble relative to gridded E-OBS data of the same spatial and temporal resolution. A thorough comparison with established bias correction methods highlights the strengths and potential weaknesses of the new RQM approach. References: A.J. Cannon, S.R. Sorbie, T.Q. Murdock: Bias Correction of GCM Precipitation by Quantile Mapping - How Well Do Methods Preserve Changes in Quantiles and Extremes? Journal of Climate, 28, 6038, 2015 A.J. Cannon: Multivariate Bias Correction of Climate

  5. Spectral distance decay: Assessing species beta-diversity by quantile regression

    USGS Publications Warehouse

    Rocchinl, D.; Nagendra, H.; Ghate, R.; Cade, B.S.

    2009-01-01

    Remotely sensed data represents key information for characterizing and estimating biodiversity. Spectral distance among sites has proven to be a powerful approach for detecting species composition variability. Regression analysis of species similarity versus spectral distance may allow us to quantitatively estimate how beta-diversity in species changes with respect to spectral and ecological variability. In classical regression analysis, the residual sum of squares is minimized for the mean of the dependent variable distribution. However, many ecological datasets are characterized by a high number of zeroes that can add noise to the regression model. Quantile regression can be used to evaluate trend in the upper quantiles rather than a mean trend across the whole distribution of the dependent variable. In this paper, we used ordinary least square (ols) and quantile regression to estimate the decay of species similarity versus spectral distance. The achieved decay rates were statistically nonzero (p < 0.05) considering both ols and quantile regression. Nonetheless, ols regression estimate of mean decay rate was only half the decay rate indicated by the upper quantiles. Moreover, the intercept value, representing the similarity reached when spectral distance approaches zero, was very low compared with the intercepts of upper quantiles, which detected high species similarity when habitats are more similar. In this paper we demonstrated the power of using quantile regressions applied to spectral distance decay in order to reveal species diversity patterns otherwise lost or underestimated by ordinary least square regression. ?? 2009 American Society for Photogrammetry and Remote Sensing.

  6. Estimating risks to aquatic life using quantile regression

    USGS Publications Warehouse

    Schmidt, Travis S.; Clements, William H.; Cade, Brian S.

    2012-01-01

    One of the primary goals of biological assessment is to assess whether contaminants or other stressors limit the ecological potential of running waters. It is important to interpret responses to contaminants relative to other environmental factors, but necessity or convenience limit quantification of all factors that influence ecological potential. In these situations, the concept of limiting factors is useful for data interpretation. We used quantile regression to measure risks to aquatic life exposed to metals by including all regression quantiles (τ  =  0.05–0.95, by increments of 0.05), not just the upper limit of density (e.g., 90th quantile). We measured population densities (individuals/0.1 m2) of 2 mayflies (Rhithrogena spp., Drunella spp.) and a caddisfly (Arctopsyche grandis), aqueous metal mixtures (Cd, Cu, Zn), and other limiting factors (basin area, site elevation, discharge, temperature) at 125 streams in Colorado. We used a model selection procedure to test which factor was most limiting to density. Arctopsyche grandis was limited by other factors, whereas metals limited most quantiles of density for the 2 mayflies. Metals reduced mayfly densities most at sites where other factors were not limiting. Where other factors were limiting, low mayfly densities were observed despite metal concentrations. Metals affected mayfly densities most at quantiles above the mean and not just at the upper limit of density. Risk models developed from quantile regression showed that mayfly densities observed at background metal concentrations are improbable when metal mixtures are at US Environmental Protection Agency criterion continuous concentrations. We conclude that metals limit potential density, not realized average density. The most obvious effects on mayfly populations were at upper quantiles and not mean density. Therefore, we suggest that policy developed from mean-based measures of effects may not be as useful as policy based on the concept of

  7. Composite marginal quantile regression analysis for longitudinal adolescent body mass index data.

    PubMed

    Yang, Chi-Chuan; Chen, Yi-Hau; Chang, Hsing-Yi

    2017-09-20

    Childhood and adolescenthood overweight or obesity, which may be quantified through the body mass index (BMI), is strongly associated with adult obesity and other health problems. Motivated by the child and adolescent behaviors in long-term evolution (CABLE) study, we are interested in individual, family, and school factors associated with marginal quantiles of longitudinal adolescent BMI values. We propose a new method for composite marginal quantile regression analysis for longitudinal outcome data, which performs marginal quantile regressions at multiple quantile levels simultaneously. The proposed method extends the quantile regression coefficient modeling method introduced by Frumento and Bottai (Biometrics 2016; 72:74-84) to longitudinal data accounting suitably for the correlation structure in longitudinal observations. A goodness-of-fit test for the proposed modeling is also developed. Simulation results show that the proposed method can be much more efficient than the analysis without taking correlation into account and the analysis performing separate quantile regressions at different quantile levels. The application to the longitudinal adolescent BMI data from the CABLE study demonstrates the practical utility of our proposal. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  8. Bayesian quantile regression-based partially linear mixed-effects joint models for longitudinal data with multiple features.

    PubMed

    Zhang, Hanze; Huang, Yangxin; Wang, Wei; Chen, Henian; Langland-Orban, Barbara

    2017-01-01

    In longitudinal AIDS studies, it is of interest to investigate the relationship between HIV viral load and CD4 cell counts, as well as the complicated time effect. Most of common models to analyze such complex longitudinal data are based on mean-regression, which fails to provide efficient estimates due to outliers and/or heavy tails. Quantile regression-based partially linear mixed-effects models, a special case of semiparametric models enjoying benefits of both parametric and nonparametric models, have the flexibility to monitor the viral dynamics nonparametrically and detect the varying CD4 effects parametrically at different quantiles of viral load. Meanwhile, it is critical to consider various data features of repeated measurements, including left-censoring due to a limit of detection, covariate measurement error, and asymmetric distribution. In this research, we first establish a Bayesian joint models that accounts for all these data features simultaneously in the framework of quantile regression-based partially linear mixed-effects models. The proposed models are applied to analyze the Multicenter AIDS Cohort Study (MACS) data. Simulation studies are also conducted to assess the performance of the proposed methods under different scenarios.

  9. Predicting Word Reading Ability: A Quantile Regression Study

    ERIC Educational Resources Information Center

    McIlraith, Autumn L.

    2018-01-01

    Predictors of early word reading are well established. However, it is unclear if these predictors hold for readers across a range of word reading abilities. This study used quantile regression to investigate predictive relationships at different points in the distribution of word reading. Quantile regression analyses used preschool and…

  10. Quantile Regression for Recurrent Gap Time Data

    PubMed Central

    Luo, Xianghua; Huang, Chiung-Yu; Wang, Lan

    2014-01-01

    Summary Evaluating covariate effects on gap times between successive recurrent events is of interest in many medical and public health studies. While most existing methods for recurrent gap time analysis focus on modeling the hazard function of gap times, a direct interpretation of the covariate effects on the gap times is not available through these methods. In this article, we consider quantile regression that can provide direct assessment of covariate effects on the quantiles of the gap time distribution. Following the spirit of the weighted risk-set method by Luo and Huang (2011, Statistics in Medicine 30, 301–311), we extend the martingale-based estimating equation method considered by Peng and Huang (2008, Journal of the American Statistical Association 103, 637–649) for univariate survival data to analyze recurrent gap time data. The proposed estimation procedure can be easily implemented in existing software for univariate censored quantile regression. Uniform consistency and weak convergence of the proposed estimators are established. Monte Carlo studies demonstrate the effectiveness of the proposed method. An application to data from the Danish Psychiatric Central Register is presented to illustrate the methods developed in this article. PMID:23489055

  11. Modeling distributional changes in winter precipitation of Canada using Bayesian spatiotemporal quantile regression subjected to different teleconnections

    NASA Astrophysics Data System (ADS)

    Tan, Xuezhi; Gan, Thian Yew; Chen, Shu; Liu, Bingjun

    2018-05-01

    Climate change and large-scale climate patterns may result in changes in probability distributions of climate variables that are associated with changes in the mean and variability, and severity of extreme climate events. In this paper, we applied a flexible framework based on the Bayesian spatiotemporal quantile (BSTQR) model to identify climate changes at different quantile levels and their teleconnections to large-scale climate patterns such as El Niño-Southern Oscillation (ENSO), Pacific Decadal Oscillation (PDO), North Atlantic Oscillation (NAO) and Pacific-North American (PNA). Using the BSTQR model with time (year) as a covariate, we estimated changes in Canadian winter precipitation and their uncertainties at different quantile levels. There were some stations in eastern Canada showing distributional changes in winter precipitation such as an increase in low quantiles but a decrease in high quantiles. Because quantile functions in the BSTQR model vary with space and time and assimilate spatiotemporal precipitation data, the BSTQR model produced much spatially smoother and less uncertain quantile changes than the classic regression without considering spatiotemporal correlations. Using the BSTQR model with five teleconnection indices (i.e., SOI, PDO, PNA, NP and NAO) as covariates, we investigated effects of large-scale climate patterns on Canadian winter precipitation at different quantile levels. Winter precipitation responses to these five teleconnections were found to occur differently at different quantile levels. Effects of five teleconnections on Canadian winter precipitation were stronger at low and high than at medium quantile levels.

  12. Robust and efficient estimation with weighted composite quantile regression

    NASA Astrophysics Data System (ADS)

    Jiang, Xuejun; Li, Jingzhi; Xia, Tian; Yan, Wanfeng

    2016-09-01

    In this paper we introduce a weighted composite quantile regression (CQR) estimation approach and study its application in nonlinear models such as exponential models and ARCH-type models. The weighted CQR is augmented by using a data-driven weighting scheme. With the error distribution unspecified, the proposed estimators share robustness from quantile regression and achieve nearly the same efficiency as the oracle maximum likelihood estimator (MLE) for a variety of error distributions including the normal, mixed-normal, Student's t, Cauchy distributions, etc. We also suggest an algorithm for the fast implementation of the proposed methodology. Simulations are carried out to compare the performance of different estimators, and the proposed approach is used to analyze the daily S&P 500 Composite index, which verifies the effectiveness and efficiency of our theoretical results.

  13. Influences of spatial and temporal variation on fish-habitat relationships defined by regression quantiles

    Treesearch

    Jason B. Dunham; Brian S. Cade; James W. Terrell

    2002-01-01

    We used regression quantiles to model potentially limiting relationships between the standing crop of cutthroat trout Oncorhynchus clarki and measures of stream channel morphology. Regression quantile models indicated that variation in fish density was inversely related to the width:depth ratio of streams but not to stream width or depth alone. The...

  14. Applying quantile regression for modeling equivalent property damage only crashes to identify accident blackspots.

    PubMed

    Washington, Simon; Haque, Md Mazharul; Oh, Jutaek; Lee, Dongmin

    2014-05-01

    Hot spot identification (HSID) aims to identify potential sites-roadway segments, intersections, crosswalks, interchanges, ramps, etc.-with disproportionately high crash risk relative to similar sites. An inefficient HSID methodology might result in either identifying a safe site as high risk (false positive) or a high risk site as safe (false negative), and consequently lead to the misuse the available public funds, to poor investment decisions, and to inefficient risk management practice. Current HSID methods suffer from issues like underreporting of minor injury and property damage only (PDO) crashes, challenges of accounting for crash severity into the methodology, and selection of a proper safety performance function to model crash data that is often heavily skewed by a preponderance of zeros. Addressing these challenges, this paper proposes a combination of a PDO equivalency calculation and quantile regression technique to identify hot spots in a transportation network. In particular, issues related to underreporting and crash severity are tackled by incorporating equivalent PDO crashes, whilst the concerns related to the non-count nature of equivalent PDO crashes and the skewness of crash data are addressed by the non-parametric quantile regression technique. The proposed method identifies covariate effects on various quantiles of a population, rather than the population mean like most methods in practice, which more closely corresponds with how black spots are identified in practice. The proposed methodology is illustrated using rural road segment data from Korea and compared against the traditional EB method with negative binomial regression. Application of a quantile regression model on equivalent PDO crashes enables identification of a set of high-risk sites that reflect the true safety costs to the society, simultaneously reduces the influence of under-reported PDO and minor injury crashes, and overcomes the limitation of traditional NB model in dealing

  15. Quality of life in breast cancer patients--a quantile regression analysis.

    PubMed

    Pourhoseingholi, Mohamad Amin; Safaee, Azadeh; Moghimi-Dehkordi, Bijan; Zeighami, Bahram; Faghihzadeh, Soghrat; Tabatabaee, Hamid Reza; Pourhoseingholi, Asma

    2008-01-01

    Quality of life study has an important role in health care especially in chronic diseases, in clinical judgment and in medical resources supplying. Statistical tools like linear regression are widely used to assess the predictors of quality of life. But when the response is not normal the results are misleading. The aim of this study is to determine the predictors of quality of life in breast cancer patients, using quantile regression model and compare to linear regression. A cross-sectional study conducted on 119 breast cancer patients that admitted and treated in chemotherapy ward of Namazi hospital in Shiraz. We used QLQ-C30 questionnaire to assessment quality of life in these patients. A quantile regression was employed to assess the assocciated factors and the results were compared to linear regression. All analysis carried out using SAS. The mean score for the global health status for breast cancer patients was 64.92+/-11.42. Linear regression showed that only grade of tumor, occupational status, menopausal status, financial difficulties and dyspnea were statistically significant. In spite of linear regression, financial difficulties were not significant in quantile regression analysis and dyspnea was only significant for first quartile. Also emotion functioning and duration of disease statistically predicted the QOL score in the third quartile. The results have demonstrated that using quantile regression leads to better interpretation and richer inference about predictors of the breast cancer patient quality of life.

  16. Quantile regression in the presence of monotone missingness with sensitivity analysis

    PubMed Central

    Liu, Minzhao; Daniels, Michael J.; Perri, Michael G.

    2016-01-01

    In this paper, we develop methods for longitudinal quantile regression when there is monotone missingness. In particular, we propose pattern mixture models with a constraint that provides a straightforward interpretation of the marginal quantile regression parameters. Our approach allows sensitivity analysis which is an essential component in inference for incomplete data. To facilitate computation of the likelihood, we propose a novel way to obtain analytic forms for the required integrals. We conduct simulations to examine the robustness of our approach to modeling assumptions and compare its performance to competing approaches. The model is applied to data from a recent clinical trial on weight management. PMID:26041008

  17. Relationship between Urbanization and Cancer Incidence in Iran Using Quantile Regression.

    PubMed

    Momenyan, Somayeh; Sadeghifar, Majid; Sarvi, Fatemeh; Khodadost, Mahmoud; Mosavi-Jarrahi, Alireza; Ghaffari, Mohammad Ebrahim; Sekhavati, Eghbal

    2016-01-01

    Quantile regression is an efficient method for predicting and estimating the relationship between explanatory variables and percentile points of the response distribution, particularly for extreme percentiles of the distribution. To study the relationship between urbanization and cancer morbidity, we here applied quantile regression. This cross-sectional study was conducted for 9 cancers in 345 cities in 2007 in Iran. Data were obtained from the Ministry of Health and Medical Education and the relationship between urbanization and cancer morbidity was investigated using quantile regression and least square regression. Fitting models were compared using AIC criteria. R (3.0.1) software and the Quantreg package were used for statistical analysis. With the quantile regression model all percentiles for breast, colorectal, prostate, lung and pancreas cancers demonstrated increasing incidence rate with urbanization. The maximum increase for breast cancer was in the 90th percentile (β=0.13, p-value<0.001), for colorectal cancer was in the 75th percentile (β=0.048, p-value<0.001), for prostate cancer the 95th percentile (β=0.55, p-value<0.001), for lung cancer was in 95th percentile (β=0.52, p-value=0.006), for pancreas cancer was in 10th percentile (β=0.011, p-value<0.001). For gastric, esophageal and skin cancers, with increasing urbanization, the incidence rate was decreased. The maximum decrease for gastric cancer was in the 90th percentile(β=0.003, p-value<0.001), for esophageal cancer the 95th (β=0.04, p-value=0.4) and for skin cancer also the 95th (β=0.145, p-value=0.071). The AIC showed that for upper percentiles, the fitting of quantile regression was better than least square regression. According to the results of this study, the significant impact of urbanization on cancer morbidity requirs more effort and planning by policymakers and administrators in order to reduce risk factors such as pollution in urban areas and ensure proper nutrition

  18. Quantile regression for the statistical analysis of immunological data with many non-detects.

    PubMed

    Eilers, Paul H C; Röder, Esther; Savelkoul, Huub F J; van Wijk, Roy Gerth

    2012-07-07

    Immunological parameters are hard to measure. A well-known problem is the occurrence of values below the detection limit, the non-detects. Non-detects are a nuisance, because classical statistical analyses, like ANOVA and regression, cannot be applied. The more advanced statistical techniques currently available for the analysis of datasets with non-detects can only be used if a small percentage of the data are non-detects. Quantile regression, a generalization of percentiles to regression models, models the median or higher percentiles and tolerates very high numbers of non-detects. We present a non-technical introduction and illustrate it with an implementation to real data from a clinical trial. We show that by using quantile regression, groups can be compared and that meaningful linear trends can be computed, even if more than half of the data consists of non-detects. Quantile regression is a valuable addition to the statistical methods that can be used for the analysis of immunological datasets with non-detects.

  19. Regularized quantile regression for SNP marker estimation of pig growth curves.

    PubMed

    Barroso, L M A; Nascimento, M; Nascimento, A C C; Silva, F F; Serão, N V L; Cruz, C D; Resende, M D V; Silva, F L; Azevedo, C F; Lopes, P S; Guimarães, S E F

    2017-01-01

    Genomic growth curves are generally defined only in terms of population mean; an alternative approach that has not yet been exploited in genomic analyses of growth curves is the Quantile Regression (QR). This methodology allows for the estimation of marker effects at different levels of the variable of interest. We aimed to propose and evaluate a regularized quantile regression for SNP marker effect estimation of pig growth curves, as well as to identify the chromosome regions of the most relevant markers and to estimate the genetic individual weight trajectory over time (genomic growth curve) under different quantiles (levels). The regularized quantile regression (RQR) enabled the discovery, at different levels of interest (quantiles), of the most relevant markers allowing for the identification of QTL regions. We found the same relevant markers simultaneously affecting different growth curve parameters (mature weight and maturity rate): two (ALGA0096701 and ALGA0029483) for RQR(0.2), one (ALGA0096701) for RQR(0.5), and one (ALGA0003761) for RQR(0.8). Three average genomic growth curves were obtained and the behavior was explained by the curve in quantile 0.2, which differed from the others. RQR allowed for the construction of genomic growth curves, which is the key to identifying and selecting the most desirable animals for breeding purposes. Furthermore, the proposed model enabled us to find, at different levels of interest (quantiles), the most relevant markers for each trait (growth curve parameter estimates) and their respective chromosomal positions (identification of new QTL regions for growth curves in pigs). These markers can be exploited under the context of marker assisted selection while aiming to change the shape of pig growth curves.

  20. Estimating equivalence with quantile regression

    USGS Publications Warehouse

    Cade, B.S.

    2011-01-01

    Equivalence testing and corresponding confidence interval estimates are used to provide more enlightened statistical statements about parameter estimates by relating them to intervals of effect sizes deemed to be of scientific or practical importance rather than just to an effect size of zero. Equivalence tests and confidence interval estimates are based on a null hypothesis that a parameter estimate is either outside (inequivalence hypothesis) or inside (equivalence hypothesis) an equivalence region, depending on the question of interest and assignment of risk. The former approach, often referred to as bioequivalence testing, is often used in regulatory settings because it reverses the burden of proof compared to a standard test of significance, following a precautionary principle for environmental protection. Unfortunately, many applications of equivalence testing focus on establishing average equivalence by estimating differences in means of distributions that do not have homogeneous variances. I discuss how to compare equivalence across quantiles of distributions using confidence intervals on quantile regression estimates that detect differences in heterogeneous distributions missed by focusing on means. I used one-tailed confidence intervals based on inequivalence hypotheses in a two-group treatment-control design for estimating bioequivalence of arsenic concentrations in soils at an old ammunition testing site and bioequivalence of vegetation biomass at a reclaimed mining site. Two-tailed confidence intervals based both on inequivalence and equivalence hypotheses were used to examine quantile equivalence for negligible trends over time for a continuous exponential model of amphibian abundance. ?? 2011 by the Ecological Society of America.

  1. Goodness of Fit and Misspecification in Quantile Regressions

    ERIC Educational Resources Information Center

    Furno, Marilena

    2011-01-01

    The article considers a test of specification for quantile regressions. The test relies on the increase of the objective function and the worsening of the fit when unnecessary constraints are imposed. It compares the objective functions of restricted and unrestricted models and, in its different formulations, it verifies (a) forecast ability, (b)…

  2. Quantile regression reveals hidden bias and uncertainty in habitat models

    Treesearch

    Brian S. Cade; Barry R. Noon; Curtis H. Flather

    2005-01-01

    We simulated the effects of missing information on statistical distributions of animal response that covaried with measured predictors of habitat to evaluate the utility and performance of quantile regression for providing more useful intervals of uncertainty in habitat relationships. These procedures were evaulated for conditions in which heterogeneity and hidden bias...

  3. Heritability Across the Distribution: An Application of Quantile Regression

    PubMed Central

    Petrill, Stephen A.; Hart, Sara A.; Schatschneider, Christopher; Thompson, Lee A.; Deater-Deckard, Kirby; DeThorne, Laura S.; Bartlett, Christopher

    2016-01-01

    We introduce a new method for analyzing twin data called quantile regression. Through the application presented here, quantile regression is able to assess the genetic and environmental etiology of any skill or ability, at multiple points in the distribution of that skill or ability. This method is compared to the Cherny et al. (Behav Genet 22:153–162, 1992) method in an application to four different reading-related outcomes in 304 pairs of first-grade same sex twins enrolled in the Western Reserve Reading Project. Findings across the two methods were similar; both indicated some variation across the distribution of the genetic and shared environmental influences on non-word reading. However, quantile regression provides more details about the location and size of the measured effect. Applications of the technique are discussed. PMID:21877231

  4. Groundwater depth prediction in a shallow aquifer in north China by a quantile regression model

    NASA Astrophysics Data System (ADS)

    Li, Fawen; Wei, Wan; Zhao, Yong; Qiao, Jiale

    2017-01-01

    There is a close relationship between groundwater level in a shallow aquifer and the surface ecological environment; hence, it is important to accurately simulate and predict the groundwater level in eco-environmental construction projects. The multiple linear regression (MLR) model is one of the most useful methods to predict groundwater level (depth); however, the predicted values by this model only reflect the mean distribution of the observations and cannot effectively fit the extreme distribution data (outliers). The study reported here builds a prediction model of groundwater-depth dynamics in a shallow aquifer using the quantile regression (QR) method on the basis of the observed data of groundwater depth and related factors. The proposed approach was applied to five sites in Tianjin city, north China, and the groundwater depth was calculated in different quantiles, from which the optimal quantile was screened out according to the box plot method and compared to the values predicted by the MLR model. The results showed that the related factors in the five sites did not follow the standard normal distribution and that there were outliers in the precipitation and last-month (initial state) groundwater-depth factors because the basic assumptions of the MLR model could not be achieved, thereby causing errors. Nevertheless, these conditions had no effect on the QR model, as it could more effectively describe the distribution of original data and had a higher precision in fitting the outliers.

  5. Forecasting peak asthma admissions in London: an application of quantile regression models.

    PubMed

    Soyiri, Ireneous N; Reidpath, Daniel D; Sarran, Christophe

    2013-07-01

    Asthma is a chronic condition of great public health concern globally. The associated morbidity, mortality and healthcare utilisation place an enormous burden on healthcare infrastructure and services. This study demonstrates a multistage quantile regression approach to predicting excess demand for health care services in the form of asthma daily admissions in London, using retrospective data from the Hospital Episode Statistics, weather and air quality. Trivariate quantile regression models (QRM) of asthma daily admissions were fitted to a 14-day range of lags of environmental factors, accounting for seasonality in a hold-in sample of the data. Representative lags were pooled to form multivariate predictive models, selected through a systematic backward stepwise reduction approach. Models were cross-validated using a hold-out sample of the data, and their respective root mean square error measures, sensitivity, specificity and predictive values compared. Two of the predictive models were able to detect extreme number of daily asthma admissions at sensitivity levels of 76 % and 62 %, as well as specificities of 66 % and 76 %. Their positive predictive values were slightly higher for the hold-out sample (29 % and 28 %) than for the hold-in model development sample (16 % and 18 %). QRMs can be used in multistage to select suitable variables to forecast extreme asthma events. The associations between asthma and environmental factors, including temperature, ozone and carbon monoxide can be exploited in predicting future events using QRMs.

  6. Forecasting peak asthma admissions in London: an application of quantile regression models

    NASA Astrophysics Data System (ADS)

    Soyiri, Ireneous N.; Reidpath, Daniel D.; Sarran, Christophe

    2013-07-01

    Asthma is a chronic condition of great public health concern globally. The associated morbidity, mortality and healthcare utilisation place an enormous burden on healthcare infrastructure and services. This study demonstrates a multistage quantile regression approach to predicting excess demand for health care services in the form of asthma daily admissions in London, using retrospective data from the Hospital Episode Statistics, weather and air quality. Trivariate quantile regression models (QRM) of asthma daily admissions were fitted to a 14-day range of lags of environmental factors, accounting for seasonality in a hold-in sample of the data. Representative lags were pooled to form multivariate predictive models, selected through a systematic backward stepwise reduction approach. Models were cross-validated using a hold-out sample of the data, and their respective root mean square error measures, sensitivity, specificity and predictive values compared. Two of the predictive models were able to detect extreme number of daily asthma admissions at sensitivity levels of 76 % and 62 %, as well as specificities of 66 % and 76 %. Their positive predictive values were slightly higher for the hold-out sample (29 % and 28 %) than for the hold-in model development sample (16 % and 18 %). QRMs can be used in multistage to select suitable variables to forecast extreme asthma events. The associations between asthma and environmental factors, including temperature, ozone and carbon monoxide can be exploited in predicting future events using QRMs.

  7. Multiple imputation for cure rate quantile regression with censored data.

    PubMed

    Wu, Yuanshan; Yin, Guosheng

    2017-03-01

    The main challenge in the context of cure rate analysis is that one never knows whether censored subjects are cured or uncured, or whether they are susceptible or insusceptible to the event of interest. Considering the susceptible indicator as missing data, we propose a multiple imputation approach to cure rate quantile regression for censored data with a survival fraction. We develop an iterative algorithm to estimate the conditionally uncured probability for each subject. By utilizing this estimated probability and Bernoulli sample imputation, we can classify each subject as cured or uncured, and then employ the locally weighted method to estimate the quantile regression coefficients with only the uncured subjects. Repeating the imputation procedure multiple times and taking an average over the resultant estimators, we obtain consistent estimators for the quantile regression coefficients. Our approach relaxes the usual global linearity assumption, so that we can apply quantile regression to any particular quantile of interest. We establish asymptotic properties for the proposed estimators, including both consistency and asymptotic normality. We conduct simulation studies to assess the finite-sample performance of the proposed multiple imputation method and apply it to a lung cancer study as an illustration. © 2016, The International Biometric Society.

  8. Rank score and permutation testing alternatives for regression quantile estimates

    USGS Publications Warehouse

    Cade, B.S.; Richards, J.D.; Mielke, P.W.

    2006-01-01

    Performance of quantile rank score tests used for hypothesis testing and constructing confidence intervals for linear quantile regression estimates (0 ≤ τ ≤ 1) were evaluated by simulation for models with p = 2 and 6 predictors, moderate collinearity among predictors, homogeneous and hetero-geneous errors, small to moderate samples (n = 20–300), and central to upper quantiles (0.50–0.99). Test statistics evaluated were the conventional quantile rank score T statistic distributed as χ2 random variable with q degrees of freedom (where q parameters are constrained by H 0:) and an F statistic with its sampling distribution approximated by permutation. The permutation F-test maintained better Type I errors than the T-test for homogeneous error models with smaller n and more extreme quantiles τ. An F distributional approximation of the F statistic provided some improvements in Type I errors over the T-test for models with > 2 parameters, smaller n, and more extreme quantiles but not as much improvement as the permutation approximation. Both rank score tests required weighting to maintain correct Type I errors when heterogeneity under the alternative model increased to 5 standard deviations across the domain of X. A double permutation procedure was developed to provide valid Type I errors for the permutation F-test when null models were forced through the origin. Power was similar for conditions where both T- and F-tests maintained correct Type I errors but the F-test provided some power at smaller n and extreme quantiles when the T-test had no power because of excessively conservative Type I errors. When the double permutation scheme was required for the permutation F-test to maintain valid Type I errors, power was less than for the T-test with decreasing sample size and increasing quantiles. Confidence intervals on parameters and tolerance intervals for future predictions were constructed based on test inversion for an example application

  9. HIGHLIGHTING DIFFERENCES BETWEEN CONDITIONAL AND UNCONDITIONAL QUANTILE REGRESSION APPROACHES THROUGH AN APPLICATION TO ASSESS MEDICATION ADHERENCE

    PubMed Central

    BORAH, BIJAN J.; BASU, ANIRBAN

    2014-01-01

    The quantile regression (QR) framework provides a pragmatic approach in understanding the differential impacts of covariates along the distribution of an outcome. However, the QR framework that has pervaded the applied economics literature is based on the conditional quantile regression method. It is used to assess the impact of a covariate on a quantile of the outcome conditional on specific values of other covariates. In most cases, conditional quantile regression may generate results that are often not generalizable or interpretable in a policy or population context. In contrast, the unconditional quantile regression method provides more interpretable results as it marginalizes the effect over the distributions of other covariates in the model. In this paper, the differences between these two regression frameworks are highlighted, both conceptually and econometrically. Additionally, using real-world claims data from a large US health insurer, alternative QR frameworks are implemented to assess the differential impacts of covariates along the distribution of medication adherence among elderly patients with Alzheimer’s disease. PMID:23616446

  10. Simultaneous multiple non-crossing quantile regression estimation using kernel constraints

    PubMed Central

    Liu, Yufeng; Wu, Yichao

    2011-01-01

    Quantile regression (QR) is a very useful statistical tool for learning the relationship between the response variable and covariates. For many applications, one often needs to estimate multiple conditional quantile functions of the response variable given covariates. Although one can estimate multiple quantiles separately, it is of great interest to estimate them simultaneously. One advantage of simultaneous estimation is that multiple quantiles can share strength among them to gain better estimation accuracy than individually estimated quantile functions. Another important advantage of joint estimation is the feasibility of incorporating simultaneous non-crossing constraints of QR functions. In this paper, we propose a new kernel-based multiple QR estimation technique, namely simultaneous non-crossing quantile regression (SNQR). We use kernel representations for QR functions and apply constraints on the kernel coefficients to avoid crossing. Both unregularised and regularised SNQR techniques are considered. Asymptotic properties such as asymptotic normality of linear SNQR and oracle properties of the sparse linear SNQR are developed. Our numerical results demonstrate the competitive performance of our SNQR over the original individual QR estimation. PMID:22190842

  11. Spatial quantile regression using INLA with applications to childhood overweight in Malawi.

    PubMed

    Mtambo, Owen P L; Masangwi, Salule J; Kazembe, Lawrence N M

    2015-04-01

    Analyses of childhood overweight have mainly used mean regression. However, using quantile regression is more appropriate as it provides flexibility to analyse the determinants of overweight corresponding to quantiles of interest. The main objective of this study was to fit a Bayesian additive quantile regression model with structured spatial effects for childhood overweight in Malawi using the 2010 Malawi DHS data. Inference was fully Bayesian using R-INLA package. The significant determinants of childhood overweight ranged from socio-demographic factors such as type of residence to child and maternal factors such as child age and maternal BMI. We observed significant positive structured spatial effects on childhood overweight in some districts of Malawi. We recommended that the childhood malnutrition policy makers should consider timely interventions based on risk factors as identified in this paper including spatial targets of interventions. Copyright © 2015 Elsevier Ltd. All rights reserved.

  12. Birthweight Related Factors in Northwestern Iran: Using Quantile Regression Method.

    PubMed

    Fallah, Ramazan; Kazemnejad, Anoshirvan; Zayeri, Farid; Shoghli, Alireza

    2015-11-18

    Birthweight is one of the most important predicting indicators of the health status in adulthood. Having a balanced birthweight is one of the priorities of the health system in most of the industrial and developed countries. This indicator is used to assess the growth and health status of the infants. The aim of this study was to assess the birthweight of the neonates by using quantile regression in Zanjan province. This analytical descriptive study was carried out using pre-registered (March 2010 - March 2012) data of neonates in urban/rural health centers of Zanjan province using multiple-stage cluster sampling. Data were analyzed using multiple linear regressions andquantile regression method and SAS 9.2 statistical software. From 8456 newborn baby, 4146 (49%) were female. The mean age of the mothers was 27.1±5.4 years. The mean birthweight of the neonates was 3104 ± 431 grams. Five hundred and seventy-three patients (6.8%) of the neonates were less than 2500 grams. In all quantiles, gestational age of neonates (p<0.05), weight and educational level of the mothers (p<0.05) showed a linear significant relationship with the i of the neonates. However, sex and birth rank of the neonates, mothers age, place of residence (urban/rural) and career were not significant in all quantiles (p>0.05). This study revealed the results of multiple linear regression and quantile regression were not identical. We strictly recommend the use of quantile regression when an asymmetric response variable or data with outliers is available.

  13. Birthweight Related Factors in Northwestern Iran: Using Quantile Regression Method

    PubMed Central

    Fallah, Ramazan; Kazemnejad, Anoshirvan; Zayeri, Farid; Shoghli, Alireza

    2016-01-01

    Introduction: Birthweight is one of the most important predicting indicators of the health status in adulthood. Having a balanced birthweight is one of the priorities of the health system in most of the industrial and developed countries. This indicator is used to assess the growth and health status of the infants. The aim of this study was to assess the birthweight of the neonates by using quantile regression in Zanjan province. Methods: This analytical descriptive study was carried out using pre-registered (March 2010 - March 2012) data of neonates in urban/rural health centers of Zanjan province using multiple-stage cluster sampling. Data were analyzed using multiple linear regressions andquantile regression method and SAS 9.2 statistical software. Results: From 8456 newborn baby, 4146 (49%) were female. The mean age of the mothers was 27.1±5.4 years. The mean birthweight of the neonates was 3104 ± 431 grams. Five hundred and seventy-three patients (6.8%) of the neonates were less than 2500 grams. In all quantiles, gestational age of neonates (p<0.05), weight and educational level of the mothers (p<0.05) showed a linear significant relationship with the i of the neonates. However, sex and birth rank of the neonates, mothers age, place of residence (urban/rural) and career were not significant in all quantiles (p>0.05). Conclusion: This study revealed the results of multiple linear regression and quantile regression were not identical. We strictly recommend the use of quantile regression when an asymmetric response variable or data with outliers is available. PMID:26925889

  14. Hospital charges associated with motorcycle crash factors: a quantile regression analysis.

    PubMed

    Olsen, Cody S; Thomas, Andrea M; Cook, Lawrence J

    2014-08-01

    Previous studies of motorcycle crash (MC) related hospital charges use trauma registries and hospital records, and do not adjust for the number of motorcyclists not requiring medical attention. This may lead to conservative estimates of helmet use effectiveness. MC records were probabilistically linked with emergency department and hospital records to obtain total hospital charges. Missing data were imputed. Multivariable quantile regression estimated reductions in hospital charges associated with helmet use and other crash factors. Motorcycle helmets were associated with reduced median hospital charges of $256 (42% reduction) and reduced 98th percentile of $32,390 (33% reduction). After adjusting for other factors, helmets were associated with reductions in charges in all upper percentiles studied. Quantile regression models described homogenous and heterogeneous associations between other crash factors and charges. Quantile regression comprehensively describes associations between crash factors and hospital charges. Helmet use among motorcyclists is associated with decreased hospital charges. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

  15. Highlighting differences between conditional and unconditional quantile regression approaches through an application to assess medication adherence.

    PubMed

    Borah, Bijan J; Basu, Anirban

    2013-09-01

    The quantile regression (QR) framework provides a pragmatic approach in understanding the differential impacts of covariates along the distribution of an outcome. However, the QR framework that has pervaded the applied economics literature is based on the conditional quantile regression method. It is used to assess the impact of a covariate on a quantile of the outcome conditional on specific values of other covariates. In most cases, conditional quantile regression may generate results that are often not generalizable or interpretable in a policy or population context. In contrast, the unconditional quantile regression method provides more interpretable results as it marginalizes the effect over the distributions of other covariates in the model. In this paper, the differences between these two regression frameworks are highlighted, both conceptually and econometrically. Additionally, using real-world claims data from a large US health insurer, alternative QR frameworks are implemented to assess the differential impacts of covariates along the distribution of medication adherence among elderly patients with Alzheimer's disease. Copyright © 2013 John Wiley & Sons, Ltd.

  16. Analysis of the labor productivity of enterprises via quantile regression

    NASA Astrophysics Data System (ADS)

    Türkan, Semra

    2017-07-01

    In this study, we have analyzed the factors that affect the performance of Turkey's Top 500 Industrial Enterprises using quantile regression. The variable about labor productivity of enterprises is considered as dependent variable, the variableabout assets is considered as independent variable. The distribution of labor productivity of enterprises is right-skewed. If the dependent distribution is skewed, linear regression could not catch important aspects of the relationships between the dependent variable and its predictors due to modeling only the conditional mean. Hence, the quantile regression, which allows modelingany quantilesof the dependent distribution, including the median,appears to be useful. It examines whether relationships between dependent and independent variables are different for low, medium, and high percentiles. As a result of analyzing data, the effect of total assets is relatively constant over the entire distribution, except the upper tail. It hasa moderately stronger effect in the upper tail.

  17. Influences of spatial and temporal variation on fish-habitat relationships defined by regression quantiles

    USGS Publications Warehouse

    Dunham, J.B.; Cade, B.S.; Terrell, J.W.

    2002-01-01

    We used regression quantiles to model potentially limiting relationships between the standing crop of cutthroat trout Oncorhynchus clarki and measures of stream channel morphology. Regression quantile models indicated that variation in fish density was inversely related to the width:depth ratio of streams but not to stream width or depth alone. The spatial and temporal stability of model predictions were examined across years and streams, respectively. Variation in fish density with width:depth ratio (10th-90th regression quantiles) modeled for streams sampled in 1993-1997 predicted the variation observed in 1998-1999, indicating similar habitat relationships across years. Both linear and nonlinear models described the limiting relationships well, the latter performing slightly better. Although estimated relationships were transferable in time, results were strongly dependent on the influence of spatial variation in fish density among streams. Density changes with width:depth ratio in a single stream were responsible for the significant (P < 0.10) negative slopes estimated for the higher quantiles (>80th). This suggests that stream-scale factors other than width:depth ratio play a more direct role in determining population density. Much of the variation in densities of cutthroat trout among streams was attributed to the occurrence of nonnative brook trout Salvelinus fontinalis (a possible competitor) or connectivity to migratory habitats. Regression quantiles can be useful for estimating the effects of limiting factors when ecological responses are highly variable, but our results indicate that spatiotemporal variability in the data should be explicitly considered. In this study, data from individual streams and stream-specific characteristics (e.g., the occurrence of nonnative species and habitat connectivity) strongly affected our interpretation of the relationship between width:depth ratio and fish density.

  18. Principles of Quantile Regression and an Application

    ERIC Educational Resources Information Center

    Chen, Fang; Chalhoub-Deville, Micheline

    2014-01-01

    Newer statistical procedures are typically introduced to help address the limitations of those already in practice or to deal with emerging research needs. Quantile regression (QR) is introduced in this paper as a relatively new methodology, which is intended to overcome some of the limitations of least squares mean regression (LMR). QR is more…

  19. Interquantile Shrinkage in Regression Models

    PubMed Central

    Jiang, Liewen; Wang, Huixia Judy; Bondell, Howard D.

    2012-01-01

    Conventional analysis using quantile regression typically focuses on fitting the regression model at different quantiles separately. However, in situations where the quantile coefficients share some common feature, joint modeling of multiple quantiles to accommodate the commonality often leads to more efficient estimation. One example of common features is that a predictor may have a constant effect over one region of quantile levels but varying effects in other regions. To automatically perform estimation and detection of the interquantile commonality, we develop two penalization methods. When the quantile slope coefficients indeed do not change across quantile levels, the proposed methods will shrink the slopes towards constant and thus improve the estimation efficiency. We establish the oracle properties of the two proposed penalization methods. Through numerical investigations, we demonstrate that the proposed methods lead to estimations with competitive or higher efficiency than the standard quantile regression estimation in finite samples. Supplemental materials for the article are available online. PMID:24363546

  20. Quantile Regression in the Study of Developmental Sciences

    ERIC Educational Resources Information Center

    Petscher, Yaacov; Logan, Jessica A. R.

    2014-01-01

    Linear regression analysis is one of the most common techniques applied in developmental research, but only allows for an estimate of the average relations between the predictor(s) and the outcome. This study describes quantile regression, which provides estimates of the relations between the predictor(s) and outcome, but across multiple points of…

  1. Using Gamma and Quantile Regressions to Explore the Association between Job Strain and Adiposity in the ELSA-Brasil Study: Does Gender Matter?

    PubMed

    Fonseca, Maria de Jesus Mendes da; Juvanhol, Leidjaira Lopes; Rotenberg, Lúcia; Nobre, Aline Araújo; Griep, Rosane Härter; Alves, Márcia Guimarães de Mello; Cardoso, Letícia de Oliveira; Giatti, Luana; Nunes, Maria Angélica; Aquino, Estela M L; Chor, Dóra

    2017-11-17

    This paper explores the association between job strain and adiposity, using two statistical analysis approaches and considering the role of gender. The research evaluated 11,960 active baseline participants (2008-2010) in the ELSA-Brasil study. Job strain was evaluated through a demand-control questionnaire, while body mass index (BMI) and waist circumference (WC) were evaluated in continuous form. The associations were estimated using gamma regression models with an identity link function. Quantile regression models were also estimated from the final set of co-variables established by gamma regression. The relationship that was found varied by analytical approach and gender. Among the women, no association was observed between job strain and adiposity in the fitted gamma models. In the quantile models, a pattern of increasing effects of high strain was observed at higher BMI and WC distribution quantiles. Among the men, high strain was associated with adiposity in the gamma regression models. However, when quantile regression was used, that association was found not to be homogeneous across outcome distributions. In addition, in the quantile models an association was observed between active jobs and BMI. Our results point to an association between job strain and adiposity, which follows a heterogeneous pattern. Modelling strategies can produce different results and should, accordingly, be used to complement one another.

  2. Contrasting OLS and Quantile Regression Approaches to Student "Growth" Percentiles

    ERIC Educational Resources Information Center

    Castellano, Katherine Elizabeth; Ho, Andrew Dean

    2013-01-01

    Regression methods can locate student test scores in a conditional distribution, given past scores. This article contrasts and clarifies two approaches to describing these locations in terms of readily interpretable percentile ranks or "conditional status percentile ranks." The first is Betebenner's quantile regression approach that results in…

  3. Regional trends in short-duration precipitation extremes: a flexible multivariate monotone quantile regression approach

    NASA Astrophysics Data System (ADS)

    Cannon, Alex

    2017-04-01

    univariate technique, and cannot incorporate information from additional covariates, for example ENSO state or physiographic controls on extreme rainfall within a region. Here, the univariate MQR model is extended to allow the use of multiple covariates. Multivariate monotone quantile regression (MMQR) is based on a single hidden-layer feedforward network with the quantile regression error function and partial monotonicity constraints. The MMQR model is demonstrated via Monte Carlo simulations and the estimation and visualization of regional trends in moderate rainfall extremes based on homogenized sub-daily precipitation data at stations in Canada.

  4. Comparing least-squares and quantile regression approaches to analyzing median hospital charges.

    PubMed

    Olsen, Cody S; Clark, Amy E; Thomas, Andrea M; Cook, Lawrence J

    2012-07-01

    Emergency department (ED) and hospital charges obtained from administrative data sets are useful descriptors of injury severity and the burden to EDs and the health care system. However, charges are typically positively skewed due to costly procedures, long hospital stays, and complicated or prolonged treatment for few patients. The median is not affected by extreme observations and is useful in describing and comparing distributions of hospital charges. A least-squares analysis employing a log transformation is one approach for estimating median hospital charges, corresponding confidence intervals (CIs), and differences between groups; however, this method requires certain distributional properties. An alternate method is quantile regression, which allows estimation and inference related to the median without making distributional assumptions. The objective was to compare the log-transformation least-squares method to the quantile regression approach for estimating median hospital charges, differences in median charges between groups, and associated CIs. The authors performed simulations using repeated sampling of observed statewide ED and hospital charges and charges randomly generated from a hypothetical lognormal distribution. The median and 95% CI and the multiplicative difference between the median charges of two groups were estimated using both least-squares and quantile regression methods. Performance of the two methods was evaluated. In contrast to least squares, quantile regression produced estimates that were unbiased and had smaller mean square errors in simulations of observed ED and hospital charges. Both methods performed well in simulations of hypothetical charges that met least-squares method assumptions. When the data did not follow the assumed distribution, least-squares estimates were often biased, and the associated CIs had lower than expected coverage as sample size increased. Quantile regression analyses of hospital charges provide unbiased

  5. Estimating normative limits of Heidelberg Retina Tomograph optic disc rim area with quantile regression.

    PubMed

    Artes, Paul H; Crabb, David P

    2010-01-01

    To investigate why the specificity of the Moorfields Regression Analysis (MRA) of the Heidelberg Retina Tomograph (HRT) varies with disc size, and to derive accurate normative limits for neuroretinal rim area to address this problem. Two datasets from healthy subjects (Manchester, UK, n = 88; Halifax, Nova Scotia, Canada, n = 75) were used to investigate the physiological relationship between the optic disc and neuroretinal rim area. Normative limits for rim area were derived by quantile regression (QR) and compared with those of the MRA (derived by linear regression). Logistic regression analyses were performed to quantify the association between disc size and positive classifications with the MRA, as well as with the QR-derived normative limits. In both datasets, the specificity of the MRA depended on optic disc size. The odds of observing a borderline or outside-normal-limits classification increased by approximately 10% for each 0.1 mm(2) increase in disc area (P < 0.1). The lower specificity of the MRA with large optic discs could be explained by the failure of linear regression to model the extremes of the rim area distribution (observations far from the mean). In comparison, the normative limits predicted by QR were larger for smaller discs (less specific, more sensitive), and smaller for larger discs, such that false-positive rates became independent of optic disc size. Normative limits derived by quantile regression appear to remove the size-dependence of specificity with the MRA. Because quantile regression does not rely on the restrictive assumptions of standard linear regression, it may be a more appropriate method for establishing normative limits in other clinical applications where the underlying distributions are nonnormal or have nonconstant variance.

  6. A quantile count model of water depth constraints on Cape Sable seaside sparrows

    USGS Publications Warehouse

    Cade, B.S.; Dong, Q.

    2008-01-01

    1. A quantile regression model for counts of breeding Cape Sable seaside sparrows Ammodramus maritimus mirabilis (L.) as a function of water depth and previous year abundance was developed based on extensive surveys, 1992-2005, in the Florida Everglades. The quantile count model extends linear quantile regression methods to discrete response variables, providing a flexible alternative to discrete parametric distributional models, e.g. Poisson, negative binomial and their zero-inflated counterparts. 2. Estimates from our multiplicative model demonstrated that negative effects of increasing water depth in breeding habitat on sparrow numbers were dependent on recent occupation history. Upper 10th percentiles of counts (one to three sparrows) decreased with increasing water depth from 0 to 30 cm when sites were not occupied in previous years. However, upper 40th percentiles of counts (one to six sparrows) decreased with increasing water depth for sites occupied in previous years. 3. Greatest decreases (-50% to -83%) in upper quantiles of sparrow counts occurred as water depths increased from 0 to 15 cm when previous year counts were 1, but a small proportion of sites (5-10%) held at least one sparrow even as water depths increased to 20 or 30 cm. 4. A zero-inflated Poisson regression model provided estimates of conditional means that also decreased with increasing water depth but rates of change were lower and decreased with increasing previous year counts compared to the quantile count model. Quantiles computed for the zero-inflated Poisson model enhanced interpretation of this model but had greater lack-of-fit for water depths > 0 cm and previous year counts 1, conditions where the negative effect of water depths were readily apparent and fitted better with the quantile count model.

  7. Modeling soil organic carbon with Quantile Regression: Dissecting predictors' effects on carbon stocks

    NASA Astrophysics Data System (ADS)

    Lombardo, Luigi; Saia, Sergio; Schillaci, Calogero; Mai, P. Martin; Huser, Raphaël

    2018-05-01

    Soil Organic Carbon (SOC) estimation is crucial to manage both natural and anthropic ecosystems and has recently been put under the magnifying glass after the Paris agreement 2016 due to its relationship with greenhouse gas. Statistical applications have dominated the SOC stock mapping at regional scale so far. However, the community has hardly ever attempted to implement Quantile Regression (QR) to spatially predict the SOC distribution. In this contribution, we test QR to estimate SOC stock (0-30 $cm$ depth) in the agricultural areas of a highly variable semi-arid region (Sicily, Italy, around 25,000 $km2$) by using topographic and remotely sensed predictors. We also compare the results with those from available SOC stock measurement. The QR models produced robust performances and allowed to recognize dominant effects among the predictors with respect to the considered quantile. This information, currently lacking, suggests that QR can discern predictor influences on SOC stock at specific sub-domains of each predictors. In this work, the predictive map generated at the median shows lower errors than those of the Joint Research Centre and International Soil Reference, and Information Centre benchmarks. The results suggest the use of QR as a comprehensive and effective method to map SOC using legacy data in agro-ecosystems. The R code scripted in this study for QR is included.

  8. Quantile regression analyses of associated factors for body mass index in Korean adolescents.

    PubMed

    Kim, T H; Lee, E K; Han, E

    2015-05-01

    This study examined the influence of home and school environments, and individual health-risk behaviours on body weight outcomes in Korean adolescents. This was a cross-sectional observational study. Quantile regression models to explore heterogeneity in the association of specific factors with body mass index (BMI) over the entire conditional BMI distribution was used. A nationally representative web-based survey for youths was used. Paternal education level of college or more education was associated with lower BMI for girls, whereas college or more education of mothers was associated with higher BMI for boys; for both, the magnitude of association became larger at the upper quantiles of the conditional BMI distribution. Girls with good family economic status were more likely to have higher BMIs than those with average family economic status, particularly at the upper quantile of the conditional BMI distribution. Attending a co-ed school was associated with lower BMI for both genders with a larger association at the upper quantiles. Substantial screen time for TV watching, video games, or internet surfing was associated with a higher BMI with a larger association at the upper quantiles for both girls and boys. Dental prevention was negatively associated with BMI, whereas suicide consideration was positively associated with BMIs of both genders with a larger association at a higher quantile. These findings suggest that interventions aimed at behavioural changes and positive parental roles are needed to effectively address high adolescent BMI. Copyright © 2015 The Royal Society for Public Health. Published by Elsevier Ltd. All rights reserved.

  9. A quantile regression approach can reveal the effect of fruit and vegetable consumption on plasma homocysteine levels.

    PubMed

    Verly, Eliseu; Steluti, Josiane; Fisberg, Regina Mara; Marchioni, Dirce Maria Lobo

    2014-01-01

    A reduction in homocysteine concentration due to the use of supplemental folic acid is well recognized, although evidence of the same effect for natural folate sources, such as fruits and vegetables (FV), is lacking. The traditional statistical analysis approaches do not provide further information. As an alternative, quantile regression allows for the exploration of the effects of covariates through percentiles of the conditional distribution of the dependent variable. To investigate how the associations of FV intake with plasma total homocysteine (tHcy) differ through percentiles in the distribution using quantile regression. A cross-sectional population-based survey was conducted among 499 residents of Sao Paulo City, Brazil. The participants provided food intake and fasting blood samples. Fruit and vegetable intake was predicted by adjusting for day-to-day variation using a proper measurement error model. We performed a quantile regression to verify the association between tHcy and the predicted FV intake. The predicted values of tHcy for each percentile model were calculated considering an increase of 200 g in the FV intake for each percentile. The results showed that tHcy was inversely associated with FV intake when assessed by linear regression whereas, the association was different when using quantile regression. The relationship with FV consumption was inverse and significant for almost all percentiles of tHcy. The coefficients increased as the percentile of tHcy increased. A simulated increase of 200 g in the FV intake could decrease the tHcy levels in the overall percentiles, but the higher percentiles of tHcy benefited more. This study confirms that the effect of FV intake on lowering the tHcy levels is dependent on the level of tHcy using an innovative statistical approach. From a public health point of view, encouraging people to increase FV intake would benefit people with high levels of tHcy.

  10. Finite-sample and asymptotic sign-based tests for parameters of non-linear quantile regression with Markov noise

    NASA Astrophysics Data System (ADS)

    Sirenko, M. A.; Tarasenko, P. F.; Pushkarev, M. I.

    2017-01-01

    One of the most noticeable features of sign-based statistical procedures is an opportunity to build an exact test for simple hypothesis testing of parameters in a regression model. In this article, we expanded a sing-based approach to the nonlinear case with dependent noise. The examined model is a multi-quantile regression, which makes it possible to test hypothesis not only of regression parameters, but of noise parameters as well.

  11. Logistic quantile regression provides improved estimates for bounded avian counts: A case study of California Spotted Owl fledgling production

    USGS Publications Warehouse

    Cade, Brian S.; Noon, Barry R.; Scherer, Rick D.; Keane, John J.

    2017-01-01

    Counts of avian fledglings, nestlings, or clutch size that are bounded below by zero and above by some small integer form a discrete random variable distribution that is not approximated well by conventional parametric count distributions such as the Poisson or negative binomial. We developed a logistic quantile regression model to provide estimates of the empirical conditional distribution of a bounded discrete random variable. The logistic quantile regression model requires that counts are randomly jittered to a continuous random variable, logit transformed to bound them between specified lower and upper values, then estimated in conventional linear quantile regression, repeating the 3 steps and averaging estimates. Back-transformation to the original discrete scale relies on the fact that quantiles are equivariant to monotonic transformations. We demonstrate this statistical procedure by modeling 20 years of California Spotted Owl fledgling production (0−3 per territory) on the Lassen National Forest, California, USA, as related to climate, demographic, and landscape habitat characteristics at territories. Spotted Owl fledgling counts increased nonlinearly with decreasing precipitation in the early nesting period, in the winter prior to nesting, and in the prior growing season; with increasing minimum temperatures in the early nesting period; with adult compared to subadult parents; when there was no fledgling production in the prior year; and when percentage of the landscape surrounding nesting sites (202 ha) with trees ≥25 m height increased. Changes in production were primarily driven by changes in the proportion of territories with 2 or 3 fledglings. Average variances of the discrete cumulative distributions of the estimated fledgling counts indicated that temporal changes in climate and parent age class explained 18% of the annual variance in owl fledgling production, which was 34% of the total variance. Prior fledgling production explained as much of

  12. Identifying the Safety Factors over Traffic Signs in State Roads using a Panel Quantile Regression Approach.

    PubMed

    Šarić, Željko; Xu, Xuecai; Duan, Li; Babić, Darko

    2018-06-20

    This study intended to investigate the interactions between accident rate and traffic signs in state roads located in Croatia, and accommodate the heterogeneity attributed to unobserved factors. The data from 130 state roads between 2012 and 2016 were collected from Traffic Accident Database System maintained by the Republic of Croatia Ministry of the Interior. To address the heterogeneity, a panel quantile regression model was proposed, in which quantile regression model offers a more complete view and a highly comprehensive analysis of the relationship between accident rate and traffic signs, while the panel data model accommodates the heterogeneity attributed to unobserved factors. Results revealed that (1) low visibility of material damage (MD) and death or injured (DI) increased the accident rate; (2) the number of mandatory signs and the number of warning signs were more likely to reduce the accident rate; (3)average speed limit and the number of invalid traffic signs per km exhibited a high accident rate. To our knowledge, it's the first attempt to analyze the interactions between accident consequences and traffic signs by employing a panel quantile regression model; by involving the visibility, the present study demonstrates that the low visibility causes a relatively higher risk of MD and DI; It is noteworthy that average speed limit corresponds with accident rate positively; The number of mandatory signs and the number of warning signs are more likely to reduce the accident rate; The number of invalid traffic signs per km are significant for accident rate, thus regular maintenance should be kept for a safer roadway environment.

  13. [Spatial heterogeneity in body condition of small yellow croaker in Yellow Sea and East China Sea based on mixed-effects model and quantile regression analysis].

    PubMed

    Liu, Zun-Lei; Yuan, Xing-Wei; Yan, Li-Ping; Yang, Lin-Lin; Cheng, Jia-Hua

    2013-09-01

    By using the 2008-2010 investigation data about the body condition of small yellow croaker in the offshore waters of southern Yellow Sea (SYS), open waters of northern East China Sea (NECS), and offshore waters of middle East China Sea (MECS), this paper analyzed the spatial heterogeneity of body length-body mass of juvenile and adult small yellow croakers by the statistical approaches of mean regression model and quantile regression model. The results showed that the residual standard errors from the analysis of covariance (ANCOVA) and the linear mixed-effects model were similar, and those from the simple linear regression were the highest. For the juvenile small yellow croakers, their mean body mass in SYS and NECS estimated by the mixed-effects mean regression model was higher than the overall average mass across the three regions, while the mean body mass in MECS was below the overall average. For the adult small yellow croakers, their mean body mass in NECS was higher than the overall average, while the mean body mass in SYS and MECS was below the overall average. The results from quantile regression indicated the substantial differences in the allometric relationships of juvenile small yellow croakers between SYS, NECS, and MECS, with the estimated mean exponent of the allometric relationship in SYS being 2.85, and the interquartile range being from 2.63 to 2.96, which indicated the heterogeneity of body form. The results from ANCOVA showed that the allometric body length-body mass relationships were significantly different between the 25th and 75th percentile exponent values (F=6.38, df=1737, P<0.01) and the 25th percentile and median exponent values (F=2.35, df=1737, P=0.039). The relationship was marginally different between the median and 75th percentile exponent values (F=2.21, df=1737, P=0.051). The estimated body length-body mass exponent of adult small yellow croakers in SYS was 3.01 (10th and 95th percentiles = 2.77 and 3.1, respectively). The

  14. Understanding Child Stunting in India: A Comprehensive Analysis of Socio-Economic, Nutritional and Environmental Determinants Using Additive Quantile Regression

    PubMed Central

    Fenske, Nora; Burns, Jacob; Hothorn, Torsten; Rehfuess, Eva A.

    2013-01-01

    Background Most attempts to address undernutrition, responsible for one third of global child deaths, have fallen behind expectations. This suggests that the assumptions underlying current modelling and intervention practices should be revisited. Objective We undertook a comprehensive analysis of the determinants of child stunting in India, and explored whether the established focus on linear effects of single risks is appropriate. Design Using cross-sectional data for children aged 0–24 months from the Indian National Family Health Survey for 2005/2006, we populated an evidence-based diagram of immediate, intermediate and underlying determinants of stunting. We modelled linear, non-linear, spatial and age-varying effects of these determinants using additive quantile regression for four quantiles of the Z-score of standardized height-for-age and logistic regression for stunting and severe stunting. Results At least one variable within each of eleven groups of determinants was significantly associated with height-for-age in the 35% Z-score quantile regression. The non-modifiable risk factors child age and sex, and the protective factors household wealth, maternal education and BMI showed the largest effects. Being a twin or multiple birth was associated with dramatically decreased height-for-age. Maternal age, maternal BMI, birth order and number of antenatal visits influenced child stunting in non-linear ways. Findings across the four quantile and two logistic regression models were largely comparable. Conclusions Our analysis confirms the multifactorial nature of child stunting. It emphasizes the need to pursue a systems-based approach and to consider non-linear effects, and suggests that differential effects across the height-for-age distribution do not play a major role. PMID:24223839

  15. Understanding child stunting in India: a comprehensive analysis of socio-economic, nutritional and environmental determinants using additive quantile regression.

    PubMed

    Fenske, Nora; Burns, Jacob; Hothorn, Torsten; Rehfuess, Eva A

    2013-01-01

    Most attempts to address undernutrition, responsible for one third of global child deaths, have fallen behind expectations. This suggests that the assumptions underlying current modelling and intervention practices should be revisited. We undertook a comprehensive analysis of the determinants of child stunting in India, and explored whether the established focus on linear effects of single risks is appropriate. Using cross-sectional data for children aged 0-24 months from the Indian National Family Health Survey for 2005/2006, we populated an evidence-based diagram of immediate, intermediate and underlying determinants of stunting. We modelled linear, non-linear, spatial and age-varying effects of these determinants using additive quantile regression for four quantiles of the Z-score of standardized height-for-age and logistic regression for stunting and severe stunting. At least one variable within each of eleven groups of determinants was significantly associated with height-for-age in the 35% Z-score quantile regression. The non-modifiable risk factors child age and sex, and the protective factors household wealth, maternal education and BMI showed the largest effects. Being a twin or multiple birth was associated with dramatically decreased height-for-age. Maternal age, maternal BMI, birth order and number of antenatal visits influenced child stunting in non-linear ways. Findings across the four quantile and two logistic regression models were largely comparable. Our analysis confirms the multifactorial nature of child stunting. It emphasizes the need to pursue a systems-based approach and to consider non-linear effects, and suggests that differential effects across the height-for-age distribution do not play a major role.

  16. Local Composite Quantile Regression Smoothing for Harris Recurrent Markov Processes

    PubMed Central

    Li, Degui; Li, Runze

    2016-01-01

    In this paper, we study the local polynomial composite quantile regression (CQR) smoothing method for the nonlinear and nonparametric models under the Harris recurrent Markov chain framework. The local polynomial CQR regression method is a robust alternative to the widely-used local polynomial method, and has been well studied in stationary time series. In this paper, we relax the stationarity restriction on the model, and allow that the regressors are generated by a general Harris recurrent Markov process which includes both the stationary (positive recurrent) and nonstationary (null recurrent) cases. Under some mild conditions, we establish the asymptotic theory for the proposed local polynomial CQR estimator of the mean regression function, and show that the convergence rate for the estimator in nonstationary case is slower than that in stationary case. Furthermore, a weighted type local polynomial CQR estimator is provided to improve the estimation efficiency, and a data-driven bandwidth selection is introduced to choose the optimal bandwidth involved in the nonparametric estimators. Finally, we give some numerical studies to examine the finite sample performance of the developed methodology and theory. PMID:27667894

  17. Incremental Treatment Costs Attributable to Overweight and Obesity in Patients with Diabetes: Quantile Regression Approach.

    PubMed

    Lee, Seung-Mi; Choi, In-Sun; Han, Euna; Suh, David; Shin, Eun-Kyung; Je, Seyunghe; Lee, Sung Su; Suh, Dong-Churl

    2018-01-01

    This study aimed to estimate treatment costs attributable to overweight and obesity in patients with diabetes who were less than 65 years of age in the United States. This study used data from the Medical Expenditure Panel Survey from 2001 to 2013. Patients with diabetes were identified by using the International Classification of Diseases, Ninth Revision, Clinical Modification code (250), clinical classification codes (049 and 050), or self-reported physician diagnoses. Total treatment costs attributable to overweight and obesity were calculated as the differences in the adjusted costs compared with individuals with diabetes and normal weight. Adjusted costs were estimated by using generalized linear models or unconditional quantile regression models. The mean annual treatment costs attributable to obesity were $1,852 higher than those attributable to normal weight, while costs attributable to overweight were $133 higher. The unconditional quantile regression results indicated that the impact of obesity on total treatment costs gradually became more significant as treatment costs approached the upper quantile. Among patients with diabetes who were less than 65 years of age, patients with diabetes and obesity have significantly higher treatment costs than patients with diabetes and normal weight. The economic burden of diabetes to society will continue to increase unless more proactive preventive measures are taken to effectively treat patients with overweight or obesity. © 2017 The Obesity Society.

  18. The quantile regression approach to efficiency measurement: insights from Monte Carlo simulations.

    PubMed

    Liu, Chunping; Laporte, Audrey; Ferguson, Brian S

    2008-09-01

    In the health economics literature there is an ongoing debate over approaches used to estimate the efficiency of health systems at various levels, from the level of the individual hospital - or nursing home - up to that of the health system as a whole. The two most widely used approaches to evaluating the efficiency with which various units deliver care are non-parametric data envelopment analysis (DEA) and parametric stochastic frontier analysis (SFA). Productivity researchers tend to have very strong preferences over which methodology to use for efficiency estimation. In this paper, we use Monte Carlo simulation to compare the performance of DEA and SFA in terms of their ability to accurately estimate efficiency. We also evaluate quantile regression as a potential alternative approach. A Cobb-Douglas production function, random error terms and a technical inefficiency term with different distributions are used to calculate the observed output. The results, based on these experiments, suggest that neither DEA nor SFA can be regarded as clearly dominant, and that, depending on the quantile estimated, the quantile regression approach may be a useful addition to the armamentarium of methods for estimating technical efficiency.

  19. Quantile Regression for Analyzing Heterogeneity in Ultra-high Dimension

    PubMed Central

    Wang, Lan; Wu, Yichao

    2012-01-01

    Ultra-high dimensional data often display heterogeneity due to either heteroscedastic variance or other forms of non-location-scale covariate effects. To accommodate heterogeneity, we advocate a more general interpretation of sparsity which assumes that only a small number of covariates influence the conditional distribution of the response variable given all candidate covariates; however, the sets of relevant covariates may differ when we consider different segments of the conditional distribution. In this framework, we investigate the methodology and theory of nonconvex penalized quantile regression in ultra-high dimension. The proposed approach has two distinctive features: (1) it enables us to explore the entire conditional distribution of the response variable given the ultra-high dimensional covariates and provides a more realistic picture of the sparsity pattern; (2) it requires substantially weaker conditions compared with alternative methods in the literature; thus, it greatly alleviates the difficulty of model checking in the ultra-high dimension. In theoretic development, it is challenging to deal with both the nonsmooth loss function and the nonconvex penalty function in ultra-high dimensional parameter space. We introduce a novel sufficient optimality condition which relies on a convex differencing representation of the penalized loss function and the subdifferential calculus. Exploring this optimality condition enables us to establish the oracle property for sparse quantile regression in the ultra-high dimension under relaxed conditions. The proposed method greatly enhances existing tools for ultra-high dimensional data analysis. Monte Carlo simulations demonstrate the usefulness of the proposed procedure. The real data example we analyzed demonstrates that the new approach reveals substantially more information compared with alternative methods. PMID:23082036

  20. Managing more than the mean: Using quantile regression to identify factors related to large elk groups

    USGS Publications Warehouse

    Brennan, Angela K.; Cross, Paul C.; Creely, Scott

    2015-01-01

    Synthesis and applications. Our analysis of elk group size distributions using quantile regression suggests that private land, irrigation, open habitat, elk density and wolf abundance can affect large elk group sizes. Thus, to manage larger groups by removal or dispersal of individuals, we recommend incentivizing hunting on private land (particularly if irrigated) during the regular and late hunting seasons, promoting tolerance of wolves on private land (if elk aggregate in these areas to avoid wolves) and creating more winter range and varied habitats. Relationships to the variables of interest also differed by quantile, highlighting the importance of using quantile regression to examine response variables more completely to uncover relationships important to conservation and management.

  1. Modeling the human development index and the percentage of poor people using quantile smoothing splines

    NASA Astrophysics Data System (ADS)

    Mulyani, Sri; Andriyana, Yudhie; Sudartianto

    2017-03-01

    Mean regression is a statistical method to explain the relationship between the response variable and the predictor variable based on the central tendency of the data (mean) of the response variable. The parameter estimation in mean regression (with Ordinary Least Square or OLS) generates a problem if we apply it to the data with a symmetric, fat-tailed, or containing outlier. Hence, an alternative method is necessary to be used to that kind of data, for example quantile regression method. The quantile regression is a robust technique to the outlier. This model can explain the relationship between the response variable and the predictor variable, not only on the central tendency of the data (median) but also on various quantile, in order to obtain complete information about that relationship. In this study, a quantile regression is developed with a nonparametric approach such as smoothing spline. Nonparametric approach is used if the prespecification model is difficult to determine, the relation between two variables follow the unknown function. We will apply that proposed method to poverty data. Here, we want to estimate the Percentage of Poor People as the response variable involving the Human Development Index (HDI) as the predictor variable.

  2. Environmental influence on mussel (Mytilus edulis) growth - A quantile regression approach

    NASA Astrophysics Data System (ADS)

    Bergström, Per; Lindegarth, Mats

    2016-03-01

    The need for methods for sustainable management and use of coastal ecosystems has increased in the last century. A key aspect for obtaining ecologically and economically sustainable aquaculture in threatened coastal areas is the requirement of geographic information of growth and potential production capacity. Growth varies over time and space and depends on a complex pattern of interactions between the bivalve and a diverse range of environmental factors (e.g. temperature, salinity, food availability). Understanding these processes and modelling the environmental control of bivalve growth has been central in aquaculture. In contrast to the most conventional modelling techniques, quantile regression can handle cases where not all factors are measured and provide the possibility to estimate the effect at different levels of the response distribution and give therefore a more complete picture of the relationship between environmental factors and biological response. Observation of the relationships between environmental factors and growth of the bivalve Mytilus edulis revealed relationships that varied both among level of growth rate and within the range of environmental variables along the Swedish west coast. The strongest patterns were found for water oxygen concentration level which had a negative effect on growth for all oxygen levels and growth levels. However, these patterns coincided with differences in growth among periods and very little of the remaining variability within periods could be explained indicating that interactive processes masked the importance of the individual variables. By using quantile regression and local regression (LOESS) this study was able to provide valuable information on environmental factors influencing the growth of M. edulis and important insight for the development of ecosystem based management tools of aquaculture activities, its use in mitigation efforts and successful management of human use of coastal areas.

  3. A Quantile Regression Approach to Understanding the Relations Between Morphological Awareness, Vocabulary, and Reading Comprehension in Adult Basic Education Students

    PubMed Central

    Tighe, Elizabeth L.; Schatschneider, Christopher

    2015-01-01

    The purpose of this study was to investigate the joint and unique contributions of morphological awareness and vocabulary knowledge at five reading comprehension levels in Adult Basic Education (ABE) students. We introduce the statistical technique of multiple quantile regression, which enabled us to assess the predictive utility of morphological awareness and vocabulary knowledge at multiple points (quantiles) along the continuous distribution of reading comprehension. To demonstrate the efficacy of our multiple quantile regression analysis, we compared and contrasted our results with a traditional multiple regression analytic approach. Our results indicated that morphological awareness and vocabulary knowledge accounted for a large portion of the variance (82-95%) in reading comprehension skills across all quantiles. Morphological awareness exhibited the greatest unique predictive ability at lower levels of reading comprehension whereas vocabulary knowledge exhibited the greatest unique predictive ability at higher levels of reading comprehension. These results indicate the utility of using multiple quantile regression to assess trajectories of component skills across multiple levels of reading comprehension. The implications of our findings for ABE programs are discussed. PMID:25351773

  4. A Quantile Regression Approach to Understanding the Relations Among Morphological Awareness, Vocabulary, and Reading Comprehension in Adult Basic Education Students.

    PubMed

    Tighe, Elizabeth L; Schatschneider, Christopher

    2016-07-01

    The purpose of this study was to investigate the joint and unique contributions of morphological awareness and vocabulary knowledge at five reading comprehension levels in adult basic education (ABE) students. We introduce the statistical technique of multiple quantile regression, which enabled us to assess the predictive utility of morphological awareness and vocabulary knowledge at multiple points (quantiles) along the continuous distribution of reading comprehension. To demonstrate the efficacy of our multiple quantile regression analysis, we compared and contrasted our results with a traditional multiple regression analytic approach. Our results indicated that morphological awareness and vocabulary knowledge accounted for a large portion of the variance (82%-95%) in reading comprehension skills across all quantiles. Morphological awareness exhibited the greatest unique predictive ability at lower levels of reading comprehension whereas vocabulary knowledge exhibited the greatest unique predictive ability at higher levels of reading comprehension. These results indicate the utility of using multiple quantile regression to assess trajectories of component skills across multiple levels of reading comprehension. The implications of our findings for ABE programs are discussed. © Hammill Institute on Disabilities 2014.

  5. Ordinary Least Squares and Quantile Regression: An Inquiry-Based Learning Approach to a Comparison of Regression Methods

    ERIC Educational Resources Information Center

    Helmreich, James E.; Krog, K. Peter

    2018-01-01

    We present a short, inquiry-based learning course on concepts and methods underlying ordinary least squares (OLS), least absolute deviation (LAD), and quantile regression (QR). Students investigate squared, absolute, and weighted absolute distance functions (metrics) as location measures. Using differential calculus and properties of convex…

  6. Bayesian quantitative precipitation forecasts in terms of quantiles

    NASA Astrophysics Data System (ADS)

    Bentzien, Sabrina; Friederichs, Petra

    2014-05-01

    Ensemble prediction systems (EPS) for numerical weather predictions on the mesoscale are particularly developed to obtain probabilistic guidance for high impact weather. An EPS not only issues a deterministic future state of the atmosphere but a sample of possible future states. Ensemble postprocessing then translates such a sample of forecasts into probabilistic measures. This study focus on probabilistic quantitative precipitation forecasts in terms of quantiles. Quantiles are particular suitable to describe precipitation at various locations, since no assumption is required on the distribution of precipitation. The focus is on the prediction during high-impact events and related to the Volkswagen Stiftung funded project WEX-MOP (Mesoscale Weather Extremes - Theory, Spatial Modeling and Prediction). Quantile forecasts are derived from the raw ensemble and via quantile regression. Neighborhood method and time-lagging are effective tools to inexpensively increase the ensemble spread, which results in more reliable forecasts especially for extreme precipitation events. Since an EPS provides a large amount of potentially informative predictors, a variable selection is required in order to obtain a stable statistical model. A Bayesian formulation of quantile regression allows for inference about the selection of predictive covariates by the use of appropriate prior distributions. Moreover, the implementation of an additional process layer for the regression parameters accounts for spatial variations of the parameters. Bayesian quantile regression and its spatially adaptive extension is illustrated for the German-focused mesoscale weather prediction ensemble COSMO-DE-EPS, which runs (pre)operationally since December 2010 at the German Meteorological Service (DWD). Objective out-of-sample verification uses the quantile score (QS), a weighted absolute error between quantile forecasts and observations. The QS is a proper scoring function and can be decomposed into

  7. Effects of environmental variables on invasive amphibian activity: Using model selection on quantiles for counts

    USGS Publications Warehouse

    Muller, Benjamin J.; Cade, Brian S.; Schwarzkoph, Lin

    2018-01-01

    Many different factors influence animal activity. Often, the value of an environmental variable may influence significantly the upper or lower tails of the activity distribution. For describing relationships with heterogeneous boundaries, quantile regressions predict a quantile of the conditional distribution of the dependent variable. A quantile count model extends linear quantile regression methods to discrete response variables, and is useful if activity is quantified by trapping, where there may be many tied (equal) values in the activity distribution, over a small range of discrete values. Additionally, different environmental variables in combination may have synergistic or antagonistic effects on activity, so examining their effects together, in a modeling framework, is a useful approach. Thus, model selection on quantile counts can be used to determine the relative importance of different variables in determining activity, across the entire distribution of capture results. We conducted model selection on quantile count models to describe the factors affecting activity (numbers of captures) of cane toads (Rhinella marina) in response to several environmental variables (humidity, temperature, rainfall, wind speed, and moon luminosity) over eleven months of trapping. Environmental effects on activity are understudied in this pest animal. In the dry season, model selection on quantile count models suggested that rainfall positively affected activity, especially near the lower tails of the activity distribution. In the wet season, wind speed limited activity near the maximum of the distribution, while minimum activity increased with minimum temperature. This statistical methodology allowed us to explore, in depth, how environmental factors influenced activity across the entire distribution, and is applicable to any survey or trapping regime, in which environmental variables affect activity.

  8. Accelerating Approximate Bayesian Computation with Quantile Regression: application to cosmological redshift distributions

    NASA Astrophysics Data System (ADS)

    Kacprzak, T.; Herbel, J.; Amara, A.; Réfrégier, A.

    2018-02-01

    Approximate Bayesian Computation (ABC) is a method to obtain a posterior distribution without a likelihood function, using simulations and a set of distance metrics. For that reason, it has recently been gaining popularity as an analysis tool in cosmology and astrophysics. Its drawback, however, is a slow convergence rate. We propose a novel method, which we call qABC, to accelerate ABC with Quantile Regression. In this method, we create a model of quantiles of distance measure as a function of input parameters. This model is trained on a small number of simulations and estimates which regions of the prior space are likely to be accepted into the posterior. Other regions are then immediately rejected. This procedure is then repeated as more simulations are available. We apply it to the practical problem of estimation of redshift distribution of cosmological samples, using forward modelling developed in previous work. The qABC method converges to nearly same posterior as the basic ABC. It uses, however, only 20% of the number of simulations compared to basic ABC, achieving a fivefold gain in execution time for our problem. For other problems the acceleration rate may vary; it depends on how close the prior is to the final posterior. We discuss possible improvements and extensions to this method.

  9. A simulation study of nonparametric total deviation index as a measure of agreement based on quantile regression.

    PubMed

    Lin, Lawrence; Pan, Yi; Hedayat, A S; Barnhart, Huiman X; Haber, Michael

    2016-01-01

    Total deviation index (TDI) captures a prespecified quantile of the absolute deviation of paired observations from raters, observers, methods, assays, instruments, etc. We compare the performance of TDI using nonparametric quantile regression to the TDI assuming normality (Lin, 2000). This simulation study considers three distributions: normal, Poisson, and uniform at quantile levels of 0.8 and 0.9 for cases with and without contamination. Study endpoints include the bias of TDI estimates (compared with their respective theoretical values), standard error of TDI estimates (compared with their true simulated standard errors), and test size (compared with 0.05), and power. Nonparametric TDI using quantile regression, although it slightly underestimates and delivers slightly less power for data without contamination, works satisfactorily under all simulated cases even for moderate (say, ≥40) sample sizes. The performance of the TDI based on a quantile of 0.8 is in general superior to that of 0.9. The performances of nonparametric and parametric TDI methods are compared with a real data example. Nonparametric TDI can be very useful when the underlying distribution on the difference is not normal, especially when it has a heavy tail.

  10. Using nonlinear quantile regression to estimate the self-thinning boundary curve

    Treesearch

    Quang V. Cao; Thomas J. Dean

    2015-01-01

    The relationship between tree size (quadratic mean diameter) and tree density (number of trees per unit area) has been a topic of research and discussion for many decades. Starting with Reineke in 1933, the maximum size-density relationship, on a log-log scale, has been assumed to be linear. Several techniques, including linear quantile regression, have been employed...

  11. Early Warning Signals of Financial Crises with Multi-Scale Quantile Regressions of Log-Periodic Power Law Singularities.

    PubMed

    Zhang, Qun; Zhang, Qunzhi; Sornette, Didier

    2016-01-01

    We augment the existing literature using the Log-Periodic Power Law Singular (LPPLS) structures in the log-price dynamics to diagnose financial bubbles by providing three main innovations. First, we introduce the quantile regression to the LPPLS detection problem. This allows us to disentangle (at least partially) the genuine LPPLS signal and the a priori unknown complicated residuals. Second, we propose to combine the many quantile regressions with a multi-scale analysis, which aggregates and consolidates the obtained ensembles of scenarios. Third, we define and implement the so-called DS LPPLS Confidence™ and Trust™ indicators that enrich considerably the diagnostic of bubbles. Using a detailed study of the "S&P 500 1987" bubble and presenting analyses of 16 historical bubbles, we show that the quantile regression of LPPLS signals contributes useful early warning signals. The comparison between the constructed signals and the price development in these 16 historical bubbles demonstrates their significant predictive ability around the real critical time when the burst/rally occurs.

  12. Assessment of Weighted Quantile Sum Regression for Modeling Chemical Mixtures and Cancer Risk

    PubMed Central

    Czarnota, Jenna; Gennings, Chris; Wheeler, David C

    2015-01-01

    In evaluation of cancer risk related to environmental chemical exposures, the effect of many chemicals on disease is ultimately of interest. However, because of potentially strong correlations among chemicals that occur together, traditional regression methods suffer from collinearity effects, including regression coefficient sign reversal and variance inflation. In addition, penalized regression methods designed to remediate collinearity may have limitations in selecting the truly bad actors among many correlated components. The recently proposed method of weighted quantile sum (WQS) regression attempts to overcome these problems by estimating a body burden index, which identifies important chemicals in a mixture of correlated environmental chemicals. Our focus was on assessing through simulation studies the accuracy of WQS regression in detecting subsets of chemicals associated with health outcomes (binary and continuous) in site-specific analyses and in non-site-specific analyses. We also evaluated the performance of the penalized regression methods of lasso, adaptive lasso, and elastic net in correctly classifying chemicals as bad actors or unrelated to the outcome. We based the simulation study on data from the National Cancer Institute Surveillance Epidemiology and End Results Program (NCI-SEER) case–control study of non-Hodgkin lymphoma (NHL) to achieve realistic exposure situations. Our results showed that WQS regression had good sensitivity and specificity across a variety of conditions considered in this study. The shrinkage methods had a tendency to incorrectly identify a large number of components, especially in the case of strong association with the outcome. PMID:26005323

  13. Assessment of weighted quantile sum regression for modeling chemical mixtures and cancer risk.

    PubMed

    Czarnota, Jenna; Gennings, Chris; Wheeler, David C

    2015-01-01

    In evaluation of cancer risk related to environmental chemical exposures, the effect of many chemicals on disease is ultimately of interest. However, because of potentially strong correlations among chemicals that occur together, traditional regression methods suffer from collinearity effects, including regression coefficient sign reversal and variance inflation. In addition, penalized regression methods designed to remediate collinearity may have limitations in selecting the truly bad actors among many correlated components. The recently proposed method of weighted quantile sum (WQS) regression attempts to overcome these problems by estimating a body burden index, which identifies important chemicals in a mixture of correlated environmental chemicals. Our focus was on assessing through simulation studies the accuracy of WQS regression in detecting subsets of chemicals associated with health outcomes (binary and continuous) in site-specific analyses and in non-site-specific analyses. We also evaluated the performance of the penalized regression methods of lasso, adaptive lasso, and elastic net in correctly classifying chemicals as bad actors or unrelated to the outcome. We based the simulation study on data from the National Cancer Institute Surveillance Epidemiology and End Results Program (NCI-SEER) case-control study of non-Hodgkin lymphoma (NHL) to achieve realistic exposure situations. Our results showed that WQS regression had good sensitivity and specificity across a variety of conditions considered in this study. The shrinkage methods had a tendency to incorrectly identify a large number of components, especially in the case of strong association with the outcome.

  14. Examining Predictive Validity of Oral Reading Fluency Slope in Upper Elementary Grades Using Quantile Regression.

    PubMed

    Cho, Eunsoo; Capin, Philip; Roberts, Greg; Vaughn, Sharon

    2017-07-01

    Within multitiered instructional delivery models, progress monitoring is a key mechanism for determining whether a child demonstrates an adequate response to instruction. One measure commonly used to monitor the reading progress of students is oral reading fluency (ORF). This study examined the extent to which ORF slope predicts reading comprehension outcomes for fifth-grade struggling readers ( n = 102) participating in an intensive reading intervention. Quantile regression models showed that ORF slope significantly predicted performance on a sentence-level fluency and comprehension assessment, regardless of the students' reading skills, controlling for initial ORF performance. However, ORF slope was differentially predictive of a passage-level comprehension assessment based on students' reading skills when controlling for initial ORF status. Results showed that ORF explained unique variance for struggling readers whose posttest performance was at the upper quantiles at the end of the reading intervention, but slope was not a significant predictor of passage-level comprehension for students whose reading problems were the most difficult to remediate.

  15. More green space is related to less antidepressant prescription rates in the Netherlands: A Bayesian geoadditive quantile regression approach.

    PubMed

    Helbich, Marco; Klein, Nadja; Roberts, Hannah; Hagedoorn, Paulien; Groenewegen, Peter P

    2018-06-20

    Exposure to green space seems to be beneficial for self-reported mental health. In this study we used an objective health indicator, namely antidepressant prescription rates. Current studies rely exclusively upon mean regression models assuming linear associations. It is, however, plausible that the presence of green space is non-linearly related with different quantiles of the outcome antidepressant prescription rates. These restrictions may contribute to inconsistent findings. Our aim was: a) to assess antidepressant prescription rates in relation to green space, and b) to analyze how the relationship varies non-linearly across different quantiles of antidepressant prescription rates. We used cross-sectional data for the year 2014 at a municipality level in the Netherlands. Ecological Bayesian geoadditive quantile regressions were fitted for the 15%, 50%, and 85% quantiles to estimate green space-prescription rate correlations, controlling for physical activity levels, socio-demographics, urbanicity, etc. RESULTS: The results suggested that green space was overall inversely and non-linearly associated with antidepressant prescription rates. More important, the associations differed across the quantiles, although the variation was modest. Significant non-linearities were apparent: The associations were slightly positive in the lower quantile and strongly negative in the upper one. Our findings imply that an increased availability of green space within a municipality may contribute to a reduction in the number of antidepressant prescriptions dispensed. Green space is thus a central health and community asset, whilst a minimum level of 28% needs to be established for health gains. The highest effectiveness occurred at a municipality surface percentage higher than 79%. This inverse dose-dependent relation has important implications for setting future community-level health and planning policies. Copyright © 2018 Elsevier Inc. All rights reserved.

  16. Alternative configurations of Quantile Regression for estimating predictive uncertainty in water level forecasts for the Upper Severn River: a comparison

    NASA Astrophysics Data System (ADS)

    Lopez, Patricia; Verkade, Jan; Weerts, Albrecht; Solomatine, Dimitri

    2014-05-01

    Hydrological forecasting is subject to many sources of uncertainty, including those originating in initial state, boundary conditions, model structure and model parameters. Although uncertainty can be reduced, it can never be fully eliminated. Statistical post-processing techniques constitute an often used approach to estimate the hydrological predictive uncertainty, where a model of forecast error is built using a historical record of past forecasts and observations. The present study focuses on the use of the Quantile Regression (QR) technique as a hydrological post-processor. It estimates the predictive distribution of water levels using deterministic water level forecasts as predictors. This work aims to thoroughly verify uncertainty estimates using the implementation of QR that was applied in an operational setting in the UK National Flood Forecasting System, and to inter-compare forecast quality and skill in various, differing configurations of QR. These configurations are (i) 'classical' QR, (ii) QR constrained by a requirement that quantiles do not cross, (iii) QR derived on time series that have been transformed into the Normal domain (Normal Quantile Transformation - NQT), and (iv) a piecewise linear derivation of QR models. The QR configurations are applied to fourteen hydrological stations on the Upper Severn River with different catchments characteristics. Results of each QR configuration are conditionally verified for progressively higher flood levels, in terms of commonly used verification metrics and skill scores. These include Brier's probability score (BS), the continuous ranked probability score (CRPS) and corresponding skill scores as well as the Relative Operating Characteristic score (ROCS). Reliability diagrams are also presented and analysed. The results indicate that none of the four Quantile Regression configurations clearly outperforms the others.

  17. Early Warning Signals of Financial Crises with Multi-Scale Quantile Regressions of Log-Periodic Power Law Singularities

    PubMed Central

    Zhang, Qun; Zhang, Qunzhi; Sornette, Didier

    2016-01-01

    We augment the existing literature using the Log-Periodic Power Law Singular (LPPLS) structures in the log-price dynamics to diagnose financial bubbles by providing three main innovations. First, we introduce the quantile regression to the LPPLS detection problem. This allows us to disentangle (at least partially) the genuine LPPLS signal and the a priori unknown complicated residuals. Second, we propose to combine the many quantile regressions with a multi-scale analysis, which aggregates and consolidates the obtained ensembles of scenarios. Third, we define and implement the so-called DS LPPLS Confidence™ and Trust™ indicators that enrich considerably the diagnostic of bubbles. Using a detailed study of the “S&P 500 1987” bubble and presenting analyses of 16 historical bubbles, we show that the quantile regression of LPPLS signals contributes useful early warning signals. The comparison between the constructed signals and the price development in these 16 historical bubbles demonstrates their significant predictive ability around the real critical time when the burst/rally occurs. PMID:27806093

  18. Economic policy uncertainty, equity premium and dependence between their quantiles: Evidence from quantile-on-quantile approach

    NASA Astrophysics Data System (ADS)

    Raza, Syed Ali; Zaighum, Isma; Shah, Nida

    2018-02-01

    This paper examines the relationship between economic policy uncertainty and equity premium in G7 countries over a period of the monthly data from January 1989 to December 2015 using a novel technique namely QQ regression proposed by Sim and Zhou (2015). Based on QQ approach, we estimate how the quantiles of the economic policy uncertainty affect the quantiles of the equity premium. Thus, it provides a comprehensive insight into the overall dependence structure between the equity premium and economic policy uncertainty as compared to traditional techniques like OLS or quantile regression. Overall, our empirical evidence suggests the existence of a negative association between equity premium and EPU predominately in all G7 countries, especially in the extreme low and extreme high tails. However, differences exist among countries and across different quantiles of EPU and the equity premium within each country. The existence of this heterogeneity among countries is due to the differences in terms of dependency on economic policy, other stock markets, and the linkages with other country's equity market.

  19. Using Quantile and Asymmetric Least Squares Regression for Optimal Risk Adjustment.

    PubMed

    Lorenz, Normann

    2017-06-01

    In this paper, we analyze optimal risk adjustment for direct risk selection (DRS). Integrating insurers' activities for risk selection into a discrete choice model of individuals' health insurance choice shows that DRS has the structure of a contest. For the contest success function (csf) used in most of the contest literature (the Tullock-csf), optimal transfers for a risk adjustment scheme have to be determined by means of a restricted quantile regression, irrespective of whether insurers are primarily engaged in positive DRS (attracting low risks) or negative DRS (repelling high risks). This is at odds with the common practice of determining transfers by means of a least squares regression. However, this common practice can be rationalized for a new csf, but only if positive and negative DRSs are equally important; if they are not, optimal transfers have to be calculated by means of a restricted asymmetric least squares regression. Using data from German and Swiss health insurers, we find considerable differences between the three types of regressions. Optimal transfers therefore critically depend on which csf represents insurers' incentives for DRS and, if it is not the Tullock-csf, whether insurers are primarily engaged in positive or negative DRS. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  20. Longitudinal analysis of the strengths and difficulties questionnaire scores of the Millennium Cohort Study children in England using M-quantile random-effects regression.

    PubMed

    Tzavidis, Nikos; Salvati, Nicola; Schmid, Timo; Flouri, Eirini; Midouhas, Emily

    2016-02-01

    Multilevel modelling is a popular approach for longitudinal data analysis. Statistical models conventionally target a parameter at the centre of a distribution. However, when the distribution of the data is asymmetric, modelling other location parameters, e.g. percentiles, may be more informative. We present a new approach, M -quantile random-effects regression, for modelling multilevel data. The proposed method is used for modelling location parameters of the distribution of the strengths and difficulties questionnaire scores of children in England who participate in the Millennium Cohort Study. Quantile mixed models are also considered. The analyses offer insights to child psychologists about the differential effects of risk factors on children's outcomes.

  1. Estimating the Extreme Behaviors of Students Performance Using Quantile Regression--Evidences from Taiwan

    ERIC Educational Resources Information Center

    Chen, Sheng-Tung; Kuo, Hsiao-I.; Chen, Chi-Chung

    2012-01-01

    The two-stage least squares approach together with quantile regression analysis is adopted here to estimate the educational production function. Such a methodology is able to capture the extreme behaviors of the two tails of students' performance and the estimation outcomes have important policy implications. Our empirical study is applied to the…

  2. Bumps in river profiles: uncertainty assessment and smoothing using quantile regression techniques

    NASA Astrophysics Data System (ADS)

    Schwanghart, Wolfgang; Scherler, Dirk

    2017-12-01

    The analysis of longitudinal river profiles is an important tool for studying landscape evolution. However, characterizing river profiles based on digital elevation models (DEMs) suffers from errors and artifacts that particularly prevail along valley bottoms. The aim of this study is to characterize uncertainties that arise from the analysis of river profiles derived from different, near-globally available DEMs. We devised new algorithms - quantile carving and the CRS algorithm - that rely on quantile regression to enable hydrological correction and the uncertainty quantification of river profiles. We find that globally available DEMs commonly overestimate river elevations in steep topography. The distributions of elevation errors become increasingly wider and right skewed if adjacent hillslope gradients are steep. Our analysis indicates that the AW3D DEM has the highest precision and lowest bias for the analysis of river profiles in mountainous topography. The new 12 m resolution TanDEM-X DEM has a very low precision, most likely due to the combined effect of steep valley walls and the presence of water surfaces in valley bottoms. Compared to the conventional approaches of carving and filling, we find that our new approach is able to reduce the elevation bias and errors in longitudinal river profiles.

  3. Heterogeneous effects of oil shocks on exchange rates: evidence from a quantile regression approach.

    PubMed

    Su, Xianfang; Zhu, Huiming; You, Wanhai; Ren, Yinghua

    2016-01-01

    The determinants of exchange rates have attracted considerable attention among researchers over the past several decades. Most studies, however, ignore the possibility that the impact of oil shocks on exchange rates could vary across the exchange rate returns distribution. We employ a quantile regression approach to address this issue. Our results indicate that the effect of oil shocks on exchange rates is heterogeneous across quantiles. A large US depreciation or appreciation tends to heighten the effects of oil shocks on exchange rate returns. Positive oil demand shocks lead to appreciation pressures in oil-exporting countries and this result is robust across lower and upper return distributions. These results offer rich and useful information for investors and decision-makers.

  4. Trait Mindfulness as a Limiting Factor for Residual Depressive Symptoms: An Explorative Study Using Quantile Regression

    PubMed Central

    Radford, Sholto; Eames, Catrin; Brennan, Kate; Lambert, Gwladys; Crane, Catherine; Williams, J. Mark G.; Duggan, Danielle S.; Barnhofer, Thorsten

    2014-01-01

    Mindfulness has been suggested to be an important protective factor for emotional health. However, this effect might vary with regard to context. This study applied a novel statistical approach, quantile regression, in order to investigate the relation between trait mindfulness and residual depressive symptoms in individuals with a history of recurrent depression, while taking into account symptom severity and number of episodes as contextual factors. Rather than fitting to a single indicator of central tendency, quantile regression allows exploration of relations across the entire range of the response variable. Analysis of self-report data from 274 participants with a history of three or more previous episodes of depression showed that relatively higher levels of mindfulness were associated with relatively lower levels of residual depressive symptoms. This relationship was most pronounced near the upper end of the response distribution and moderated by the number of previous episodes of depression at the higher quantiles. The findings suggest that with lower levels of mindfulness, residual symptoms are less constrained and more likely to be influenced by other factors. Further, the limiting effect of mindfulness on residual symptoms is most salient in those with higher numbers of episodes. PMID:24988072

  5. The effectiveness of drinking and driving policies for different alcohol-related fatalities: a quantile regression analysis.

    PubMed

    Ying, Yung-Hsiang; Wu, Chin-Chih; Chang, Koyin

    2013-09-27

    To understand the impact of drinking and driving laws on drinking and driving fatality rates, this study explored the different effects these laws have on areas with varying severity rates for drinking and driving. Unlike previous studies, this study employed quantile regression analysis. Empirical results showed that policies based on local conditions must be used to effectively reduce drinking and driving fatality rates; that is, different measures should be adopted to target the specific conditions in various regions. For areas with low fatality rates (low quantiles), people's habits and attitudes toward alcohol should be emphasized instead of transportation safety laws because "preemptive regulations" are more effective. For areas with high fatality rates (or high quantiles), "ex-post regulations" are more effective, and impact these areas approximately 0.01% to 0.05% more than they do areas with low fatality rates.

  6. Factors associated with the income distribution of full-time physicians: a quantile regression approach.

    PubMed

    Shih, Ya-Chen Tina; Konrad, Thomas R

    2007-10-01

    Physician income is generally high, but quite variable; hence, physicians have divergent perspectives regarding health policy initiatives and market reforms that could affect their incomes. We investigated factors underlying the distribution of income within the physician population. Full-time physicians (N=10,777) from the restricted version of the 1996-1997 Community Tracking Study Physician Survey (CTS-PS), 1996 Area Resource File, and 1996 health maintenance organization penetration data. We conducted separate analyses for primary care physicians (PCPs) and specialists. We employed least square and quantile regression models to examine factors associated with physician incomes at the mean and at various points of the income distribution, respectively. We accounted for the complex survey design for the CTS-PS data using appropriate weighted procedures and explored endogeneity using an instrumental variables method. We detected widespread and subtle effects of many variables on physician incomes at different points (10th, 25th, 75th, and 90th percentiles) in the distribution that were undetected when employing regression estimations focusing on only the means or medians. Our findings show that the effects of managed care penetration are demonstrable at the mean of specialist incomes, but are more pronounced at higher levels. Conversely, a gender gap in earnings occurs at all levels of income of both PCPs and specialists, but is more pronounced at lower income levels. The quantile regression technique offers an analytical tool to evaluate policy effects beyond the means. A longitudinal application of this approach may enable health policy makers to identify winners and losers among segments of the physician workforce and assess how market dynamics and health policy initiatives affect the overall physician income distribution over various time intervals.

  7. Estimating geographic variation on allometric growth and body condition of Blue Suckers with quantile regression

    USGS Publications Warehouse

    Cade, B.S.; Terrell, J.W.; Neely, B.C.

    2011-01-01

    Increasing our understanding of how environmental factors affect fish body condition and improving its utility as a metric of aquatic system health require reliable estimates of spatial variation in condition (weight at length). We used three statistical approaches that varied in how they accounted for heterogeneity in allometric growth to estimate differences in body condition of blue suckers Cycleptus elongatus across 19 large-river locations in the central USA. Quantile regression of an expanded allometric growth model provided the most comprehensive estimates, including variation in exponents within and among locations (range = 2.88–4.24). Blue suckers from more-southerly locations had the largest exponents. Mixed-effects mean regression of a similar expanded allometric growth model allowed exponents to vary among locations (range = 3.03–3.60). Mean relative weights compared across selected intervals of total length (TL = 510–594 and 594–692 mm) in a multiplicative model involved the implicit assumption that allometric exponents within and among locations were similar to the exponent (3.46) for the standard weight equation. Proportionate differences in the quantiles of weight at length for adult blue suckers (TL = 510, 594, 644, and 692 mm) compared with their average across locations ranged from 1.08 to 1.30 for southern locations (Texas, Mississippi) and from 0.84 to 1.00 for northern locations (Montana, North Dakota); proportionate differences for mean weight ranged from 1.13 to 1.17 and from 0.87 to 0.95, respectively, and those for mean relative weight ranged from 1.10 to 1.18 and from 0.86 to 0.98, respectively. Weights for fish at longer lengths varied by 600–700 g within a location and by as much as 2,000 g among southern and northern locations. Estimates for the Wabash River, Indiana (0.96–1.07 times the average; greatest increases for lower weights at shorter TLs), and for the Missouri River from Blair, Nebraska, to Sioux City, Iowa (0.90

  8. Logistic quantile regression provides improved estimates for bounded avian counts: a case study of California Spotted Owl fledgling production

    Treesearch

    Brian S. Cade; Barry R. Noon; Rick D. Scherer; John J. Keane

    2017-01-01

    Counts of avian fledglings, nestlings, or clutch size that are bounded below by zero and above by some small integer form a discrete random variable distribution that is not approximated well by conventional parametric count distributions such as the Poisson or negative binomial. We developed a logistic quantile regression model to provide estimates of the empirical...

  9. The Effectiveness of Drinking and Driving Policies for Different Alcohol-Related Fatalities: A Quantile Regression Analysis

    PubMed Central

    Ying, Yung-Hsiang; Wu, Chin-Chih; Chang, Koyin

    2013-01-01

    To understand the impact of drinking and driving laws on drinking and driving fatality rates, this study explored the different effects these laws have on areas with varying severity rates for drinking and driving. Unlike previous studies, this study employed quantile regression analysis. Empirical results showed that policies based on local conditions must be used to effectively reduce drinking and driving fatality rates; that is, different measures should be adopted to target the specific conditions in various regions. For areas with low fatality rates (low quantiles), people’s habits and attitudes toward alcohol should be emphasized instead of transportation safety laws because “preemptive regulations” are more effective. For areas with high fatality rates (or high quantiles), “ex-post regulations” are more effective, and impact these areas approximately 0.01% to 0.05% more than they do areas with low fatality rates. PMID:24084673

  10. Determinants of Academic Attainment in the United States: A Quantile Regression Analysis of Test Scores

    ERIC Educational Resources Information Center

    Haile, Getinet Astatike; Nguyen, Anh Ngoc

    2008-01-01

    We investigate the determinants of high school students' academic attainment in mathematics, reading and science in the United States; focusing particularly on possible differential impacts of ethnicity and family background across the distribution of test scores. Using data from the NELS2000 and employing quantile regression, we find two…

  11. Quantile uncertainty and value-at-risk model risk.

    PubMed

    Alexander, Carol; Sarabia, José María

    2012-08-01

    This article develops a methodology for quantifying model risk in quantile risk estimates. The application of quantile estimates to risk assessment has become common practice in many disciplines, including hydrology, climate change, statistical process control, insurance and actuarial science, and the uncertainty surrounding these estimates has long been recognized. Our work is particularly important in finance, where quantile estimates (called Value-at-Risk) have been the cornerstone of banking risk management since the mid 1980s. A recent amendment to the Basel II Accord recommends additional market risk capital to cover all sources of "model risk" in the estimation of these quantiles. We provide a novel and elegant framework whereby quantile estimates are adjusted for model risk, relative to a benchmark which represents the state of knowledge of the authority that is responsible for model risk. A simulation experiment in which the degree of model risk is controlled illustrates how to quantify Value-at-Risk model risk and compute the required regulatory capital add-on for banks. An empirical example based on real data shows how the methodology can be put into practice, using only two time series (daily Value-at-Risk and daily profit and loss) from a large bank. We conclude with a discussion of potential applications to nonfinancial risks. © 2012 Society for Risk Analysis.

  12. Factors Associated with the Income Distribution of Full-Time Physicians: A Quantile Regression Approach

    PubMed Central

    Shih, Ya-Chen Tina; Konrad, Thomas R

    2007-01-01

    Objective Physician income is generally high, but quite variable; hence, physicians have divergent perspectives regarding health policy initiatives and market reforms that could affect their incomes. We investigated factors underlying the distribution of income within the physician population. Data Sources Full-time physicians (N=10,777) from the restricted version of the 1996–1997 Community Tracking Study Physician Survey (CTS-PS), 1996 Area Resource File, and 1996 health maintenance organization penetration data. Study Design We conducted separate analyses for primary care physicians (PCPs) and specialists. We employed least square and quantile regression models to examine factors associated with physician incomes at the mean and at various points of the income distribution, respectively. We accounted for the complex survey design for the CTS-PS data using appropriate weighted procedures and explored endogeneity using an instrumental variables method. Principal Findings We detected widespread and subtle effects of many variables on physician incomes at different points (10th, 25th, 75th, and 90th percentiles) in the distribution that were undetected when employing regression estimations focusing on only the means or medians. Our findings show that the effects of managed care penetration are demonstrable at the mean of specialist incomes, but are more pronounced at higher levels. Conversely, a gender gap in earnings occurs at all levels of income of both PCPs and specialists, but is more pronounced at lower income levels. Conclusions The quantile regression technique offers an analytical tool to evaluate policy effects beyond the means. A longitudinal application of this approach may enable health policy makers to identify winners and losers among segments of the physician workforce and assess how market dynamics and health policy initiatives affect the overall physician income distribution over various time intervals. PMID:17850525

  13. Quantile regression and clustering analysis of standardized precipitation index in the Tarim River Basin, Xinjiang, China

    NASA Astrophysics Data System (ADS)

    Yang, Peng; Xia, Jun; Zhang, Yongyong; Han, Jian; Wu, Xia

    2017-11-01

    Because drought is a very common and widespread natural disaster, it has attracted a great deal of academic interest. Based on 12-month time scale standardized precipitation indices (SPI12) calculated from precipitation data recorded between 1960 and 2015 at 22 weather stations in the Tarim River Basin (TRB), this study aims to identify the trends of SPI and drought duration, severity, and frequency at various quantiles and to perform cluster analysis of drought events in the TRB. The results indicated that (1) both precipitation and temperature at most stations in the TRB exhibited significant positive trends during 1960-2015; (2) multiple scales of SPIs changed significantly around 1986; (3) based on quantile regression analysis of temporal drought changes, the positive SPI slopes indicated less severe and less frequent droughts at lower quantiles, but clear variation was detected in the drought frequency; and (4) significantly different trends were found in drought frequency probably between severe droughts and drought frequency.

  14. Gender Gaps in Mathematics, Science and Reading Achievements in Muslim Countries: A Quantile Regression Approach

    ERIC Educational Resources Information Center

    Shafiq, M. Najeeb

    2013-01-01

    Using quantile regression analyses, this study examines gender gaps in mathematics, science, and reading in Azerbaijan, Indonesia, Jordan, the Kyrgyz Republic, Qatar, Tunisia, and Turkey among 15-year-old students. The analyses show that girls in Azerbaijan achieve as well as boys in mathematics and science and overachieve in reading. In Jordan,…

  15. Variable screening via quantile partial correlation

    PubMed Central

    Ma, Shujie; Tsai, Chih-Ling

    2016-01-01

    In quantile linear regression with ultra-high dimensional data, we propose an algorithm for screening all candidate variables and subsequently selecting relevant predictors. Specifically, we first employ quantile partial correlation for screening, and then we apply the extended Bayesian information criterion (EBIC) for best subset selection. Our proposed method can successfully select predictors when the variables are highly correlated, and it can also identify variables that make a contribution to the conditional quantiles but are marginally uncorrelated or weakly correlated with the response. Theoretical results show that the proposed algorithm can yield the sure screening set. By controlling the false selection rate, model selection consistency can be achieved theoretically. In practice, we proposed using EBIC for best subset selection so that the resulting model is screening consistent. Simulation studies demonstrate that the proposed algorithm performs well, and an empirical example is presented. PMID:28943683

  16. Gender Gaps in Mathematics, Science and Reading Achievements in Muslim Countries: Evidence from Quantile Regression Analyses

    ERIC Educational Resources Information Center

    Shafiq, M. Najeeb

    2011-01-01

    Using quantile regression analyses, this study examines gender gaps in mathematics, science, and reading in Azerbaijan, Indonesia, Jordan, the Kyrgyz Republic, Qatar, Tunisia, and Turkey among 15 year-old students. The analyses show that girls in Azerbaijan achieve as well as boys in mathematics and science and overachieve in reading. In Jordan,…

  17. Customized Fetal Growth Charts for Parents' Characteristics, Race, and Parity by Quantile Regression Analysis: A Cross-sectional Multicenter Italian Study.

    PubMed

    Ghi, Tullio; Cariello, Luisa; Rizzo, Ludovica; Ferrazzi, Enrico; Periti, Enrico; Prefumo, Federico; Stampalija, Tamara; Viora, Elsa; Verrotti, Carla; Rizzo, Giuseppe

    2016-01-01

    The purpose of this study was to construct fetal biometric charts between 16 and 40 weeks' gestation that were customized for parental characteristics, race, and parity, using quantile regression analysis. In a multicenter cross-sectional study, 8070 sonographic examinations from low-risk pregnancies between 16 and 40 weeks' gestation were analyzed. The fetal measurements obtained were biparietal diameter, head circumference, abdominal circumference, and femur diaphysis length. Quantile regression was used to examine the impact of parental height and weight, parity, and race across biometric percentiles for the fetal measurements considered. Paternal and maternal height were significant covariates for all of the measurements considered (P < .05). Maternal weight significantly influenced head circumference, abdominal circumference, and femur diaphysis length. Parity was significantly associated with biparietal diameter and head circumference. Central African race was associated with head circumference and femur diaphysis length, whereas North African race was only associated with femur diaphysis length. In this study we constructed customized biometric growth charts using quantile regression in a large cohort of low-risk pregnancies. These charts offer the advantage of defining individualized normal ranges of fetal biometric parameters at each specific percentile corrected for parental height and weight, parity, and race. This study supports the importance of including these variables in routine sonographic screening for fetal growth abnormalities.

  18. An application of quantile random forests for predictive mapping of forest attributes

    Treesearch

    E.A. Freeman; G.G. Moisen

    2015-01-01

    Increasingly, random forest models are used in predictive mapping of forest attributes. Traditional random forests output the mean prediction from the random trees. Quantile regression forests (QRF) is an extension of random forests developed by Nicolai Meinshausen that provides non-parametric estimates of the median predicted value as well as prediction quantiles. It...

  19. Quantile regression analysis of body mass and wages.

    PubMed

    Johar, Meliyanni; Katayama, Hajime

    2012-05-01

    Using the National Longitudinal Survey of Youth 1979, we explore the relationship between body mass and wages. We use quantile regression to provide a broad description of the relationship across the wage distribution. We also allow the relationship to vary by the degree of social skills involved in different jobs. Our results find that for female workers body mass and wages are negatively correlated at all points in their wage distribution. The strength of the relationship is larger at higher-wage levels. For male workers, the relationship is relatively constant across wage distribution but heterogeneous across ethnic groups. When controlling for the endogeneity of body mass, we find that additional body mass has a negative causal impact on the wages of white females earning more than the median wages and of white males around the median wages. Among these workers, the wage penalties are larger for those employed in jobs that require extensive social skills. These findings may suggest that labor markets reward white workers for good physical shape differently, depending on the level of wages and the type of job a worker has. Copyright © 2011 John Wiley & Sons, Ltd.

  20. Ensuring the consistancy of Flow Direction Curve reconstructions: the 'quantile solidarity' approach

    NASA Astrophysics Data System (ADS)

    Poncelet, Carine; Andreassian, Vazken; Oudin, Ludovic

    2015-04-01

    Flow Duration Curves (FDCs) are a hydrologic tool describing the distribution of streamflows at a catchment outlet. FDCs are usually used for calibration of hydrological models, managing water quality and classifying catchments, among others. For gauged catchments, empirical FDCs can be computed from streamflow records. For ungauged catchments, on the other hand, FDCs cannot be obtained from streamflow records and must therefore be obtained in another manner, for example through reconstructions. Regression-based reconstructions are methods relying on the evaluation of quantiles separately from catchments' attributes (climatic or physical features).The advantage of this category of methods is that it is informative about the processes and it is non-parametric. However, the large number of parameters required can cause unwanted artifacts, typically reconstructions that do not always produce increasing quantiles. In this paper we propose a new approach named Quantile Solidarity (QS), which is applied under strict proxy-basin test conditions (Klemes, 1986) to a set of 600 French catchments. Half of the catchments are considered as gauged and used to calibrate the regression and compute residuals of the regression. The QS approach consists in a three-step regionalization scheme, which first links quantile values to physical descriptors, then reduces the number of regression parameters and finally exploits the spatial correlation of the residuals. The innovation is the utilisation of the parameters continuity across the quantiles to dramatically reduce the number of parameters. The second half of catchment is used as an independent validation set over which we show that the QS approach ensures strictly growing FDC reconstructions in ungauged conditions. Reference: V. KLEMEŠ (1986) Operational testing of hydrological simulation models, Hydrological Sciences Journal, 31:1, 13-24

  1. Percentile-Based ETCCDI Temperature Extremes Indices for CMIP5 Model Output: New Results through Semiparametric Quantile Regression Approach

    NASA Astrophysics Data System (ADS)

    Li, L.; Yang, C.

    2017-12-01

    Climate extremes often manifest as rare events in terms of surface air temperature and precipitation with an annual reoccurrence period. In order to represent the manifold characteristics of climate extremes for monitoring and analysis, the Expert Team on Climate Change Detection and Indices (ETCCDI) had worked out a set of 27 core indices based on daily temperature and precipitation data, describing extreme weather and climate events on an annual basis. The CLIMDEX project (http://www.climdex.org) had produced public domain datasets of such indices for data from a variety of sources, including output from global climate models (GCM) participating in the Coupled Model Intercomparison Project Phase 5 (CMIP5). Among the 27 ETCCDI indices, there are six percentile-based temperature extremes indices that may fall into two groups: exceedance rates (ER) (TN10p, TN90p, TX10p and TX90p) and durations (CSDI and WSDI). Percentiles must be estimated prior to the calculation of the indices, and could more or less be biased by the adopted algorithm. Such biases will in turn be propagated to the final results of indices. The CLIMDEX used an empirical quantile estimator combined with a bootstrap resampling procedure to reduce the inhomogeneity in the annual series of the ER indices. However, there are still some problems remained in the CLIMDEX datasets, namely the overestimated climate variability due to unaccounted autocorrelation in the daily temperature data, seasonally varying biases and inconsistency between algorithms applied to the ER indices and to the duration indices. We now present new results of the six indices through a semiparametric quantile regression approach for the CMIP5 model output. By using the base-period data as a whole and taking seasonality and autocorrelation into account, this approach successfully addressed the aforementioned issues and came out with consistent results. The new datasets cover the historical and three projected (RCP2.6, RCP4.5 and RCP

  2. Structured Additive Quantile Regression for Assessing the Determinants of Childhood Anemia in Rwanda.

    PubMed

    Habyarimana, Faustin; Zewotir, Temesgen; Ramroop, Shaun

    2017-06-17

    Childhood anemia is among the most significant health problems faced by public health departments in developing countries. This study aims at assessing the determinants and possible spatial effects associated with childhood anemia in Rwanda. The 2014/2015 Rwanda Demographic and Health Survey (RDHS) data was used. The analysis was done using the structured spatial additive quantile regression model. The findings of this study revealed that the child's age; the duration of breastfeeding; gender of the child; the nutritional status of the child (whether underweight and/or wasting); whether the child had a fever; had a cough in the two weeks prior to the survey or not; whether the child received vitamin A supplementation in the six weeks before the survey or not; the household wealth index; literacy of the mother; mother's anemia status; mother's age at the birth are all significant factors associated with childhood anemia in Rwanda. Furthermore, significant structured spatial location effects on childhood anemia was found.

  3. Flood quantile estimation at ungauged sites by Bayesian networks

    NASA Astrophysics Data System (ADS)

    Mediero, L.; Santillán, D.; Garrote, L.

    2012-04-01

    Estimating flood quantiles at a site for which no observed measurements are available is essential for water resources planning and management. Ungauged sites have no observations about the magnitude of floods, but some site and basin characteristics are known. The most common technique used is the multiple regression analysis, which relates physical and climatic basin characteristic to flood quantiles. Regression equations are fitted from flood frequency data and basin characteristics at gauged sites. Regression equations are a rigid technique that assumes linear relationships between variables and cannot take the measurement errors into account. In addition, the prediction intervals are estimated in a very simplistic way from the variance of the residuals in the estimated model. Bayesian networks are a probabilistic computational structure taken from the field of Artificial Intelligence, which have been widely and successfully applied to many scientific fields like medicine and informatics, but application to the field of hydrology is recent. Bayesian networks infer the joint probability distribution of several related variables from observations through nodes, which represent random variables, and links, which represent causal dependencies between them. A Bayesian network is more flexible than regression equations, as they capture non-linear relationships between variables. In addition, the probabilistic nature of Bayesian networks allows taking the different sources of estimation uncertainty into account, as they give a probability distribution as result. A homogeneous region in the Tagus Basin was selected as case study. A regression equation was fitted taking the basin area, the annual maximum 24-hour rainfall for a given recurrence interval and the mean height as explanatory variables. Flood quantiles at ungauged sites were estimated by Bayesian networks. Bayesian networks need to be learnt from a huge enough data set. As observational data are reduced, a

  4. Simulating Quantile Models with Applications to Economics and Management

    NASA Astrophysics Data System (ADS)

    Machado, José A. F.

    2010-05-01

    The massive increase in the speed of computers over the past forty years changed the way that social scientists, applied economists and statisticians approach their trades and also the very nature of the problems that they could feasibly tackle. The new methods that use intensively computer power go by the names of "computer-intensive" or "simulation". My lecture will start with bird's eye view of the uses of simulation in Economics and Statistics. Then I will turn out to my own research on uses of computer- intensive methods. From a methodological point of view the question I address is how to infer marginal distributions having estimated a conditional quantile process, (Counterfactual Decomposition of Changes in Wage Distributions using Quantile Regression," Journal of Applied Econometrics 20, 2005). Illustrations will be provided of the use of the method to perform counterfactual analysis in several different areas of knowledge.

  5. A Quantile Regression Approach to Understanding the Relations among Morphological Awareness, Vocabulary, and Reading Comprehension in Adult Basic Education Students

    ERIC Educational Resources Information Center

    Tighe, Elizabeth L.; Schatschneider, Christopher

    2016-01-01

    The purpose of this study was to investigate the joint and unique contributions of morphological awareness and vocabulary knowledge at five reading comprehension levels in adult basic education (ABE) students. We introduce the statistical technique of multiple quantile regression, which enabled us to assess the predictive utility of morphological…

  6. Asymptotics of nonparametric L-1 regression models with dependent data

    PubMed Central

    ZHAO, ZHIBIAO; WEI, YING; LIN, DENNIS K.J.

    2013-01-01

    We investigate asymptotic properties of least-absolute-deviation or median quantile estimates of the location and scale functions in nonparametric regression models with dependent data from multiple subjects. Under a general dependence structure that allows for longitudinal data and some spatially correlated data, we establish uniform Bahadur representations for the proposed median quantile estimates. The obtained Bahadur representations provide deep insights into the asymptotic behavior of the estimates. Our main theoretical development is based on studying the modulus of continuity of kernel weighted empirical process through a coupling argument. Progesterone data is used for an illustration. PMID:24955016

  7. Hybrid ARIMAX quantile regression method for forecasting short term electricity consumption in east java

    NASA Astrophysics Data System (ADS)

    Prastuti, M.; Suhartono; Salehah, NA

    2018-04-01

    The need for energy supply, especially for electricity in Indonesia has been increasing in the last past years. Furthermore, the high electricity usage by people at different times leads to the occurrence of heteroscedasticity issue. Estimate the electricity supply that could fulfilled the community’s need is very important, but the heteroscedasticity issue often made electricity forecasting hard to be done. An accurate forecast of electricity consumptions is one of the key challenges for energy provider to make better resources and service planning and also take control actions in order to balance the electricity supply and demand for community. In this paper, hybrid ARIMAX Quantile Regression (ARIMAX-QR) approach was proposed to predict the short-term electricity consumption in East Java. This method will also be compared to time series regression using RMSE, MAPE, and MdAPE criteria. The data used in this research was the electricity consumption per half-an-hour data during the period of September 2015 to April 2016. The results show that the proposed approach can be a competitive alternative to forecast short-term electricity in East Java. ARIMAX-QR using lag values and dummy variables as predictors yield more accurate prediction in both in-sample and out-sample data. Moreover, both time series regression and ARIMAX-QR methods with addition of lag values as predictor could capture accurately the patterns in the data. Hence, it produces better predictions compared to the models that not use additional lag variables.

  8. Estimation of peak discharge quantiles for selected annual exceedance probabilities in northeastern Illinois

    USGS Publications Warehouse

    Over, Thomas M.; Saito, Riki J.; Veilleux, Andrea G.; Sharpe, Jennifer B.; Soong, David T.; Ishii, Audrey L.

    2016-06-28

    This report provides two sets of equations for estimating peak discharge quantiles at annual exceedance probabilities (AEPs) of 0.50, 0.20, 0.10, 0.04, 0.02, 0.01, 0.005, and 0.002 (recurrence intervals of 2, 5, 10, 25, 50, 100, 200, and 500 years, respectively) for watersheds in Illinois based on annual maximum peak discharge data from 117 watersheds in and near northeastern Illinois. One set of equations was developed through a temporal analysis with a two-step least squares-quantile regression technique that measures the average effect of changes in the urbanization of the watersheds used in the study. The resulting equations can be used to adjust rural peak discharge quantiles for the effect of urbanization, and in this study the equations also were used to adjust the annual maximum peak discharges from the study watersheds to 2010 urbanization conditions.The other set of equations was developed by a spatial analysis. This analysis used generalized least-squares regression to fit the peak discharge quantiles computed from the urbanization-adjusted annual maximum peak discharges from the study watersheds to drainage-basin characteristics. The peak discharge quantiles were computed by using the Expected Moments Algorithm following the removal of potentially influential low floods defined by a multiple Grubbs-Beck test. To improve the quantile estimates, regional skew coefficients were obtained from a newly developed regional skew model in which the skew increases with the urbanized land use fraction. The drainage-basin characteristics used as explanatory variables in the spatial analysis include drainage area, the fraction of developed land, the fraction of land with poorly drained soils or likely water, and the basin slope estimated as the ratio of the basin relief to basin perimeter.This report also provides the following: (1) examples to illustrate the use of the spatial and urbanization-adjustment equations for estimating peak discharge quantiles at ungaged

  9. On Quantile Regression in Reproducing Kernel Hilbert Spaces with Data Sparsity Constraint

    PubMed Central

    Zhang, Chong; Liu, Yufeng; Wu, Yichao

    2015-01-01

    For spline regressions, it is well known that the choice of knots is crucial for the performance of the estimator. As a general learning framework covering the smoothing splines, learning in a Reproducing Kernel Hilbert Space (RKHS) has a similar issue. However, the selection of training data points for kernel functions in the RKHS representation has not been carefully studied in the literature. In this paper we study quantile regression as an example of learning in a RKHS. In this case, the regular squared norm penalty does not perform training data selection. We propose a data sparsity constraint that imposes thresholding on the kernel function coefficients to achieve a sparse kernel function representation. We demonstrate that the proposed data sparsity method can have competitive prediction performance for certain situations, and have comparable performance in other cases compared to that of the traditional squared norm penalty. Therefore, the data sparsity method can serve as a competitive alternative to the squared norm penalty method. Some theoretical properties of our proposed method using the data sparsity constraint are obtained. Both simulated and real data sets are used to demonstrate the usefulness of our data sparsity constraint. PMID:27134575

  10. Growth curves of preschool children in the northeast of iran: a population based study using quantile regression approach.

    PubMed

    Payande, Abolfazl; Tabesh, Hamed; Shakeri, Mohammad Taghi; Saki, Azadeh; Safarian, Mohammad

    2013-01-14

    Growth charts are widely used to assess children's growth status and can provide a trajectory of growth during early important months of life. The objectives of this study are going to construct growth charts and normal values of weight-for-age for children aged 0 to 5 years using a powerful and applicable methodology. The results compare with the World Health Organization (WHO) references and semi-parametric LMS method of Cole and Green. A total of 70737 apparently healthy boys and girls aged 0 to 5 years were recruited in July 2004 for 20 days from those attending community clinics for routine health checks as a part of a national survey. Anthropometric measurements were done by trained health staff using WHO methodology. The nonparametric quantile regression method obtained by local constant kernel estimation of conditional quantiles curves using for estimation of curves and normal values. The weight-for-age growth curves for boys and girls aged from 0 to 5 years were derived utilizing a population of children living in the northeast of Iran. The results were similar to the ones obtained by the semi-parametric LMS method in the same data. Among all age groups from 0 to 5 years, the median values of children's weight living in the northeast of Iran were lower than the corresponding values in WHO reference data. The weight curves of boys were higher than those of girls in all age groups. The differences between growth patterns of children living in the northeast of Iran versus international ones necessitate using local and regional growth charts. International normal values may not properly recognize the populations at risk for growth problems in Iranian children. Quantile regression (QR) as a flexible method which doesn't require restricted assumptions, proposed for estimation reference curves and normal values.

  11. Growth Curves of Preschool Children in the Northeast of Iran: A Population Based Study Using Quantile Regression Approach

    PubMed Central

    Payande, Abolfazl; Tabesh, Hamed; Shakeri, Mohammad Taghi; Saki, Azadeh; Safarian, Mohammad

    2013-01-01

    Introduction: Growth charts are widely used to assess children’s growth status and can provide a trajectory of growth during early important months of life. The objectives of this study are going to construct growth charts and normal values of weight-for-age for children aged 0 to 5 years using a powerful and applicable methodology. The results compare with the World Health Organization (WHO) references and semi-parametric LMS method of Cole and Green. Methods: A total of 70737 apparently healthy boys and girls aged 0 to 5 years were recruited in July 2004 for 20 days from those attending community clinics for routine health checks as a part of a national survey. Anthropometric measurements were done by trained health staff using WHO methodology. The nonparametric quantile regression method obtained by local constant kernel estimation of conditional quantiles curves using for estimation of curves and normal values. Results: The weight-for-age growth curves for boys and girls aged from 0 to 5 years were derived utilizing a population of children living in the northeast of Iran. The results were similar to the ones obtained by the semi-parametric LMS method in the same data. Among all age groups from 0 to 5 years, the median values of children’s weight living in the northeast of Iran were lower than the corresponding values in WHO reference data. The weight curves of boys were higher than those of girls in all age groups. Conclusion: The differences between growth patterns of children living in the northeast of Iran versus international ones necessitate using local and regional growth charts. International normal values may not properly recognize the populations at risk for growth problems in Iranian children. Quantile regression (QR) as a flexible method which doesn’t require restricted assumptions, proposed for estimation reference curves and normal values. PMID:23618470

  12. Assessing the impact of local meteorological variables on surface ozone in Hong Kong during 2000-2015 using quantile and multiple line regression models

    NASA Astrophysics Data System (ADS)

    Zhao, Wei; Fan, Shaojia; Guo, Hai; Gao, Bo; Sun, Jiaren; Chen, Laiguo

    2016-11-01

    The quantile regression (QR) method has been increasingly introduced to atmospheric environmental studies to explore the non-linear relationship between local meteorological conditions and ozone mixing ratios. In this study, we applied QR for the first time, together with multiple linear regression (MLR), to analyze the dominant meteorological parameters influencing the mean, 10th percentile, 90th percentile and 99th percentile of maximum daily 8-h average (MDA8) ozone concentrations in 2000-2015 in Hong Kong. The dominance analysis (DA) was used to assess the relative importance of meteorological variables in the regression models. Results showed that the MLR models worked better at suburban and rural sites than at urban sites, and worked better in winter than in summer. QR models performed better in summer for 99th and 90th percentiles and performed better in autumn and winter for 10th percentile. And QR models also performed better in suburban and rural areas for 10th percentile. The top 3 dominant variables associated with MDA8 ozone concentrations, changing with seasons and regions, were frequently associated with the six meteorological parameters: boundary layer height, humidity, wind direction, surface solar radiation, total cloud cover and sea level pressure. Temperature rarely became a significant variable in any season, which could partly explain the peak of monthly average ozone concentrations in October in Hong Kong. And we found the effect of solar radiation would be enhanced during extremely ozone pollution episodes (i.e., the 99th percentile). Finally, meteorological effects on MDA8 ozone had no significant changes before and after the 2010 Asian Games.

  13. Physical Activity and Pediatric Obesity: A Quantile Regression Analysis

    PubMed Central

    Mitchell, Jonathan A.; Dowda, Marsha; Pate, Russell R.; Kordas, Katarzyna; Froberg, Karsten; Sardinha, Luís B.; Kolle, Elin; Page, Angela

    2016-01-01

    Purpose We aimed to determine if moderate-to-vigorous physical activity (MVPA) and sedentary behavior (SB) were independently associated with body mass index (BMI) and waist circumference (WC) in children and adolescents. Methods Data from the International Children’s Accelerometry Database (ICAD) were used to address our objectives (N=11,115; 6-18y; 51% female). We calculated age and gender specific body mass index (BMI) and waist circumference (WC) Z-scores and used accelerometry to estimate MVPA and total SB. Self-reported television viewing was used as a measure of leisure time SB. Quantile regression was used to analyze the data. Results MVPA and total SB were associated with lower and higher BMI and WC Z-scores, respectively. These associations were strongest at the higher percentiles of the Z-score distributions. After including MVPA and total SB in the same model the MVPA associations remained, but the SB associations were no longer present. For example, each additional hour per day of MVPA was not associated with BMI Z-score at the 10th percentile (b=-0.02, P=0.170), but was associated with lower BMI Z-score at the 50th (b=-0.19, P<0.001) and 90th percentiles (b=-0.41, P<0.001). More television viewing was associated with higher BMI and WC and the associations were strongest at the higher percentiles of the Z-score distributions, with adjustment for MVPA and total SB. Conclusions Our observation of stronger associations at the higher percentiles indicate that increasing MVPA and decreasing television viewing at the population-level could shift the upper tails of the BMI and WC frequency distributions to lower values, thereby lowering the number of children and adolescents classified as obese. PMID:27755284

  14. The N-shaped environmental Kuznets curve: an empirical evaluation using a panel quantile regression approach.

    PubMed

    Allard, Alexandra; Takman, Johanna; Uddin, Gazi Salah; Ahmed, Ali

    2018-02-01

    We evaluate the N-shaped environmental Kuznets curve (EKC) using panel quantile regression analysis. We investigate the relationship between CO 2 emissions and GDP per capita for 74 countries over the period of 1994-2012. We include additional explanatory variables, such as renewable energy consumption, technological development, trade, and institutional quality. We find evidence for the N-shaped EKC in all income groups, except for the upper-middle-income countries. Heterogeneous characteristics are, however, observed over the N-shaped EKC. Finally, we find a negative relationship between renewable energy consumption and CO 2 emissions, which highlights the importance of promoting greener energy in order to combat global warming.

  15. Quantile regression and Bayesian cluster detection to identify radon prone areas.

    PubMed

    Sarra, Annalina; Fontanella, Lara; Valentini, Pasquale; Palermi, Sergio

    2016-11-01

    Albeit the dominant source of radon in indoor environments is the geology of the territory, many studies have demonstrated that indoor radon concentrations also depend on dwelling-specific characteristics. Following a stepwise analysis, in this study we propose a combined approach to delineate radon prone areas. We first investigate the impact of various building covariates on indoor radon concentrations. To achieve a more complete picture of this association, we exploit the flexible formulation of a Bayesian spatial quantile regression, which is also equipped with parameters that controls the spatial dependence across data. The quantitative knowledge of the influence of each significant building-specific factor on the measured radon levels is employed to predict the radon concentrations that would have been found if the sampled buildings had possessed standard characteristics. Those normalised radon measures should reflect the geogenic radon potential of the underlying ground, which is a quantity directly related to the geological environment. The second stage of the analysis is aimed at identifying radon prone areas, and to this end, we adopt a Bayesian model for spatial cluster detection using as reference unit the building with standard characteristics. The case study is based on a data set of more than 2000 indoor radon measures, available for the Abruzzo region (Central Italy) and collected by the Agency of Environmental Protection of Abruzzo, during several indoor radon monitoring surveys. Copyright © 2016 Elsevier Ltd. All rights reserved.

  16. Bayesian estimation of extreme flood quantiles using a rainfall-runoff model and a stochastic daily rainfall generator

    NASA Astrophysics Data System (ADS)

    Costa, Veber; Fernandes, Wilson

    2017-11-01

    Extreme flood estimation has been a key research topic in hydrological sciences. Reliable estimates of such events are necessary as structures for flood conveyance are continuously evolving in size and complexity and, as a result, their failure-associated hazards become more and more pronounced. Due to this fact, several estimation techniques intended to improve flood frequency analysis and reducing uncertainty in extreme quantile estimation have been addressed in the literature in the last decades. In this paper, we develop a Bayesian framework for the indirect estimation of extreme flood quantiles from rainfall-runoff models. In the proposed approach, an ensemble of long daily rainfall series is simulated with a stochastic generator, which models extreme rainfall amounts with an upper-bounded distribution function, namely, the 4-parameter lognormal model. The rationale behind the generation model is that physical limits for rainfall amounts, and consequently for floods, exist and, by imposing an appropriate upper bound for the probabilistic model, more plausible estimates can be obtained for those rainfall quantiles with very low exceedance probabilities. Daily rainfall time series are converted into streamflows by routing each realization of the synthetic ensemble through a conceptual hydrologic model, the Rio Grande rainfall-runoff model. Calibration of parameters is performed through a nonlinear regression model, by means of the specification of a statistical model for the residuals that is able to accommodate autocorrelation, heteroscedasticity and nonnormality. By combining the outlined steps in a Bayesian structure of analysis, one is able to properly summarize the resulting uncertainty and estimating more accurate credible intervals for a set of flood quantiles of interest. The method for extreme flood indirect estimation was applied to the American river catchment, at the Folsom dam, in the state of California, USA. Results show that most floods

  17. The use of quantile regression to forecast higher than expected respiratory deaths in a daily time series: a study of New York City data 1987-2000.

    PubMed

    Soyiri, Ireneous N; Reidpath, Daniel D

    2013-01-01

    Forecasting higher than expected numbers of health events provides potentially valuable insights in its own right, and may contribute to health services management and syndromic surveillance. This study investigates the use of quantile regression to predict higher than expected respiratory deaths. Data taken from 70,830 deaths occurring in New York were used. Temporal, weather and air quality measures were fitted using quantile regression at the 90th-percentile with half the data (in-sample). Four QR models were fitted: an unconditional model predicting the 90th-percentile of deaths (Model 1), a seasonal/temporal (Model 2), a seasonal, temporal plus lags of weather and air quality (Model 3), and a seasonal, temporal model with 7-day moving averages of weather and air quality. Models were cross-validated with the out of sample data. Performance was measured as proportionate reduction in weighted sum of absolute deviations by a conditional, over unconditional models; i.e., the coefficient of determination (R1). The coefficient of determination showed an improvement over the unconditional model between 0.16 and 0.19. The greatest improvement in predictive and forecasting accuracy of daily mortality was associated with the inclusion of seasonal and temporal predictors (Model 2). No gains were made in the predictive models with the addition of weather and air quality predictors (Models 3 and 4). However, forecasting models that included weather and air quality predictors performed slightly better than the seasonal and temporal model alone (i.e., Model 3 > Model 4 > Model 2) This study provided a new approach to predict higher than expected numbers of respiratory related-deaths. The approach, while promising, has limitations and should be treated at this stage as a proof of concept.

  18. The Use of Quantile Regression to Forecast Higher Than Expected Respiratory Deaths in a Daily Time Series: A Study of New York City Data 1987-2000

    PubMed Central

    Soyiri, Ireneous N.; Reidpath, Daniel D.

    2013-01-01

    Forecasting higher than expected numbers of health events provides potentially valuable insights in its own right, and may contribute to health services management and syndromic surveillance. This study investigates the use of quantile regression to predict higher than expected respiratory deaths. Data taken from 70,830 deaths occurring in New York were used. Temporal, weather and air quality measures were fitted using quantile regression at the 90th-percentile with half the data (in-sample). Four QR models were fitted: an unconditional model predicting the 90th-percentile of deaths (Model 1), a seasonal / temporal (Model 2), a seasonal, temporal plus lags of weather and air quality (Model 3), and a seasonal, temporal model with 7-day moving averages of weather and air quality. Models were cross-validated with the out of sample data. Performance was measured as proportionate reduction in weighted sum of absolute deviations by a conditional, over unconditional models; i.e., the coefficient of determination (R1). The coefficient of determination showed an improvement over the unconditional model between 0.16 and 0.19. The greatest improvement in predictive and forecasting accuracy of daily mortality was associated with the inclusion of seasonal and temporal predictors (Model 2). No gains were made in the predictive models with the addition of weather and air quality predictors (Models 3 and 4). However, forecasting models that included weather and air quality predictors performed slightly better than the seasonal and temporal model alone (i.e., Model 3 > Model 4 > Model 2) This study provided a new approach to predict higher than expected numbers of respiratory related-deaths. The approach, while promising, has limitations and should be treated at this stage as a proof of concept. PMID:24147122

  19. Use of Quantile Regression to Determine the Impact on Total Health Care Costs of Surgical Site Infections Following Common Ambulatory Procedures

    PubMed Central

    Olsen, Margaret A.; Tian, Fang; Wallace, Anna E.; Nickel, Katelin B.; Warren, David K.; Fraser, Victoria J.; Selvam, Nandini; Hamilton, Barton H.

    2017-01-01

    Objective To determine the impact of surgical site infections (SSIs) on healthcare costs following common ambulatory surgical procedures throughout the cost distribution. Background Data on costs of SSIs following ambulatory surgery are sparse, particularly variation beyond just mean costs. Methods We performed a retrospective cohort study of persons undergoing cholecystectomy, breast-conserving surgery (BCS), anterior cruciate ligament reconstruction (ACL), and hernia repair from 12/31/2004–12/31/2010 using commercial insurer claims data. SSIs within 90 days post-procedure were identified; infections during a hospitalization or requiring surgery were considered serious. We used quantile regression, controlling for patient, operative, and postoperative factors to examine the impact of SSIs on 180-day healthcare costs throughout the cost distribution. Results The incidence of serious and non-serious SSIs were 0.8% and 0.2% after 21,062 ACL, 0.5% and 0.3% after 57,750 cholecystectomy, 0.6% and 0.5% after 60,681 hernia, and 0.8% and 0.8% after 42,489 BCS procedures. Serious SSIs were associated with significantly higher costs than non-serious SSIs for all 4 procedures throughout the cost distribution. The attributable cost of serious SSIs increased for both cholecystectomy and hernia repair as the quantile of total costs increased ($38,410 for cholecystectomy with serious SSI vs. no SSI at the 70th percentile of costs, up to $89,371 at the 90th percentile). Conclusions SSIs, particularly serious infections resulting in hospitalization or surgical treatment, were associated with significantly increased healthcare costs after 4 common surgical procedures. Quantile regression illustrated the differential effect of serious SSIs on healthcare costs at the upper end of the cost distribution. PMID:28059961

  20. Use of Quantile Regression to Determine the Impact on Total Health Care Costs of Surgical Site Infections Following Common Ambulatory Procedures.

    PubMed

    Olsen, Margaret A; Tian, Fang; Wallace, Anna E; Nickel, Katelin B; Warren, David K; Fraser, Victoria J; Selvam, Nandini; Hamilton, Barton H

    2017-02-01

    To determine the impact of surgical site infections (SSIs) on health care costs following common ambulatory surgical procedures throughout the cost distribution. Data on costs of SSIs following ambulatory surgery are sparse, particularly variation beyond just mean costs. We performed a retrospective cohort study of persons undergoing cholecystectomy, breast-conserving surgery, anterior cruciate ligament reconstruction, and hernia repair from December 31, 2004 to December 31, 2010 using commercial insurer claims data. SSIs within 90 days post-procedure were identified; infections during a hospitalization or requiring surgery were considered serious. We used quantile regression, controlling for patient, operative, and postoperative factors to examine the impact of SSIs on 180-day health care costs throughout the cost distribution. The incidence of serious and nonserious SSIs was 0.8% and 0.2%, respectively, after 21,062 anterior cruciate ligament reconstruction, 0.5% and 0.3% after 57,750 cholecystectomy, 0.6% and 0.5% after 60,681 hernia, and 0.8% and 0.8% after 42,489 breast-conserving surgery procedures. Serious SSIs were associated with significantly higher costs than nonserious SSIs for all 4 procedures throughout the cost distribution. The attributable cost of serious SSIs increased for both cholecystectomy and hernia repair as the quantile of total costs increased ($38,410 for cholecystectomy with serious SSI vs no SSI at the 70th percentile of costs, up to $89,371 at the 90th percentile). SSIs, particularly serious infections resulting in hospitalization or surgical treatment, were associated with significantly increased health care costs after 4 common surgical procedures. Quantile regression illustrated the differential effect of serious SSIs on health care costs at the upper end of the cost distribution.

  1. Estimation of effects of factors related to preschooler body mass index using quantile regression model.

    PubMed

    Kim, Hee Soon; Park, Yun Hee; Park, Hyun Bong; Kim, Su Hee

    2014-12-01

    The purpose of this study was to investigate Korean preschoolers' obesity-related factors through an ecological approach and to identify Korean preschoolers' obesity-related factors and the different effects of ecological variables on body mass index and its quantiles through an ecological approach. The study design was cross-sectional. Through convenience sampling, 241 cases were collected from three kindergartens and seven nurseries in the Seoul metropolitan area and Kyunggi Province in April 2013 using self-administered questionnaires from preschoolers' mothers and homeroom teachers. Results of ordinary least square regression analysis show that mother's sedentary behavior (p < .001), sedentary behavior parenting (p = .039), healthy eating parenting (p = .027), physical activity-related social capital (p = .029) were significant factors of preschoolers' body mass index. While in the 5% body mass index distribution group, gender (p = .031), preference for physical activity (p = .015), mother's sedentary behavior parenting (p = .032), healthy eating parenting (p = .005), and teacher's sedentary behavior (p = .037) showed significant influences. In the 25% group, the effects of gender and preference for physical activity were no longer significant. In the 75% and 95% group, only mother's sedentary behavior showed a statistically significant influence (p < .001, p = .012 respectively). Efforts to lower the obesity rate of preschoolers should focus on their environment, especially on the sedentary behavior of mothers, as mothers are the main nurturers of this age group. Copyright © 2014. Published by Elsevier B.V.

  2. Gender differences in French GPs' activity: the contribution of quantile regressions.

    PubMed

    Dumontet, Magali; Franc, Carine

    2015-05-01

    In any fee-for-service system, doctors may be encouraged to increase the number of services (private activity) they provide to receive a higher income. Studying private activity determinants helps to predict doctors' provision of care. In the context of strong feminization and heterogeneity in general practitioners' (GP) behavior, we first aim to measure the effects of the determinants of private activity. Second, we study the evolution of these effects along the private activity distribution. Third, we examine the differences between male and female GPs. From an exhaustive database of French GPs working in private practice in 2008, we performed an ordinary least squares (OLS) regression and quantile regressions (QR) on the GPs' private activity. Among other determinants, we examined the trade-offs within the GPs' household considering his/her marital status, spousal income, and children. While the OLS results showed that female GPs had less private activity than male GPs (-13%), the QR results emphasized a private activity gender gap that increased significantly in the upper tail of the distribution. We also find gender differences in the private activity determinants, including family structure, practice characteristics, and case-mix variables. For instance, having a youngest child under 12 years old had a positive effect on the level of private activity for male GPs and a negative effect for female GPs. The results allow us to understand to what extent the supply of care differs between male and female GPs. In the context of strong feminization, this is essential to consider for organizing and forecasting the GPs' supply of care.

  3. Using the Quantile Mapping to improve a weather generator

    NASA Astrophysics Data System (ADS)

    Chen, Y.; Themessl, M.; Gobiet, A.

    2012-04-01

    We developed a weather generator (WG) by using statistical and stochastic methods, among them are quantile mapping (QM), Monte-Carlo, auto-regression, empirical orthogonal function (EOF). One of the important steps in the WG is using QM, through which all the variables, no matter what distribution they originally are, are transformed into normal distributed variables. Therefore, the WG can work on normally distributed variables, which greatly facilitates the treatment of random numbers in the WG. Monte-Carlo and auto-regression are used to generate the realization; EOFs are employed for preserving spatial relationships and the relationships between different meteorological variables. We have established a complete model named WGQM (weather generator and quantile mapping), which can be applied flexibly to generate daily or hourly time series. For example, with 30-year daily (hourly) data and 100-year monthly (daily) data as input, the 100-year daily (hourly) data would be relatively reasonably produced. Some evaluation experiments with WGQM have been carried out in the area of Austria and the evaluation results will be presented.

  4. Patient characteristics associated with differences in radiation exposure from pediatric abdomen-pelvis CT scans: a quantile regression analysis.

    PubMed

    Cooper, Jennifer N; Lodwick, Daniel L; Adler, Brent; Lee, Choonsik; Minneci, Peter C; Deans, Katherine J

    2017-06-01

    Computed tomography (CT) is a widely used diagnostic tool in pediatric medicine. However, due to concerns regarding radiation exposure, it is essential to identify patient characteristics associated with higher radiation burden from CT imaging, in order to more effectively target efforts towards dose reduction. Our objective was to identify the effects of various demographic and clinical patient characteristics on radiation exposure from single abdomen/pelvis CT scans in children. CT scans performed at our institution between January 2013 and August 2015 in patients under 16 years of age were processed using a software tool that estimates patient-specific organ and effective doses and merges these estimates with data from the electronic health record and billing record. Quantile regression models at the 50th, 75th, and 90th percentiles were used to estimate the effects of patients' demographic and clinical characteristics on effective dose. 2390 abdomen/pelvis CT scans (median effective dose 1.52mSv) were included. Of all characteristics examined, only older age, female gender, higher BMI, and whether the scan was a multiphase exam or an exam that required repeating for movement were significant predictors of higher effective dose at each quantile examined (all p<0.05). The effects of obesity and multiphase or repeat scanning on effective dose were magnified in higher dose scans. Older age, female gender, obesity, and multiphase or repeat scanning are all associated with increased effective dose from abdomen/pelvis CT. Targeted efforts to reduce dose from abdominal CT in these groups should be undertaken. Copyright © 2017 Elsevier Ltd. All rights reserved.

  5. Spatially Modeling the Effects of Meteorological Drivers of PM2.5 in the Eastern United States via a Local Linear Penalized Quantile Regression Estimator.

    PubMed

    Russell, Brook T; Wang, Dewei; McMahan, Christopher S

    2017-08-01

    Fine particulate matter (PM 2.5 ) poses a significant risk to human health, with long-term exposure being linked to conditions such as asthma, chronic bronchitis, lung cancer, atherosclerosis, etc. In order to improve current pollution control strategies and to better shape public policy, the development of a more comprehensive understanding of this air pollutant is necessary. To this end, this work attempts to quantify the relationship between certain meteorological drivers and the levels of PM 2.5 . It is expected that the set of important meteorological drivers will vary both spatially and within the conditional distribution of PM 2.5 levels. To account for these characteristics, a new local linear penalized quantile regression methodology is developed. The proposed estimator uniquely selects the set of important drivers at every spatial location and for each quantile of the conditional distribution of PM 2.5 levels. The performance of the proposed methodology is illustrated through simulation, and it is then used to determine the association between several meteorological drivers and PM 2.5 over the Eastern United States (US). This analysis suggests that the primary drivers throughout much of the Eastern US tend to differ based on season and geographic location, with similarities existing between "typical" and "high" PM 2.5 levels.

  6. Quantile rank maps: a new tool for understanding individual brain development.

    PubMed

    Chen, Huaihou; Kelly, Clare; Castellanos, F Xavier; He, Ye; Zuo, Xi-Nian; Reiss, Philip T

    2015-05-01

    We propose a novel method for neurodevelopmental brain mapping that displays how an individual's values for a quantity of interest compare with age-specific norms. By estimating smoothly age-varying distributions at a set of brain regions of interest, we derive age-dependent region-wise quantile ranks for a given individual, which can be presented in the form of a brain map. Such quantile rank maps could potentially be used for clinical screening. Bootstrap-based confidence intervals are proposed for the quantile rank estimates. We also propose a recalibrated Kolmogorov-Smirnov test for detecting group differences in the age-varying distribution. This test is shown to be more robust to model misspecification than a linear regression-based test. The proposed methods are applied to brain imaging data from the Nathan Kline Institute Rockland Sample and from the Autism Brain Imaging Data Exchange (ABIDE) sample. Copyright © 2015 Elsevier Inc. All rights reserved.

  7. Socio-demographic, clinical characteristics and utilization of mental health care services associated with SF-6D utility scores in patients with mental disorders: contributions of the quantile regression.

    PubMed

    Prigent, Amélie; Kamendje-Tchokobou, Blaise; Chevreul, Karine

    2017-11-01

    Health-related quality of life (HRQoL) is a widely used concept in the assessment of health care. Some generic HRQoL instruments, based on specific algorithms, can generate utility scores which reflect the preferences of the general population for the different health states described by the instrument. This study aimed to investigate the relationships between utility scores and potentially associated factors in patients with mental disorders followed in inpatient and/or outpatient care settings using two statistical methods. Patients were recruited in four psychiatric sectors in France. Patient responses to the SF-36 generic HRQoL instrument were used to calculate SF-6D utility scores. The relationships between utility scores and patient socio-demographic, clinical characteristics, and mental health care utilization, considered as potentially associated factors, were studied using OLS and quantile regressions. One hundred and seventy six patients were included. Women, severely ill patients and those hospitalized full-time tended to report lower utility scores, whereas psychotic disorders (as opposed to mood disorders) and part-time care were associated with higher scores. The quantile regression highlighted that the size of the associations between the utility scores and some patient characteristics varied along with the utility score distribution, and provided more accurate estimated values than OLS regression. The quantile regression may constitute a relevant complement for the analysis of factors associated with utility scores. For policy decision-making, the association of full-time hospitalization with lower utility scores while part-time care was associated with higher scores supports the further development of alternatives to full-time hospitalizations.

  8. Association of Perceived Stress with Stressful Life Events, Lifestyle and Sociodemographic Factors: A Large-Scale Community-Based Study Using Logistic Quantile Regression

    PubMed Central

    Feizi, Awat; Aliyari, Roqayeh; Roohafza, Hamidreza

    2012-01-01

    Objective. The present paper aimed at investigating the association between perceived stress and major life events stressors in Iranian general population. Methods. In a cross-sectional large-scale community-based study, 4583 people aged 19 and older, living in Isfahan, Iran, were investigated. Logistic quantile regression was used for modeling perceived stress, measured by GHQ questionnaire, as the bounded outcome (dependent), variable, and as a function of most important stressful life events, as the predictor variables, controlling for major lifestyle and sociodemographic factors. This model provides empirical evidence of the predictors' effects heterogeneity depending on individual location on the distribution of perceived stress. Results. The results showed that among four stressful life events, family conflicts and social problems were more correlated with level of perceived stress. Higher levels of education were negatively associated with perceived stress and its coefficients monotonically decrease beyond the 30th percentile. Also, higher levels of physical activity were associated with perception of low levels of stress. The pattern of gender's coefficient over the majority of quantiles implied that females are more affected by stressors. Also high perceived stress was associated with low or middle levels of income. Conclusions. The results of current research suggested that in a developing society with high prevalence of stress, interventions targeted toward promoting financial and social equalities, social skills training, and healthy lifestyle may have the potential benefits for large parts of the population, most notably female and lower educated people. PMID:23091560

  9. [Socioeconomic factors conditioning obesity in adults. Evidence based on quantile regression and panel data].

    PubMed

    Temporelli, Karina L; Viego, Valentina N

    2016-08-01

    Objective To measure the effect of socioeconomic variables on the prevalence of obesity. Factors such as income level, urbanization, incorporation of women into the labor market and access to unhealthy foods are considered in this paper. Method Econometric estimates of the proportion of obese men and women by country were calculated using models based on panel data and quantile regressions, with data from 192 countries for the period 2002-2005.Levels of per capita income, urbanization, income/big mac ratio price and labor indicators for female population were considered as explanatory variables. Results Factors that have influence over obesity in adults differ between men and women; accessibility to fast food is related to male obesity, while the employment mode causes higher rates in women. The underlying socioeconomic factors for obesity are also different depending on the magnitude of this problem in each country; in countries with low prevalence, a greater level of income favor the transition to obesogenic habits, while a higher income level mitigates the problem in those countries with high rates of obesity. Discussion Identifying the socio-economic causes of the significant increase in the prevalence of obesity is essential for the implementation of effective strategies for prevention, since this condition not only affects the quality of life of those who suffer from it but also puts pressure on health systems due to the treatment costs of associated diseases.

  10. Using instant messaging to enhance the interpersonal relationships of Taiwanese adolescents: evidence from quantile regression analysis.

    PubMed

    Lee, Yueh-Chiang; Sun, Ya Chung

    2009-01-01

    Even though use of the internet by adolescents has grown exponentially, little is known about the correlation between their interaction via Instant Messaging (IM) and the evolution of their interpersonal relationships in real life. In the present study, 369 junior high school students in Taiwan responded to questions regarding their IM usage and their dispositional measures of real-life interpersonal relationships. Descriptive statistics, factor analysis, and quantile regression methods were used to analyze the data. Results indicate that (1) IM helps define adolescents' self-identity (forming and maintaining individual friendships) and social-identity (belonging to a peer group), and (2) how development of an interpersonal relationship is impacted by the use of IM since it appears that adolescents use IM to improve their interpersonal relationships in real life.

  11. Technical note: Combining quantile forecasts and predictive distributions of streamflows

    NASA Astrophysics Data System (ADS)

    Bogner, Konrad; Liechti, Katharina; Zappa, Massimiliano

    2017-11-01

    The enhanced availability of many different hydro-meteorological modelling and forecasting systems raises the issue of how to optimally combine this great deal of information. Especially the usage of deterministic and probabilistic forecasts with sometimes widely divergent predicted future streamflow values makes it even more complicated for decision makers to sift out the relevant information. In this study multiple streamflow forecast information will be aggregated based on several different predictive distributions, and quantile forecasts. For this combination the Bayesian model averaging (BMA) approach, the non-homogeneous Gaussian regression (NGR), also known as the ensemble model output statistic (EMOS) techniques, and a novel method called Beta-transformed linear pooling (BLP) will be applied. By the help of the quantile score (QS) and the continuous ranked probability score (CRPS), the combination results for the Sihl River in Switzerland with about 5 years of forecast data will be compared and the differences between the raw and optimally combined forecasts will be highlighted. The results demonstrate the importance of applying proper forecast combination methods for decision makers in the field of flood and water resource management.

  12. Quantile Functions, Convergence in Quantile, and Extreme Value Distribution Theory.

    DTIC Science & Technology

    1980-11-01

    Gnanadesikan (1968). Quantile functions are advocated by Parzen (1979) as providing an approach to probability-based data analysis. Quantile functions are... Gnanadesikan , R. (1968). Probability Plotting Methods for the Analysis of Data, Biomtrika, 55, 1-17.

  13. A hierarchical Bayesian GEV model for improving local and regional flood quantile estimates

    NASA Astrophysics Data System (ADS)

    Lima, Carlos H. R.; Lall, Upmanu; Troy, Tara; Devineni, Naresh

    2016-10-01

    We estimate local and regional Generalized Extreme Value (GEV) distribution parameters for flood frequency analysis in a multilevel, hierarchical Bayesian framework, to explicitly model and reduce uncertainties. As prior information for the model, we assume that the GEV location and scale parameters for each site come from independent log-normal distributions, whose mean parameter scales with the drainage area. From empirical and theoretical arguments, the shape parameter for each site is shrunk towards a common mean. Non-informative prior distributions are assumed for the hyperparameters and the MCMC method is used to sample from the joint posterior distribution. The model is tested using annual maximum series from 20 streamflow gauges located in an 83,000 km2 flood prone basin in Southeast Brazil. The results show a significant reduction of uncertainty estimates of flood quantile estimates over the traditional GEV model, particularly for sites with shorter records. For return periods within the range of the data (around 50 years), the Bayesian credible intervals for the flood quantiles tend to be narrower than the classical confidence limits based on the delta method. As the return period increases beyond the range of the data, the confidence limits from the delta method become unreliable and the Bayesian credible intervals provide a way to estimate satisfactory confidence bands for the flood quantiles considering parameter uncertainties and regional information. In order to evaluate the applicability of the proposed hierarchical Bayesian model for regional flood frequency analysis, we estimate flood quantiles for three randomly chosen out-of-sample sites and compare with classical estimates using the index flood method. The posterior distributions of the scaling law coefficients are used to define the predictive distributions of the GEV location and scale parameters for the out-of-sample sites given only their drainage areas and the posterior distribution of the

  14. Using quantile regression to examine health care expenditures during the Great Recession.

    PubMed

    Chen, Jie; Vargas-Bustamante, Arturo; Mortensen, Karoline; Thomas, Stephen B

    2014-04-01

    To examine the association between the Great Recession of 2007-2009 and health care expenditures along the health care spending distribution, with a focus on racial/ethnic disparities. Secondary data analyses of the Medical Expenditure Panel Survey (2005-2006 and 2008-2009). Quantile multivariate regressions are employed to measure the different associations between the economic recession of 2007-2009 and health care spending. Race/ethnicity and interaction terms between race/ethnicity and a recession indicator are controlled to examine whether minorities encountered disproportionately lower health spending during the economic recession. The Great Recession was significantly associated with reductions in health care expenditures at the 10th-50th percentiles of the distribution, but not at the 75th-90th percentiles. Racial and ethnic disparities were more substantial at the lower end of the health expenditure distribution; however, on average the reduction in expenditures was similar for all race/ethnic groups. The Great Recession was also positively associated with spending on emergency department visits. This study shows that the relationship between the Great Recession and health care spending varied along the health expenditure distribution. More variability was observed in the lower end of the health spending distribution compared to the higher end. © Health Research and Educational Trust.

  15. Detecting Long-term Trend of Water Quality Indices of Dong-gang River, Taiwan Using Quantile Regression

    NASA Astrophysics Data System (ADS)

    Yang, D.; Shiau, J.

    2013-12-01

    ABSTRACT BODY: Abstract Surface water quality is an essential issue in water-supply for human uses and sustaining healthy ecosystem of rivers. However, water quality of rivers is easily influenced by anthropogenic activities such as urban development and wastewater disposal. Long-term monitoring of water quality can assess whether water quality of rivers deteriorates or not. Taiwan is a population-dense area and heavily depends on surface water for domestic, industrial, and agricultural uses. Dong-gang River is one of major resources in southern Taiwan for agricultural requirements. The water-quality data of four monitoring stations of the Dong-gang River for the period of 2000-2012 are selected for trend analysis. The parameters used to characterize water quality of rivers include biochemical oxygen demand (BOD), dissolved oxygen (DO), suspended solids (SS), and ammonia nitrogen (NH3-N). These four water-quality parameters are integrated into an index called river pollution index (RPI) to indicate the pollution level of rivers. Although widely used non-parametric Mann-Kendall test and linear regression exhibit computational efficiency to identify trends of water-quality indices, limitations of such approaches include sensitive to outliers and estimations of conditional mean only. Quantile regression, capable of identifying changes over time of any percentile values, is employed in this study to detect long-term trend of water-quality indices for the Dong-gang River located in southern Taiwan. The results show that Dong-gang River 4 stations from 2000 to 2012 monthly long-term trends in water quality.To analyze s Dong-gang River long-term water quality trends and pollution characteristics. The results showed that the bridge measuring ammonia Long-dong, BOD5 measure in that station on a downward trend, DO, and SS is on the rise, River Pollution Index (RPI) on a downward trend. The results form Chau-Jhou station also ahowed simialar trends .more and more near the

  16. Obesity inequality in Malaysia: decomposing differences by gender and ethnicity using quantile regression.

    PubMed

    Dunn, Richard A; Tan, Andrew K G; Nayga, Rodolfo M

    2012-01-01

    Obesity prevalence is unequally distributed across gender and ethnic group in Malaysia. In this paper, we examine the role of socioeconomic inequality in explaining these disparities. The body mass index (BMI) distributions of Malays and Chinese, the two largest ethnic groups in Malaysia, are estimated through the use of quantile regression. The differences in the BMI distributions are then decomposed into two parts: attributable to differences in socioeconomic endowments and attributable to differences in responses to endowments. For both males and females, the BMI distribution of Malays is shifted toward the right of the distribution of Chinese, i.e., Malays exhibit higher obesity rates. In the lower 75% of the distribution, differences in socioeconomic endowments explain none of this difference. At the 90th percentile, differences in socioeconomic endowments account for no more than 30% of the difference in BMI between ethnic groups. Our results demonstrate that the higher levels of income and education that accrue with economic development will likely not eliminate obesity inequality. This leads us to conclude that reduction of obesity inequality, as well the overall level of obesity, requires increased efforts to alter the lifestyle behaviors of Malaysians.

  17. Hospital ownership and drug utilization under a global budget: a quantile regression analysis.

    PubMed

    Zhang, Jing Hua; Chou, Shin-Yi; Deily, Mary E; Lien, Hsien-Ming

    2014-03-01

    A global budgeting system helps control the growth of healthcare spending by setting expenditure ceilings. However, the hospital global budget implemented in Taiwan in 2002 included a special provision: drug expenditures are reimbursed at face value, while other expenditures are subject to discounting. That gives hospitals, particularly those that are for-profit, an incentive to increase drug expenditures in treating patients. We calculated monthly drug expenditures by hospital departments from January 1997 to June 2006, using a sample of 348 193 patient claims to Taiwan National Health Insurance. To allow for variation among responses by departments with differing reliance on drugs and among hospitals of different ownerships, we used quantile regression to identify the effect of the hospital global budget on drug expenditures. Although drug expenditure increased in all hospital departments after the enactment of the hospital global budget, departments in for-profit hospitals that rely more heavily on drug treatments increased drug spending more, relative to public hospitals. Our findings suggest that a global budgeting system with special reimbursement provisions for certain treatment categories may alter treatment decisions and may undermine cost-containment goals, particularly among for-profit hospitals.

  18. Application of empirical mode decomposition with local linear quantile regression in financial time series forecasting.

    PubMed

    Jaber, Abobaker M; Ismail, Mohd Tahir; Altaher, Alsaidi M

    2014-01-01

    This paper mainly forecasts the daily closing price of stock markets. We propose a two-stage technique that combines the empirical mode decomposition (EMD) with nonparametric methods of local linear quantile (LLQ). We use the proposed technique, EMD-LLQ, to forecast two stock index time series. Detailed experiments are implemented for the proposed method, in which EMD-LPQ, EMD, and Holt-Winter methods are compared. The proposed EMD-LPQ model is determined to be superior to the EMD and Holt-Winter methods in predicting the stock closing prices.

  19. Automatic coronary artery segmentation based on multi-domains remapping and quantile regression in angiographies.

    PubMed

    Li, Zhixun; Zhang, Yingtao; Gong, Huiling; Li, Weimin; Tang, Xianglong

    2016-12-01

    Coronary artery disease has become the most dangerous diseases to human life. And coronary artery segmentation is the basis of computer aided diagnosis and analysis. Existing segmentation methods are difficult to handle the complex vascular texture due to the projective nature in conventional coronary angiography. Due to large amount of data and complex vascular shapes, any manual annotation has become increasingly unrealistic. A fully automatic segmentation method is necessary in clinic practice. In this work, we study a method based on reliable boundaries via multi-domains remapping and robust discrepancy correction via distance balance and quantile regression for automatic coronary artery segmentation of angiography images. The proposed method can not only segment overlapping vascular structures robustly, but also achieve good performance in low contrast regions. The effectiveness of our approach is demonstrated on a variety of coronary blood vessels compared with the existing methods. The overall segmentation performances si, fnvf, fvpf and tpvf were 95.135%, 3.733%, 6.113%, 96.268%, respectively. Copyright © 2016 Elsevier Ltd. All rights reserved.

  20. Quantile based Tsallis entropy in residual lifetime

    NASA Astrophysics Data System (ADS)

    Khammar, A. H.; Jahanshahi, S. M. A.

    2018-02-01

    Tsallis entropy is a generalization of type α of the Shannon entropy, that is a nonadditive entropy unlike the Shannon entropy. Shannon entropy may be negative for some distributions, but Tsallis entropy can always be made nonnegative by choosing appropriate value of α. In this paper, we derive the quantile form of this nonadditive's entropy function in the residual lifetime, namely the residual quantile Tsallis entropy (RQTE) and get the bounds for it, depending on the Renyi's residual quantile entropy. Also, we obtain relationship between RQTE and concept of proportional hazards model in the quantile setup. Based on the new measure, we propose a stochastic order and aging classes, and study its properties. Finally, we prove characterizations theorems for some well known lifetime distributions. It is shown that RQTE uniquely determines the parent distribution unlike the residual Tsallis entropy.

  1. Using Quantile Regression to Examine Health Care Expenditures during the Great Recession

    PubMed Central

    Chen, Jie; Vargas-Bustamante, Arturo; Mortensen, Karoline; Thomas, Stephen B

    2014-01-01

    Objective To examine the association between the Great Recession of 2007–2009 and health care expenditures along the health care spending distribution, with a focus on racial/ethnic disparities. Data Sources/Study Setting Secondary data analyses of the Medical Expenditure Panel Survey (2005–2006 and 2008–2009). Study Design Quantile multivariate regressions are employed to measure the different associations between the economic recession of 2007–2009 and health care spending. Race/ethnicity and interaction terms between race/ethnicity and a recession indicator are controlled to examine whether minorities encountered disproportionately lower health spending during the economic recession. Principal Findings The Great Recession was significantly associated with reductions in health care expenditures at the 10th–50th percentiles of the distribution, but not at the 75th–90th percentiles. Racial and ethnic disparities were more substantial at the lower end of the health expenditure distribution; however, on average the reduction in expenditures was similar for all race/ethnic groups. The Great Recession was also positively associated with spending on emergency department visits. Conclusion This study shows that the relationship between the Great Recession and health care spending varied along the health expenditure distribution. More variability was observed in the lower end of the health spending distribution compared to the higher end. PMID:24134797

  2. How important are determinants of obesity measured at the individual level for explaining geographic variation in body mass index distributions? Observational evidence from Canada using Quantile Regression and Blinder-Oaxaca Decomposition.

    PubMed

    Dutton, Daniel J; McLaren, Lindsay

    2016-04-01

    Obesity prevalence varies between geographic regions in Canada. The reasons for this variation are unclear but most likely implicate both individual-level and population-level factors. The objective of this study was to examine whether equalising correlates of body mass index (BMI) across these geographic regions could be reasonably expected to reduce differences in BMI distributions between regions. Using data from three cycles of the Canadian Community Health Survey (CCHS) 2001, 2003 and 2007 for males and females, we modelled between-region BMI cross-sectionally using quantile regression and Blinder-Oaxaca decomposition of the quantile regression results. We show that while individual-level variables (ie, age, income, education, physical activity level, fruit and vegetable consumption, smoking status, drinking status, family doctor status, rural status, employment in the past 12 months and marital status) may be Caucasian important correlates of BMI within geographic regions, those variables are not capable of explaining variation in BMI between regions. Equalisation of common correlates of BMI between regions cannot be reasonably expected to reduce differences in the BMI distributions between regions. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/

  3. Estimating earnings losses due to mental illness: a quantile regression approach.

    PubMed

    Marcotte, Dave E; Wilcox-Gök, Virginia

    2003-09-01

    The ability of workers to remain productive and sustain earnings when afflicted with mental illness depends importantly on access to appropriate treatment and on flexibility and support from employers. In the United States there is substantial variation in access to health care and sick leave and other employment flexibilities across the earnings distribution. Consequently, a worker's ability to work and how much his/her earnings are impeded likely depend upon his/her position in the earnings distribution. Because of this, focusing on average earnings losses may provide insufficient information on the impact of mental illness in the labor market. In this paper, we examine the effects of mental illness on earnings by recognizing that effects could vary across the distribution of earnings. Using data from the National Comorbidity Survey, we employ a quantile regression estimator to identify the effects at key points in the earnings distribution. We find that earnings effects vary importantly across the distribution. While average effects are often not large, mental illness more commonly imposes earnings losses at the lower tail of the distribution, especially for women. In only one case do we find an illness to have negative effects across the distribution. Mental illness can have larger negative impacts on economic outcomes than previously estimated, even if those effects are not uniform. Consequently, researchers and policy makers alike should not be placated by findings that mean earnings effects are relatively small. Such estimates miss important features of how and where mental illness is associated with real economic losses for the ill.

  4. Asymmetric impact of rainfall on India's food grain production: evidence from quantile autoregressive distributed lag model

    NASA Astrophysics Data System (ADS)

    Pal, Debdatta; Mitra, Subrata Kumar

    2018-01-01

    This study used a quantile autoregressive distributed lag (QARDL) model to capture asymmetric impact of rainfall on food production in India. It was found that the coefficient corresponding to the rainfall in the QARDL increased till the 75th quantile and started decreasing thereafter, though it remained in the positive territory. Another interesting finding is that at the 90th quantile and above the coefficients of rainfall though remained positive was not statistically significant and therefore, the benefit of high rainfall on crop production was not conclusive. However, the impact of other determinants, such as fertilizer and pesticide consumption, is quite uniform over the whole range of the distribution of food grain production.

  5. Fitness adjusted racial disparities in central adiposity among women in the USA using quantile regression.

    PubMed

    McDonald, S; Ortaglia, A; Supino, C; Kacka, M; Clenin, M; Bottai, M

    2017-06-01

    This study comprehensively explores racial/ethnic disparities in waist circumference (WC) after adjusting for cardiorespiratory fitness (CRF), among both adult and adolescent women, across WC percentiles. Analysis was conducted using data from the 1999 to 2004 National Health and Nutrition Examination Survey. Female participants ( n  = 3,977) aged 12-49 years with complete data on CRF, height, weight and WC were included. Quantile regression models, stratified by age groups (12-15, 16-19 and 20-49 years), were used to assess the association between WC and race/ethnicity adjusting for CRF, height and age across WC percentiles (10th, 25th, 50th, 75th, 90th and 95th). For non-Hispanic (NH) Black, in both the 16-19 and 20-49 years age groups, estimated WC was significantly greater than for NH White across percentiles above the median with estimates ranging from 5.2 to 11.5 cm. For Mexican Americans, in all age groups, estimated WC tended to be significantly greater than for NH White particularly for middle percentiles (50th and 75th) with point estimates ranging from 1.9 to 8.4 cm. Significant disparities in WC between NH Black and Mexican women, as compared to NH White, remain even after adjustment for CRF. The magnitude of the disparities associated with race/ethnicity differs across WC percentiles and age groups.

  6. Nonuniform sampling by quantiles

    NASA Astrophysics Data System (ADS)

    Craft, D. Levi; Sonstrom, Reilly E.; Rovnyak, Virginia G.; Rovnyak, David

    2018-03-01

    A flexible strategy for choosing samples nonuniformly from a Nyquist grid using the concept of statistical quantiles is presented for broad classes of NMR experimentation. Quantile-directed scheduling is intuitive and flexible for any weighting function, promotes reproducibility and seed independence, and is generalizable to multiple dimensions. In brief, weighting functions are divided into regions of equal probability, which define the samples to be acquired. Quantile scheduling therefore achieves close adherence to a probability distribution function, thereby minimizing gaps for any given degree of subsampling of the Nyquist grid. A characteristic of quantile scheduling is that one-dimensional, weighted NUS schedules are deterministic, however higher dimensional schedules are similar within a user-specified jittering parameter. To develop unweighted sampling, we investigated the minimum jitter needed to disrupt subharmonic tracts, and show that this criterion can be met in many cases by jittering within 25-50% of the subharmonic gap. For nD-NUS, three supplemental components to choosing samples by quantiles are proposed in this work: (i) forcing the corner samples to ensure sampling to specified maximum values in indirect evolution times, (ii) providing an option to triangular backfill sampling schedules to promote dense/uniform tracts at the beginning of signal evolution periods, and (iii) providing an option to force the edges of nD-NUS schedules to be identical to the 1D quantiles. Quantile-directed scheduling meets the diverse needs of current NUS experimentation, but can also be used for future NUS implementations such as off-grid NUS and more. A computer program implementing these principles (a.k.a. QSched) in 1D- and 2D-NUS is available under the general public license.

  7. Nonuniform sampling by quantiles.

    PubMed

    Craft, D Levi; Sonstrom, Reilly E; Rovnyak, Virginia G; Rovnyak, David

    2018-03-01

    A flexible strategy for choosing samples nonuniformly from a Nyquist grid using the concept of statistical quantiles is presented for broad classes of NMR experimentation. Quantile-directed scheduling is intuitive and flexible for any weighting function, promotes reproducibility and seed independence, and is generalizable to multiple dimensions. In brief, weighting functions are divided into regions of equal probability, which define the samples to be acquired. Quantile scheduling therefore achieves close adherence to a probability distribution function, thereby minimizing gaps for any given degree of subsampling of the Nyquist grid. A characteristic of quantile scheduling is that one-dimensional, weighted NUS schedules are deterministic, however higher dimensional schedules are similar within a user-specified jittering parameter. To develop unweighted sampling, we investigated the minimum jitter needed to disrupt subharmonic tracts, and show that this criterion can be met in many cases by jittering within 25-50% of the subharmonic gap. For nD-NUS, three supplemental components to choosing samples by quantiles are proposed in this work: (i) forcing the corner samples to ensure sampling to specified maximum values in indirect evolution times, (ii) providing an option to triangular backfill sampling schedules to promote dense/uniform tracts at the beginning of signal evolution periods, and (iii) providing an option to force the edges of nD-NUS schedules to be identical to the 1D quantiles. Quantile-directed scheduling meets the diverse needs of current NUS experimentation, but can also be used for future NUS implementations such as off-grid NUS and more. A computer program implementing these principles (a.k.a. QSched) in 1D- and 2D-NUS is available under the general public license. Copyright © 2018 Elsevier Inc. All rights reserved.

  8. The heterogeneous effects of urbanization and income inequality on CO2 emissions in BRICS economies: evidence from panel quantile regression.

    PubMed

    Zhu, Huiming; Xia, Hang; Guo, Yawei; Peng, Cheng

    2018-04-12

    This paper empirically examines the effects of urbanization and income inequality on CO 2 emissions in the BRICS economies (i.e., Brazil, Russia, India, China, and South Africa) during the periods 1994-2013. The method we used is the panel quantile regression, which takes into account the unobserved individual heterogeneity and distributional heterogeneity. Our empirical results indicate that urbanization has a significant and negative impact on carbon emissions, except in the 80 th , 90 th , and 95 th quantiles. We also quantitatively investigate the direct and indirect effect of urbanization on carbon emissions, and the results show that we may underestimate urbanization's effect on carbon emissions if we ignore its indirect effect. In addition, in middle- and high-emission countries, income inequality has a significant and positive impact on carbon emissions. The results of our study indicate that in the BRICS economies, there is an inverted U-shaped environmental Kuznets curve (EKC) between the GDP per capita and carbon emissions. The conclusions of this study have important policy implications for policymakers. Policymakers should try to narrow the income gap between the rich and the poor to improve environmental quality; the BRICS economies can speed up urbanization to reduce carbon emissions, but they must improve energy efficiency and use clean energy to the greatest extent in the process.

  9. Environmental determinants of different blood lead levels in children: a quantile analysis from a nationwide survey.

    PubMed

    Etchevers, Anne; Le Tertre, Alain; Lucas, Jean-Paul; Bretin, Philippe; Oulhote, Youssef; Le Bot, Barbara; Glorennec, Philippe

    2015-01-01

    Blood lead levels (BLLs) have substantially decreased in recent decades in children in France. However, further reducing exposure is a public health goal because there is no clear toxicological threshold. The identification of the environmental determinants of BLLs as well as risk factors associated with high BLLs is important to update prevention strategies. We aimed to estimate the contribution of environmental sources of lead to different BLLs in children in France. We enrolled 484 children aged from 6months to 6years, in a nationwide cross-sectional survey in 2008-2009. We measured lead concentrations in blood and environmental samples (water, soils, household settled dusts, paints, cosmetics and traditional cookware). We performed two models: a multivariate generalized additive model on the geometric mean (GM), and a quantile regression model on the 10th, 25th, 50th, 75th and 90th quantile of BLLs. The GM of BLLs was 13.8μg/L (=1.38μg/dL) (95% confidence intervals (CI): 12.7-14.9) and the 90th quantile was 25.7μg/L (CI: 24.2-29.5). Household and common area dust, tap water, interior paint, ceramic cookware, traditional cosmetics, playground soil and dust, and environmental tobacco smoke were associated with the GM of BLLs. Household dust and tap water made the largest contributions to both the GM and the 90th quantile of BLLs. The concentration of lead in dust was positively correlated with all quantiles of BLLs even at low concentrations. Lead concentrations in tap water above 5μg/L were also positively correlated with the GM, 75th and 90th quantiles of BLLs in children drinking tap water. Preventative actions must target household settled dust and tap water to reduce the BLLs of children in France. The use of traditional cosmetics should be avoided whereas ceramic cookware should be limited to decorative purposes. Copyright © 2014 Elsevier Ltd. All rights reserved.

  10. Early Home Activities and Oral Language Skills in Middle Childhood: A Quantile Analysis

    ERIC Educational Resources Information Center

    Law, James; Rush, Robert; King, Tom; Westrupp, Elizabeth; Reilly, Sheena

    2018-01-01

    Oral language development is a key outcome of elementary school, and it is important to identify factors that predict it most effectively. Commonly researchers use ordinary least squares regression with conclusions restricted to average performance conditional on relevant covariates. Quantile regression offers a more sophisticated alternative.…

  11. High dimensional linear regression models under long memory dependence and measurement error

    NASA Astrophysics Data System (ADS)

    Kaul, Abhishek

    This dissertation consists of three chapters. The first chapter introduces the models under consideration and motivates problems of interest. A brief literature review is also provided in this chapter. The second chapter investigates the properties of Lasso under long range dependent model errors. Lasso is a computationally efficient approach to model selection and estimation, and its properties are well studied when the regression errors are independent and identically distributed. We study the case, where the regression errors form a long memory moving average process. We establish a finite sample oracle inequality for the Lasso solution. We then show the asymptotic sign consistency in this setup. These results are established in the high dimensional setup (p> n) where p can be increasing exponentially with n. Finally, we show the consistency, n½ --d-consistency of Lasso, along with the oracle property of adaptive Lasso, in the case where p is fixed. Here d is the memory parameter of the stationary error sequence. The performance of Lasso is also analysed in the present setup with a simulation study. The third chapter proposes and investigates the properties of a penalized quantile based estimator for measurement error models. Standard formulations of prediction problems in high dimension regression models assume the availability of fully observed covariates and sub-Gaussian and homogeneous model errors. This makes these methods inapplicable to measurement errors models where covariates are unobservable and observations are possibly non sub-Gaussian and heterogeneous. We propose weighted penalized corrected quantile estimators for the regression parameter vector in linear regression models with additive measurement errors, where unobservable covariates are nonrandom. The proposed estimators forgo the need for the above mentioned model assumptions. We study these estimators in both the fixed dimension and high dimensional sparse setups, in the latter setup, the

  12. A nonparametric method for assessment of interactions in a median regression model for analyzing right censored data.

    PubMed

    Lee, MinJae; Rahbar, Mohammad H; Talebi, Hooshang

    2018-01-01

    We propose a nonparametric test for interactions when we are concerned with investigation of the simultaneous effects of two or more factors in a median regression model with right censored survival data. Our approach is developed to detect interaction in special situations, when the covariates have a finite number of levels with a limited number of observations in each level, and it allows varying levels of variance and censorship at different levels of the covariates. Through simulation studies, we compare the power of detecting an interaction between the study group variable and a covariate using our proposed procedure with that of the Cox Proportional Hazard (PH) model and censored quantile regression model. We also assess the impact of censoring rate and type on the standard error of the estimators of parameters. Finally, we illustrate application of our proposed method to real life data from Prospective Observational Multicenter Major Trauma Transfusion (PROMMTT) study to test an interaction effect between type of injury and study sites using median time for a trauma patient to receive three units of red blood cells. The results from simulation studies indicate that our procedure performs better than both Cox PH model and censored quantile regression model based on statistical power for detecting the interaction, especially when the number of observations is small. It is also relatively less sensitive to censoring rates or even the presence of conditionally independent censoring that is conditional on the levels of covariates.

  13. Locally Weighted Score Estimation for Quantile Classification in Binary Regression Models

    PubMed Central

    Rice, John D.; Taylor, Jeremy M. G.

    2016-01-01

    One common use of binary response regression methods is classification based on an arbitrary probability threshold dictated by the particular application. Since this is given to us a priori, it is sensible to incorporate the threshold into our estimation procedure. Specifically, for the linear logistic model, we solve a set of locally weighted score equations, using a kernel-like weight function centered at the threshold. The bandwidth for the weight function is selected by cross validation of a novel hybrid loss function that combines classification error and a continuous measure of divergence between observed and fitted values; other possible cross-validation functions based on more common binary classification metrics are also examined. This work has much in common with robust estimation, but diers from previous approaches in this area in its focus on prediction, specifically classification into high- and low-risk groups. Simulation results are given showing the reduction in error rates that can be obtained with this method when compared with maximum likelihood estimation, especially under certain forms of model misspecification. Analysis of a melanoma data set is presented to illustrate the use of the method in practice. PMID:28018492

  14. Superquantile Regression: Theory, Algorithms, and Applications

    DTIC Science & Technology

    2014-12-01

    Example C: Stack loss data scatterplot matrix. 91 Regression α c0 caf cwt cac R̄ 2 α R̄ 2 α,Adj Least Squares NA -39.9197 0.7156 1.2953 -0.1521 0.9136...This is due to a small 92 Model Regression α c0 cwt cwt2 R̄ 2 α R̄ 2 α,Adj f2 Least Squares NA -41.9109 2.8174 — 0.7665 0.7542 Quantile 0.25 -32.0000

  15. A method to preserve trends in quantile mapping bias correction of climate modeled temperature

    NASA Astrophysics Data System (ADS)

    Grillakis, Manolis G.; Koutroulis, Aristeidis G.; Daliakopoulos, Ioannis N.; Tsanis, Ioannis K.

    2017-09-01

    Bias correction of climate variables is a standard practice in climate change impact (CCI) studies. Various methodologies have been developed within the framework of quantile mapping. However, it is well known that quantile mapping may significantly modify the long-term statistics due to the time dependency of the temperature bias. Here, a method to overcome this issue without compromising the day-to-day correction statistics is presented. The methodology separates the modeled temperature signal into a normalized and a residual component relative to the modeled reference period climatology, in order to adjust the biases only for the former and preserve the signal of the later. The results show that this method allows for the preservation of the originally modeled long-term signal in the mean, the standard deviation and higher and lower percentiles of temperature. To illustrate the improvements, the methodology is tested on daily time series obtained from five Euro CORDEX regional climate models (RCMs).

  16. Quantile equivalence to evaluate compliance with habitat management objectives

    USGS Publications Warehouse

    Cade, Brian S.; Johnson, Pamela R.

    2011-01-01

    Equivalence estimated with linear quantile regression was used to evaluate compliance with habitat management objectives at Arapaho National Wildlife Refuge based on monitoring data collected in upland (5,781 ha; n = 511 transects) and riparian and meadow (2,856 ha, n = 389 transects) habitats from 2005 to 2008. Quantiles were used because the management objectives specified proportions of the habitat area that needed to comply with vegetation criteria. The linear model was used to obtain estimates that were averaged across 4 y. The equivalence testing framework allowed us to interpret confidence intervals for estimated proportions with respect to intervals of vegetative criteria (equivalence regions) in either a liberal, benefit-of-doubt or conservative, fail-safe approach associated with minimizing alternative risks. Simple Boolean conditional arguments were used to combine the quantile equivalence results for individual vegetation components into a joint statement for the multivariable management objectives. For example, management objective 2A required at least 809 ha of upland habitat with a shrub composition ≥0.70 sagebrush (Artemisia spp.), 20–30% canopy cover of sagebrush ≥25 cm in height, ≥20% canopy cover of grasses, and ≥10% canopy cover of forbs on average over 4 y. Shrub composition and canopy cover of grass each were readily met on >3,000 ha under either conservative or liberal interpretations of sampling variability. However, there were only 809–1,214 ha (conservative to liberal) with ≥10% forb canopy cover and 405–1,098 ha with 20–30%canopy cover of sagebrush ≥25 cm in height. Only 91–180 ha of uplands simultaneously met criteria for all four components, primarily because canopy cover of sagebrush and forbs was inversely related when considered at the spatial scale (30 m) of a sample transect. We demonstrate how the quantile equivalence analyses also can help refine the numerical specification of habitat objectives and explore

  17. Can quantile mapping improve precipitation extremes from regional climate models?

    NASA Astrophysics Data System (ADS)

    Tani, Satyanarayana; Gobiet, Andreas

    2015-04-01

    The ability of quantile mapping to accurately bias correct regard to precipitation extremes is investigated in this study. We developed new methods by extending standard quantile mapping (QMα) to improve the quality of bias corrected extreme precipitation events as simulated by regional climate model (RCM) output. The new QM version (QMβ) was developed by combining parametric and nonparametric bias correction methods. The new nonparametric method is tested with and without a controlling shape parameter (Qmβ1 and Qmβ0, respectively). Bias corrections are applied on hindcast simulations for a small ensemble of RCMs at six different locations over Europe. We examined the quality of the extremes through split sample and cross validation approaches of these three bias correction methods. This split-sample approach mimics the application to future climate scenarios. A cross validation framework with particular focus on new extremes was developed. Error characteristics, q-q plots and Mean Absolute Error (MAEx) skill scores are used for evaluation. We demonstrate the unstable behaviour of correction function at higher quantiles with QMα, whereas the correction functions with for QMβ0 and QMβ1 are smoother, with QMβ1 providing the most reasonable correction values. The result from q-q plots demonstrates that, all bias correction methods are capable of producing new extremes but QMβ1 reproduces new extremes with low biases in all seasons compared to QMα, QMβ0. Our results clearly demonstrate the inherent limitations of empirical bias correction methods employed for extremes, particularly new extremes, and our findings reveals that the new bias correction method (Qmß1) produces more reliable climate scenarios for new extremes. These findings present a methodology that can better capture future extreme precipitation events, which is necessary to improve regional climate change impact studies.

  18. Improving Global Forecast System of extreme precipitation events with regional statistical model: Application of quantile-based probabilistic forecasts

    NASA Astrophysics Data System (ADS)

    Shastri, Hiteshri; Ghosh, Subimal; Karmakar, Subhankar

    2017-02-01

    Forecasting of extreme precipitation events at a regional scale is of high importance due to their severe impacts on society. The impacts are stronger in urban regions due to high flood potential as well high population density leading to high vulnerability. Although significant scientific improvements took place in the global models for weather forecasting, they are still not adequate at a regional scale (e.g., for an urban region) with high false alarms and low detection. There has been a need to improve the weather forecast skill at a local scale with probabilistic outcome. Here we develop a methodology with quantile regression, where the reliably simulated variables from Global Forecast System are used as predictors and different quantiles of rainfall are generated corresponding to that set of predictors. We apply this method to a flood-prone coastal city of India, Mumbai, which has experienced severe floods in recent years. We find significant improvements in the forecast with high detection and skill scores. We apply the methodology to 10 ensemble members of Global Ensemble Forecast System and find a reduction in ensemble uncertainty of precipitation across realizations with respect to that of original precipitation forecasts. We validate our model for the monsoon season of 2006 and 2007, which are independent of the training/calibration data set used in the study. We find promising results and emphasize to implement such data-driven methods for a better probabilistic forecast at an urban scale primarily for an early flood warning.

  19. Performance and robustness of probabilistic river forecasts computed with quantile regression based on multiple independent variables in the North Central USA

    NASA Astrophysics Data System (ADS)

    Hoss, F.; Fischbeck, P. S.

    2014-10-01

    This study further develops the method of quantile regression (QR) to predict exceedance probabilities of flood stages by post-processing forecasts. Using data from the 82 river gages, for which the National Weather Service's North Central River Forecast Center issues forecasts daily, this is the first QR application to US American river gages. Archived forecasts for lead times up to six days from 2001-2013 were analyzed. Earlier implementations of QR used the forecast itself as the only independent variable (Weerts et al., 2011; López López et al., 2014). This study adds the rise rate of the river stage in the last 24 and 48 h and the forecast error 24 and 48 h ago to the QR model. Including those four variables significantly improved the forecasts, as measured by the Brier Skill Score (BSS). Mainly, the resolution increases, as the original QR implementation already delivered high reliability. Combining the forecast with the other four variables results in much less favorable BSSs. Lastly, the forecast performance does not depend on the size of the training dataset, but on the year, the river gage, lead time and event threshold that are being forecast. We find that each event threshold requires a separate model configuration or at least calibration.

  20. Factors Associated with Adherence to Adjuvant Endocrine Therapy Among Privately Insured and Newly Diagnosed Breast Cancer Patients: A Quantile Regression Analysis.

    PubMed

    Farias, Albert J; Hansen, Ryan N; Zeliadt, Steven B; Ornelas, India J; Li, Christopher I; Thompson, Beti

    2016-08-01

    Adherence to adjuvant endocrine therapy (AET) for estrogen receptor-positive breast cancer remains suboptimal, which suggests that women are not getting the full benefit of the treatment to reduce breast cancer recurrence and mortality. The majority of studies on adherence to AET focus on identifying factors among those women at the highest levels of adherence and provide little insight on factors that influence medication use across the distribution of adherence. To understand how factors influence adherence among women across low and high levels of adherence. A retrospective evaluation was conducted using the Truven Health MarketScan Commercial Claims and Encounters Database from 2007-2011. Privately insured women aged 18-64 years who were recently diagnosed and treated for breast cancer and who initiated AET within 12 months of primary treatment were assessed. Adherence was measured as the proportion of days covered (PDC) over a 12-month period. Simultaneous multivariable quantile regression was used to assess the association between treatment and demographic factors, use of mail order pharmacies, medication switching, and out-of-pocket costs and adherence. The effect of each variable was examined at the 40th, 60th, 80th, and 95th quantiles. Among the 6,863 women in the cohort, mail order pharmacies had the greatest influence on adherence at the 40th quantile, associated with a 29.6% (95% CI = 22.2-37.0) higher PDC compared with retail pharmacies. Out-of-pocket cost for a 30-day supply of AET greater than $20 was associated with an 8.6% (95% CI = 2.8-14.4) lower PDC versus $0-$9.99. The main factors that influenced adherence at the 95th quantile were mail order pharmacies, associated with a 4.4% higher PDC (95% CI = 3.8-5.0) versus retail pharmacies, and switching AET medication 2 or more times, associated with a 5.6% lower PDC versus not switching (95% CI = 2.3-9.0). Factors associated with adherence differed across quantiles. Addressing the use of mail order

  1. Smooth conditional distribution function and quantiles under random censorship.

    PubMed

    Leconte, Eve; Poiraud-Casanova, Sandrine; Thomas-Agnan, Christine

    2002-09-01

    We consider a nonparametric random design regression model in which the response variable is possibly right censored. The aim of this paper is to estimate the conditional distribution function and the conditional alpha-quantile of the response variable. We restrict attention to the case where the response variable as well as the explanatory variable are unidimensional and continuous. We propose and discuss two classes of estimators which are smooth with respect to the response variable as well as to the covariate. Some simulations demonstrate that the new methods have better mean square error performances than the generalized Kaplan-Meier estimator introduced by Beran (1981) and considered in the literature by Dabrowska (1989, 1992) and Gonzalez-Manteiga and Cadarso-Suarez (1994).

  2. Robust small area estimation of poverty indicators using M-quantile approach (Case study: Sub-district level in Bogor district)

    NASA Astrophysics Data System (ADS)

    Girinoto, Sadik, Kusman; Indahwati

    2017-03-01

    The National Socio-Economic Survey samples are designed to produce estimates of parameters of planned domains (provinces and districts). The estimation of unplanned domains (sub-districts and villages) has its limitation to obtain reliable direct estimates. One of the possible solutions to overcome this problem is employing small area estimation techniques. The popular choice of small area estimation is based on linear mixed models. However, such models need strong distributional assumptions and do not easy allow for outlier-robust estimation. As an alternative approach for this purpose, M-quantile regression approach to small area estimation based on modeling specific M-quantile coefficients of conditional distribution of study variable given auxiliary covariates. It obtained outlier-robust estimation from influence function of M-estimator type and also no need strong distributional assumptions. In this paper, the aim of study is to estimate the poverty indicator at sub-district level in Bogor District-West Java using M-quantile models for small area estimation. Using data taken from National Socioeconomic Survey and Villages Potential Statistics, the results provide a detailed description of pattern of incidence and intensity of poverty within Bogor district. We also compare the results with direct estimates. The results showed the framework may be preferable when direct estimate having no incidence of poverty at all in the small area.

  3. Multivariate quantile mapping bias correction: an N-dimensional probability density function transform for climate model simulations of multiple variables

    NASA Astrophysics Data System (ADS)

    Cannon, Alex J.

    2018-01-01

    Most bias correction algorithms used in climatology, for example quantile mapping, are applied to univariate time series. They neglect the dependence between different variables. Those that are multivariate often correct only limited measures of joint dependence, such as Pearson or Spearman rank correlation. Here, an image processing technique designed to transfer colour information from one image to another—the N-dimensional probability density function transform—is adapted for use as a multivariate bias correction algorithm (MBCn) for climate model projections/predictions of multiple climate variables. MBCn is a multivariate generalization of quantile mapping that transfers all aspects of an observed continuous multivariate distribution to the corresponding multivariate distribution of variables from a climate model. When applied to climate model projections, changes in quantiles of each variable between the historical and projection period are also preserved. The MBCn algorithm is demonstrated on three case studies. First, the method is applied to an image processing example with characteristics that mimic a climate projection problem. Second, MBCn is used to correct a suite of 3-hourly surface meteorological variables from the Canadian Centre for Climate Modelling and Analysis Regional Climate Model (CanRCM4) across a North American domain. Components of the Canadian Forest Fire Weather Index (FWI) System, a complicated set of multivariate indices that characterizes the risk of wildfire, are then calculated and verified against observed values. Third, MBCn is used to correct biases in the spatial dependence structure of CanRCM4 precipitation fields. Results are compared against a univariate quantile mapping algorithm, which neglects the dependence between variables, and two multivariate bias correction algorithms, each of which corrects a different form of inter-variable correlation structure. MBCn outperforms these alternatives, often by a large margin

  4. Quantile regression of microgeographic variation in population characteristics of an invasive vertebrate predator

    USGS Publications Warehouse

    Siers, Shane R.; Savidge, Julie A.; Reed, Robert

    2017-01-01

    Localized ecological conditions have the potential to induce variation in population characteristics such as size distributions and body conditions. The ability to generalize the influence of ecological characteristics on such population traits may be particularly meaningful when those traits influence prospects for successful management interventions. To characterize variability in invasive Brown Treesnake population attributes within and among habitat types, we conducted systematic and seasonally-balanced surveys, collecting 100 snakes from each of 18 sites: three replicates within each of six major habitat types comprising 95% of Guam’s geographic expanse. Our study constitutes one of the most comprehensive and controlled samplings of any published snake study. Quantile regression on snake size and body condition indicated significant ecological heterogeneity, with a general trend of relative consistency of size classes and body conditions within and among scrub and Leucaena forest habitat types and more heterogeneity among ravine forest, savanna, and urban residential sites. Larger and more robust snakes were found within some savanna and urban habitat replicates, likely due to relative availability of larger prey. Compared to more homogeneous samples in the wet season, variability in size distributions and body conditions was greater during the dry season. Although there is evidence of habitat influencing Brown Treesnake populations at localized scales (e.g., the higher prevalence of larger snakes—particularly males—in savanna and urban sites), the level of variability among sites within habitat types indicates little ability to make meaningful predictions about these traits at unsampled locations. Seasonal variability within sites and habitats indicates that localized population characterization should include sampling in both wet and dry seasons. Extreme values at single replicates occasionally influenced overall habitat patterns, while pooling

  5. Quantile regression of microgeographic variation in population characteristics of an invasive vertebrate predator

    PubMed Central

    Siers, Shane R.; Savidge, Julie A.; Reed, Robert N.

    2017-01-01

    Localized ecological conditions have the potential to induce variation in population characteristics such as size distributions and body conditions. The ability to generalize the influence of ecological characteristics on such population traits may be particularly meaningful when those traits influence prospects for successful management interventions. To characterize variability in invasive Brown Treesnake population attributes within and among habitat types, we conducted systematic and seasonally-balanced surveys, collecting 100 snakes from each of 18 sites: three replicates within each of six major habitat types comprising 95% of Guam’s geographic expanse. Our study constitutes one of the most comprehensive and controlled samplings of any published snake study. Quantile regression on snake size and body condition indicated significant ecological heterogeneity, with a general trend of relative consistency of size classes and body conditions within and among scrub and Leucaena forest habitat types and more heterogeneity among ravine forest, savanna, and urban residential sites. Larger and more robust snakes were found within some savanna and urban habitat replicates, likely due to relative availability of larger prey. Compared to more homogeneous samples in the wet season, variability in size distributions and body conditions was greater during the dry season. Although there is evidence of habitat influencing Brown Treesnake populations at localized scales (e.g., the higher prevalence of larger snakes—particularly males—in savanna and urban sites), the level of variability among sites within habitat types indicates little ability to make meaningful predictions about these traits at unsampled locations. Seasonal variability within sites and habitats indicates that localized population characterization should include sampling in both wet and dry seasons. Extreme values at single replicates occasionally influenced overall habitat patterns, while pooling

  6. Quantile regression of microgeographic variation in population characteristics of an invasive vertebrate predator.

    PubMed

    Siers, Shane R; Savidge, Julie A; Reed, Robert N

    2017-01-01

    Localized ecological conditions have the potential to induce variation in population characteristics such as size distributions and body conditions. The ability to generalize the influence of ecological characteristics on such population traits may be particularly meaningful when those traits influence prospects for successful management interventions. To characterize variability in invasive Brown Treesnake population attributes within and among habitat types, we conducted systematic and seasonally-balanced surveys, collecting 100 snakes from each of 18 sites: three replicates within each of six major habitat types comprising 95% of Guam's geographic expanse. Our study constitutes one of the most comprehensive and controlled samplings of any published snake study. Quantile regression on snake size and body condition indicated significant ecological heterogeneity, with a general trend of relative consistency of size classes and body conditions within and among scrub and Leucaena forest habitat types and more heterogeneity among ravine forest, savanna, and urban residential sites. Larger and more robust snakes were found within some savanna and urban habitat replicates, likely due to relative availability of larger prey. Compared to more homogeneous samples in the wet season, variability in size distributions and body conditions was greater during the dry season. Although there is evidence of habitat influencing Brown Treesnake populations at localized scales (e.g., the higher prevalence of larger snakes-particularly males-in savanna and urban sites), the level of variability among sites within habitat types indicates little ability to make meaningful predictions about these traits at unsampled locations. Seasonal variability within sites and habitats indicates that localized population characterization should include sampling in both wet and dry seasons. Extreme values at single replicates occasionally influenced overall habitat patterns, while pooling replicates

  7. Removing Batch Effects from Longitudinal Gene Expression - Quantile Normalization Plus ComBat as Best Approach for Microarray Transcriptome Data

    PubMed Central

    Müller, Christian; Schillert, Arne; Röthemeier, Caroline; Trégouët, David-Alexandre; Proust, Carole; Binder, Harald; Pfeiffer, Norbert; Beutel, Manfred; Lackner, Karl J.; Schnabel, Renate B.; Tiret, Laurence; Wild, Philipp S.; Blankenberg, Stefan

    2016-01-01

    Technical variation plays an important role in microarray-based gene expression studies, and batch effects explain a large proportion of this noise. It is therefore mandatory to eliminate technical variation while maintaining biological variability. Several strategies have been proposed for the removal of batch effects, although they have not been evaluated in large-scale longitudinal gene expression data. In this study, we aimed at identifying a suitable method for batch effect removal in a large study of microarray-based longitudinal gene expression. Monocytic gene expression was measured in 1092 participants of the Gutenberg Health Study at baseline and 5-year follow up. Replicates of selected samples were measured at both time points to identify technical variability. Deming regression, Passing-Bablok regression, linear mixed models, non-linear models as well as ReplicateRUV and ComBat were applied to eliminate batch effects between replicates. In a second step, quantile normalization prior to batch effect correction was performed for each method. Technical variation between batches was evaluated by principal component analysis. Associations between body mass index and transcriptomes were calculated before and after batch removal. Results from association analyses were compared to evaluate maintenance of biological variability. Quantile normalization, separately performed in each batch, combined with ComBat successfully reduced batch effects and maintained biological variability. ReplicateRUV performed perfectly in the replicate data subset of the study, but failed when applied to all samples. All other methods did not substantially reduce batch effects in the replicate data subset. Quantile normalization plus ComBat appears to be a valuable approach for batch correction in longitudinal gene expression data. PMID:27272489

  8. Restoration of Monotonicity Respecting in Dynamic Regression

    PubMed Central

    Huang, Yijian

    2017-01-01

    Dynamic regression models, including the quantile regression model and Aalen’s additive hazards model, are widely adopted to investigate evolving covariate effects. Yet lack of monotonicity respecting with standard estimation procedures remains an outstanding issue. Advances have recently been made, but none provides a complete resolution. In this article, we propose a novel adaptive interpolation method to restore monotonicity respecting, by successively identifying and then interpolating nearest monotonicity-respecting points of an original estimator. Under mild regularity conditions, the resulting regression coefficient estimator is shown to be asymptotically equivalent to the original. Our numerical studies have demonstrated that the proposed estimator is much more smooth and may have better finite-sample efficiency than the original as well as, when available as only in special cases, other competing monotonicity-respecting estimators. Illustration with a clinical study is provided. PMID:29430068

  9. Development of Growth Charts of Pakistani Children Aged 4-15 Years Using Quantile Regression: A Cross-sectional Study

    PubMed Central

    Khan, Nazeer; Siddiqui, Junaid S; Baig-Ansari, Naila

    2018-01-01

    Background Growth charts are essential tools used by pediatricians as well as public health researchers in assessing and monitoring the well-being of pediatric populations. Development of these growth charts, especially for children above five years of age, is challenging and requires current anthropometric data and advanced statistical analysis. These growth charts are generally presented as a series of smooth centile curves. A number of modeling approaches are available for generating growth charts and applying these on national datasets is important for generating country-specific reference growth charts. Objective To demonstrate that quantile regression (QR) as a viable statistical approach to construct growth reference charts and to assess the applicability of the World Health Organization (WHO) 2007 growth standards to a large Pakistani population of school-going children. Methodology This is a secondary data analysis using anthropometric data of 9,515 students from a Pakistani survey conducted between 2007 and 2014 in four cities of Pakistan. Growth reference charts were created using QR as well as the LMS (Box-Cox transformation (L), the median (M), and the generalized coefficient of variation (S)) method and then compared with WHO 2007 growth standards. Results Centile values estimated by the LMS method and QR procedure had few differences. The centile values attained from QR procedure of BMI-for-age, weight-for-age, and height-for-age of Pakistani children were lower than the standard WHO 2007 centile. Conclusion QR should be considered as an alternative method to develop growth charts for its simplicity and lack of necessity to transform data. WHO 2007 standards are not suitable for Pakistani children. PMID:29632748

  10. Development of Growth Charts of Pakistani Children Aged 4-15 Years Using Quantile Regression: A Cross-sectional Study.

    PubMed

    Iftikhar, Sundus; Khan, Nazeer; Siddiqui, Junaid S; Baig-Ansari, Naila

    2018-02-02

    Background Growth charts are essential tools used by pediatricians as well as public health researchers in assessing and monitoring the well-being of pediatric populations. Development of these growth charts, especially for children above five years of age, is challenging and requires current anthropometric data and advanced statistical analysis. These growth charts are generally presented as a series of smooth centile curves. A number of modeling approaches are available for generating growth charts and applying these on national datasets is important for generating country-specific reference growth charts. Objective To demonstrate that quantile regression (QR) as a viable statistical approach to construct growth reference charts and to assess the applicability of the World Health Organization (WHO) 2007 growth standards to a large Pakistani population of school-going children. Methodology This is a secondary data analysis using anthropometric data of 9,515 students from a Pakistani survey conducted between 2007 and 2014 in four cities of Pakistan. Growth reference charts were created using QR as well as the LMS (Box-Cox transformation (L), the median (M), and the generalized coefficient of variation (S)) method and then compared with WHO 2007 growth standards. Results Centile values estimated by the LMS method and QR procedure had few differences. The centile values attained from QR procedure of BMI-for-age, weight-for-age, and height-for-age of Pakistani children were lower than the standard WHO 2007 centile. Conclusion QR should be considered as an alternative method to develop growth charts for its simplicity and lack of necessity to transform data. WHO 2007 standards are not suitable for Pakistani children.

  11. Smooth quantile normalization.

    PubMed

    Hicks, Stephanie C; Okrah, Kwame; Paulson, Joseph N; Quackenbush, John; Irizarry, Rafael A; Bravo, Héctor Corrada

    2018-04-01

    Between-sample normalization is a critical step in genomic data analysis to remove systematic bias and unwanted technical variation in high-throughput data. Global normalization methods are based on the assumption that observed variability in global properties is due to technical reasons and are unrelated to the biology of interest. For example, some methods correct for differences in sequencing read counts by scaling features to have similar median values across samples, but these fail to reduce other forms of unwanted technical variation. Methods such as quantile normalization transform the statistical distributions across samples to be the same and assume global differences in the distribution are induced by only technical variation. However, it remains unclear how to proceed with normalization if these assumptions are violated, for example, if there are global differences in the statistical distributions between biological conditions or groups, and external information, such as negative or control features, is not available. Here, we introduce a generalization of quantile normalization, referred to as smooth quantile normalization (qsmooth), which is based on the assumption that the statistical distribution of each sample should be the same (or have the same distributional shape) within biological groups or conditions, but allowing that they may differ between groups. We illustrate the advantages of our method on several high-throughput datasets with global differences in distributions corresponding to different biological conditions. We also perform a Monte Carlo simulation study to illustrate the bias-variance tradeoff and root mean squared error of qsmooth compared to other global normalization methods. A software implementation is available from https://github.com/stephaniehicks/qsmooth.

  12. Quantile-Specific Penetrance of Genes Affecting Lipoproteins, Adiposity and Height

    PubMed Central

    Williams, Paul T.

    2012-01-01

    Quantile-dependent penetrance is proposed to occur when the phenotypic expression of a SNP depends upon the population percentile of the phenotype. To illustrate the phenomenon, quantiles of height, body mass index (BMI), and plasma lipids and lipoproteins were compared to genetic risk scores (GRS) derived from single nucleotide polymorphisms (SNP)s having established genome-wide significance: 180 SNPs for height, 32 for BMI, 37 for low-density lipoprotein (LDL)-cholesterol, 47 for high-density lipoprotein (HDL)-cholesterol, 52 for total cholesterol, and 31 for triglycerides in 1930 subjects. Both phenotypes and GRSs were adjusted for sex, age, study, and smoking status. Quantile regression showed that the slope of the genotype-phenotype relationships increased with the percentile of BMI (P = 0.002), LDL-cholesterol (P = 3×10−8), HDL-cholesterol (P = 5×10−6), total cholesterol (P = 2.5×10−6), and triglyceride distribution (P = 7.5×10−6), but not height (P = 0.09). Compared to a GRS's phenotypic effect at the 10th population percentile, its effect at the 90th percentile was 4.2-fold greater for BMI, 4.9-fold greater for LDL-cholesterol, 1.9-fold greater for HDL-cholesterol, 3.1-fold greater for total cholesterol, and 3.3-fold greater for triglycerides. Moreover, the effect of the rs1558902 (FTO) risk allele was 6.7-fold greater at the 90th than the 10th percentile of the BMI distribution, and that of the rs3764261 (CETP) risk allele was 2.4-fold greater at the 90th than the 10th percentile of the HDL-cholesterol distribution. Conceptually, it maybe useful to distinguish environmental effects on the phenotype that in turn alters a gene's phenotypic expression (quantile-dependent penetrance) from environmental effects affecting the gene's phenotypic expression directly (gene-environment interaction). PMID:22235250

  13. Multi-element stochastic spectral projection for high quantile estimation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ko, Jordan, E-mail: jordan.ko@mac.com; Garnier, Josselin

    2013-06-15

    We investigate quantile estimation by multi-element generalized Polynomial Chaos (gPC) metamodel where the exact numerical model is approximated by complementary metamodels in overlapping domains that mimic the model’s exact response. The gPC metamodel is constructed by the non-intrusive stochastic spectral projection approach and function evaluation on the gPC metamodel can be considered as essentially free. Thus, large number of Monte Carlo samples from the metamodel can be used to estimate α-quantile, for moderate values of α. As the gPC metamodel is an expansion about the means of the inputs, its accuracy may worsen away from these mean values where themore » extreme events may occur. By increasing the approximation accuracy of the metamodel, we may eventually improve accuracy of quantile estimation but it is very expensive. A multi-element approach is therefore proposed by combining a global metamodel in the standard normal space with supplementary local metamodels constructed in bounded domains about the design points corresponding to the extreme events. To improve the accuracy and to minimize the sampling cost, sparse-tensor and anisotropic-tensor quadratures are tested in addition to the full-tensor Gauss quadrature in the construction of local metamodels; different bounds of the gPC expansion are also examined. The global and local metamodels are combined in the multi-element gPC (MEgPC) approach and it is shown that MEgPC can be more accurate than Monte Carlo or importance sampling methods for high quantile estimations for input dimensions roughly below N=8, a limit that is very much case- and α-dependent.« less

  14. The Applicability of Confidence Intervals of Quantiles for the Generalized Logistic Distribution

    NASA Astrophysics Data System (ADS)

    Shin, H.; Heo, J.; Kim, T.; Jung, Y.

    2007-12-01

    The generalized logistic (GL) distribution has been widely used for frequency analysis. However, there is a little study related to the confidence intervals that indicate the prediction accuracy of distribution for the GL distribution. In this paper, the estimation of the confidence intervals of quantiles for the GL distribution is presented based on the method of moments (MOM), maximum likelihood (ML), and probability weighted moments (PWM) and the asymptotic variances of each quantile estimator are derived as functions of the sample sizes, return periods, and parameters. Monte Carlo simulation experiments are also performed to verify the applicability of the derived confidence intervals of quantile. As the results, the relative bias (RBIAS) and relative root mean square error (RRMSE) of the confidence intervals generally increase as return period increases and reverse as sample size increases. And PWM for estimating the confidence intervals performs better than the other methods in terms of RRMSE when the data is almost symmetric while ML shows the smallest RBIAS and RRMSE when the data is more skewed and sample size is moderately large. The GL model was applied to fit the distribution of annual maximum rainfall data. The results show that there are little differences in the estimated quantiles between ML and PWM while distinct differences in MOM.

  15. Removing technical variability in RNA-seq data using conditional quantile normalization.

    PubMed

    Hansen, Kasper D; Irizarry, Rafael A; Wu, Zhijin

    2012-04-01

    The ability to measure gene expression on a genome-wide scale is one of the most promising accomplishments in molecular biology. Microarrays, the technology that first permitted this, were riddled with problems due to unwanted sources of variability. Many of these problems are now mitigated, after a decade's worth of statistical methodology development. The recently developed RNA sequencing (RNA-seq) technology has generated much excitement in part due to claims of reduced variability in comparison to microarrays. However, we show that RNA-seq data demonstrate unwanted and obscuring variability similar to what was first observed in microarrays. In particular, we find guanine-cytosine content (GC-content) has a strong sample-specific effect on gene expression measurements that, if left uncorrected, leads to false positives in downstream results. We also report on commonly observed data distortions that demonstrate the need for data normalization. Here, we describe a statistical methodology that improves precision by 42% without loss of accuracy. Our resulting conditional quantile normalization algorithm combines robust generalized regression to remove systematic bias introduced by deterministic features such as GC-content and quantile normalization to correct for global distortions.

  16. Quantile-based Bayesian maximum entropy approach for spatiotemporal modeling of ambient air quality levels.

    PubMed

    Yu, Hwa-Lung; Wang, Chih-Hsin

    2013-02-05

    Understanding the daily changes in ambient air quality concentrations is important to the assessing human exposure and environmental health. However, the fine temporal scales (e.g., hourly) involved in this assessment often lead to high variability in air quality concentrations. This is because of the complex short-term physical and chemical mechanisms among the pollutants. Consequently, high heterogeneity is usually present in not only the averaged pollution levels, but also the intraday variance levels of the daily observations of ambient concentration across space and time. This characteristic decreases the estimation performance of common techniques. This study proposes a novel quantile-based Bayesian maximum entropy (QBME) method to account for the nonstationary and nonhomogeneous characteristics of ambient air pollution dynamics. The QBME method characterizes the spatiotemporal dependence among the ambient air quality levels based on their location-specific quantiles and accounts for spatiotemporal variations using a local weighted smoothing technique. The epistemic framework of the QBME method can allow researchers to further consider the uncertainty of space-time observations. This study presents the spatiotemporal modeling of daily CO and PM10 concentrations across Taiwan from 1998 to 2009 using the QBME method. Results show that the QBME method can effectively improve estimation accuracy in terms of lower mean absolute errors and standard deviations over space and time, especially for pollutants with strong nonhomogeneous variances across space. In addition, the epistemic framework can allow researchers to assimilate the site-specific secondary information where the observations are absent because of the common preferential sampling issues of environmental data. The proposed QBME method provides a practical and powerful framework for the spatiotemporal modeling of ambient pollutants.

  17. Parameter Heterogeneity In Breast Cancer Cost Regressions – Evidence From Five European Countries

    PubMed Central

    Banks, Helen; Campbell, Harry; Douglas, Anne; Fletcher, Eilidh; McCallum, Alison; Moger, Tron Anders; Peltola, Mikko; Sveréus, Sofia; Wild, Sarah; Williams, Linda J.; Forbes, John

    2015-01-01

    Abstract We investigate parameter heterogeneity in breast cancer 1‐year cumulative hospital costs across five European countries as part of the EuroHOPE project. The paper aims to explore whether conditional mean effects provide a suitable representation of the national variation in hospital costs. A cohort of patients with a primary diagnosis of invasive breast cancer (ICD‐9 codes 174 and ICD‐10 C50 codes) is derived using routinely collected individual breast cancer data from Finland, the metropolitan area of Turin (Italy), Norway, Scotland and Sweden. Conditional mean effects are estimated by ordinary least squares for each country, and quantile regressions are used to explore heterogeneity across the conditional quantile distribution. Point estimates based on conditional mean effects provide a good approximation of treatment response for some key demographic and diagnostic specific variables (e.g. age and ICD‐10 diagnosis) across the conditional quantile distribution. For many policy variables of interest, however, there is considerable evidence of parameter heterogeneity that is concealed if decisions are based solely on conditional mean results. The use of quantile regression methods reinforce the need to consider beyond an average effect given the greater recognition that breast cancer is a complex disease reflecting patient heterogeneity. © 2015 The Authors. Health Economics Published by John Wiley & Sons Ltd. PMID:26633866

  18. Modified Regression Correlation Coefficient for Poisson Regression Model

    NASA Astrophysics Data System (ADS)

    Kaengthong, Nattacha; Domthong, Uthumporn

    2017-09-01

    This study gives attention to indicators in predictive power of the Generalized Linear Model (GLM) which are widely used; however, often having some restrictions. We are interested in regression correlation coefficient for a Poisson regression model. This is a measure of predictive power, and defined by the relationship between the dependent variable (Y) and the expected value of the dependent variable given the independent variables [E(Y|X)] for the Poisson regression model. The dependent variable is distributed as Poisson. The purpose of this research was modifying regression correlation coefficient for Poisson regression model. We also compare the proposed modified regression correlation coefficient with the traditional regression correlation coefficient in the case of two or more independent variables, and having multicollinearity in independent variables. The result shows that the proposed regression correlation coefficient is better than the traditional regression correlation coefficient based on Bias and the Root Mean Square Error (RMSE).

  19. Intersection of All Top Quantile

    EPA Pesticide Factsheets

    This layer combines the Top quantiles of the CES, CEVA, and EJSM layers so that viewers can see the overlap of 00e2??hot spots00e2?? for each method. This layer was created by James Sadd of Occidental College of Los Angeles

  20. Parametric regression model for survival data: Weibull regression model as an example

    PubMed Central

    2016-01-01

    Weibull regression model is one of the most popular forms of parametric regression model that it provides estimate of baseline hazard function, as well as coefficients for covariates. Because of technical difficulties, Weibull regression model is seldom used in medical literature as compared to the semi-parametric proportional hazard model. To make clinical investigators familiar with Weibull regression model, this article introduces some basic knowledge on Weibull regression model and then illustrates how to fit the model with R software. The SurvRegCensCov package is useful in converting estimated coefficients to clinical relevant statistics such as hazard ratio (HR) and event time ratio (ETR). Model adequacy can be assessed by inspecting Kaplan-Meier curves stratified by categorical variable. The eha package provides an alternative method to model Weibull regression model. The check.dist() function helps to assess goodness-of-fit of the model. Variable selection is based on the importance of a covariate, which can be tested using anova() function. Alternatively, backward elimination starting from a full model is an efficient way for model development. Visualization of Weibull regression model after model development is interesting that it provides another way to report your findings. PMID:28149846

  1. Regional flow duration curves: Geostatistical techniques versus multivariate regression

    USGS Publications Warehouse

    Pugliese, Alessio; Farmer, William H.; Castellarin, Attilio; Archfield, Stacey A.; Vogel, Richard M.

    2016-01-01

    A period-of-record flow duration curve (FDC) represents the relationship between the magnitude and frequency of daily streamflows. Prediction of FDCs is of great importance for locations characterized by sparse or missing streamflow observations. We present a detailed comparison of two methods which are capable of predicting an FDC at ungauged basins: (1) an adaptation of the geostatistical method, Top-kriging, employing a linear weighted average of dimensionless empirical FDCs, standardised with a reference streamflow value; and (2) regional multiple linear regression of streamflow quantiles, perhaps the most common method for the prediction of FDCs at ungauged sites. In particular, Top-kriging relies on a metric for expressing the similarity between catchments computed as the negative deviation of the FDC from a reference streamflow value, which we termed total negative deviation (TND). Comparisons of these two methods are made in 182 largely unregulated river catchments in the southeastern U.S. using a three-fold cross-validation algorithm. Our results reveal that the two methods perform similarly throughout flow-regimes, with average Nash-Sutcliffe Efficiencies 0.566 and 0.662, (0.883 and 0.829 on log-transformed quantiles) for the geostatistical and the linear regression models, respectively. The differences between the reproduction of FDC's occurred mostly for low flows with exceedance probability (i.e. duration) above 0.98.

  2. Association between Physical Activity and Teacher-Reported Academic Performance among Fifth-Graders in Shanghai: A Quantile Regression

    PubMed Central

    Zhang, Yunting; Zhang, Donglan; Jiang, Yanrui; Sun, Wanqi; Wang, Yan; Chen, Wenjuan; Li, Shenghui; Shi, Lu; Shen, Xiaoming; Zhang, Jun; Jiang, Fan

    2015-01-01

    Introduction A growing body of literature reveals the causal pathways between physical activity and brain function, indicating that increasing physical activity among children could improve rather than undermine their scholastic performance. However, past studies of physical activity and scholastic performance among students often relied on parent-reported grade information, and did not explore whether the association varied among different levels of scholastic performance. Our study among fifth-grade students in Shanghai sought to determine the association between regular physical activity and teacher-reported academic performance scores (APS), with special attention to the differential associational patterns across different strata of scholastic performance. Method A total of 2,225 students were chosen through a stratified random sampling, and a complete sample of 1470 observations were used for analysis. We used a quantile regression analysis to explore whether the association between physical activity and teacher-reported APS differs by distribution of APS. Results Minimal-intensity physical activity such as walking was positively associated with academic performance scores (β = 0.13, SE = 0.04). The magnitude of the association tends to be larger at the lower end of the APS distribution (β = 0.24, SE = 0.08) than in the higher end of the distribution (β = 0.00, SE = 0.07). Conclusion Based upon teacher-reported student academic performance, there is no evidence that spending time on frequent physical activity would undermine student’s APS. Those students who are below the average in their academic performance could be worse off in academic performance if they give up minimal-intensity physical activity. Therefore, cutting physical activity time in schools could hurt the scholastic performance among those students who were already at higher risk for dropping out due to inadequate APS. PMID:25774525

  3. Non-inferiority tests for anti-infective drugs using control group quantiles.

    PubMed

    Fay, Michael P; Follmann, Dean A

    2016-12-01

    In testing for non-inferiority of anti-infective drugs, the primary endpoint is often the difference in the proportion of failures between the test and control group at a landmark time. The landmark time is chosen to approximately correspond to the qth historic quantile of the control group, and the non-inferiority margin is selected to be reasonable for the target level q. For designing these studies, a troubling issue is that the landmark time must be pre-specified, but there is no guarantee that the proportion of control failures at the landmark time will be close to the target level q. If the landmark time is far from the target control quantile, then the pre-specified non-inferiority margin may not longer be reasonable. Exact variable margin tests have been developed by Röhmel and Kieser to address this problem, but these tests can have poor power if the observed control failure rate at the landmark time is far from its historic value. We develop a new variable margin non-inferiority test where we continue sampling until a pre-specified proportion of failures, q, have occurred in the control group, where q is the target quantile level. The test does not require any assumptions on the failure time distributions, and hence, no knowledge of the true [Formula: see text] control quantile for the study is needed. Our new test is exact and has power comparable to (or greater than) its competitors when the true control quantile from the study equals (or differs moderately from) its historic value. Our nivm R package performs the test and gives confidence intervals on the difference in failure rates at the true target control quantile. The tests can be applied to time to cure or other numeric variables as well. A substantial proportion of new anti-infective drugs being developed use non-inferiority tests in their development, and typically, a pre-specified landmark time and its associated difference margin are set at the design stage to match a specific target control

  4. Intersection of Screening Methods High Quantile

    EPA Pesticide Factsheets

    This layer combines the high quantiles of the CES, CEVA, and EJSM layers so that viewers can see the overlap of 00e2??hot spots00e2?? for each method. This layer was created by James Sadd of Occidental College of Los Angeles

  5. Using quantile regression to examine the effects of inequality across the mortality distribution in the U.S. counties

    PubMed Central

    Yang, Tse-Chuan; Chen, Vivian Yi-Ju; Shoff, Carla; Matthews, Stephen A.

    2012-01-01

    The U.S. has experienced a resurgence of income inequality in the past decades. The evidence regarding the mortality implications of this phenomenon has been mixed. This study employs a rarely used method in mortality research, quantile regression (QR), to provide insight into the ongoing debate of whether income inequality is a determinant of mortality and to investigate the varying relationship between inequality and mortality throughout the mortality distribution. Analyzing a U.S. dataset where the five-year (1998–2002) average mortality rates were combined with other county-level covariates, we found that the association between inequality and mortality was not constant throughout the mortality distribution and the impact of inequality on mortality steadily increased until the 80th percentile. When accounting for all potential confounders, inequality was significantly and positively related to mortality; however, this inequality–mortality relationship did not hold across the mortality distribution. A series of Wald tests confirmed this varying inequality–mortality relationship, especially between the lower and upper tails. The large variation in the estimated coefficients of the Gini index suggested that inequality had the greatest influence on those counties with a mortality rate of roughly 9.95 deaths per 1000 population (80th percentile) compared to any other counties. Furthermore, our results suggest that the traditional analytic methods that focus on mean or median value of the dependent variable can be, at most, applied to a narrow 20 percent of observations. This study demonstrates the value of QR. Our findings provide some insight as to why the existing evidence for the inequality–mortality relationship is mixed and suggest that analytical issues may play a role in clarifying whether inequality is a robust determinant of population health. PMID:22497847

  6. Data quantile-quantile plots: quantifying the time evolution of space climatology

    NASA Astrophysics Data System (ADS)

    Tindale, Elizabeth; Chapman, Sandra

    2017-04-01

    The solar wind is inherently variable across a wide range of spatio-temporal scales; embedded in the flow are the signatures of distinct non-linear physical processes from evolving turbulence to the dynamical solar corona. In-situ satellite observations of solar wind magnetic field and velocity are at minute and below time resolution and now extend over several solar cycles. Each solar cycle is unique, and the space climatology challenge is to quantify how solar wind variability changes within, and across, each distinct solar cycle, and how this in turn drives space weather at earth. We will demonstrate a novel statistical method, that of data-data quantile-quantile (DQQ) plots, which quantifies how the underlying statistical distribution of a given observable is changing in time. Importantly this method does not require any assumptions concerning the underlying functional form of the distribution and can identify multi-component behaviour that is changing in time. This can be used to determine when a sub-range of a given observable is undergoing a change in statistical distribution, or where the moments of the distribution only are changing and the functional form of the underlying distribution is not changing in time. The method is quite general; for this application we use data from the WIND satellite to compare the solar wind across the minima and maxima of solar cycles 23 and 24 [1], and how these changes are manifest in parameters that quantify coupling to the earth's magnetosphere. [1] Tindale, E., and S.C. Chapman (2016), Geophys. Res. Lett., 43(11), doi: 10.1002/2016GL068920.

  7. Topological and canonical kriging for design flood prediction in ungauged catchments: an improvement over a traditional regional regression approach?

    USGS Publications Warehouse

    Archfield, Stacey A.; Pugliese, Alessio; Castellarin, Attilio; Skøien, Jon O.; Kiang, Julie E.

    2013-01-01

    In the United States, estimation of flood frequency quantiles at ungauged locations has been largely based on regional regression techniques that relate measurable catchment descriptors to flood quantiles. More recently, spatial interpolation techniques of point data have been shown to be effective for predicting streamflow statistics (i.e., flood flows and low-flow indices) in ungauged catchments. Literature reports successful applications of two techniques, canonical kriging, CK (or physiographical-space-based interpolation, PSBI), and topological kriging, TK (or top-kriging). CK performs the spatial interpolation of the streamflow statistic of interest in the two-dimensional space of catchment descriptors. TK predicts the streamflow statistic along river networks taking both the catchment area and nested nature of catchments into account. It is of interest to understand how these spatial interpolation methods compare with generalized least squares (GLS) regression, one of the most common approaches to estimate flood quantiles at ungauged locations. By means of a leave-one-out cross-validation procedure, the performance of CK and TK was compared to GLS regression equations developed for the prediction of 10, 50, 100 and 500 yr floods for 61 streamgauges in the southeast United States. TK substantially outperforms GLS and CK for the study area, particularly for large catchments. The performance of TK over GLS highlights an important distinction between the treatments of spatial correlation when using regression-based or spatial interpolation methods to estimate flood quantiles at ungauged locations. The analysis also shows that coupling TK with CK slightly improves the performance of TK; however, the improvement is marginal when compared to the improvement in performance over GLS.

  8. Quantiles for Finite Mixtures of Normal Distributions

    ERIC Educational Resources Information Center

    Rahman, Mezbahur; Rahman, Rumanur; Pearson, Larry M.

    2006-01-01

    Quantiles for finite mixtures of normal distributions are computed. The difference between a linear combination of independent normal random variables and a linear combination of independent normal densities is emphasized. (Contains 3 tables and 1 figure.)

  9. Downscaling of daily precipitation using a hybrid model of Artificial Neural Network, Wavelet, and Quantile Mapping in Gharehsoo River Basin, Iran

    NASA Astrophysics Data System (ADS)

    Taie Semiromi, M.; Koch, M.

    2017-12-01

    Although linear/regression statistical downscaling methods are very straightforward and widely used, and they can be applied to a single predictor-predictand pair or spatial fields of predictors-predictands, the greatest constraint is the requirement of a normal distribution of the predictor and the predictand values, which means that it cannot be used to predict the distribution of daily rainfall because it is typically non-normal. To tacked with such a limitation, the current study aims to introduce a new developed hybrid technique taking advantages from Artificial Neural Networks (ANNs), Wavelet and Quantile Mapping (QM) for downscaling of daily precipitation for 10 rain-gauge stations located in Gharehsoo River Basin, Iran. With the purpose of daily precipitation downscaling, the study makes use of Second Generation Canadian Earth System Model (CanESM2) developed by Canadian Centre for Climate Modeling and Analysis (CCCma). Climate projections are available for three representative concentration pathways (RCPs) namely RCP 2.6, RCP 4.5 and RCP 8.5 for up to 2100. In this regard, 26 National Centers for Environmental Prediction (NCEP) reanalysis large-scale variables which have potential physical relationships with precipitation, were selected as candidate predictors. Afterwards, predictor screening was conducted using correlation, partial correlation and explained variance between predictors and predictand (precipitation). Depending on each rain-gauge station between two and three predictors were selected which their decomposed details (D) and approximation (A) obtained from discrete wavelet analysis were fed as inputs to the neural networks. After downscaling of daily precipitation, bias correction was conducted using quantile mapping. Out of the complete time series available, i.e. 1978-2005, two third of which namely 1978-1996 was used for calibration of QM and the reminder, i.e. 1997-2005 was considered for the validation. Result showed that the proposed

  10. Spline methods for approximating quantile functions and generating random samples

    NASA Technical Reports Server (NTRS)

    Schiess, J. R.; Matthews, C. G.

    1985-01-01

    Two cubic spline formulations are presented for representing the quantile function (inverse cumulative distribution function) of a random sample of data. Both B-spline and rational spline approximations are compared with analytic representations of the quantile function. It is also shown how these representations can be used to generate random samples for use in simulation studies. Comparisons are made on samples generated from known distributions and a sample of experimental data. The spline representations are more accurate for multimodal and skewed samples and to require much less time to generate samples than the analytic representation.

  11. Evaluating differential effects using regression interactions and regression mixture models

    PubMed Central

    Van Horn, M. Lee; Jaki, Thomas; Masyn, Katherine; Howe, George; Feaster, Daniel J.; Lamont, Andrea E.; George, Melissa R. W.; Kim, Minjung

    2015-01-01

    Research increasingly emphasizes understanding differential effects. This paper focuses on understanding regression mixture models, a relatively new statistical methods for assessing differential effects by comparing results to using an interactive term in linear regression. The research questions which each model answers, their formulation, and their assumptions are compared using Monte Carlo simulations and real data analysis. The capabilities of regression mixture models are described and specific issues to be addressed when conducting regression mixtures are proposed. The paper aims to clarify the role that regression mixtures can take in the estimation of differential effects and increase awareness of the benefits and potential pitfalls of this approach. Regression mixture models are shown to be a potentially effective exploratory method for finding differential effects when these effects can be defined by a small number of classes of respondents who share a typical relationship between a predictor and an outcome. It is also shown that the comparison between regression mixture models and interactions becomes substantially more complex as the number of classes increases. It is argued that regression interactions are well suited for direct tests of specific hypotheses about differential effects and regression mixtures provide a useful approach for exploring effect heterogeneity given adequate samples and study design. PMID:26556903

  12. Explaining Variation in Instructional Time: An Application of Quantile Regression

    ERIC Educational Resources Information Center

    Corey, Douglas Lyman; Phelps, Geoffrey; Ball, Deborah Loewenberg; Demonte, Jenny; Harrison, Delena

    2012-01-01

    This research is conducted in the context of a large-scale study of three nationally disseminated comprehensive school reform projects (CSRs) and examines how school- and classroom-level factors contribute to variation in instructional time in English language arts and mathematics. When using mean-based OLS regression techniques such as…

  13. Unitary Response Regression Models

    ERIC Educational Resources Information Center

    Lipovetsky, S.

    2007-01-01

    The dependent variable in a regular linear regression is a numerical variable, and in a logistic regression it is a binary or categorical variable. In these models the dependent variable has varying values. However, there are problems yielding an identity output of a constant value which can also be modelled in a linear or logistic regression with…

  14. On the distortion of elevation dependent warming signals by quantile mapping

    NASA Astrophysics Data System (ADS)

    Jury, Martin W.; Mendlik, Thomas; Maraun, Douglas

    2017-04-01

    Elevation dependent warming (EDW), the amplification of warming under climate change with elevation, is likely to accelerate changes in e.g. cryospheric and hydrological systems. Responsible for EDW is a mixture of processes including snow albedo feedback, cloud formations or the location of aerosols. The degree of incorporation of this processes varies across state of the art climate models. In a recent study we were preparing bias corrected model output of CMIP5 GCMs and CORDEX RCMs over the Himalayan region for the glacier modelling community. In a first attempt we used quantile mapping (QM) to generate this data. A beforehand model evaluation showed that more than two third of the 49 included climate models were able to reproduce positive trend differences between areas of higher and lower elevations in winter, clearly visible in all of our five observational datasets used. Regrettably, we noticed that height dependent trend signals provided by models were distorted, most of the time in the direction of less EDW, sometimes even reversing EDW signals present in the models before the bias correction. As a consequence, we refrained from using quantile mapping for our task, as EDW poses one important factor influencing the climate in high altitudes for the nearer and more distant future, and used a climate change signal preserving bias correction approach. Here we present our findings of the distortion of the EDW temperature change by QM and discuss the influence of QM on different statistical properties as well as their modifications.

  15. Evaluation of three statistical prediction models for forensic age prediction based on DNA methylation.

    PubMed

    Smeers, Inge; Decorte, Ronny; Van de Voorde, Wim; Bekaert, Bram

    2018-05-01

    DNA methylation is a promising biomarker for forensic age prediction. A challenge that has emerged in recent studies is the fact that prediction errors become larger with increasing age due to interindividual differences in epigenetic ageing rates. This phenomenon of non-constant variance or heteroscedasticity violates an assumption of the often used method of ordinary least squares (OLS) regression. The aim of this study was to evaluate alternative statistical methods that do take heteroscedasticity into account in order to provide more accurate, age-dependent prediction intervals. A weighted least squares (WLS) regression is proposed as well as a quantile regression model. Their performances were compared against an OLS regression model based on the same dataset. Both models provided age-dependent prediction intervals which account for the increasing variance with age, but WLS regression performed better in terms of success rate in the current dataset. However, quantile regression might be a preferred method when dealing with a variance that is not only non-constant, but also not normally distributed. Ultimately the choice of which model to use should depend on the observed characteristics of the data. Copyright © 2018 Elsevier B.V. All rights reserved.

  16. Matching a Distribution by Matching Quantiles Estimation

    PubMed Central

    Sgouropoulos, Nikolaos; Yao, Qiwei; Yastremiz, Claudia

    2015-01-01

    Motivated by the problem of selecting representative portfolios for backtesting counterparty credit risks, we propose a matching quantiles estimation (MQE) method for matching a target distribution by that of a linear combination of a set of random variables. An iterative procedure based on the ordinary least-squares estimation (OLS) is proposed to compute MQE. MQE can be easily modified by adding a LASSO penalty term if a sparse representation is desired, or by restricting the matching within certain range of quantiles to match a part of the target distribution. The convergence of the algorithm and the asymptotic properties of the estimation, both with or without LASSO, are established. A measure and an associated statistical test are proposed to assess the goodness-of-match. The finite sample properties are illustrated by simulation. An application in selecting a counterparty representative portfolio with a real dataset is reported. The proposed MQE also finds applications in portfolio tracking, which demonstrates the usefulness of combining MQE with LASSO. PMID:26692592

  17. Confidence intervals for expected moments algorithm flood quantile estimates

    USGS Publications Warehouse

    Cohn, Timothy A.; Lane, William L.; Stedinger, Jery R.

    2001-01-01

    Historical and paleoflood information can substantially improve flood frequency estimates if appropriate statistical procedures are properly applied. However, the Federal guidelines for flood frequency analysis, set forth in Bulletin 17B, rely on an inefficient “weighting” procedure that fails to take advantage of historical and paleoflood information. This has led researchers to propose several more efficient alternatives including the Expected Moments Algorithm (EMA), which is attractive because it retains Bulletin 17B's statistical structure (method of moments with the Log Pearson Type 3 distribution) and thus can be easily integrated into flood analyses employing the rest of the Bulletin 17B approach. The practical utility of EMA, however, has been limited because no closed‐form method has been available for quantifying the uncertainty of EMA‐based flood quantile estimates. This paper addresses that concern by providing analytical expressions for the asymptotic variance of EMA flood‐quantile estimators and confidence intervals for flood quantile estimates. Monte Carlo simulations demonstrate the properties of such confidence intervals for sites where a 25‐ to 100‐year streamgage record is augmented by 50 to 150 years of historical information. The experiments show that the confidence intervals, though not exact, should be acceptable for most purposes.

  18. Comparison of different hydrological similarity measures to estimate flow quantiles

    NASA Astrophysics Data System (ADS)

    Rianna, M.; Ridolfi, E.; Napolitano, F.

    2017-07-01

    This paper aims to evaluate the influence of hydrological similarity measures on the definition of homogeneous regions. To this end, several attribute sets have been analyzed in the context of the Region of Influence (ROI) procedure. Several combinations of geomorphological, climatological, and geographical characteristics are also used to cluster potentially homogeneous regions. To verify the goodness of the resulting pooled sites, homogeneity tests arecarried out. Through a Monte Carlo simulation and a jack-knife procedure, flow quantiles areestimated for the regions effectively resulting as homogeneous. The analysis areperformed in both the so-called gauged and ungauged scenarios to analyze the effect of hydrological measures on flow quantiles estimation.

  19. Regression modeling of ground-water flow

    USGS Publications Warehouse

    Cooley, R.L.; Naff, R.L.

    1985-01-01

    Nonlinear multiple regression methods are developed to model and analyze groundwater flow systems. Complete descriptions of regression methodology as applied to groundwater flow models allow scientists and engineers engaged in flow modeling to apply the methods to a wide range of problems. Organization of the text proceeds from an introduction that discusses the general topic of groundwater flow modeling, to a review of basic statistics necessary to properly apply regression techniques, and then to the main topic: exposition and use of linear and nonlinear regression to model groundwater flow. Statistical procedures are given to analyze and use the regression models. A number of exercises and answers are included to exercise the student on nearly all the methods that are presented for modeling and statistical analysis. Three computer programs implement the more complex methods. These three are a general two-dimensional, steady-state regression model for flow in an anisotropic, heterogeneous porous medium, a program to calculate a measure of model nonlinearity with respect to the regression parameters, and a program to analyze model errors in computed dependent variables such as hydraulic head. (USGS)

  20. Evaluating Differential Effects Using Regression Interactions and Regression Mixture Models

    ERIC Educational Resources Information Center

    Van Horn, M. Lee; Jaki, Thomas; Masyn, Katherine; Howe, George; Feaster, Daniel J.; Lamont, Andrea E.; George, Melissa R. W.; Kim, Minjung

    2015-01-01

    Research increasingly emphasizes understanding differential effects. This article focuses on understanding regression mixture models, which are relatively new statistical methods for assessing differential effects by comparing results to using an interactive term in linear regression. The research questions which each model answers, their…

  1. Bootstrap Enhanced Penalized Regression for Variable Selection with Neuroimaging Data

    PubMed Central

    Abram, Samantha V.; Helwig, Nathaniel E.; Moodie, Craig A.; DeYoung, Colin G.; MacDonald, Angus W.; Waller, Niels G.

    2016-01-01

    Recent advances in fMRI research highlight the use of multivariate methods for examining whole-brain connectivity. Complementary data-driven methods are needed for determining the subset of predictors related to individual differences. Although commonly used for this purpose, ordinary least squares (OLS) regression may not be ideal due to multi-collinearity and over-fitting issues. Penalized regression is a promising and underutilized alternative to OLS regression. In this paper, we propose a nonparametric bootstrap quantile (QNT) approach for variable selection with neuroimaging data. We use real and simulated data, as well as annotated R code, to demonstrate the benefits of our proposed method. Our results illustrate the practical potential of our proposed bootstrap QNT approach. Our real data example demonstrates how our method can be used to relate individual differences in neural network connectivity with an externalizing personality measure. Also, our simulation results reveal that the QNT method is effective under a variety of data conditions. Penalized regression yields more stable estimates and sparser models than OLS regression in situations with large numbers of highly correlated neural predictors. Our results demonstrate that penalized regression is a promising method for examining associations between neural predictors and clinically relevant traits or behaviors. These findings have important implications for the growing field of functional connectivity research, where multivariate methods produce numerous, highly correlated brain networks. PMID:27516732

  2. Bootstrap Enhanced Penalized Regression for Variable Selection with Neuroimaging Data.

    PubMed

    Abram, Samantha V; Helwig, Nathaniel E; Moodie, Craig A; DeYoung, Colin G; MacDonald, Angus W; Waller, Niels G

    2016-01-01

    Recent advances in fMRI research highlight the use of multivariate methods for examining whole-brain connectivity. Complementary data-driven methods are needed for determining the subset of predictors related to individual differences. Although commonly used for this purpose, ordinary least squares (OLS) regression may not be ideal due to multi-collinearity and over-fitting issues. Penalized regression is a promising and underutilized alternative to OLS regression. In this paper, we propose a nonparametric bootstrap quantile (QNT) approach for variable selection with neuroimaging data. We use real and simulated data, as well as annotated R code, to demonstrate the benefits of our proposed method. Our results illustrate the practical potential of our proposed bootstrap QNT approach. Our real data example demonstrates how our method can be used to relate individual differences in neural network connectivity with an externalizing personality measure. Also, our simulation results reveal that the QNT method is effective under a variety of data conditions. Penalized regression yields more stable estimates and sparser models than OLS regression in situations with large numbers of highly correlated neural predictors. Our results demonstrate that penalized regression is a promising method for examining associations between neural predictors and clinically relevant traits or behaviors. These findings have important implications for the growing field of functional connectivity research, where multivariate methods produce numerous, highly correlated brain networks.

  3. Survival Data and Regression Models

    NASA Astrophysics Data System (ADS)

    Grégoire, G.

    2014-12-01

    We start this chapter by introducing some basic elements for the analysis of censored survival data. Then we focus on right censored data and develop two types of regression models. The first one concerns the so-called accelerated failure time models (AFT), which are parametric models where a function of a parameter depends linearly on the covariables. The second one is a semiparametric model, where the covariables enter in a multiplicative form in the expression of the hazard rate function. The main statistical tool for analysing these regression models is the maximum likelihood methodology and, in spite we recall some essential results about the ML theory, we refer to the chapter "Logistic Regression" for a more detailed presentation.

  4. Use of Flood Seasonality in Pooling-Group Formation and Quantile Estimation: An Application in Great Britain

    NASA Astrophysics Data System (ADS)

    Formetta, Giuseppe; Bell, Victoria; Stewart, Elizabeth

    2018-02-01

    Regional flood frequency analysis is one of the most commonly applied methods for estimating extreme flood events at ungauged sites or locations with short measurement records. It is based on: (i) the definition of a homogeneous group (pooling-group) of catchments, and on (ii) the use of the pooling-group data to estimate flood quantiles. Although many methods to define a pooling-group (pooling schemes, PS) are based on catchment physiographic similarity measures, in the last decade methods based on flood seasonality similarity have been contemplated. In this paper, two seasonality-based PS are proposed and tested both in terms of the homogeneity of the pooling-groups they generate and in terms of the accuracy in estimating extreme flood events. The method has been applied in 420 catchments in Great Britain (considered as both gauged and ungauged) and compared against the current Flood Estimation Handbook (FEH) PS. Results for gauged sites show that, compared to the current PS, the seasonality-based PS performs better both in terms of homogeneity of the pooling-group and in terms of the accuracy of flood quantile estimates. For ungauged locations, a national-scale hydrological model has been used for the first time to quantify flood seasonality. Results show that in 75% of the tested locations the seasonality-based PS provides an improvement in the accuracy of the flood quantile estimates. The remaining 25% were located in highly urbanized, groundwater-dependent catchments. The promising results support the aspiration that large-scale hydrological models complement traditional methods for estimating design floods.

  5. Extreme climatic events drive mammal irruptions: regression analysis of 100-year trends in desert rainfall and temperature

    PubMed Central

    Greenville, Aaron C; Wardle, Glenda M; Dickman, Chris R

    2012-01-01

    Extreme climatic events, such as flooding rains, extended decadal droughts and heat waves have been identified increasingly as important regulators of natural populations. Climate models predict that global warming will drive changes in rainfall and increase the frequency and severity of extreme events. Consequently, to anticipate how organisms will respond we need to document how changes in extremes of temperature and rainfall compare to trends in the mean values of these variables and over what spatial scales the patterns are consistent. Using the longest historical weather records available for central Australia – 100 years – and quantile regression methods, we investigate if extreme climate events have changed at similar rates to median events, if annual rainfall has increased in variability, and if the frequency of large rainfall events has increased over this period. Specifically, we compared local (individual weather stations) and regional (Simpson Desert) spatial scales, and quantified trends in median (50th quantile) and extreme weather values (5th, 10th, 90th, and 95th quantiles). We found that median and extreme annual minimum and maximum temperatures have increased at both spatial scales over the past century. Rainfall changes have been inconsistent across the Simpson Desert; individual weather stations showed increases in annual rainfall, increased frequency of large rainfall events or more prolonged droughts, depending on the location. In contrast to our prediction, we found no evidence that intra-annual rainfall had become more variable over time. Using long-term live-trapping records (22 years) of desert small mammals as a case study, we demonstrate that irruptive events are driven by extreme rainfalls (>95th quantile) and that increases in the magnitude and frequency of extreme rainfall events are likely to drive changes in the populations of these species through direct and indirect changes in predation pressure and wildfires. PMID:23170202

  6. Stochastic variability in stress, sleep duration, and sleep quality across the distribution of body mass index: insights from quantile regression.

    PubMed

    Yang, Tse-Chuan; Matthews, Stephen A; Chen, Vivian Y-J

    2014-04-01

    Obesity has become a problem in the USA and identifying modifiable factors at the individual level may help to address this public health concern. A burgeoning literature has suggested that sleep and stress may be associated with obesity; however, little is know about whether these two factors moderate each other and even less is known about whether their impacts on obesity differ by gender. This study investigates whether sleep and stress are associated with body mass index (BMI) respectively, explores whether the combination of stress and sleep is also related to BMI, and demonstrates how these associations vary across the distribution of BMI values. We analyze the data from 3,318 men and 6,689 women in the Philadelphia area using quantile regression (QR) to evaluate the relationships between sleep, stress, and obesity by gender. Our substantive findings include: (1) high and/or extreme stress were related to roughly an increase of 1.2 in BMI after accounting for other covariates; (2) the pathways linking sleep and BMI differed by gender, with BMI for men increasing by 0.77-1 units with reduced sleep duration and BMI for women declining by 0.12 unit with 1 unit increase in sleep quality; (3) stress- and sleep-related variables were confounded, but there was little evidence for moderation between these two; (4) the QR results demonstrate that the association between high and/or extreme stress to BMI varied stochastically across the distribution of BMI values, with an upward trend, suggesting that stress played a more important role among adults with higher BMI (i.e., BMI > 26 for both genders); and (5) the QR plots of sleep-related variables show similar patterns, with stronger effects on BMI at the upper end of BMI distribution. Our findings suggested that sleep and stress were two seemingly independent predictors for BMI and their relationships with BMI were not constant across the BMI distribution.

  7. Interpretation of commonly used statistical regression models.

    PubMed

    Kasza, Jessica; Wolfe, Rory

    2014-01-01

    A review of some regression models commonly used in respiratory health applications is provided in this article. Simple linear regression, multiple linear regression, logistic regression and ordinal logistic regression are considered. The focus of this article is on the interpretation of the regression coefficients of each model, which are illustrated through the application of these models to a respiratory health research study. © 2013 The Authors. Respirology © 2013 Asian Pacific Society of Respirology.

  8. Regionalisation of a distributed method for flood quantiles estimation: Revaluation of local calibration hypothesis to enhance the spatial structure of the optimised parameter

    NASA Astrophysics Data System (ADS)

    Odry, Jean; Arnaud, Patrick

    2016-04-01

    The SHYREG method (Aubert et al., 2014) associates a stochastic rainfall generator and a rainfall-runoff model to produce rainfall and flood quantiles on a 1 km2 mesh covering the whole French territory. The rainfall generator is based on the description of rainy events by descriptive variables following probability distributions and is characterised by a high stability. This stochastic generator is fully regionalised, and the rainfall-runoff transformation is calibrated with a single parameter. Thanks to the stability of the approach, calibration can be performed against only flood quantiles associated with observated frequencies which can be extracted from relatively short time series. The aggregation of SHYREG flood quantiles to the catchment scale is performed using an areal reduction factor technique unique on the whole territory. Past studies demonstrated the accuracy of SHYREG flood quantiles estimation for catchments where flow data are available (Arnaud et al., 2015). Nevertheless, the parameter of the rainfall-runoff model is independently calibrated for each target catchment. As a consequence, this parameter plays a corrective role and compensates approximations and modelling errors which makes difficult to identify its proper spatial pattern. It is an inherent objective of the SHYREG approach to be completely regionalised in order to provide a complete and accurate flood quantiles database throughout France. Consequently, it appears necessary to identify the model configuration in which the calibrated parameter could be regionalised with acceptable performances. The revaluation of some of the method hypothesis is a necessary step before the regionalisation. Especially the inclusion or the modification of the spatial variability of imposed parameters (like production and transfer reservoir size, base flow addition and quantiles aggregation function) should lead to more realistic values of the only calibrated parameter. The objective of the work presented

  9. Modeling Longitudinal Data Containing Non-Normal Within Subject Errors

    NASA Technical Reports Server (NTRS)

    Feiveson, Alan; Glenn, Nancy L.

    2013-01-01

    The mission of the National Aeronautics and Space Administration’s (NASA) human research program is to advance safe human spaceflight. This involves conducting experiments, collecting data, and analyzing data. The data are longitudinal and result from a relatively few number of subjects; typically 10 – 20. A longitudinal study refers to an investigation where participant outcomes and possibly treatments are collected at multiple follow-up times. Standard statistical designs such as mean regression with random effects and mixed–effects regression are inadequate for such data because the population is typically not approximately normally distributed. Hence, more advanced data analysis methods are necessary. This research focuses on four such methods for longitudinal data analysis: the recently proposed linear quantile mixed models (lqmm) by Geraci and Bottai (2013), quantile regression, multilevel mixed–effects linear regression, and robust regression. This research also provides computational algorithms for longitudinal data that scientists can directly use for human spaceflight and other longitudinal data applications, then presents statistical evidence that verifies which method is best for specific situations. This advances the study of longitudinal data in a broad range of applications including applications in the sciences, technology, engineering and mathematics fields.

  10. Poisson Mixture Regression Models for Heart Disease Prediction.

    PubMed

    Mufudza, Chipo; Erol, Hamza

    2016-01-01

    Early heart disease control can be achieved by high disease prediction and diagnosis efficiency. This paper focuses on the use of model based clustering techniques to predict and diagnose heart disease via Poisson mixture regression models. Analysis and application of Poisson mixture regression models is here addressed under two different classes: standard and concomitant variable mixture regression models. Results show that a two-component concomitant variable Poisson mixture regression model predicts heart disease better than both the standard Poisson mixture regression model and the ordinary general linear Poisson regression model due to its low Bayesian Information Criteria value. Furthermore, a Zero Inflated Poisson Mixture Regression model turned out to be the best model for heart prediction over all models as it both clusters individuals into high or low risk category and predicts rate to heart disease componentwise given clusters available. It is deduced that heart disease prediction can be effectively done by identifying the major risks componentwise using Poisson mixture regression model.

  11. Poisson Mixture Regression Models for Heart Disease Prediction

    PubMed Central

    Erol, Hamza

    2016-01-01

    Early heart disease control can be achieved by high disease prediction and diagnosis efficiency. This paper focuses on the use of model based clustering techniques to predict and diagnose heart disease via Poisson mixture regression models. Analysis and application of Poisson mixture regression models is here addressed under two different classes: standard and concomitant variable mixture regression models. Results show that a two-component concomitant variable Poisson mixture regression model predicts heart disease better than both the standard Poisson mixture regression model and the ordinary general linear Poisson regression model due to its low Bayesian Information Criteria value. Furthermore, a Zero Inflated Poisson Mixture Regression model turned out to be the best model for heart prediction over all models as it both clusters individuals into high or low risk category and predicts rate to heart disease componentwise given clusters available. It is deduced that heart disease prediction can be effectively done by identifying the major risks componentwise using Poisson mixture regression model. PMID:27999611

  12. Quantile Mapping Bias correction for daily precipitation over Vietnam in a regional climate model

    NASA Astrophysics Data System (ADS)

    Trinh, L. T.; Matsumoto, J.; Ngo-Duc, T.

    2017-12-01

    In the past decades, Regional Climate Models (RCMs) have been developed significantly, allowing climate simulation to be conducted at a higher resolution. However, RCMs often contained biases when comparing with observations. Therefore, statistical correction methods were commonly employed to reduce/minimize the model biases. In this study, outputs of the Regional Climate Model (RegCM) version 4.3 driven by the CNRM-CM5 global products were evaluated with and without the Quantile Mapping (QM) bias correction method. The model domain covered the area from 90oE to 145oE and from 15oS to 40oN with a horizontal resolution of 25km. The QM bias correction processes were implemented by using the Vietnam Gridded precipitation dataset (VnGP) and the outputs of RegCM historical run in the period 1986-1995 and then validated for the period 1996-2005. Based on the statistical quantity of spatial correlation and intensity distributions, the QM method showed a significant improvement in rainfall compared to the non-bias correction method. The improvements both in time and space were recognized in all seasons and all climatic sub-regions of Vietnam. Moreover, not only the rainfall amount but also some extreme indices such as R10m, R20mm, R50m, CDD, CWD, R95pTOT, R99pTOT were much better after the correction. The results suggested that the QM correction method should be taken into practice for the projections of the future precipitation over Vietnam.

  13. L-statistics for Repeated Measurements Data With Application to Trimmed Means, Quantiles and Tolerance Intervals.

    PubMed

    Assaad, Houssein I; Choudhary, Pankaj K

    2013-01-01

    The L -statistics form an important class of estimators in nonparametric statistics. Its members include trimmed means and sample quantiles and functions thereof. This article is devoted to theory and applications of L -statistics for repeated measurements data, wherein the measurements on the same subject are dependent and the measurements from different subjects are independent. This article has three main goals: (a) Show that the L -statistics are asymptotically normal for repeated measurements data. (b) Present three statistical applications of this result, namely, location estimation using trimmed means, quantile estimation and construction of tolerance intervals. (c) Obtain a Bahadur representation for sample quantiles. These results are generalizations of similar results for independently and identically distributed data. The practical usefulness of these results is illustrated by analyzing a real data set involving measurement of systolic blood pressure. The properties of the proposed point and interval estimators are examined via simulation.

  14. [From clinical judgment to linear regression model.

    PubMed

    Palacios-Cruz, Lino; Pérez, Marcela; Rivas-Ruiz, Rodolfo; Talavera, Juan O

    2013-01-01

    When we think about mathematical models, such as linear regression model, we think that these terms are only used by those engaged in research, a notion that is far from the truth. Legendre described the first mathematical model in 1805, and Galton introduced the formal term in 1886. Linear regression is one of the most commonly used regression models in clinical practice. It is useful to predict or show the relationship between two or more variables as long as the dependent variable is quantitative and has normal distribution. Stated in another way, the regression is used to predict a measure based on the knowledge of at least one other variable. Linear regression has as it's first objective to determine the slope or inclination of the regression line: Y = a + bx, where "a" is the intercept or regression constant and it is equivalent to "Y" value when "X" equals 0 and "b" (also called slope) indicates the increase or decrease that occurs when the variable "x" increases or decreases in one unit. In the regression line, "b" is called regression coefficient. The coefficient of determination (R 2 ) indicates the importance of independent variables in the outcome.

  15. Inferring river bathymetry via Image-to-Depth Quantile Transformation (IDQT)

    USGS Publications Warehouse

    Legleiter, Carl

    2016-01-01

    Conventional, regression-based methods of inferring depth from passive optical image data undermine the advantages of remote sensing for characterizing river systems. This study introduces and evaluates a more flexible framework, Image-to-Depth Quantile Transformation (IDQT), that involves linking the frequency distribution of pixel values to that of depth. In addition, a new image processing workflow involving deep water correction and Minimum Noise Fraction (MNF) transformation can reduce a hyperspectral data set to a single variable related to depth and thus suitable for input to IDQT. Applied to a gravel bed river, IDQT avoided negative depth estimates along channel margins and underpredictions of pool depth. Depth retrieval accuracy (R25 0.79) and precision (0.27 m) were comparable to an established band ratio-based method, although a small shallow bias (0.04 m) was observed. Several ways of specifying distributions of pixel values and depths were evaluated but had negligible impact on the resulting depth estimates, implying that IDQT was robust to these implementation details. In essence, IDQT uses frequency distributions of pixel values and depths to achieve an aspatial calibration; the image itself provides information on the spatial distribution of depths. The approach thus reduces sensitivity to misalignment between field and image data sets and allows greater flexibility in the timing of field data collection relative to image acquisition, a significant advantage in dynamic channels. IDQT also creates new possibilities for depth retrieval in the absence of field data if a model could be used to predict the distribution of depths within a reach.

  16. Log Pearson type 3 quantile estimators with regional skew information and low outlier adjustments

    USGS Publications Warehouse

    Griffis, V.W.; Stedinger, Jery R.; Cohn, T.A.

    2004-01-01

    The recently developed expected moments algorithm (EMA) [Cohn et al., 1997] does as well as maximum likelihood estimations at estimating log‐Pearson type 3 (LP3) flood quantiles using systematic and historical flood information. Needed extensions include use of a regional skewness estimator and its precision to be consistent with Bulletin 17B. Another issue addressed by Bulletin 17B is the treatment of low outliers. A Monte Carlo study compares the performance of Bulletin 17B using the entire sample with and without regional skew with estimators that use regional skew and censor low outliers, including an extended EMA estimator, the conditional probability adjustment (CPA) from Bulletin 17B, and an estimator that uses probability plot regression (PPR) to compute substitute values for low outliers. Estimators that neglect regional skew information do much worse than estimators that use an informative regional skewness estimator. For LP3 data the low outlier rejection procedure generally results in no loss of overall accuracy, and the differences between the MSEs of the estimators that used an informative regional skew are generally modest in the skewness range of real interest. Samples contaminated to model actual flood data demonstrate that estimators which give special treatment to low outliers significantly outperform estimators that make no such adjustment.

  17. Log Pearson type 3 quantile estimators with regional skew information and low outlier adjustments

    NASA Astrophysics Data System (ADS)

    Griffis, V. W.; Stedinger, J. R.; Cohn, T. A.

    2004-07-01

    The recently developed expected moments algorithm (EMA) [, 1997] does as well as maximum likelihood estimations at estimating log-Pearson type 3 (LP3) flood quantiles using systematic and historical flood information. Needed extensions include use of a regional skewness estimator and its precision to be consistent with Bulletin 17B. Another issue addressed by Bulletin 17B is the treatment of low outliers. A Monte Carlo study compares the performance of Bulletin 17B using the entire sample with and without regional skew with estimators that use regional skew and censor low outliers, including an extended EMA estimator, the conditional probability adjustment (CPA) from Bulletin 17B, and an estimator that uses probability plot regression (PPR) to compute substitute values for low outliers. Estimators that neglect regional skew information do much worse than estimators that use an informative regional skewness estimator. For LP3 data the low outlier rejection procedure generally results in no loss of overall accuracy, and the differences between the MSEs of the estimators that used an informative regional skew are generally modest in the skewness range of real interest. Samples contaminated to model actual flood data demonstrate that estimators which give special treatment to low outliers significantly outperform estimators that make no such adjustment.

  18. Categorical regression dose-response modeling

    EPA Science Inventory

    The goal of this training is to provide participants with training on the use of the U.S. EPA’s Categorical Regression soft¬ware (CatReg) and its application to risk assessment. Categorical regression fits mathematical models to toxicity data that have been assigned ord...

  19. Moderation analysis using a two-level regression model.

    PubMed

    Yuan, Ke-Hai; Cheng, Ying; Maxwell, Scott

    2014-10-01

    Moderation analysis is widely used in social and behavioral research. The most commonly used model for moderation analysis is moderated multiple regression (MMR) in which the explanatory variables of the regression model include product terms, and the model is typically estimated by least squares (LS). This paper argues for a two-level regression model in which the regression coefficients of a criterion variable on predictors are further regressed on moderator variables. An algorithm for estimating the parameters of the two-level model by normal-distribution-based maximum likelihood (NML) is developed. Formulas for the standard errors (SEs) of the parameter estimates are provided and studied. Results indicate that, when heteroscedasticity exists, NML with the two-level model gives more efficient and more accurate parameter estimates than the LS analysis of the MMR model. When error variances are homoscedastic, NML with the two-level model leads to essentially the same results as LS with the MMR model. Most importantly, the two-level regression model permits estimating the percentage of variance of each regression coefficient that is due to moderator variables. When applied to data from General Social Surveys 1991, NML with the two-level model identified a significant moderation effect of race on the regression of job prestige on years of education while LS with the MMR model did not. An R package is also developed and documented to facilitate the application of the two-level model.

  20. Variable selection and model choice in geoadditive regression models.

    PubMed

    Kneib, Thomas; Hothorn, Torsten; Tutz, Gerhard

    2009-06-01

    Model choice and variable selection are issues of major concern in practical regression analyses, arising in many biometric applications such as habitat suitability analyses, where the aim is to identify the influence of potentially many environmental conditions on certain species. We describe regression models for breeding bird communities that facilitate both model choice and variable selection, by a boosting algorithm that works within a class of geoadditive regression models comprising spatial effects, nonparametric effects of continuous covariates, interaction surfaces, and varying coefficients. The major modeling components are penalized splines and their bivariate tensor product extensions. All smooth model terms are represented as the sum of a parametric component and a smooth component with one degree of freedom to obtain a fair comparison between the model terms. A generic representation of the geoadditive model allows us to devise a general boosting algorithm that automatically performs model choice and variable selection.

  1. Interaction Models for Functional Regression.

    PubMed

    Usset, Joseph; Staicu, Ana-Maria; Maity, Arnab

    2016-02-01

    A functional regression model with a scalar response and multiple functional predictors is proposed that accommodates two-way interactions in addition to their main effects. The proposed estimation procedure models the main effects using penalized regression splines, and the interaction effect by a tensor product basis. Extensions to generalized linear models and data observed on sparse grids or with measurement error are presented. A hypothesis testing procedure for the functional interaction effect is described. The proposed method can be easily implemented through existing software. Numerical studies show that fitting an additive model in the presence of interaction leads to both poor estimation performance and lost prediction power, while fitting an interaction model where there is in fact no interaction leads to negligible losses. The methodology is illustrated on the AneuRisk65 study data.

  2. Introduction to the use of regression models in epidemiology.

    PubMed

    Bender, Ralf

    2009-01-01

    Regression modeling is one of the most important statistical techniques used in analytical epidemiology. By means of regression models the effect of one or several explanatory variables (e.g., exposures, subject characteristics, risk factors) on a response variable such as mortality or cancer can be investigated. From multiple regression models, adjusted effect estimates can be obtained that take the effect of potential confounders into account. Regression methods can be applied in all epidemiologic study designs so that they represent a universal tool for data analysis in epidemiology. Different kinds of regression models have been developed in dependence on the measurement scale of the response variable and the study design. The most important methods are linear regression for continuous outcomes, logistic regression for binary outcomes, Cox regression for time-to-event data, and Poisson regression for frequencies and rates. This chapter provides a nontechnical introduction to these regression models with illustrating examples from cancer research.

  3. Model selection for logistic regression models

    NASA Astrophysics Data System (ADS)

    Duller, Christine

    2012-09-01

    Model selection for logistic regression models decides which of some given potential regressors have an effect and hence should be included in the final model. The second interesting question is whether a certain factor is heterogeneous among some subsets, i.e. whether the model should include a random intercept or not. In this paper these questions will be answered with classical as well as with Bayesian methods. The application show some results of recent research projects in medicine and business administration.

  4. Real estate value prediction using multivariate regression models

    NASA Astrophysics Data System (ADS)

    Manjula, R.; Jain, Shubham; Srivastava, Sharad; Rajiv Kher, Pranav

    2017-11-01

    The real estate market is one of the most competitive in terms of pricing and the same tends to vary significantly based on a lot of factors, hence it becomes one of the prime fields to apply the concepts of machine learning to optimize and predict the prices with high accuracy. Therefore in this paper, we present various important features to use while predicting housing prices with good accuracy. We have described regression models, using various features to have lower Residual Sum of Squares error. While using features in a regression model some feature engineering is required for better prediction. Often a set of features (multiple regressions) or polynomial regression (applying a various set of powers in the features) is used for making better model fit. For these models are expected to be susceptible towards over fitting ridge regression is used to reduce it. This paper thus directs to the best application of regression models in addition to other techniques to optimize the result.

  5. Error Covariance Penalized Regression: A novel multivariate model combining penalized regression with multivariate error structure.

    PubMed

    Allegrini, Franco; Braga, Jez W B; Moreira, Alessandro C O; Olivieri, Alejandro C

    2018-06-29

    A new multivariate regression model, named Error Covariance Penalized Regression (ECPR) is presented. Following a penalized regression strategy, the proposed model incorporates information about the measurement error structure of the system, using the error covariance matrix (ECM) as a penalization term. Results are reported from both simulations and experimental data based on replicate mid and near infrared (MIR and NIR) spectral measurements. The results for ECPR are better under non-iid conditions when compared with traditional first-order multivariate methods such as ridge regression (RR), principal component regression (PCR) and partial least-squares regression (PLS). Copyright © 2018 Elsevier B.V. All rights reserved.

  6. Ridge Regression for Interactive Models.

    ERIC Educational Resources Information Center

    Tate, Richard L.

    1988-01-01

    An exploratory study of the value of ridge regression for interactive models is reported. Assuming that the linear terms in a simple interactive model are centered to eliminate non-essential multicollinearity, a variety of common models, representing both ordinal and disordinal interactions, are shown to have "orientations" that are…

  7. Superquantile/CVaR Risk Measures: Second-Order Theory

    DTIC Science & Technology

    2015-07-31

    order superquantile risk minimization as well as superquantile regression , a proposed second-order version of quantile regression . Keywords...minimization as well as superquantile regression , a proposed second-order version of quantile regression . 15. SUBJECT TERMS 16. SECURITY...superquantilies, because it is deeply tied to generalized regression . The joint formula (3) is central to quantile regression , a well known alternative

  8. Rank-preserving regression: a more robust rank regression model against outliers.

    PubMed

    Chen, Tian; Kowalski, Jeanne; Chen, Rui; Wu, Pan; Zhang, Hui; Feng, Changyong; Tu, Xin M

    2016-08-30

    Mean-based semi-parametric regression models such as the popular generalized estimating equations are widely used to improve robustness of inference over parametric models. Unfortunately, such models are quite sensitive to outlying observations. The Wilcoxon-score-based rank regression (RR) provides more robust estimates over generalized estimating equations against outliers. However, the RR and its extensions do not sufficiently address missing data arising in longitudinal studies. In this paper, we propose a new approach to address outliers under a different framework based on the functional response models. This functional-response-model-based alternative not only addresses limitations of the RR and its extensions for longitudinal data, but, with its rank-preserving property, even provides more robust estimates than these alternatives. The proposed approach is illustrated with both real and simulated data. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  9. A simple approach to power and sample size calculations in logistic regression and Cox regression models.

    PubMed

    Vaeth, Michael; Skovlund, Eva

    2004-06-15

    For a given regression problem it is possible to identify a suitably defined equivalent two-sample problem such that the power or sample size obtained for the two-sample problem also applies to the regression problem. For a standard linear regression model the equivalent two-sample problem is easily identified, but for generalized linear models and for Cox regression models the situation is more complicated. An approximately equivalent two-sample problem may, however, also be identified here. In particular, we show that for logistic regression and Cox regression models the equivalent two-sample problem is obtained by selecting two equally sized samples for which the parameters differ by a value equal to the slope times twice the standard deviation of the independent variable and further requiring that the overall expected number of events is unchanged. In a simulation study we examine the validity of this approach to power calculations in logistic regression and Cox regression models. Several different covariate distributions are considered for selected values of the overall response probability and a range of alternatives. For the Cox regression model we consider both constant and non-constant hazard rates. The results show that in general the approach is remarkably accurate even in relatively small samples. Some discrepancies are, however, found in small samples with few events and a highly skewed covariate distribution. Comparison with results based on alternative methods for logistic regression models with a single continuous covariate indicates that the proposed method is at least as good as its competitors. The method is easy to implement and therefore provides a simple way to extend the range of problems that can be covered by the usual formulas for power and sample size determination. Copyright 2004 John Wiley & Sons, Ltd.

  10. Regression Models For Multivariate Count Data.

    PubMed

    Zhang, Yiwen; Zhou, Hua; Zhou, Jin; Sun, Wei

    2017-01-01

    Data with multivariate count responses frequently occur in modern applications. The commonly used multinomial-logit model is limiting due to its restrictive mean-variance structure. For instance, analyzing count data from the recent RNA-seq technology by the multinomial-logit model leads to serious errors in hypothesis testing. The ubiquity of over-dispersion and complicated correlation structures among multivariate counts calls for more flexible regression models. In this article, we study some generalized linear models that incorporate various correlation structures among the counts. Current literature lacks a treatment of these models, partly due to the fact that they do not belong to the natural exponential family. We study the estimation, testing, and variable selection for these models in a unifying framework. The regression models are compared on both synthetic and real RNA-seq data.

  11. Regression Models For Multivariate Count Data

    PubMed Central

    Zhang, Yiwen; Zhou, Hua; Zhou, Jin; Sun, Wei

    2016-01-01

    Data with multivariate count responses frequently occur in modern applications. The commonly used multinomial-logit model is limiting due to its restrictive mean-variance structure. For instance, analyzing count data from the recent RNA-seq technology by the multinomial-logit model leads to serious errors in hypothesis testing. The ubiquity of over-dispersion and complicated correlation structures among multivariate counts calls for more flexible regression models. In this article, we study some generalized linear models that incorporate various correlation structures among the counts. Current literature lacks a treatment of these models, partly due to the fact that they do not belong to the natural exponential family. We study the estimation, testing, and variable selection for these models in a unifying framework. The regression models are compared on both synthetic and real RNA-seq data. PMID:28348500

  12. Building Regression Models: The Importance of Graphics.

    ERIC Educational Resources Information Center

    Dunn, Richard

    1989-01-01

    Points out reasons for using graphical methods to teach simple and multiple regression analysis. Argues that a graphically oriented approach has considerable pedagogic advantages in the exposition of simple and multiple regression. Shows that graphical methods may play a central role in the process of building regression models. (Author/LS)

  13. Censored Quantile Instrumental Variable Estimates of the Price Elasticity of Expenditure on Medical Care.

    PubMed

    Kowalski, Amanda

    2016-01-02

    Efforts to control medical care costs depend critically on how individuals respond to prices. I estimate the price elasticity of expenditure on medical care using a censored quantile instrumental variable (CQIV) estimator. CQIV allows estimates to vary across the conditional expenditure distribution, relaxes traditional censored model assumptions, and addresses endogeneity with an instrumental variable. My instrumental variable strategy uses a family member's injury to induce variation in an individual's own price. Across the conditional deciles of the expenditure distribution, I find elasticities that vary from -0.76 to -1.49, which are an order of magnitude larger than previous estimates.

  14. Gaussian Process Regression Model in Spatial Logistic Regression

    NASA Astrophysics Data System (ADS)

    Sofro, A.; Oktaviarina, A.

    2018-01-01

    Spatial analysis has developed very quickly in the last decade. One of the favorite approaches is based on the neighbourhood of the region. Unfortunately, there are some limitations such as difficulty in prediction. Therefore, we offer Gaussian process regression (GPR) to accommodate the issue. In this paper, we will focus on spatial modeling with GPR for binomial data with logit link function. The performance of the model will be investigated. We will discuss the inference of how to estimate the parameters and hyper-parameters and to predict as well. Furthermore, simulation studies will be explained in the last section.

  15. Robust mislabel logistic regression without modeling mislabel probabilities.

    PubMed

    Hung, Hung; Jou, Zhi-Yu; Huang, Su-Yun

    2018-03-01

    Logistic regression is among the most widely used statistical methods for linear discriminant analysis. In many applications, we only observe possibly mislabeled responses. Fitting a conventional logistic regression can then lead to biased estimation. One common resolution is to fit a mislabel logistic regression model, which takes into consideration of mislabeled responses. Another common method is to adopt a robust M-estimation by down-weighting suspected instances. In this work, we propose a new robust mislabel logistic regression based on γ-divergence. Our proposal possesses two advantageous features: (1) It does not need to model the mislabel probabilities. (2) The minimum γ-divergence estimation leads to a weighted estimating equation without the need to include any bias correction term, that is, it is automatically bias-corrected. These features make the proposed γ-logistic regression more robust in model fitting and more intuitive for model interpretation through a simple weighting scheme. Our method is also easy to implement, and two types of algorithms are included. Simulation studies and the Pima data application are presented to demonstrate the performance of γ-logistic regression. © 2017, The International Biometric Society.

  16. Predicting birth weight with conditionally linear transformation models.

    PubMed

    Möst, Lisa; Schmid, Matthias; Faschingbauer, Florian; Hothorn, Torsten

    2016-12-01

    Low and high birth weight (BW) are important risk factors for neonatal morbidity and mortality. Gynecologists must therefore accurately predict BW before delivery. Most prediction formulas for BW are based on prenatal ultrasound measurements carried out within one week prior to birth. Although successfully used in clinical practice, these formulas focus on point predictions of BW but do not systematically quantify uncertainty of the predictions, i.e. they result in estimates of the conditional mean of BW but do not deliver prediction intervals. To overcome this problem, we introduce conditionally linear transformation models (CLTMs) to predict BW. Instead of focusing only on the conditional mean, CLTMs model the whole conditional distribution function of BW given prenatal ultrasound parameters. Consequently, the CLTM approach delivers both point predictions of BW and fetus-specific prediction intervals. Prediction intervals constitute an easy-to-interpret measure of prediction accuracy and allow identification of fetuses subject to high prediction uncertainty. Using a data set of 8712 deliveries at the Perinatal Centre at the University Clinic Erlangen (Germany), we analyzed variants of CLTMs and compared them to standard linear regression estimation techniques used in the past and to quantile regression approaches. The best-performing CLTM variant was competitive with quantile regression and linear regression approaches in terms of conditional coverage and average length of the prediction intervals. We propose that CLTMs be used because they are able to account for possible heteroscedasticity, kurtosis, and skewness of the distribution of BWs. © The Author(s) 2014.

  17. The 2011 heat wave in Greater Houston: Effects of land use on temperature.

    PubMed

    Zhou, Weihe; Ji, Shuang; Chen, Tsun-Hsuan; Hou, Yi; Zhang, Kai

    2014-11-01

    Effects of land use on temperatures during severe heat waves have been rarely studied. This paper examines land use-temperature associations during the 2011 heat wave in Greater Houston. We obtained high resolution of satellite-derived land use data from the US National Land Cover Database, and temperature observations at 138 weather stations from Weather Underground, Inc (WU) during the August of 2011, which was the hottest month in Houston since 1889. Land use regression and quantile regression methods were applied to the monthly averages of daily maximum/mean/minimum temperatures and 114 land use-related predictors. Although selected variables vary with temperature metric, distance to the coastline consistently appears among all models. Other variables are generally related to high developed intensity, open water or wetlands. In addition, our quantile regression analysis shows that distance to the coastline and high developed intensity areas have larger impacts on daily average temperatures at higher quantiles, and open water area has greater impacts on daily minimum temperatures at lower quantiles. By utilizing both land use regression and quantile regression on a recent heat wave in one of the largest US metropolitan areas, this paper provides a new perspective on the impacts of land use on temperatures. Our models can provide estimates of heat exposures for epidemiological studies, and our findings can be combined with demographic variables, air conditioning and relevant diseases information to identify 'hot spots' of population vulnerability for public health interventions to reduce heat-related health effects during heat waves. Copyright © 2014 Elsevier Inc. All rights reserved.

  18. Tests of Sunspot Number Sequences: 3. Effects of Regression Procedures on the Calibration of Historic Sunspot Data

    NASA Astrophysics Data System (ADS)

    Lockwood, M.; Owens, M. J.; Barnard, L.; Usoskin, I. G.

    2016-11-01

    We use sunspot-group observations from the Royal Greenwich Observatory (RGO) to investigate the effects of intercalibrating data from observers with different visual acuities. The tests are made by counting the number of groups [RB] above a variable cut-off threshold of observed total whole spot area (uncorrected for foreshortening) to simulate what a lower-acuity observer would have seen. The synthesised annual means of RB are then re-scaled to the full observed RGO group number [RA] using a variety of regression techniques. It is found that a very high correlation between RA and RB (r_{AB} > 0.98) does not prevent large errors in the intercalibration (for example sunspot-maximum values can be over 30 % too large even for such levels of r_{AB}). In generating the backbone sunspot number [R_{BB}], Svalgaard and Schatten ( Solar Phys., 2016) force regression fits to pass through the scatter-plot origin, which generates unreliable fits (the residuals do not form a normal distribution) and causes sunspot-cycle amplitudes to be exaggerated in the intercalibrated data. It is demonstrated that the use of Quantile-Quantile ("Q-Q") plots to test for a normal distribution is a useful indicator of erroneous and misleading regression fits. Ordinary least-squares linear fits, not forced to pass through the origin, are sometimes reliable (although the optimum method used is shown to be different when matching peak and average sunspot-group numbers). However, other fits are only reliable if non-linear regression is used. From these results it is entirely possible that the inflation of solar-cycle amplitudes in the backbone group sunspot number as one goes back in time, relative to related solar-terrestrial parameters, is entirely caused by the use of inappropriate and non-robust regression techniques to calibrate the sunspot data.

  19. Impact of multicollinearity on small sample hydrologic regression models

    NASA Astrophysics Data System (ADS)

    Kroll, Charles N.; Song, Peter

    2013-06-01

    Often hydrologic regression models are developed with ordinary least squares (OLS) procedures. The use of OLS with highly correlated explanatory variables produces multicollinearity, which creates highly sensitive parameter estimators with inflated variances and improper model selection. It is not clear how to best address multicollinearity in hydrologic regression models. Here a Monte Carlo simulation is developed to compare four techniques to address multicollinearity: OLS, OLS with variance inflation factor screening (VIF), principal component regression (PCR), and partial least squares regression (PLS). The performance of these four techniques was observed for varying sample sizes, correlation coefficients between the explanatory variables, and model error variances consistent with hydrologic regional regression models. The negative effects of multicollinearity are magnified at smaller sample sizes, higher correlations between the variables, and larger model error variances (smaller R2). The Monte Carlo simulation indicates that if the true model is known, multicollinearity is present, and the estimation and statistical testing of regression parameters are of interest, then PCR or PLS should be employed. If the model is unknown, or if the interest is solely on model predictions, is it recommended that OLS be employed since using more complicated techniques did not produce any improvement in model performance. A leave-one-out cross-validation case study was also performed using low-streamflow data sets from the eastern United States. Results indicate that OLS with stepwise selection generally produces models across study regions with varying levels of multicollinearity that are as good as biased regression techniques such as PCR and PLS.

  20. Probabilistic forecasting for extreme NO2 pollution episodes.

    PubMed

    Aznarte, José L

    2017-10-01

    In this study, we investigate the convenience of quantile regression to predict extreme concentrations of NO 2 . Contrarily to the usual point-forecasting, where a single value is forecast for each horizon, probabilistic forecasting through quantile regression allows for the prediction of the full probability distribution, which in turn allows to build models specifically fit for the tails of this distribution. Using data from the city of Madrid, including NO 2 concentrations as well as meteorological measures, we build models that predict extreme NO 2 concentrations, outperforming point-forecasting alternatives, and we prove that the predictions are accurate, reliable and sharp. Besides, we study the relative importance of the independent variables involved, and show how the important variables for the median quantile are different than those important for the upper quantiles. Furthermore, we present a method to compute the probability of exceedance of thresholds, which is a simple and comprehensible manner to present probabilistic forecasts maximizing their usefulness. Copyright © 2017 Elsevier Ltd. All rights reserved.

  1. Testing homogeneity in Weibull-regression models.

    PubMed

    Bolfarine, Heleno; Valença, Dione M

    2005-10-01

    In survival studies with families or geographical units it may be of interest testing whether such groups are homogeneous for given explanatory variables. In this paper we consider score type tests for group homogeneity based on a mixing model in which the group effect is modelled as a random variable. As opposed to hazard-based frailty models, this model presents survival times that conditioned on the random effect, has an accelerated failure time representation. The test statistics requires only estimation of the conventional regression model without the random effect and does not require specifying the distribution of the random effect. The tests are derived for a Weibull regression model and in the uncensored situation, a closed form is obtained for the test statistic. A simulation study is used for comparing the power of the tests. The proposed tests are applied to real data sets with censored data.

  2. Detection of epistatic effects with logic regression and a classical linear regression model.

    PubMed

    Malina, Magdalena; Ickstadt, Katja; Schwender, Holger; Posch, Martin; Bogdan, Małgorzata

    2014-02-01

    To locate multiple interacting quantitative trait loci (QTL) influencing a trait of interest within experimental populations, usually methods as the Cockerham's model are applied. Within this framework, interactions are understood as the part of the joined effect of several genes which cannot be explained as the sum of their additive effects. However, if a change in the phenotype (as disease) is caused by Boolean combinations of genotypes of several QTLs, this Cockerham's approach is often not capable to identify them properly. To detect such interactions more efficiently, we propose a logic regression framework. Even though with the logic regression approach a larger number of models has to be considered (requiring more stringent multiple testing correction) the efficient representation of higher order logic interactions in logic regression models leads to a significant increase of power to detect such interactions as compared to a Cockerham's approach. The increase in power is demonstrated analytically for a simple two-way interaction model and illustrated in more complex settings with simulation study and real data analysis.

  3. A quantile-based scenario analysis approach to biomass supply chain optimization under uncertainty

    DOE PAGES

    Zamar, David S.; Gopaluni, Bhushan; Sokhansanj, Shahab; ...

    2016-11-21

    Supply chain optimization for biomass-based power plants is an important research area due to greater emphasis on renewable power energy sources. Biomass supply chain design and operational planning models are often formulated and studied using deterministic mathematical models. While these models are beneficial for making decisions, their applicability to real world problems may be limited because they do not capture all the complexities in the supply chain, including uncertainties in the parameters. This study develops a statistically robust quantile-based approach for stochastic optimization under uncertainty, which builds upon scenario analysis. We apply and evaluate the performance of our approach tomore » address the problem of analyzing competing biomass supply chains subject to stochastic demand and supply. Finally, the proposed approach was found to outperform alternative methods in terms of computational efficiency and ability to meet the stochastic problem requirements.« less

  4. A quantile-based scenario analysis approach to biomass supply chain optimization under uncertainty

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zamar, David S.; Gopaluni, Bhushan; Sokhansanj, Shahab

    Supply chain optimization for biomass-based power plants is an important research area due to greater emphasis on renewable power energy sources. Biomass supply chain design and operational planning models are often formulated and studied using deterministic mathematical models. While these models are beneficial for making decisions, their applicability to real world problems may be limited because they do not capture all the complexities in the supply chain, including uncertainties in the parameters. This study develops a statistically robust quantile-based approach for stochastic optimization under uncertainty, which builds upon scenario analysis. We apply and evaluate the performance of our approach tomore » address the problem of analyzing competing biomass supply chains subject to stochastic demand and supply. Finally, the proposed approach was found to outperform alternative methods in terms of computational efficiency and ability to meet the stochastic problem requirements.« less

  5. Parameters Estimation of Geographically Weighted Ordinal Logistic Regression (GWOLR) Model

    NASA Astrophysics Data System (ADS)

    Zuhdi, Shaifudin; Retno Sari Saputro, Dewi; Widyaningsih, Purnami

    2017-06-01

    A regression model is the representation of relationship between independent variable and dependent variable. The dependent variable has categories used in the logistic regression model to calculate odds on. The logistic regression model for dependent variable has levels in the logistics regression model is ordinal. GWOLR model is an ordinal logistic regression model influenced the geographical location of the observation site. Parameters estimation in the model needed to determine the value of a population based on sample. The purpose of this research is to parameters estimation of GWOLR model using R software. Parameter estimation uses the data amount of dengue fever patients in Semarang City. Observation units used are 144 villages in Semarang City. The results of research get GWOLR model locally for each village and to know probability of number dengue fever patient categories.

  6. Censored Quantile Instrumental Variable Estimates of the Price Elasticity of Expenditure on Medical Care

    PubMed Central

    Kowalski, Amanda

    2015-01-01

    Efforts to control medical care costs depend critically on how individuals respond to prices. I estimate the price elasticity of expenditure on medical care using a censored quantile instrumental variable (CQIV) estimator. CQIV allows estimates to vary across the conditional expenditure distribution, relaxes traditional censored model assumptions, and addresses endogeneity with an instrumental variable. My instrumental variable strategy uses a family member’s injury to induce variation in an individual’s own price. Across the conditional deciles of the expenditure distribution, I find elasticities that vary from −0.76 to −1.49, which are an order of magnitude larger than previous estimates. PMID:26977117

  7. Building Your Own Regression Model

    ERIC Educational Resources Information Center

    Horton, Robert, M.; Phillips, Vicki; Kenelly, John

    2004-01-01

    Spreadsheets to explore regression with an algebra 2 class in a medium-sized rural high school are presented. The use of spreadsheets can help students develop sophisticated understanding of mathematical models and use them to describe real-world phenomena.

  8. Superquantile/CVaR Risk Measures: Second-Order Theory

    DTIC Science & Technology

    2014-07-17

    order version of quantile regression . Keywords: superquantiles, conditional value-at-risk, second-order superquantiles, mixed superquan- tiles... quantile regression . 15. SUBJECT TERMS 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF ABSTRACT Same as Report (SAR) 18. NUMBER OF PAGES 26 19a...second-order superquantiles is in the domain of generalized regression . We laid out in [16] a parallel methodology to that of quantile regression

  9. Spatial Assessment of Model Errors from Four Regression Techniques

    Treesearch

    Lianjun Zhang; Jeffrey H. Gove; Jeffrey H. Gove

    2005-01-01

    Fomst modelers have attempted to account for the spatial autocorrelations among trees in growth and yield models by applying alternative regression techniques such as linear mixed models (LMM), generalized additive models (GAM), and geographicalIy weighted regression (GWR). However, the model errors are commonly assessed using average errors across the entire study...

  10. Estimation of peak discharge quantiles for selected annual exceedance probabilities in Northeastern Illinois.

    DOT National Transportation Integrated Search

    2016-06-01

    This report provides two sets of equations for estimating peak discharge quantiles at annual exceedance probabilities (AEPs) of 0.50, 0.20, 0.10, : 0.04, 0.02, 0.01, 0.005, and 0.002 (recurrence intervals of 2, 5, 10, 25, 50, 100, 200, and 500 years,...

  11. The microcomputer scientific software series 2: general linear model--regression.

    Treesearch

    Harold M. Rauscher

    1983-01-01

    The general linear model regression (GLMR) program provides the microcomputer user with a sophisticated regression analysis capability. The output provides a regression ANOVA table, estimators of the regression model coefficients, their confidence intervals, confidence intervals around the predicted Y-values, residuals for plotting, a check for multicollinearity, a...

  12. Penalized spline estimation for functional coefficient regression models.

    PubMed

    Cao, Yanrong; Lin, Haiqun; Wu, Tracy Z; Yu, Yan

    2010-04-01

    The functional coefficient regression models assume that the regression coefficients vary with some "threshold" variable, providing appreciable flexibility in capturing the underlying dynamics in data and avoiding the so-called "curse of dimensionality" in multivariate nonparametric estimation. We first investigate the estimation, inference, and forecasting for the functional coefficient regression models with dependent observations via penalized splines. The P-spline approach, as a direct ridge regression shrinkage type global smoothing method, is computationally efficient and stable. With established fixed-knot asymptotics, inference is readily available. Exact inference can be obtained for fixed smoothing parameter λ, which is most appealing for finite samples. Our penalized spline approach gives an explicit model expression, which also enables multi-step-ahead forecasting via simulations. Furthermore, we examine different methods of choosing the important smoothing parameter λ: modified multi-fold cross-validation (MCV), generalized cross-validation (GCV), and an extension of empirical bias bandwidth selection (EBBS) to P-splines. In addition, we implement smoothing parameter selection using mixed model framework through restricted maximum likelihood (REML) for P-spline functional coefficient regression models with independent observations. The P-spline approach also easily allows different smoothness for different functional coefficients, which is enabled by assigning different penalty λ accordingly. We demonstrate the proposed approach by both simulation examples and a real data application.

  13. RRegrs: an R package for computer-aided model selection with multiple regression models.

    PubMed

    Tsiliki, Georgia; Munteanu, Cristian R; Seoane, Jose A; Fernandez-Lozano, Carlos; Sarimveis, Haralambos; Willighagen, Egon L

    2015-01-01

    Predictive regression models can be created with many different modelling approaches. Choices need to be made for data set splitting, cross-validation methods, specific regression parameters and best model criteria, as they all affect the accuracy and efficiency of the produced predictive models, and therefore, raising model reproducibility and comparison issues. Cheminformatics and bioinformatics are extensively using predictive modelling and exhibit a need for standardization of these methodologies in order to assist model selection and speed up the process of predictive model development. A tool accessible to all users, irrespectively of their statistical knowledge, would be valuable if it tests several simple and complex regression models and validation schemes, produce unified reports, and offer the option to be integrated into more extensive studies. Additionally, such methodology should be implemented as a free programming package, in order to be continuously adapted and redistributed by others. We propose an integrated framework for creating multiple regression models, called RRegrs. The tool offers the option of ten simple and complex regression methods combined with repeated 10-fold and leave-one-out cross-validation. Methods include Multiple Linear regression, Generalized Linear Model with Stepwise Feature Selection, Partial Least Squares regression, Lasso regression, and Support Vector Machines Recursive Feature Elimination. The new framework is an automated fully validated procedure which produces standardized reports to quickly oversee the impact of choices in modelling algorithms and assess the model and cross-validation results. The methodology was implemented as an open source R package, available at https://www.github.com/enanomapper/RRegrs, by reusing and extending on the caret package. The universality of the new methodology is demonstrated using five standard data sets from different scientific fields. Its efficiency in cheminformatics and QSAR

  14. Wavelet regression model in forecasting crude oil price

    NASA Astrophysics Data System (ADS)

    Hamid, Mohd Helmie; Shabri, Ani

    2017-05-01

    This study presents the performance of wavelet multiple linear regression (WMLR) technique in daily crude oil forecasting. WMLR model was developed by integrating the discrete wavelet transform (DWT) and multiple linear regression (MLR) model. The original time series was decomposed to sub-time series with different scales by wavelet theory. Correlation analysis was conducted to assist in the selection of optimal decomposed components as inputs for the WMLR model. The daily WTI crude oil price series has been used in this study to test the prediction capability of the proposed model. The forecasting performance of WMLR model were also compared with regular multiple linear regression (MLR), Autoregressive Moving Average (ARIMA) and Generalized Autoregressive Conditional Heteroscedasticity (GARCH) using root mean square errors (RMSE) and mean absolute errors (MAE). Based on the experimental results, it appears that the WMLR model performs better than the other forecasting technique tested in this study.

  15. Testing Different Model Building Procedures Using Multiple Regression.

    ERIC Educational Resources Information Center

    Thayer, Jerome D.

    The stepwise regression method of selecting predictors for computer assisted multiple regression analysis was compared with forward, backward, and best subsets regression, using 16 data sets. The results indicated the stepwise method was preferred because of its practical nature, when the models chosen by different selection methods were similar…

  16. Regression Model Optimization for the Analysis of Experimental Data

    NASA Technical Reports Server (NTRS)

    Ulbrich, N.

    2009-01-01

    A candidate math model search algorithm was developed at Ames Research Center that determines a recommended math model for the multivariate regression analysis of experimental data. The search algorithm is applicable to classical regression analysis problems as well as wind tunnel strain gage balance calibration analysis applications. The algorithm compares the predictive capability of different regression models using the standard deviation of the PRESS residuals of the responses as a search metric. This search metric is minimized during the search. Singular value decomposition is used during the search to reject math models that lead to a singular solution of the regression analysis problem. Two threshold dependent constraints are also applied. The first constraint rejects math models with insignificant terms. The second constraint rejects math models with near-linear dependencies between terms. The math term hierarchy rule may also be applied as an optional constraint during or after the candidate math model search. The final term selection of the recommended math model depends on the regressor and response values of the data set, the user s function class combination choice, the user s constraint selections, and the result of the search metric minimization. A frequently used regression analysis example from the literature is used to illustrate the application of the search algorithm to experimental data.

  17. Logistic models--an odd(s) kind of regression.

    PubMed

    Jupiter, Daniel C

    2013-01-01

    The logistic regression model bears some similarity to the multivariable linear regression with which we are familiar. However, the differences are great enough to warrant a discussion of the need for and interpretation of logistic regression. Copyright © 2013 American College of Foot and Ankle Surgeons. Published by Elsevier Inc. All rights reserved.

  18. Stochastic Approximation Methods for Latent Regression Item Response Models

    ERIC Educational Resources Information Center

    von Davier, Matthias; Sinharay, Sandip

    2010-01-01

    This article presents an application of a stochastic approximation expectation maximization (EM) algorithm using a Metropolis-Hastings (MH) sampler to estimate the parameters of an item response latent regression model. Latent regression item response models are extensions of item response theory (IRT) to a latent variable model with covariates…

  19. Simple linear and multivariate regression models.

    PubMed

    Rodríguez del Águila, M M; Benítez-Parejo, N

    2011-01-01

    In biomedical research it is common to find problems in which we wish to relate a response variable to one or more variables capable of describing the behaviour of the former variable by means of mathematical models. Regression techniques are used to this effect, in which an equation is determined relating the two variables. While such equations can have different forms, linear equations are the most widely used form and are easy to interpret. The present article describes simple and multiple linear regression models, how they are calculated, and how their applicability assumptions are checked. Illustrative examples are provided, based on the use of the freely accessible R program. Copyright © 2011 SEICAP. Published by Elsevier Espana. All rights reserved.

  20. Geographically weighted regression model on poverty indicator

    NASA Astrophysics Data System (ADS)

    Slamet, I.; Nugroho, N. F. T. A.; Muslich

    2017-12-01

    In this research, we applied geographically weighted regression (GWR) for analyzing the poverty in Central Java. We consider Gaussian Kernel as weighted function. The GWR uses the diagonal matrix resulted from calculating kernel Gaussian function as a weighted function in the regression model. The kernel weights is used to handle spatial effects on the data so that a model can be obtained for each location. The purpose of this paper is to model of poverty percentage data in Central Java province using GWR with Gaussian kernel weighted function and to determine the influencing factors in each regency/city in Central Java province. Based on the research, we obtained geographically weighted regression model with Gaussian kernel weighted function on poverty percentage data in Central Java province. We found that percentage of population working as farmers, population growth rate, percentage of households with regular sanitation, and BPJS beneficiaries are the variables that affect the percentage of poverty in Central Java province. In this research, we found the determination coefficient R2 are 68.64%. There are two categories of district which are influenced by different of significance factors.

  1. Effect of threatening life experiences and adverse family relations in ulcerative colitis: analysis using structural equation modeling and comparison with Crohn's disease.

    PubMed

    Slonim-Nevo, Vered; Sarid, Orly; Friger, Michael; Schwartz, Doron; Sergienko, Ruslan; Pereg, Avihu; Vardi, Hillel; Singer, Terri; Chernin, Elena; Greenberg, Dan; Odes, Shmuel

    2017-05-01

    We published that threatening life experiences and adverse family relations impact Crohn's disease (CD) adversely. In this study, we examine the influence of these stressors in ulcerative colitis (UC). Patients completed demography, economic status (ES), the Patient-Simple Clinical Colitis Activity Index (P-SCCAI), the Short Inflammatory Bowel Disease Questionnaire (SIBDQ), the Short-Form Health Survey (SF-36), the Brief Symptom Inventory (BSI), the Family Assessment Device (FAD), and the List of Threatening Life Experiences (LTE). Analysis included multiple linear and quantile regressions and structural equation modeling, comparing CD. UC patients (N=148, age 47.55±16.04 years, 50.6% women) had scores [median (interquartile range)] as follows: SCAAI, 2 (0.3-4.8); FAD, 1.8 (1.3-2.2); LTE, 1.0 (0-2.0); SF-36 Physical Health, 49.4 (36.8-55.1); SF-36 Mental Health, 45 (33.6-54.5); Brief Symptom Inventory-Global Severity Index (GSI), 0.5 (0.2-1.0). SIBDQ was 49.76±14.91. There were significant positive associations for LTE and SCAAI (25, 50, 75% quantiles), FAD and SF-36 Mental Health, FAD and LTE with GSI (50, 75, 90% quantiles), and ES with SF-36 and SIBDQ. The negative associations were as follows: LTE with SF-36 Physical/Mental Health, SIBDQ with FAD and LTE, ES with GSI (all quantiles), and P-SCCAI (75, 90% quantiles). In structural equation modeling analysis, LTE impacted ES negatively and ES impacted GSI negatively; LTE impacted GSI positively and GSI impacted P-SCCAI positively. In a split model, ES had a greater effect on GSI in UC than CD, whereas other path magnitudes were similar. Threatening life experiences, adverse family relations, and poor ES make UC patients less healthy both physically and mentally. The impact of ES is worse in UC than CD.

  2. Bayesian Estimation of Multivariate Latent Regression Models: Gauss versus Laplace

    ERIC Educational Resources Information Center

    Culpepper, Steven Andrew; Park, Trevor

    2017-01-01

    A latent multivariate regression model is developed that employs a generalized asymmetric Laplace (GAL) prior distribution for regression coefficients. The model is designed for high-dimensional applications where an approximate sparsity condition is satisfied, such that many regression coefficients are near zero after accounting for all the model…

  3. Quantile-based bias correction and uncertainty quantification of extreme event attribution statements

    DOE PAGES

    Jeon, Soyoung; Paciorek, Christopher J.; Wehner, Michael F.

    2016-02-16

    Extreme event attribution characterizes how anthropogenic climate change may have influenced the probability and magnitude of selected individual extreme weather and climate events. Attribution statements often involve quantification of the fraction of attributable risk (FAR) or the risk ratio (RR) and associated confidence intervals. Many such analyses use climate model output to characterize extreme event behavior with and without anthropogenic influence. However, such climate models may have biases in their representation of extreme events. To account for discrepancies in the probabilities of extreme events between observational datasets and model datasets, we demonstrate an appropriate rescaling of the model output basedmore » on the quantiles of the datasets to estimate an adjusted risk ratio. Our methodology accounts for various components of uncertainty in estimation of the risk ratio. In particular, we present an approach to construct a one-sided confidence interval on the lower bound of the risk ratio when the estimated risk ratio is infinity. We demonstrate the methodology using the summer 2011 central US heatwave and output from the Community Earth System Model. In this example, we find that the lower bound of the risk ratio is relatively insensitive to the magnitude and probability of the actual event.« less

  4. Regression Models for Identifying Noise Sources in Magnetic Resonance Images

    PubMed Central

    Zhu, Hongtu; Li, Yimei; Ibrahim, Joseph G.; Shi, Xiaoyan; An, Hongyu; Chen, Yashen; Gao, Wei; Lin, Weili; Rowe, Daniel B.; Peterson, Bradley S.

    2009-01-01

    Stochastic noise, susceptibility artifacts, magnetic field and radiofrequency inhomogeneities, and other noise components in magnetic resonance images (MRIs) can introduce serious bias into any measurements made with those images. We formally introduce three regression models including a Rician regression model and two associated normal models to characterize stochastic noise in various magnetic resonance imaging modalities, including diffusion-weighted imaging (DWI) and functional MRI (fMRI). Estimation algorithms are introduced to maximize the likelihood function of the three regression models. We also develop a diagnostic procedure for systematically exploring MR images to identify noise components other than simple stochastic noise, and to detect discrepancies between the fitted regression models and MRI data. The diagnostic procedure includes goodness-of-fit statistics, measures of influence, and tools for graphical display. The goodness-of-fit statistics can assess the key assumptions of the three regression models, whereas measures of influence can isolate outliers caused by certain noise components, including motion artifacts. The tools for graphical display permit graphical visualization of the values for the goodness-of-fit statistic and influence measures. Finally, we conduct simulation studies to evaluate performance of these methods, and we analyze a real dataset to illustrate how our diagnostic procedure localizes subtle image artifacts by detecting intravoxel variability that is not captured by the regression models. PMID:19890478

  5. A SEMIPARAMETRIC BAYESIAN MODEL FOR CIRCULAR-LINEAR REGRESSION

    EPA Science Inventory

    We present a Bayesian approach to regress a circular variable on a linear predictor. The regression coefficients are assumed to have a nonparametric distribution with a Dirichlet process prior. The semiparametric Bayesian approach gives added flexibility to the model and is usefu...

  6. Model building strategy for logistic regression: purposeful selection.

    PubMed

    Zhang, Zhongheng

    2016-03-01

    Logistic regression is one of the most commonly used models to account for confounders in medical literature. The article introduces how to perform purposeful selection model building strategy with R. I stress on the use of likelihood ratio test to see whether deleting a variable will have significant impact on model fit. A deleted variable should also be checked for whether it is an important adjustment of remaining covariates. Interaction should be checked to disentangle complex relationship between covariates and their synergistic effect on response variable. Model should be checked for the goodness-of-fit (GOF). In other words, how the fitted model reflects the real data. Hosmer-Lemeshow GOF test is the most widely used for logistic regression model.

  7. SPReM: Sparse Projection Regression Model For High-dimensional Linear Regression *

    PubMed Central

    Sun, Qiang; Zhu, Hongtu; Liu, Yufeng; Ibrahim, Joseph G.

    2014-01-01

    The aim of this paper is to develop a sparse projection regression modeling (SPReM) framework to perform multivariate regression modeling with a large number of responses and a multivariate covariate of interest. We propose two novel heritability ratios to simultaneously perform dimension reduction, response selection, estimation, and testing, while explicitly accounting for correlations among multivariate responses. Our SPReM is devised to specifically address the low statistical power issue of many standard statistical approaches, such as the Hotelling’s T2 test statistic or a mass univariate analysis, for high-dimensional data. We formulate the estimation problem of SPREM as a novel sparse unit rank projection (SURP) problem and propose a fast optimization algorithm for SURP. Furthermore, we extend SURP to the sparse multi-rank projection (SMURP) by adopting a sequential SURP approximation. Theoretically, we have systematically investigated the convergence properties of SURP and the convergence rate of SURP estimates. Our simulation results and real data analysis have shown that SPReM out-performs other state-of-the-art methods. PMID:26527844

  8. Influence diagnostics in meta-regression model.

    PubMed

    Shi, Lei; Zuo, ShanShan; Yu, Dalei; Zhou, Xiaohua

    2017-09-01

    This paper studies the influence diagnostics in meta-regression model including case deletion diagnostic and local influence analysis. We derive the subset deletion formulae for the estimation of regression coefficient and heterogeneity variance and obtain the corresponding influence measures. The DerSimonian and Laird estimation and maximum likelihood estimation methods in meta-regression are considered, respectively, to derive the results. Internal and external residual and leverage measure are defined. The local influence analysis based on case-weights perturbation scheme, responses perturbation scheme, covariate perturbation scheme, and within-variance perturbation scheme are explored. We introduce a method by simultaneous perturbing responses, covariate, and within-variance to obtain the local influence measure, which has an advantage of capable to compare the influence magnitude of influential studies from different perturbations. An example is used to illustrate the proposed methodology. Copyright © 2017 John Wiley & Sons, Ltd.

  9. Variable Selection for Regression Models of Percentile Flows

    NASA Astrophysics Data System (ADS)

    Fouad, G.

    2017-12-01

    Percentile flows describe the flow magnitude equaled or exceeded for a given percent of time, and are widely used in water resource management. However, these statistics are normally unavailable since most basins are ungauged. Percentile flows of ungauged basins are often predicted using regression models based on readily observable basin characteristics, such as mean elevation. The number of these independent variables is too large to evaluate all possible models. A subset of models is typically evaluated using automatic procedures, like stepwise regression. This ignores a large variety of methods from the field of feature (variable) selection and physical understanding of percentile flows. A study of 918 basins in the United States was conducted to compare an automatic regression procedure to the following variable selection methods: (1) principal component analysis, (2) correlation analysis, (3) random forests, (4) genetic programming, (5) Bayesian networks, and (6) physical understanding. The automatic regression procedure only performed better than principal component analysis. Poor performance of the regression procedure was due to a commonly used filter for multicollinearity, which rejected the strongest models because they had cross-correlated independent variables. Multicollinearity did not decrease model performance in validation because of a representative set of calibration basins. Variable selection methods based strictly on predictive power (numbers 2-5 from above) performed similarly, likely indicating a limit to the predictive power of the variables. Similar performance was also reached using variables selected based on physical understanding, a finding that substantiates recent calls to emphasize physical understanding in modeling for predictions in ungauged basins. The strongest variables highlighted the importance of geology and land cover, whereas widely used topographic variables were the weakest predictors. Variables suffered from a high

  10. Regression Simulation Model. Appendix X. Users Manual,

    DTIC Science & Technology

    1981-03-01

    change as the prediction equations become refined. Whereas no notice will be provided when the changes are made, the programs will be modified such that...NATIONAL BUREAU Of STANDARDS 1963 A ___,_ __ _ __ _ . APPENDIX X ( R4/ EGRESSION IMULATION ’jDEL. Ape’A ’) 7 USERS MANUA submitted to The Great River...regression analysis and to establish a prediction equation (model). The prediction equation contains the partial regression coefficients (B-weights) which

  11. Modelling subject-specific childhood growth using linear mixed-effect models with cubic regression splines.

    PubMed

    Grajeda, Laura M; Ivanescu, Andrada; Saito, Mayuko; Crainiceanu, Ciprian; Jaganath, Devan; Gilman, Robert H; Crabtree, Jean E; Kelleher, Dermott; Cabrera, Lilia; Cama, Vitaliano; Checkley, William

    2016-01-01

    Childhood growth is a cornerstone of pediatric research. Statistical models need to consider individual trajectories to adequately describe growth outcomes. Specifically, well-defined longitudinal models are essential to characterize both population and subject-specific growth. Linear mixed-effect models with cubic regression splines can account for the nonlinearity of growth curves and provide reasonable estimators of population and subject-specific growth, velocity and acceleration. We provide a stepwise approach that builds from simple to complex models, and account for the intrinsic complexity of the data. We start with standard cubic splines regression models and build up to a model that includes subject-specific random intercepts and slopes and residual autocorrelation. We then compared cubic regression splines vis-à-vis linear piecewise splines, and with varying number of knots and positions. Statistical code is provided to ensure reproducibility and improve dissemination of methods. Models are applied to longitudinal height measurements in a cohort of 215 Peruvian children followed from birth until their fourth year of life. Unexplained variability, as measured by the variance of the regression model, was reduced from 7.34 when using ordinary least squares to 0.81 (p < 0.001) when using a linear mixed-effect models with random slopes and a first order continuous autoregressive error term. There was substantial heterogeneity in both the intercept (p < 0.001) and slopes (p < 0.001) of the individual growth trajectories. We also identified important serial correlation within the structure of the data (ρ = 0.66; 95 % CI 0.64 to 0.68; p < 0.001), which we modeled with a first order continuous autoregressive error term as evidenced by the variogram of the residuals and by a lack of association among residuals. The final model provides a parametric linear regression equation for both estimation and prediction of population- and individual-level growth

  12. [Evaluation of estimation of prevalence ratio using bayesian log-binomial regression model].

    PubMed

    Gao, W L; Lin, H; Liu, X N; Ren, X W; Li, J S; Shen, X P; Zhu, S L

    2017-03-10

    To evaluate the estimation of prevalence ratio ( PR ) by using bayesian log-binomial regression model and its application, we estimated the PR of medical care-seeking prevalence to caregivers' recognition of risk signs of diarrhea in their infants by using bayesian log-binomial regression model in Openbugs software. The results showed that caregivers' recognition of infant' s risk signs of diarrhea was associated significantly with a 13% increase of medical care-seeking. Meanwhile, we compared the differences in PR 's point estimation and its interval estimation of medical care-seeking prevalence to caregivers' recognition of risk signs of diarrhea and convergence of three models (model 1: not adjusting for the covariates; model 2: adjusting for duration of caregivers' education, model 3: adjusting for distance between village and township and child month-age based on model 2) between bayesian log-binomial regression model and conventional log-binomial regression model. The results showed that all three bayesian log-binomial regression models were convergence and the estimated PRs were 1.130(95 %CI : 1.005-1.265), 1.128(95 %CI : 1.001-1.264) and 1.132(95 %CI : 1.004-1.267), respectively. Conventional log-binomial regression model 1 and model 2 were convergence and their PRs were 1.130(95 % CI : 1.055-1.206) and 1.126(95 % CI : 1.051-1.203), respectively, but the model 3 was misconvergence, so COPY method was used to estimate PR , which was 1.125 (95 %CI : 1.051-1.200). In addition, the point estimation and interval estimation of PRs from three bayesian log-binomial regression models differed slightly from those of PRs from conventional log-binomial regression model, but they had a good consistency in estimating PR . Therefore, bayesian log-binomial regression model can effectively estimate PR with less misconvergence and have more advantages in application compared with conventional log-binomial regression model.

  13. Prediction models for clustered data: comparison of a random intercept and standard regression model.

    PubMed

    Bouwmeester, Walter; Twisk, Jos W R; Kappen, Teus H; van Klei, Wilton A; Moons, Karel G M; Vergouwe, Yvonne

    2013-02-15

    When study data are clustered, standard regression analysis is considered inappropriate and analytical techniques for clustered data need to be used. For prediction research in which the interest of predictor effects is on the patient level, random effect regression models are probably preferred over standard regression analysis. It is well known that the random effect parameter estimates and the standard logistic regression parameter estimates are different. Here, we compared random effect and standard logistic regression models for their ability to provide accurate predictions. Using an empirical study on 1642 surgical patients at risk of postoperative nausea and vomiting, who were treated by one of 19 anesthesiologists (clusters), we developed prognostic models either with standard or random intercept logistic regression. External validity of these models was assessed in new patients from other anesthesiologists. We supported our results with simulation studies using intra-class correlation coefficients (ICC) of 5%, 15%, or 30%. Standard performance measures and measures adapted for the clustered data structure were estimated. The model developed with random effect analysis showed better discrimination than the standard approach, if the cluster effects were used for risk prediction (standard c-index of 0.69 versus 0.66). In the external validation set, both models showed similar discrimination (standard c-index 0.68 versus 0.67). The simulation study confirmed these results. For datasets with a high ICC (≥15%), model calibration was only adequate in external subjects, if the used performance measure assumed the same data structure as the model development method: standard calibration measures showed good calibration for the standard developed model, calibration measures adapting the clustered data structure showed good calibration for the prediction model with random intercept. The models with random intercept discriminate better than the standard model only

  14. Modeling maximum daily temperature using a varying coefficient regression model

    Treesearch

    Han Li; Xinwei Deng; Dong-Yum Kim; Eric P. Smith

    2014-01-01

    Relationships between stream water and air temperatures are often modeled using linear or nonlinear regression methods. Despite a strong relationship between water and air temperatures and a variety of models that are effective for data summarized on a weekly basis, such models did not yield consistently good predictions for summaries such as daily maximum temperature...

  15. The Application of the Cumulative Logistic Regression Model to Automated Essay Scoring

    ERIC Educational Resources Information Center

    Haberman, Shelby J.; Sinharay, Sandip

    2010-01-01

    Most automated essay scoring programs use a linear regression model to predict an essay score from several essay features. This article applied a cumulative logit model instead of the linear regression model to automated essay scoring. Comparison of the performances of the linear regression model and the cumulative logit model was performed on a…

  16. Time series regression model for infectious disease and weather.

    PubMed

    Imai, Chisato; Armstrong, Ben; Chalabi, Zaid; Mangtani, Punam; Hashizume, Masahiro

    2015-10-01

    Time series regression has been developed and long used to evaluate the short-term associations of air pollution and weather with mortality or morbidity of non-infectious diseases. The application of the regression approaches from this tradition to infectious diseases, however, is less well explored and raises some new issues. We discuss and present potential solutions for five issues often arising in such analyses: changes in immune population, strong autocorrelations, a wide range of plausible lag structures and association patterns, seasonality adjustments, and large overdispersion. The potential approaches are illustrated with datasets of cholera cases and rainfall from Bangladesh and influenza and temperature in Tokyo. Though this article focuses on the application of the traditional time series regression to infectious diseases and weather factors, we also briefly introduce alternative approaches, including mathematical modeling, wavelet analysis, and autoregressive integrated moving average (ARIMA) models. Modifications proposed to standard time series regression practice include using sums of past cases as proxies for the immune population, and using the logarithm of lagged disease counts to control autocorrelation due to true contagion, both of which are motivated from "susceptible-infectious-recovered" (SIR) models. The complexity of lag structures and association patterns can often be informed by biological mechanisms and explored by using distributed lag non-linear models. For overdispersed models, alternative distribution models such as quasi-Poisson and negative binomial should be considered. Time series regression can be used to investigate dependence of infectious diseases on weather, but may need modifying to allow for features specific to this context. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.

  17. Linear regression crash prediction models : issues and proposed solutions.

    DOT National Transportation Integrated Search

    2010-05-01

    The paper develops a linear regression model approach that can be applied to : crash data to predict vehicle crashes. The proposed approach involves novice data aggregation : to satisfy linear regression assumptions; namely error structure normality ...

  18. Identifying Factors That Predict Promotion Time to E-4 and Re-Enlistment Eligibility for U.S. Marine Corps Field Radio Operators

    DTIC Science & Technology

    2014-12-01

    Primary Military Occupational Specialty PRO Proficiency Q-Q Quantile - Quantile RSS Residual Sum of Squares SI Shop Information T&R Training and...construct multivariate linear regression models to estimate Marines’ Computed Tier Score and time to achieve E-4 based on their individual personal...Science (GS) score, ASVAB Mathematics Knowledge (MK) score, ASVAB Paragraph Comprehension (PC) score, weight , and whether a Marine receives a weight

  19. Forecasting conditional climate-change using a hybrid approach

    USGS Publications Warehouse

    Esfahani, Akbar Akbari; Friedel, Michael J.

    2014-01-01

    A novel approach is proposed to forecast the likelihood of climate-change across spatial landscape gradients. This hybrid approach involves reconstructing past precipitation and temperature using the self-organizing map technique; determining quantile trends in the climate-change variables by quantile regression modeling; and computing conditional forecasts of climate-change variables based on self-similarity in quantile trends using the fractionally differenced auto-regressive integrated moving average technique. The proposed modeling approach is applied to states (Arizona, California, Colorado, Nevada, New Mexico, and Utah) in the southwestern U.S., where conditional forecasts of climate-change variables are evaluated against recent (2012) observations, evaluated at a future time period (2030), and evaluated as future trends (2009–2059). These results have broad economic, political, and social implications because they quantify uncertainty in climate-change forecasts affecting various sectors of society. Another benefit of the proposed hybrid approach is that it can be extended to any spatiotemporal scale providing self-similarity exists.

  20. Robust geographically weighted regression of modeling the Air Polluter Standard Index (APSI)

    NASA Astrophysics Data System (ADS)

    Warsito, Budi; Yasin, Hasbi; Ispriyanti, Dwi; Hoyyi, Abdul

    2018-05-01

    The Geographically Weighted Regression (GWR) model has been widely applied to many practical fields for exploring spatial heterogenity of a regression model. However, this method is inherently not robust to outliers. Outliers commonly exist in data sets and may lead to a distorted estimate of the underlying regression model. One of solution to handle the outliers in the regression model is to use the robust models. So this model was called Robust Geographically Weighted Regression (RGWR). This research aims to aid the government in the policy making process related to air pollution mitigation by developing a standard index model for air polluter (Air Polluter Standard Index - APSI) based on the RGWR approach. In this research, we also consider seven variables that are directly related to the air pollution level, which are the traffic velocity, the population density, the business center aspect, the air humidity, the wind velocity, the air temperature, and the area size of the urban forest. The best model is determined by the smallest AIC value. There are significance differences between Regression and RGWR in this case, but Basic GWR using the Gaussian kernel is the best model to modeling APSI because it has smallest AIC.

  1. Analysis of Sting Balance Calibration Data Using Optimized Regression Models

    NASA Technical Reports Server (NTRS)

    Ulbrich, N.; Bader, Jon B.

    2010-01-01

    Calibration data of a wind tunnel sting balance was processed using a candidate math model search algorithm that recommends an optimized regression model for the data analysis. During the calibration the normal force and the moment at the balance moment center were selected as independent calibration variables. The sting balance itself had two moment gages. Therefore, after analyzing the connection between calibration loads and gage outputs, it was decided to choose the difference and the sum of the gage outputs as the two responses that best describe the behavior of the balance. The math model search algorithm was applied to these two responses. An optimized regression model was obtained for each response. Classical strain gage balance load transformations and the equations of the deflection of a cantilever beam under load are used to show that the search algorithm s two optimized regression models are supported by a theoretical analysis of the relationship between the applied calibration loads and the measured gage outputs. The analysis of the sting balance calibration data set is a rare example of a situation when terms of a regression model of a balance can directly be derived from first principles of physics. In addition, it is interesting to note that the search algorithm recommended the correct regression model term combinations using only a set of statistical quality metrics that were applied to the experimental data during the algorithm s term selection process.

  2. Prediction models for clustered data: comparison of a random intercept and standard regression model

    PubMed Central

    2013-01-01

    Background When study data are clustered, standard regression analysis is considered inappropriate and analytical techniques for clustered data need to be used. For prediction research in which the interest of predictor effects is on the patient level, random effect regression models are probably preferred over standard regression analysis. It is well known that the random effect parameter estimates and the standard logistic regression parameter estimates are different. Here, we compared random effect and standard logistic regression models for their ability to provide accurate predictions. Methods Using an empirical study on 1642 surgical patients at risk of postoperative nausea and vomiting, who were treated by one of 19 anesthesiologists (clusters), we developed prognostic models either with standard or random intercept logistic regression. External validity of these models was assessed in new patients from other anesthesiologists. We supported our results with simulation studies using intra-class correlation coefficients (ICC) of 5%, 15%, or 30%. Standard performance measures and measures adapted for the clustered data structure were estimated. Results The model developed with random effect analysis showed better discrimination than the standard approach, if the cluster effects were used for risk prediction (standard c-index of 0.69 versus 0.66). In the external validation set, both models showed similar discrimination (standard c-index 0.68 versus 0.67). The simulation study confirmed these results. For datasets with a high ICC (≥15%), model calibration was only adequate in external subjects, if the used performance measure assumed the same data structure as the model development method: standard calibration measures showed good calibration for the standard developed model, calibration measures adapting the clustered data structure showed good calibration for the prediction model with random intercept. Conclusion The models with random intercept discriminate

  3. A Skew-Normal Mixture Regression Model

    ERIC Educational Resources Information Center

    Liu, Min; Lin, Tsung-I

    2014-01-01

    A challenge associated with traditional mixture regression models (MRMs), which rest on the assumption of normally distributed errors, is determining the number of unobserved groups. Specifically, even slight deviations from normality can lead to the detection of spurious classes. The current work aims to (a) examine how sensitive the commonly…

  4. Detecting influential observations in nonlinear regression modeling of groundwater flow

    USGS Publications Warehouse

    Yager, Richard M.

    1998-01-01

    Nonlinear regression is used to estimate optimal parameter values in models of groundwater flow to ensure that differences between predicted and observed heads and flows do not result from nonoptimal parameter values. Parameter estimates can be affected, however, by observations that disproportionately influence the regression, such as outliers that exert undue leverage on the objective function. Certain statistics developed for linear regression can be used to detect influential observations in nonlinear regression if the models are approximately linear. This paper discusses the application of Cook's D, which measures the effect of omitting a single observation on a set of estimated parameter values, and the statistical parameter DFBETAS, which quantifies the influence of an observation on each parameter. The influence statistics were used to (1) identify the influential observations in the calibration of a three-dimensional, groundwater flow model of a fractured-rock aquifer through nonlinear regression, and (2) quantify the effect of omitting influential observations on the set of estimated parameter values. Comparison of the spatial distribution of Cook's D with plots of model sensitivity shows that influential observations correspond to areas where the model heads are most sensitive to certain parameters, and where predicted groundwater flow rates are largest. Five of the six discharge observations were identified as influential, indicating that reliable measurements of groundwater flow rates are valuable data in model calibration. DFBETAS are computed and examined for an alternative model of the aquifer system to identify a parameterization error in the model design that resulted in overestimation of the effect of anisotropy on horizontal hydraulic conductivity.

  5. Hidden Connections between Regression Models of Strain-Gage Balance Calibration Data

    NASA Technical Reports Server (NTRS)

    Ulbrich, Norbert

    2013-01-01

    Hidden connections between regression models of wind tunnel strain-gage balance calibration data are investigated. These connections become visible whenever balance calibration data is supplied in its design format and both the Iterative and Non-Iterative Method are used to process the data. First, it is shown how the regression coefficients of the fitted balance loads of a force balance can be approximated by using the corresponding regression coefficients of the fitted strain-gage outputs. Then, data from the manual calibration of the Ames MK40 six-component force balance is chosen to illustrate how estimates of the regression coefficients of the fitted balance loads can be obtained from the regression coefficients of the fitted strain-gage outputs. The study illustrates that load predictions obtained by applying the Iterative or the Non-Iterative Method originate from two related regression solutions of the balance calibration data as long as balance loads are given in the design format of the balance, gage outputs behave highly linear, strict statistical quality metrics are used to assess regression models of the data, and regression model term combinations of the fitted loads and gage outputs can be obtained by a simple variable exchange.

  6. Maximum Entropy Discrimination Poisson Regression for Software Reliability Modeling.

    PubMed

    Chatzis, Sotirios P; Andreou, Andreas S

    2015-11-01

    Reliably predicting software defects is one of the most significant tasks in software engineering. Two of the major components of modern software reliability modeling approaches are: 1) extraction of salient features for software system representation, based on appropriately designed software metrics and 2) development of intricate regression models for count data, to allow effective software reliability data modeling and prediction. Surprisingly, research in the latter frontier of count data regression modeling has been rather limited. More specifically, a lack of simple and efficient algorithms for posterior computation has made the Bayesian approaches appear unattractive, and thus underdeveloped in the context of software reliability modeling. In this paper, we try to address these issues by introducing a novel Bayesian regression model for count data, based on the concept of max-margin data modeling, effected in the context of a fully Bayesian model treatment with simple and efficient posterior distribution updates. Our novel approach yields a more discriminative learning technique, making more effective use of our training data during model inference. In addition, it allows of better handling uncertainty in the modeled data, which can be a significant problem when the training data are limited. We derive elegant inference algorithms for our model under the mean-field paradigm and exhibit its effectiveness using the publicly available benchmark data sets.

  7. Modelling fourier regression for time series data- a case study: modelling inflation in foods sector in Indonesia

    NASA Astrophysics Data System (ADS)

    Prahutama, Alan; Suparti; Wahyu Utami, Tiani

    2018-03-01

    Regression analysis is an analysis to model the relationship between response variables and predictor variables. The parametric approach to the regression model is very strict with the assumption, but nonparametric regression model isn’t need assumption of model. Time series data is the data of a variable that is observed based on a certain time, so if the time series data wanted to be modeled by regression, then we should determined the response and predictor variables first. Determination of the response variable in time series is variable in t-th (yt), while the predictor variable is a significant lag. In nonparametric regression modeling, one developing approach is to use the Fourier series approach. One of the advantages of nonparametric regression approach using Fourier series is able to overcome data having trigonometric distribution. In modeling using Fourier series needs parameter of K. To determine the number of K can be used Generalized Cross Validation method. In inflation modeling for the transportation sector, communication and financial services using Fourier series yields an optimal K of 120 parameters with R-square 99%. Whereas if it was modeled by multiple linear regression yield R-square 90%.

  8. A Spline Regression Model for Latent Variables

    ERIC Educational Resources Information Center

    Harring, Jeffrey R.

    2014-01-01

    Spline (or piecewise) regression models have been used in the past to account for patterns in observed data that exhibit distinct phases. The changepoint or knot marking the shift from one phase to the other, in many applications, is an unknown parameter to be estimated. As an extension of this framework, this research considers modeling the…

  9. Predicting recycling behaviour: Comparison of a linear regression model and a fuzzy logic model.

    PubMed

    Vesely, Stepan; Klöckner, Christian A; Dohnal, Mirko

    2016-03-01

    In this paper we demonstrate that fuzzy logic can provide a better tool for predicting recycling behaviour than the customarily used linear regression. To show this, we take a set of empirical data on recycling behaviour (N=664), which we randomly divide into two halves. The first half is used to estimate a linear regression model of recycling behaviour, and to develop a fuzzy logic model of recycling behaviour. As the first comparison, the fit of both models to the data included in estimation of the models (N=332) is evaluated. As the second comparison, predictive accuracy of both models for "new" cases (hold-out data not included in building the models, N=332) is assessed. In both cases, the fuzzy logic model significantly outperforms the regression model in terms of fit. To conclude, when accurate predictions of recycling and possibly other environmental behaviours are needed, fuzzy logic modelling seems to be a promising technique. Copyright © 2015 Elsevier Ltd. All rights reserved.

  10. Classification of Satellite Derived Chlorophyll a Space-Time Series by Means of Quantile Regression: An Application to the Adriatic Sea

    NASA Astrophysics Data System (ADS)

    Girardi, P.; Pastres, R.; Gaetan, C.; Mangin, A.; Taji, M. A.

    2015-12-01

    In this paper, we present the results of a classification of Adriatic waters, based on spatial time series of remotely sensed Chlorophyll type-a. The study was carried out using a clustering procedure combining quantile smoothing and an agglomerative clustering algorithms. The smoothing function includes a seasonal term, thus allowing one to classify areas according to “similar” seasonal evolution, as well as according to “similar” trends. This methodology, which is here applied for the first time to Ocean Colour data, is more robust with respect to other classical methods, as it does not require any assumption on the probability distribution of the data. This approach was applied to the classification of an eleven year long time series, from January 2002 to December 2012, of monthly values of Chlorophyll type-a concentrations covering the whole Adriatic Sea. The data set was made available by ACRI (http://hermes.acri.fr) in the framework of the Glob-Colour Project (http://www.globcolour.info). Data were obtained by calibrating Ocean Colour data provided by different satellite missions, such as MERIS, SeaWiFS and MODIS. The results clearly show the presence of North-South and West-East gradient in the level of Chlorophyll, which is consistent with literature findings. This analysis could provide a sound basis for the identification of “water bodies” and of Chlorophyll type-a thresholds which define their Good Ecological Status, in terms of trophic level, as required by the implementation of the Marine Strategy Framework Directive. The forthcoming availability of Sentinel-3 OLCI data, in continuity of the previous missions, and with perspective of more than a 15-year monitoring system, offers a real opportunity of expansion of our study as a strong support to the implementation of both the EU Marine Strategy Framework Directive and the UNEP-MAP Ecosystem Approach in the Mediterranean.

  11. Predicting Quantitative Traits With Regression Models for Dense Molecular Markers and Pedigree

    PubMed Central

    de los Campos, Gustavo; Naya, Hugo; Gianola, Daniel; Crossa, José; Legarra, Andrés; Manfredi, Eduardo; Weigel, Kent; Cotes, José Miguel

    2009-01-01

    The availability of genomewide dense markers brings opportunities and challenges to breeding programs. An important question concerns the ways in which dense markers and pedigrees, together with phenotypic records, should be used to arrive at predictions of genetic values for complex traits. If a large number of markers are included in a regression model, marker-specific shrinkage of regression coefficients may be needed. For this reason, the Bayesian least absolute shrinkage and selection operator (LASSO) (BL) appears to be an interesting approach for fitting marker effects in a regression model. This article adapts the BL to arrive at a regression model where markers, pedigrees, and covariates other than markers are considered jointly. Connections between BL and other marker-based regression models are discussed, and the sensitivity of BL with respect to the choice of prior distributions assigned to key parameters is evaluated using simulation. The proposed model was fitted to two data sets from wheat and mouse populations, and evaluated using cross-validation methods. Results indicate that inclusion of markers in the regression further improved the predictive ability of models. An R program that implements the proposed model is freely available. PMID:19293140

  12. Adjusted variable plots for Cox's proportional hazards regression model.

    PubMed

    Hall, C B; Zeger, S L; Bandeen-Roche, K J

    1996-01-01

    Adjusted variable plots are useful in linear regression for outlier detection and for qualitative evaluation of the fit of a model. In this paper, we extend adjusted variable plots to Cox's proportional hazards model for possibly censored survival data. We propose three different plots: a risk level adjusted variable (RLAV) plot in which each observation in each risk set appears, a subject level adjusted variable (SLAV) plot in which each subject is represented by one point, and an event level adjusted variable (ELAV) plot in which the entire risk set at each failure event is represented by a single point. The latter two plots are derived from the RLAV by combining multiple points. In each point, the regression coefficient and standard error from a Cox proportional hazards regression is obtained by a simple linear regression through the origin fit to the coordinates of the pictured points. The plots are illustrated with a reanalysis of a dataset of 65 patients with multiple myeloma.

  13. Attributing uncertainty in streamflow simulations due to variable inputs via the Quantile Flow Deviation metric

    NASA Astrophysics Data System (ADS)

    Shoaib, Syed Abu; Marshall, Lucy; Sharma, Ashish

    2018-06-01

    Every model to characterise a real world process is affected by uncertainty. Selecting a suitable model is a vital aspect of engineering planning and design. Observation or input errors make the prediction of modelled responses more uncertain. By way of a recently developed attribution metric, this study is aimed at developing a method for analysing variability in model inputs together with model structure variability to quantify their relative contributions in typical hydrological modelling applications. The Quantile Flow Deviation (QFD) metric is used to assess these alternate sources of uncertainty. The Australian Water Availability Project (AWAP) precipitation data for four different Australian catchments is used to analyse the impact of spatial rainfall variability on simulated streamflow variability via the QFD. The QFD metric attributes the variability in flow ensembles to uncertainty associated with the selection of a model structure and input time series. For the case study catchments, the relative contribution of input uncertainty due to rainfall is higher than that due to potential evapotranspiration, and overall input uncertainty is significant compared to model structure and parameter uncertainty. Overall, this study investigates the propagation of input uncertainty in a daily streamflow modelling scenario and demonstrates how input errors manifest across different streamflow magnitudes.

  14. Analysis of Sting Balance Calibration Data Using Optimized Regression Models

    NASA Technical Reports Server (NTRS)

    Ulbrich, Norbert; Bader, Jon B.

    2009-01-01

    Calibration data of a wind tunnel sting balance was processed using a search algorithm that identifies an optimized regression model for the data analysis. The selected sting balance had two moment gages that were mounted forward and aft of the balance moment center. The difference and the sum of the two gage outputs were fitted in the least squares sense using the normal force and the pitching moment at the balance moment center as independent variables. The regression model search algorithm predicted that the difference of the gage outputs should be modeled using the intercept and the normal force. The sum of the two gage outputs, on the other hand, should be modeled using the intercept, the pitching moment, and the square of the pitching moment. Equations of the deflection of a cantilever beam are used to show that the search algorithm s two recommended math models can also be obtained after performing a rigorous theoretical analysis of the deflection of the sting balance under load. The analysis of the sting balance calibration data set is a rare example of a situation when regression models of balance calibration data can directly be derived from first principles of physics and engineering. In addition, it is interesting to see that the search algorithm recommended the same regression models for the data analysis using only a set of statistical quality metrics.

  15. Optimization of Regression Models of Experimental Data Using Confirmation Points

    NASA Technical Reports Server (NTRS)

    Ulbrich, N.

    2010-01-01

    A new search metric is discussed that may be used to better assess the predictive capability of different math term combinations during the optimization of a regression model of experimental data. The new search metric can be determined for each tested math term combination if the given experimental data set is split into two subsets. The first subset consists of data points that are only used to determine the coefficients of the regression model. The second subset consists of confirmation points that are exclusively used to test the regression model. The new search metric value is assigned after comparing two values that describe the quality of the fit of each subset. The first value is the standard deviation of the PRESS residuals of the data points. The second value is the standard deviation of the response residuals of the confirmation points. The greater of the two values is used as the new search metric value. This choice guarantees that both standard deviations are always less or equal to the value that is used during the optimization. Experimental data from the calibration of a wind tunnel strain-gage balance is used to illustrate the application of the new search metric. The new search metric ultimately generates an optimized regression model that was already tested at regression model independent confirmation points before it is ever used to predict an unknown response from a set of regressors.

  16. Modeling Fire Occurrence at the City Scale: A Comparison between Geographically Weighted Regression and Global Linear Regression.

    PubMed

    Song, Chao; Kwan, Mei-Po; Zhu, Jiping

    2017-04-08

    An increasing number of fires are occurring with the rapid development of cities, resulting in increased risk for human beings and the environment. This study compares geographically weighted regression-based models, including geographically weighted regression (GWR) and geographically and temporally weighted regression (GTWR), which integrates spatial and temporal effects and global linear regression models (LM) for modeling fire risk at the city scale. The results show that the road density and the spatial distribution of enterprises have the strongest influences on fire risk, which implies that we should focus on areas where roads and enterprises are densely clustered. In addition, locations with a large number of enterprises have fewer fire ignition records, probably because of strict management and prevention measures. A changing number of significant variables across space indicate that heterogeneity mainly exists in the northern and eastern rural and suburban areas of Hefei city, where human-related facilities or road construction are only clustered in the city sub-centers. GTWR can capture small changes in the spatiotemporal heterogeneity of the variables while GWR and LM cannot. An approach that integrates space and time enables us to better understand the dynamic changes in fire risk. Thus governments can use the results to manage fire safety at the city scale.

  17. Modeling Fire Occurrence at the City Scale: A Comparison between Geographically Weighted Regression and Global Linear Regression

    PubMed Central

    Song, Chao; Kwan, Mei-Po; Zhu, Jiping

    2017-01-01

    An increasing number of fires are occurring with the rapid development of cities, resulting in increased risk for human beings and the environment. This study compares geographically weighted regression-based models, including geographically weighted regression (GWR) and geographically and temporally weighted regression (GTWR), which integrates spatial and temporal effects and global linear regression models (LM) for modeling fire risk at the city scale. The results show that the road density and the spatial distribution of enterprises have the strongest influences on fire risk, which implies that we should focus on areas where roads and enterprises are densely clustered. In addition, locations with a large number of enterprises have fewer fire ignition records, probably because of strict management and prevention measures. A changing number of significant variables across space indicate that heterogeneity mainly exists in the northern and eastern rural and suburban areas of Hefei city, where human-related facilities or road construction are only clustered in the city sub-centers. GTWR can capture small changes in the spatiotemporal heterogeneity of the variables while GWR and LM cannot. An approach that integrates space and time enables us to better understand the dynamic changes in fire risk. Thus governments can use the results to manage fire safety at the city scale. PMID:28397745

  18. Deep ensemble learning of sparse regression models for brain disease diagnosis

    PubMed Central

    Suk, Heung-Il; Lee, Seong-Whan; Shen, Dinggang

    2018-01-01

    Recent studies on brain imaging analysis witnessed the core roles of machine learning techniques in computer-assisted intervention for brain disease diagnosis. Of various machine-learning techniques, sparse regression models have proved their effectiveness in handling high-dimensional data but with a small number of training samples, especially in medical problems. In the meantime, deep learning methods have been making great successes by outperforming the state-of-the-art performances in various applications. In this paper, we propose a novel framework that combines the two conceptually different methods of sparse regression and deep learning for Alzheimer’s disease/mild cognitive impairment diagnosis and prognosis. Specifically, we first train multiple sparse regression models, each of which is trained with different values of a regularization control parameter. Thus, our multiple sparse regression models potentially select different feature subsets from the original feature set; thereby they have different powers to predict the response values, i.e., clinical label and clinical scores in our work. By regarding the response values from our sparse regression models as target-level representations, we then build a deep convolutional neural network for clinical decision making, which thus we call ‘ Deep Ensemble Sparse Regression Network.’ To our best knowledge, this is the first work that combines sparse regression models with deep neural network. In our experiments with the ADNI cohort, we validated the effectiveness of the proposed method by achieving the highest diagnostic accuracies in three classification tasks. We also rigorously analyzed our results and compared with the previous studies on the ADNI cohort in the literature. PMID:28167394

  19. Rapid performance modeling and parameter regression of geodynamic models

    NASA Astrophysics Data System (ADS)

    Brown, J.; Duplyakin, D.

    2016-12-01

    Geodynamic models run in a parallel environment have many parameters with complicated effects on performance and scientifically-relevant functionals. Manually choosing an efficient machine configuration and mapping out the parameter space requires a great deal of expert knowledge and time-consuming experiments. We propose an active learning technique based on Gaussion Process Regression to automatically select experiments to map out the performance landscape with respect to scientific and machine parameters. The resulting performance model is then used to select optimal experiments for improving the accuracy of a reduced order model per unit of computational cost. We present the framework and evaluate its quality and capability using popular lithospheric dynamics models.

  20. Analyzing industrial energy use through ordinary least squares regression models

    NASA Astrophysics Data System (ADS)

    Golden, Allyson Katherine

    Extensive research has been performed using regression analysis and calibrated simulations to create baseline energy consumption models for residential buildings and commercial institutions. However, few attempts have been made to discuss the applicability of these methodologies to establish baseline energy consumption models for industrial manufacturing facilities. In the few studies of industrial facilities, the presented linear change-point and degree-day regression analyses illustrate ideal cases. It follows that there is a need in the established literature to discuss the methodologies and to determine their applicability for establishing baseline energy consumption models of industrial manufacturing facilities. The thesis determines the effectiveness of simple inverse linear statistical regression models when establishing baseline energy consumption models for industrial manufacturing facilities. Ordinary least squares change-point and degree-day regression methods are used to create baseline energy consumption models for nine different case studies of industrial manufacturing facilities located in the southeastern United States. The influence of ambient dry-bulb temperature and production on total facility energy consumption is observed. The energy consumption behavior of industrial manufacturing facilities is only sometimes sufficiently explained by temperature, production, or a combination of the two variables. This thesis also provides methods for generating baseline energy models that are straightforward and accessible to anyone in the industrial manufacturing community. The methods outlined in this thesis may be easily replicated by anyone that possesses basic spreadsheet software and general knowledge of the relationship between energy consumption and weather, production, or other influential variables. With the help of simple inverse linear regression models, industrial manufacturing facilities may better understand their energy consumption and

  1. Modeling absolute differences in life expectancy with a censored skew-normal regression approach

    PubMed Central

    Clough-Gorr, Kerri; Zwahlen, Marcel

    2015-01-01

    Parameter estimates from commonly used multivariable parametric survival regression models do not directly quantify differences in years of life expectancy. Gaussian linear regression models give results in terms of absolute mean differences, but are not appropriate in modeling life expectancy, because in many situations time to death has a negative skewed distribution. A regression approach using a skew-normal distribution would be an alternative to parametric survival models in the modeling of life expectancy, because parameter estimates can be interpreted in terms of survival time differences while allowing for skewness of the distribution. In this paper we show how to use the skew-normal regression so that censored and left-truncated observations are accounted for. With this we model differences in life expectancy using data from the Swiss National Cohort Study and from official life expectancy estimates and compare the results with those derived from commonly used survival regression models. We conclude that a censored skew-normal survival regression approach for left-truncated observations can be used to model differences in life expectancy across covariates of interest. PMID:26339544

  2. Validation of a heteroscedastic hazards regression model.

    PubMed

    Wu, Hong-Dar Isaac; Hsieh, Fushing; Chen, Chen-Hsin

    2002-03-01

    A Cox-type regression model accommodating heteroscedasticity, with a power factor of the baseline cumulative hazard, is investigated for analyzing data with crossing hazards behavior. Since the approach of partial likelihood cannot eliminate the baseline hazard, an overidentified estimating equation (OEE) approach is introduced in the estimation procedure. It by-product, a model checking statistic, is presented to test for the overall adequacy of the heteroscedastic model. Further, under the heteroscedastic model setting, we propose two statistics to test the proportional hazards assumption. Implementation of this model is illustrated in a data analysis of a cancer clinical trial.

  3. Linear regression models for solvent accessibility prediction in proteins.

    PubMed

    Wagner, Michael; Adamczak, Rafał; Porollo, Aleksey; Meller, Jarosław

    2005-04-01

    The relative solvent accessibility (RSA) of an amino acid residue in a protein structure is a real number that represents the solvent exposed surface area of this residue in relative terms. The problem of predicting the RSA from the primary amino acid sequence can therefore be cast as a regression problem. Nevertheless, RSA prediction has so far typically been cast as a classification problem. Consequently, various machine learning techniques have been used within the classification framework to predict whether a given amino acid exceeds some (arbitrary) RSA threshold and would thus be predicted to be "exposed," as opposed to "buried." We have recently developed novel methods for RSA prediction using nonlinear regression techniques which provide accurate estimates of the real-valued RSA and outperform classification-based approaches with respect to commonly used two-class projections. However, while their performance seems to provide a significant improvement over previously published approaches, these Neural Network (NN) based methods are computationally expensive to train and involve several thousand parameters. In this work, we develop alternative regression models for RSA prediction which are computationally much less expensive, involve orders-of-magnitude fewer parameters, and are still competitive in terms of prediction quality. In particular, we investigate several regression models for RSA prediction using linear L1-support vector regression (SVR) approaches as well as standard linear least squares (LS) regression. Using rigorously derived validation sets of protein structures and extensive cross-validation analysis, we compare the performance of the SVR with that of LS regression and NN-based methods. In particular, we show that the flexibility of the SVR (as encoded by metaparameters such as the error insensitivity and the error penalization terms) can be very beneficial to optimize the prediction accuracy for buried residues. We conclude that the simple

  4. Inferring gene regression networks with model trees

    PubMed Central

    2010-01-01

    Background Novel strategies are required in order to handle the huge amount of data produced by microarray technologies. To infer gene regulatory networks, the first step is to find direct regulatory relationships between genes building the so-called gene co-expression networks. They are typically generated using correlation statistics as pairwise similarity measures. Correlation-based methods are very useful in order to determine whether two genes have a strong global similarity but do not detect local similarities. Results We propose model trees as a method to identify gene interaction networks. While correlation-based methods analyze each pair of genes, in our approach we generate a single regression tree for each gene from the remaining genes. Finally, a graph from all the relationships among output and input genes is built taking into account whether the pair of genes is statistically significant. For this reason we apply a statistical procedure to control the false discovery rate. The performance of our approach, named REGNET, is experimentally tested on two well-known data sets: Saccharomyces Cerevisiae and E.coli data set. First, the biological coherence of the results are tested. Second the E.coli transcriptional network (in the Regulon database) is used as control to compare the results to that of a correlation-based method. This experiment shows that REGNET performs more accurately at detecting true gene associations than the Pearson and Spearman zeroth and first-order correlation-based methods. Conclusions REGNET generates gene association networks from gene expression data, and differs from correlation-based methods in that the relationship between one gene and others is calculated simultaneously. Model trees are very useful techniques to estimate the numerical values for the target genes by linear regression functions. They are very often more precise than linear regression models because they can add just different linear regressions to separate

  5. Modeling Polytomous Item Responses Using Simultaneously Estimated Multinomial Logistic Regression Models

    ERIC Educational Resources Information Center

    Anderson, Carolyn J.; Verkuilen, Jay; Peyton, Buddy L.

    2010-01-01

    Survey items with multiple response categories and multiple-choice test questions are ubiquitous in psychological and educational research. We illustrate the use of log-multiplicative association (LMA) models that are extensions of the well-known multinomial logistic regression model for multiple dependent outcome variables to reanalyze a set of…

  6. Deep ensemble learning of sparse regression models for brain disease diagnosis.

    PubMed

    Suk, Heung-Il; Lee, Seong-Whan; Shen, Dinggang

    2017-04-01

    Recent studies on brain imaging analysis witnessed the core roles of machine learning techniques in computer-assisted intervention for brain disease diagnosis. Of various machine-learning techniques, sparse regression models have proved their effectiveness in handling high-dimensional data but with a small number of training samples, especially in medical problems. In the meantime, deep learning methods have been making great successes by outperforming the state-of-the-art performances in various applications. In this paper, we propose a novel framework that combines the two conceptually different methods of sparse regression and deep learning for Alzheimer's disease/mild cognitive impairment diagnosis and prognosis. Specifically, we first train multiple sparse regression models, each of which is trained with different values of a regularization control parameter. Thus, our multiple sparse regression models potentially select different feature subsets from the original feature set; thereby they have different powers to predict the response values, i.e., clinical label and clinical scores in our work. By regarding the response values from our sparse regression models as target-level representations, we then build a deep convolutional neural network for clinical decision making, which thus we call 'Deep Ensemble Sparse Regression Network.' To our best knowledge, this is the first work that combines sparse regression models with deep neural network. In our experiments with the ADNI cohort, we validated the effectiveness of the proposed method by achieving the highest diagnostic accuracies in three classification tasks. We also rigorously analyzed our results and compared with the previous studies on the ADNI cohort in the literature. Copyright © 2017 Elsevier B.V. All rights reserved.

  7. A computational approach to compare regression modelling strategies in prediction research.

    PubMed

    Pajouheshnia, Romin; Pestman, Wiebe R; Teerenstra, Steven; Groenwold, Rolf H H

    2016-08-25

    It is often unclear which approach to fit, assess and adjust a model will yield the most accurate prediction model. We present an extension of an approach for comparing modelling strategies in linear regression to the setting of logistic regression and demonstrate its application in clinical prediction research. A framework for comparing logistic regression modelling strategies by their likelihoods was formulated using a wrapper approach. Five different strategies for modelling, including simple shrinkage methods, were compared in four empirical data sets to illustrate the concept of a priori strategy comparison. Simulations were performed in both randomly generated data and empirical data to investigate the influence of data characteristics on strategy performance. We applied the comparison framework in a case study setting. Optimal strategies were selected based on the results of a priori comparisons in a clinical data set and the performance of models built according to each strategy was assessed using the Brier score and calibration plots. The performance of modelling strategies was highly dependent on the characteristics of the development data in both linear and logistic regression settings. A priori comparisons in four empirical data sets found that no strategy consistently outperformed the others. The percentage of times that a model adjustment strategy outperformed a logistic model ranged from 3.9 to 94.9 %, depending on the strategy and data set. However, in our case study setting the a priori selection of optimal methods did not result in detectable improvement in model performance when assessed in an external data set. The performance of prediction modelling strategies is a data-dependent process and can be highly variable between data sets within the same clinical domain. A priori strategy comparison can be used to determine an optimal logistic regression modelling strategy for a given data set before selecting a final modelling approach.

  8. A generalized right truncated bivariate Poisson regression model with applications to health data.

    PubMed

    Islam, M Ataharul; Chowdhury, Rafiqul I

    2017-01-01

    A generalized right truncated bivariate Poisson regression model is proposed in this paper. Estimation and tests for goodness of fit and over or under dispersion are illustrated for both untruncated and right truncated bivariate Poisson regression models using marginal-conditional approach. Estimation and test procedures are illustrated for bivariate Poisson regression models with applications to Health and Retirement Study data on number of health conditions and the number of health care services utilized. The proposed test statistics are easy to compute and it is evident from the results that the models fit the data very well. A comparison between the right truncated and untruncated bivariate Poisson regression models using the test for nonnested models clearly shows that the truncated model performs significantly better than the untruncated model.

  9. Procedures for adjusting regional regression models of urban-runoff quality using local data

    USGS Publications Warehouse

    Hoos, A.B.; Sisolak, J.K.

    1993-01-01

    Statistical operations termed model-adjustment procedures (MAP?s) can be used to incorporate local data into existing regression models to improve the prediction of urban-runoff quality. Each MAP is a form of regression analysis in which the local data base is used as a calibration data set. Regression coefficients are determined from the local data base, and the resulting `adjusted? regression models can then be used to predict storm-runoff quality at unmonitored sites. The response variable in the regression analyses is the observed load or mean concentration of a constituent in storm runoff for a single storm. The set of explanatory variables used in the regression analyses is different for each MAP, but always includes the predicted value of load or mean concentration from a regional regression model. The four MAP?s examined in this study were: single-factor regression against the regional model prediction, P, (termed MAP-lF-P), regression against P,, (termed MAP-R-P), regression against P, and additional local variables (termed MAP-R-P+nV), and a weighted combination of P, and a local-regression prediction (termed MAP-W). The procedures were tested by means of split-sample analysis, using data from three cities included in the Nationwide Urban Runoff Program: Denver, Colorado; Bellevue, Washington; and Knoxville, Tennessee. The MAP that provided the greatest predictive accuracy for the verification data set differed among the three test data bases and among model types (MAP-W for Denver and Knoxville, MAP-lF-P and MAP-R-P for Bellevue load models, and MAP-R-P+nV for Bellevue concentration models) and, in many cases, was not clearly indicated by the values of standard error of estimate for the calibration data set. A scheme to guide MAP selection, based on exploratory data analysis of the calibration data set, is presented and tested. The MAP?s were tested for sensitivity to the size of a calibration data set. As expected, predictive accuracy of all MAP?s for

  10. Climate variations and salmonellosis transmission in Adelaide, South Australia: a comparison between regression models

    NASA Astrophysics Data System (ADS)

    Zhang, Ying; Bi, Peng; Hiller, Janet

    2008-01-01

    This is the first study to identify appropriate regression models for the association between climate variation and salmonellosis transmission. A comparison between different regression models was conducted using surveillance data in Adelaide, South Australia. By using notified salmonellosis cases and climatic variables from the Adelaide metropolitan area over the period 1990-2003, four regression methods were examined: standard Poisson regression, autoregressive adjusted Poisson regression, multiple linear regression, and a seasonal autoregressive integrated moving average (SARIMA) model. Notified salmonellosis cases in 2004 were used to test the forecasting ability of the four models. Parameter estimation, goodness-of-fit and forecasting ability of the four regression models were compared. Temperatures occurring 2 weeks prior to cases were positively associated with cases of salmonellosis. Rainfall was also inversely related to the number of cases. The comparison of the goodness-of-fit and forecasting ability suggest that the SARIMA model is better than the other three regression models. Temperature and rainfall may be used as climatic predictors of salmonellosis cases in regions with climatic characteristics similar to those of Adelaide. The SARIMA model could, thus, be adopted to quantify the relationship between climate variations and salmonellosis transmission.

  11. Effects of export concentration on CO2 emissions in developed countries: an empirical analysis.

    PubMed

    Apergis, Nicholas; Can, Muhlis; Gozgor, Giray; Lau, Chi Keung Marco

    2018-03-08

    This paper provides the evidence on the short- and the long-run effects of the export product concentration on the level of CO 2 emissions in 19 developed (high-income) economies, spanning the period 1962-2010. To this end, the paper makes use of the nonlinear panel unit root and cointegration tests with multiple endogenous structural breaks. It also considers the mean group estimations, the autoregressive distributed lag model, and the panel quantile regression estimations. The findings illustrate that the environmental Kuznets curve (EKC) hypothesis is valid in the panel dataset of 19 developed economies. In addition, it documents that a higher level of the product concentration of exports leads to lower CO 2 emissions. The results from the panel quantile regressions also indicate that the effect of the export product concentration upon the per capita CO 2 emissions is relatively high at the higher quantiles.

  12. The Application of Censored Regression Models in Low Streamflow Analyses

    NASA Astrophysics Data System (ADS)

    Kroll, C.; Luz, J.

    2003-12-01

    Estimation of low streamflow statistics at gauged and ungauged river sites is often a daunting task. This process is further confounded by the presence of intermittent streamflows, where streamflow is sometimes reported as zero, within a region. Streamflows recorded as zero may be zero, or may be less than the measurement detection limit. Such data is often referred to as censored data. Numerous methods have been developed to characterize intermittent streamflow series. Logit regression has been proposed to develop regional models of the probability annual lowflows series (such as 7-day lowflows) are zero. In addition, Tobit regression, a method of regression that allows for censored dependent variables, has been proposed for lowflow regional regression models in regions where the lowflow statistic of interest estimated as zero at some sites in the region. While these methods have been proposed, their use in practice has been limited. Here a delete-one jackknife simulation is presented to examine the performance of Logit and Tobit models of 7-day annual minimum flows in 6 USGS water resource regions in the United States. For the Logit model, an assessment is made of whether sites are correctly classified as having at least 10% of 7-day annual lowflows equal to zero. In such a situation, the 7-day, 10-year lowflow (Q710), a commonly employed low streamflow statistic, would be reported as zero. For the Tobit model, a comparison is made between results from the Tobit model, and from performing either ordinary least squares (OLS) or principal component regression (PCR) after the zero sites are dropped from the analysis. Initial results for the Logit model indicate this method to have a high probability of correctly classifying sites into groups with Q710s as zero and non-zero. Initial results also indicate the Tobit model produces better results than PCR and OLS when more than 5% of the sites in the region have Q710 values calculated as zero.

  13. Regression Model Term Selection for the Analysis of Strain-Gage Balance Calibration Data

    NASA Technical Reports Server (NTRS)

    Ulbrich, Norbert Manfred; Volden, Thomas R.

    2010-01-01

    The paper discusses the selection of regression model terms for the analysis of wind tunnel strain-gage balance calibration data. Different function class combinations are presented that may be used to analyze calibration data using either a non-iterative or an iterative method. The role of the intercept term in a regression model of calibration data is reviewed. In addition, useful algorithms and metrics originating from linear algebra and statistics are recommended that will help an analyst (i) to identify and avoid both linear and near-linear dependencies between regression model terms and (ii) to make sure that the selected regression model of the calibration data uses only statistically significant terms. Three different tests are suggested that may be used to objectively assess the predictive capability of the final regression model of the calibration data. These tests use both the original data points and regression model independent confirmation points. Finally, data from a simplified manual calibration of the Ames MK40 balance is used to illustrate the application of some of the metrics and tests to a realistic calibration data set.

  14. Exact Analysis of Squared Cross-Validity Coefficient in Predictive Regression Models

    ERIC Educational Resources Information Center

    Shieh, Gwowen

    2009-01-01

    In regression analysis, the notion of population validity is of theoretical interest for describing the usefulness of the underlying regression model, whereas the presumably more important concept of population cross-validity represents the predictive effectiveness for the regression equation in future research. It appears that the inference…

  15. Linear regression metamodeling as a tool to summarize and present simulation model results.

    PubMed

    Jalal, Hawre; Dowd, Bryan; Sainfort, François; Kuntz, Karen M

    2013-10-01

    Modelers lack a tool to systematically and clearly present complex model results, including those from sensitivity analyses. The objective was to propose linear regression metamodeling as a tool to increase transparency of decision analytic models and better communicate their results. We used a simplified cancer cure model to demonstrate our approach. The model computed the lifetime cost and benefit of 3 treatment options for cancer patients. We simulated 10,000 cohorts in a probabilistic sensitivity analysis (PSA) and regressed the model outcomes on the standardized input parameter values in a set of regression analyses. We used the regression coefficients to describe measures of sensitivity analyses, including threshold and parameter sensitivity analyses. We also compared the results of the PSA to deterministic full-factorial and one-factor-at-a-time designs. The regression intercept represented the estimated base-case outcome, and the other coefficients described the relative parameter uncertainty in the model. We defined simple relationships that compute the average and incremental net benefit of each intervention. Metamodeling produced outputs similar to traditional deterministic 1-way or 2-way sensitivity analyses but was more reliable since it used all parameter values. Linear regression metamodeling is a simple, yet powerful, tool that can assist modelers in communicating model characteristics and sensitivity analyses.

  16. Developing and testing a global-scale regression model to quantify mean annual streamflow

    NASA Astrophysics Data System (ADS)

    Barbarossa, Valerio; Huijbregts, Mark A. J.; Hendriks, A. Jan; Beusen, Arthur H. W.; Clavreul, Julie; King, Henry; Schipper, Aafke M.

    2017-01-01

    Quantifying mean annual flow of rivers (MAF) at ungauged sites is essential for assessments of global water supply, ecosystem integrity and water footprints. MAF can be quantified with spatially explicit process-based models, which might be overly time-consuming and data-intensive for this purpose, or with empirical regression models that predict MAF based on climate and catchment characteristics. Yet, regression models have mostly been developed at a regional scale and the extent to which they can be extrapolated to other regions is not known. In this study, we developed a global-scale regression model for MAF based on a dataset unprecedented in size, using observations of discharge and catchment characteristics from 1885 catchments worldwide, measuring between 2 and 106 km2. In addition, we compared the performance of the regression model with the predictive ability of the spatially explicit global hydrological model PCR-GLOBWB by comparing results from both models to independent measurements. We obtained a regression model explaining 89% of the variance in MAF based on catchment area and catchment averaged mean annual precipitation and air temperature, slope and elevation. The regression model performed better than PCR-GLOBWB for the prediction of MAF, as root-mean-square error (RMSE) values were lower (0.29-0.38 compared to 0.49-0.57) and the modified index of agreement (d) was higher (0.80-0.83 compared to 0.72-0.75). Our regression model can be applied globally to estimate MAF at any point of the river network, thus providing a feasible alternative to spatially explicit process-based global hydrological models.

  17. A generalized right truncated bivariate Poisson regression model with applications to health data

    PubMed Central

    Islam, M. Ataharul; Chowdhury, Rafiqul I.

    2017-01-01

    A generalized right truncated bivariate Poisson regression model is proposed in this paper. Estimation and tests for goodness of fit and over or under dispersion are illustrated for both untruncated and right truncated bivariate Poisson regression models using marginal-conditional approach. Estimation and test procedures are illustrated for bivariate Poisson regression models with applications to Health and Retirement Study data on number of health conditions and the number of health care services utilized. The proposed test statistics are easy to compute and it is evident from the results that the models fit the data very well. A comparison between the right truncated and untruncated bivariate Poisson regression models using the test for nonnested models clearly shows that the truncated model performs significantly better than the untruncated model. PMID:28586344

  18. Solvency supervision based on a total balance sheet approach

    NASA Astrophysics Data System (ADS)

    Pitselis, Georgios

    2009-11-01

    In this paper we investigate the adequacy of the own funds a company requires in order to remain healthy and avoid insolvency. Two methods are applied here; the quantile regression method and the method of mixed effects models. Quantile regression is capable of providing a more complete statistical analysis of the stochastic relationship among random variables than least squares estimation. The estimated mixed effects line can be considered as an internal industry equation (norm), which explains a systematic relation between a dependent variable (such as own funds) with independent variables (e.g. financial characteristics, such as assets, provisions, etc.). The above two methods are implemented with two data sets.

  19. The association of fatigue, pain, depression and anxiety with work and activity impairment in immune mediated inflammatory diseases.

    PubMed

    Enns, Murray W; Bernstein, Charles N; Kroeker, Kristine; Graff, Lesley; Walker, John R; Lix, Lisa M; Hitchon, Carol A; El-Gabalawy, Renée; Fisk, John D; Marrie, Ruth Ann

    2018-01-01

    Impairment in work function is a frequent outcome in patients with chronic conditions such as immune-mediated inflammatory diseases (IMID), depression and anxiety disorders. The personal and economic costs of work impairment in these disorders are immense. Symptoms of pain, fatigue, depression and anxiety are potentially remediable forms of distress that may contribute to work impairment in chronic health conditions such as IMID. The present study evaluated the association between pain [Medical Outcomes Study Pain Effects Scale], fatigue [Daily Fatigue Impact Scale], depression and anxiety [Hospital Anxiety and Depression Scale] and work impairment [Work Productivity and Activity Impairment Scale] in four patient populations: multiple sclerosis (n = 255), inflammatory bowel disease (n = 248, rheumatoid arthritis (n = 154) and a depression and anxiety group (n = 307), using quantile regression, controlling for the effects of sociodemographic factors, physical disability, and cognitive deficits. Each of pain, depression symptoms, anxiety symptoms, and fatigue individually showed significant associations with work absenteeism, presenteeism, and general activity impairment (quantile regression standardized estimates ranging from 0.3 to 1.0). When the distress variables were entered concurrently into the regression models, fatigue was a significant predictor of work and activity impairment in all models (quantile regression standardized estimates ranging from 0.2 to 0.5). These findings have important clinical implications for understanding the determinants of work impairment and for improving work-related outcomes in chronic disease.

  20. Analysis of Multivariate Experimental Data Using A Simplified Regression Model Search Algorithm

    NASA Technical Reports Server (NTRS)

    Ulbrich, Norbert M.

    2013-01-01

    A new regression model search algorithm was developed that may be applied to both general multivariate experimental data sets and wind tunnel strain-gage balance calibration data. The algorithm is a simplified version of a more complex algorithm that was originally developed for the NASA Ames Balance Calibration Laboratory. The new algorithm performs regression model term reduction to prevent overfitting of data. It has the advantage that it needs only about one tenth of the original algorithm's CPU time for the completion of a regression model search. In addition, extensive testing showed that the prediction accuracy of math models obtained from the simplified algorithm is similar to the prediction accuracy of math models obtained from the original algorithm. The simplified algorithm, however, cannot guarantee that search constraints related to a set of statistical quality requirements are always satisfied in the optimized regression model. Therefore, the simplified algorithm is not intended to replace the original algorithm. Instead, it may be used to generate an alternate optimized regression model of experimental data whenever the application of the original search algorithm fails or requires too much CPU time. Data from a machine calibration of NASA's MK40 force balance is used to illustrate the application of the new search algorithm.

  1. Evaluation of Regression Models of Balance Calibration Data Using an Empirical Criterion

    NASA Technical Reports Server (NTRS)

    Ulbrich, Norbert; Volden, Thomas R.

    2012-01-01

    An empirical criterion for assessing the significance of individual terms of regression models of wind tunnel strain gage balance outputs is evaluated. The criterion is based on the percent contribution of a regression model term. It considers a term to be significant if its percent contribution exceeds the empirical threshold of 0.05%. The criterion has the advantage that it can easily be computed using the regression coefficients of the gage outputs and the load capacities of the balance. First, a definition of the empirical criterion is provided. Then, it is compared with an alternate statistical criterion that is widely used in regression analysis. Finally, calibration data sets from a variety of balances are used to illustrate the connection between the empirical and the statistical criterion. A review of these results indicated that the empirical criterion seems to be suitable for a crude assessment of the significance of a regression model term as the boundary between a significant and an insignificant term cannot be defined very well. Therefore, regression model term reduction should only be performed by using the more universally applicable statistical criterion.

  2. Regression Models and Fuzzy Logic Prediction of TBM Penetration Rate

    NASA Astrophysics Data System (ADS)

    Minh, Vu Trieu; Katushin, Dmitri; Antonov, Maksim; Veinthal, Renno

    2017-03-01

    This paper presents statistical analyses of rock engineering properties and the measured penetration rate of tunnel boring machine (TBM) based on the data of an actual project. The aim of this study is to analyze the influence of rock engineering properties including uniaxial compressive strength (UCS), Brazilian tensile strength (BTS), rock brittleness index (BI), the distance between planes of weakness (DPW), and the alpha angle (Alpha) between the tunnel axis and the planes of weakness on the TBM rate of penetration (ROP). Four (4) statistical regression models (two linear and two nonlinear) are built to predict the ROP of TBM. Finally a fuzzy logic model is developed as an alternative method and compared to the four statistical regression models. Results show that the fuzzy logic model provides better estimations and can be applied to predict the TBM performance. The R-squared value (R2) of the fuzzy logic model scores the highest value of 0.714 over the second runner-up of 0.667 from the multiple variables nonlinear regression model.

  3. Modelling infant mortality rate in Central Java, Indonesia use generalized poisson regression method

    NASA Astrophysics Data System (ADS)

    Prahutama, Alan; Sudarno

    2018-05-01

    The infant mortality rate is the number of deaths under one year of age occurring among the live births in a given geographical area during a given year, per 1,000 live births occurring among the population of the given geographical area during the same year. This problem needs to be addressed because it is an important element of a country’s economic development. High infant mortality rate will disrupt the stability of a country as it relates to the sustainability of the population in the country. One of regression model that can be used to analyze the relationship between dependent variable Y in the form of discrete data and independent variable X is Poisson regression model. Recently The regression modeling used for data with dependent variable is discrete, among others, poisson regression, negative binomial regression and generalized poisson regression. In this research, generalized poisson regression modeling gives better AIC value than poisson regression. The most significant variable is the Number of health facilities (X1), while the variable that gives the most influence to infant mortality rate is the average breastfeeding (X9).

  4. Functional Generalized Additive Models.

    PubMed

    McLean, Mathew W; Hooker, Giles; Staicu, Ana-Maria; Scheipl, Fabian; Ruppert, David

    2014-01-01

    We introduce the functional generalized additive model (FGAM), a novel regression model for association studies between a scalar response and a functional predictor. We model the link-transformed mean response as the integral with respect to t of F { X ( t ), t } where F (·,·) is an unknown regression function and X ( t ) is a functional covariate. Rather than having an additive model in a finite number of principal components as in Müller and Yao (2008), our model incorporates the functional predictor directly and thus our model can be viewed as the natural functional extension of generalized additive models. We estimate F (·,·) using tensor-product B-splines with roughness penalties. A pointwise quantile transformation of the functional predictor is also considered to ensure each tensor-product B-spline has observed data on its support. The methods are evaluated using simulated data and their predictive performance is compared with other competing scalar-on-function regression alternatives. We illustrate the usefulness of our approach through an application to brain tractography, where X ( t ) is a signal from diffusion tensor imaging at position, t , along a tract in the brain. In one example, the response is disease-status (case or control) and in a second example, it is the score on a cognitive test. R code for performing the simulations and fitting the FGAM can be found in supplemental materials available online.

  5. Default Bayes Factors for Model Selection in Regression

    ERIC Educational Resources Information Center

    Rouder, Jeffrey N.; Morey, Richard D.

    2012-01-01

    In this article, we present a Bayes factor solution for inference in multiple regression. Bayes factors are principled measures of the relative evidence from data for various models or positions, including models that embed null hypotheses. In this regard, they may be used to state positive evidence for a lack of an effect, which is not possible…

  6. Multiple linear regression and regression with time series error models in forecasting PM10 concentrations in Peninsular Malaysia.

    PubMed

    Ng, Kar Yong; Awang, Norhashidah

    2018-01-06

    Frequent haze occurrences in Malaysia have made the management of PM 10 (particulate matter with aerodynamic less than 10 μm) pollution a critical task. This requires knowledge on factors associating with PM 10 variation and good forecast of PM 10 concentrations. Hence, this paper demonstrates the prediction of 1-day-ahead daily average PM 10 concentrations based on predictor variables including meteorological parameters and gaseous pollutants. Three different models were built. They were multiple linear regression (MLR) model with lagged predictor variables (MLR1), MLR model with lagged predictor variables and PM 10 concentrations (MLR2) and regression with time series error (RTSE) model. The findings revealed that humidity, temperature, wind speed, wind direction, carbon monoxide and ozone were the main factors explaining the PM 10 variation in Peninsular Malaysia. Comparison among the three models showed that MLR2 model was on a same level with RTSE model in terms of forecasting accuracy, while MLR1 model was the worst.

  7. Numerical analysis of the accuracy of bivariate quantile distributions utilizing copulas compared to the GUM supplement 2 for oil pressure balance uncertainties

    NASA Astrophysics Data System (ADS)

    Ramnath, Vishal

    2017-11-01

    In the field of pressure metrology the effective area is Ae = A0 (1 + λP) where A0 is the zero-pressure area and λ is the distortion coefficient and the conventional practise is to construct univariate probability density functions (PDFs) for A0 and λ. As a result analytical generalized non-Gaussian bivariate joint PDFs has not featured prominently in pressure metrology. Recently extended lambda distribution based quantile functions have been successfully utilized for summarizing univariate arbitrary PDF distributions of gas pressure balances. Motivated by this development we investigate the feasibility and utility of extending and applying quantile functions to systems which naturally exhibit bivariate PDFs. Our approach is to utilize the GUM Supplement 1 methodology to solve and generate Monte Carlo based multivariate uncertainty data for an oil based pressure balance laboratory standard that is used to generate known high pressures, and which are in turn cross-floated against another pressure balance transfer standard in order to deduce the transfer standard's respective area. We then numerically analyse the uncertainty data by formulating and constructing an approximate bivariate quantile distribution that directly couples A0 and λ in order to compare and contrast its accuracy to an exact GUM Supplement 2 based uncertainty quantification analysis.

  8. Do Our Means of Inquiry Match our Intentions?

    PubMed Central

    Petscher, Yaacov

    2016-01-01

    A key stage of the scientific method is the analysis of data, yet despite the variety of methods that are available to researchers they are most frequently distilled to a model that focuses on the average relation between variables. Although research questions are frequently conceived with broad inquiry in mind, most regression methods are limited in comprehensively evaluating how observed behaviors are related to each other. Quantile regression is a largely unknown yet well-suited analytic technique similar to traditional regression analysis, but allows for a more systematic approach to understanding complex associations among observed phenomena in the psychological sciences. Data from the National Education Longitudinal Study of 1988/2000 are used to illustrate how quantile regression overcomes the limitations of average associations in linear regression by showing that psychological well-being and sex each differentially relate to reading achievement depending on one’s level of reading achievement. PMID:27486410

  9. Evaluation of land use regression models in Detroit, Michigan

    EPA Science Inventory

    Introduction: Land use regression (LUR) models have emerged as a cost-effective tool for characterizing exposure in epidemiologic health studies. However, little critical attention has been focused on validation of these models as a step toward temporal and spatial extension of ...

  10. Linear models: permutation methods

    USGS Publications Warehouse

    Cade, B.S.; Everitt, B.S.; Howell, D.C.

    2005-01-01

    Permutation tests (see Permutation Based Inference) for the linear model have applications in behavioral studies when traditional parametric assumptions about the error term in a linear model are not tenable. Improved validity of Type I error rates can be achieved with properly constructed permutation tests. Perhaps more importantly, increased statistical power, improved robustness to effects of outliers, and detection of alternative distributional differences can be achieved by coupling permutation inference with alternative linear model estimators. For example, it is well-known that estimates of the mean in linear model are extremely sensitive to even a single outlying value of the dependent variable compared to estimates of the median [7, 19]. Traditionally, linear modeling focused on estimating changes in the center of distributions (means or medians). However, quantile regression allows distributional changes to be estimated in all or any selected part of a distribution or responses, providing a more complete statistical picture that has relevance to many biological questions [6]...

  11. Approximating prediction uncertainty for random forest regression models

    Treesearch

    John W. Coulston; Christine E. Blinn; Valerie A. Thomas; Randolph H. Wynne

    2016-01-01

    Machine learning approaches such as random forest have increased for the spatial modeling and mapping of continuous variables. Random forest is a non-parametric ensemble approach, and unlike traditional regression approaches there is no direct quantification of prediction error. Understanding prediction uncertainty is important when using model-based continuous maps as...

  12. QQ-SNV: single nucleotide variant detection at low frequency by comparing the quality quantiles.

    PubMed

    Van der Borght, Koen; Thys, Kim; Wetzels, Yves; Clement, Lieven; Verbist, Bie; Reumers, Joke; van Vlijmen, Herman; Aerssens, Jeroen

    2015-11-10

    Next generation sequencing enables studying heterogeneous populations of viral infections. When the sequencing is done at high coverage depth ("deep sequencing"), low frequency variants can be detected. Here we present QQ-SNV (http://sourceforge.net/projects/qqsnv), a logistic regression classifier model developed for the Illumina sequencing platforms that uses the quantiles of the quality scores, to distinguish true single nucleotide variants from sequencing errors based on the estimated SNV probability. To train the model, we created a dataset of an in silico mixture of five HIV-1 plasmids. Testing of our method in comparison to the existing methods LoFreq, ShoRAH, and V-Phaser 2 was performed on two HIV and four HCV plasmid mixture datasets and one influenza H1N1 clinical dataset. For default application of QQ-SNV, variants were called using a SNV probability cutoff of 0.5 (QQ-SNV(D)). To improve the sensitivity we used a SNV probability cutoff of 0.0001 (QQ-SNV(HS)). To also increase specificity, SNVs called were overruled when their frequency was below the 80(th) percentile calculated on the distribution of error frequencies (QQ-SNV(HS-P80)). When comparing QQ-SNV versus the other methods on the plasmid mixture test sets, QQ-SNV(D) performed similarly to the existing approaches. QQ-SNV(HS) was more sensitive on all test sets but with more false positives. QQ-SNV(HS-P80) was found to be the most accurate method over all test sets by balancing sensitivity and specificity. When applied to a paired-end HCV sequencing study, with lowest spiked-in true frequency of 0.5%, QQ-SNV(HS-P80) revealed a sensitivity of 100% (vs. 40-60% for the existing methods) and a specificity of 100% (vs. 98.0-99.7% for the existing methods). In addition, QQ-SNV required the least overall computation time to process the test sets. Finally, when testing on a clinical sample, four putative true variants with frequency below 0.5% were consistently detected by QQ-SNV(HS-P80) from different

  13. Analyzing degradation data with a random effects spline regression model

    DOE PAGES

    Fugate, Michael Lynn; Hamada, Michael Scott; Weaver, Brian Phillip

    2017-03-17

    This study proposes using a random effects spline regression model to analyze degradation data. Spline regression avoids having to specify a parametric function for the true degradation of an item. A distribution for the spline regression coefficients captures the variation of the true degradation curves from item to item. We illustrate the proposed methodology with a real example using a Bayesian approach. The Bayesian approach allows prediction of degradation of a population over time and estimation of reliability is easy to perform.

  14. Analyzing degradation data with a random effects spline regression model

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Fugate, Michael Lynn; Hamada, Michael Scott; Weaver, Brian Phillip

    This study proposes using a random effects spline regression model to analyze degradation data. Spline regression avoids having to specify a parametric function for the true degradation of an item. A distribution for the spline regression coefficients captures the variation of the true degradation curves from item to item. We illustrate the proposed methodology with a real example using a Bayesian approach. The Bayesian approach allows prediction of degradation of a population over time and estimation of reliability is easy to perform.

  15. Accounting for measurement error in log regression models with applications to accelerated testing.

    PubMed

    Richardson, Robert; Tolley, H Dennis; Evenson, William E; Lunt, Barry M

    2018-01-01

    In regression settings, parameter estimates will be biased when the explanatory variables are measured with error. This bias can significantly affect modeling goals. In particular, accelerated lifetime testing involves an extrapolation of the fitted model, and a small amount of bias in parameter estimates may result in a significant increase in the bias of the extrapolated predictions. Additionally, bias may arise when the stochastic component of a log regression model is assumed to be multiplicative when the actual underlying stochastic component is additive. To account for these possible sources of bias, a log regression model with measurement error and additive error is approximated by a weighted regression model which can be estimated using Iteratively Re-weighted Least Squares. Using the reduced Eyring equation in an accelerated testing setting, the model is compared to previously accepted approaches to modeling accelerated testing data with both simulations and real data.

  16. OPLS statistical model versus linear regression to assess sonographic predictors of stroke prognosis.

    PubMed

    Vajargah, Kianoush Fathi; Sadeghi-Bazargani, Homayoun; Mehdizadeh-Esfanjani, Robab; Savadi-Oskouei, Daryoush; Farhoudi, Mehdi

    2012-01-01

    The objective of the present study was to assess the comparable applicability of orthogonal projections to latent structures (OPLS) statistical model vs traditional linear regression in order to investigate the role of trans cranial doppler (TCD) sonography in predicting ischemic stroke prognosis. The study was conducted on 116 ischemic stroke patients admitted to a specialty neurology ward. The Unified Neurological Stroke Scale was used once for clinical evaluation on the first week of admission and again six months later. All data was primarily analyzed using simple linear regression and later considered for multivariate analysis using PLS/OPLS models through the SIMCA P+12 statistical software package. The linear regression analysis results used for the identification of TCD predictors of stroke prognosis were confirmed through the OPLS modeling technique. Moreover, in comparison to linear regression, the OPLS model appeared to have higher sensitivity in detecting the predictors of ischemic stroke prognosis and detected several more predictors. Applying the OPLS model made it possible to use both single TCD measures/indicators and arbitrarily dichotomized measures of TCD single vessel involvement as well as the overall TCD result. In conclusion, the authors recommend PLS/OPLS methods as complementary rather than alternative to the available classical regression models such as linear regression.

  17. A Technique of Fuzzy C-Mean in Multiple Linear Regression Model toward Paddy Yield

    NASA Astrophysics Data System (ADS)

    Syazwan Wahab, Nur; Saifullah Rusiman, Mohd; Mohamad, Mahathir; Amira Azmi, Nur; Che Him, Norziha; Ghazali Kamardan, M.; Ali, Maselan

    2018-04-01

    In this paper, we propose a hybrid model which is a combination of multiple linear regression model and fuzzy c-means method. This research involved a relationship between 20 variates of the top soil that are analyzed prior to planting of paddy yields at standard fertilizer rates. Data used were from the multi-location trials for rice carried out by MARDI at major paddy granary in Peninsular Malaysia during the period from 2009 to 2012. Missing observations were estimated using mean estimation techniques. The data were analyzed using multiple linear regression model and a combination of multiple linear regression model and fuzzy c-means method. Analysis of normality and multicollinearity indicate that the data is normally scattered without multicollinearity among independent variables. Analysis of fuzzy c-means cluster the yield of paddy into two clusters before the multiple linear regression model can be used. The comparison between two method indicate that the hybrid of multiple linear regression model and fuzzy c-means method outperform the multiple linear regression model with lower value of mean square error.

  18. Estimation of variance in Cox's regression model with shared gamma frailties.

    PubMed

    Andersen, P K; Klein, J P; Knudsen, K M; Tabanera y Palacios, R

    1997-12-01

    The Cox regression model with a shared frailty factor allows for unobserved heterogeneity or for statistical dependence between the observed survival times. Estimation in this model when the frailties are assumed to follow a gamma distribution is reviewed, and we address the problem of obtaining variance estimates for regression coefficients, frailty parameter, and cumulative baseline hazards using the observed nonparametric information matrix. A number of examples are given comparing this approach with fully parametric inference in models with piecewise constant baseline hazards.

  19. SCI model structure determination program (OSR) user's guide. [optimal subset regression

    NASA Technical Reports Server (NTRS)

    1979-01-01

    The computer program, OSR (Optimal Subset Regression) which estimates models for rotorcraft body and rotor force and moment coefficients is described. The technique used is based on the subset regression algorithm. Given time histories of aerodynamic coefficients, aerodynamic variables, and control inputs, the program computes correlation between various time histories. The model structure determination is based on these correlations. Inputs and outputs of the program are given.

  20. Robust, Adaptive Functional Regression in Functional Mixed Model Framework.

    PubMed

    Zhu, Hongxiao; Brown, Philip J; Morris, Jeffrey S

    2011-09-01

    Functional data are increasingly encountered in scientific studies, and their high dimensionality and complexity lead to many analytical challenges. Various methods for functional data analysis have been developed, including functional response regression methods that involve regression of a functional response on univariate/multivariate predictors with nonparametrically represented functional coefficients. In existing methods, however, the functional regression can be sensitive to outlying curves and outlying regions of curves, so is not robust. In this paper, we introduce a new Bayesian method, robust functional mixed models (R-FMM), for performing robust functional regression within the general functional mixed model framework, which includes multiple continuous or categorical predictors and random effect functions accommodating potential between-function correlation induced by the experimental design. The underlying model involves a hierarchical scale mixture model for the fixed effects, random effect and residual error functions. These modeling assumptions across curves result in robust nonparametric estimators of the fixed and random effect functions which down-weight outlying curves and regions of curves, and produce statistics that can be used to flag global and local outliers. These assumptions also lead to distributions across wavelet coefficients that have outstanding sparsity and adaptive shrinkage properties, with great flexibility for the data to determine the sparsity and the heaviness of the tails. Together with the down-weighting of outliers, these within-curve properties lead to fixed and random effect function estimates that appear in our simulations to be remarkably adaptive in their ability to remove spurious features yet retain true features of the functions. We have developed general code to implement this fully Bayesian method that is automatic, requiring the user to only provide the functional data and design matrices. It is efficient

  1. Robust, Adaptive Functional Regression in Functional Mixed Model Framework

    PubMed Central

    Zhu, Hongxiao; Brown, Philip J.; Morris, Jeffrey S.

    2012-01-01

    Functional data are increasingly encountered in scientific studies, and their high dimensionality and complexity lead to many analytical challenges. Various methods for functional data analysis have been developed, including functional response regression methods that involve regression of a functional response on univariate/multivariate predictors with nonparametrically represented functional coefficients. In existing methods, however, the functional regression can be sensitive to outlying curves and outlying regions of curves, so is not robust. In this paper, we introduce a new Bayesian method, robust functional mixed models (R-FMM), for performing robust functional regression within the general functional mixed model framework, which includes multiple continuous or categorical predictors and random effect functions accommodating potential between-function correlation induced by the experimental design. The underlying model involves a hierarchical scale mixture model for the fixed effects, random effect and residual error functions. These modeling assumptions across curves result in robust nonparametric estimators of the fixed and random effect functions which down-weight outlying curves and regions of curves, and produce statistics that can be used to flag global and local outliers. These assumptions also lead to distributions across wavelet coefficients that have outstanding sparsity and adaptive shrinkage properties, with great flexibility for the data to determine the sparsity and the heaviness of the tails. Together with the down-weighting of outliers, these within-curve properties lead to fixed and random effect function estimates that appear in our simulations to be remarkably adaptive in their ability to remove spurious features yet retain true features of the functions. We have developed general code to implement this fully Bayesian method that is automatic, requiring the user to only provide the functional data and design matrices. It is efficient

  2. Evaluation and application of regional turbidity-sediment regression models in Virginia

    USGS Publications Warehouse

    Hyer, Kenneth; Jastram, John D.; Moyer, Douglas; Webber, James S.; Chanat, Jeffrey G.

    2015-01-01

    Conventional thinking has long held that turbidity-sediment surrogate-regression equations are site specific and that regression equations developed at a single monitoring station should not be applied to another station; however, few studies have evaluated this issue in a rigorous manner. If robust regional turbidity-sediment models can be developed successfully, their applications could greatly expand the usage of these methods. Suspended sediment load estimation could occur as soon as flow and turbidity monitoring commence at a site, suspended sediment sampling frequencies for various projects potentially could be reduced, and special-project applications (sediment monitoring following dam removal, for example) could be significantly enhanced. The objective of this effort was to investigate the turbidity-suspended sediment concentration (SSC) relations at all available USGS monitoring sites within Virginia to determine whether meaningful turbidity-sediment regression models can be developed by combining the data from multiple monitoring stations into a single model, known as a “regional” model. Following the development of the regional model, additional objectives included a comparison of predicted SSCs between the regional model and commonly used site-specific models, as well as an evaluation of why specific monitoring stations did not fit the regional model.

  3. SPSS macros to compare any two fitted values from a regression model.

    PubMed

    Weaver, Bruce; Dubois, Sacha

    2012-12-01

    In regression models with first-order terms only, the coefficient for a given variable is typically interpreted as the change in the fitted value of Y for a one-unit increase in that variable, with all other variables held constant. Therefore, each regression coefficient represents the difference between two fitted values of Y. But the coefficients represent only a fraction of the possible fitted value comparisons that might be of interest to researchers. For many fitted value comparisons that are not captured by any of the regression coefficients, common statistical software packages do not provide the standard errors needed to compute confidence intervals or carry out statistical tests-particularly in more complex models that include interactions, polynomial terms, or regression splines. We describe two SPSS macros that implement a matrix algebra method for comparing any two fitted values from a regression model. The !OLScomp and !MLEcomp macros are for use with models fitted via ordinary least squares and maximum likelihood estimation, respectively. The output from the macros includes the standard error of the difference between the two fitted values, a 95% confidence interval for the difference, and a corresponding statistical test with its p-value.

  4. Investigating the Performance of Alternate Regression Weights by Studying All Possible Criteria in Regression Models with a Fixed Set of Predictors

    ERIC Educational Resources Information Center

    Waller, Niels; Jones, Jeff

    2011-01-01

    We describe methods for assessing all possible criteria (i.e., dependent variables) and subsets of criteria for regression models with a fixed set of predictors, x (where x is an n x 1 vector of independent variables). Our methods build upon the geometry of regression coefficients (hereafter called regression weights) in n-dimensional space. For a…

  5. Can We Use Regression Modeling to Quantify Mean Annual Streamflow at a Global-Scale?

    NASA Astrophysics Data System (ADS)

    Barbarossa, V.; Huijbregts, M. A. J.; Hendriks, J. A.; Beusen, A.; Clavreul, J.; King, H.; Schipper, A.

    2016-12-01

    Quantifying mean annual flow of rivers (MAF) at ungauged sites is essential for a number of applications, including assessments of global water supply, ecosystem integrity and water footprints. MAF can be quantified with spatially explicit process-based models, which might be overly time-consuming and data-intensive for this purpose, or with empirical regression models that predict MAF based on climate and catchment characteristics. Yet, regression models have mostly been developed at a regional scale and the extent to which they can be extrapolated to other regions is not known. In this study, we developed a global-scale regression model for MAF using observations of discharge and catchment characteristics from 1,885 catchments worldwide, ranging from 2 to 106 km2 in size. In addition, we compared the performance of the regression model with the predictive ability of the spatially explicit global hydrological model PCR-GLOBWB [van Beek et al., 2011] by comparing results from both models to independent measurements. We obtained a regression model explaining 89% of the variance in MAF based on catchment area, mean annual precipitation and air temperature, average slope and elevation. The regression model performed better than PCR-GLOBWB for the prediction of MAF, as root-mean-square error values were lower (0.29 - 0.38 compared to 0.49 - 0.57) and the modified index of agreement was higher (0.80 - 0.83 compared to 0.72 - 0.75). Our regression model can be applied globally at any point of the river network, provided that the input parameters are within the range of values employed in the calibration of the model. The performance is reduced for water scarce regions and further research should focus on improving such an aspect for regression-based global hydrological models.

  6. Modeling Governance KB with CATPCA to Overcome Multicollinearity in the Logistic Regression

    NASA Astrophysics Data System (ADS)

    Khikmah, L.; Wijayanto, H.; Syafitri, U. D.

    2017-04-01

    The problem often encounters in logistic regression modeling are multicollinearity problems. Data that have multicollinearity between explanatory variables with the result in the estimation of parameters to be bias. Besides, the multicollinearity will result in error in the classification. In general, to overcome multicollinearity in regression used stepwise regression. They are also another method to overcome multicollinearity which involves all variable for prediction. That is Principal Component Analysis (PCA). However, classical PCA in only for numeric data. Its data are categorical, one method to solve the problems is Categorical Principal Component Analysis (CATPCA). Data were used in this research were a part of data Demographic and Population Survey Indonesia (IDHS) 2012. This research focuses on the characteristic of women of using the contraceptive methods. Classification results evaluated using Area Under Curve (AUC) values. The higher the AUC value, the better. Based on AUC values, the classification of the contraceptive method using stepwise method (58.66%) is better than the logistic regression model (57.39%) and CATPCA (57.39%). Evaluation of the results of logistic regression using sensitivity, shows the opposite where CATPCA method (99.79%) is better than logistic regression method (92.43%) and stepwise (92.05%). Therefore in this study focuses on major class classification (using a contraceptive method), then the selected model is CATPCA because it can raise the level of the major class model accuracy.

  7. Time series modeling by a regression approach based on a latent process.

    PubMed

    Chamroukhi, Faicel; Samé, Allou; Govaert, Gérard; Aknin, Patrice

    2009-01-01

    Time series are used in many domains including finance, engineering, economics and bioinformatics generally to represent the change of a measurement over time. Modeling techniques may then be used to give a synthetic representation of such data. A new approach for time series modeling is proposed in this paper. It consists of a regression model incorporating a discrete hidden logistic process allowing for activating smoothly or abruptly different polynomial regression models. The model parameters are estimated by the maximum likelihood method performed by a dedicated Expectation Maximization (EM) algorithm. The M step of the EM algorithm uses a multi-class Iterative Reweighted Least-Squares (IRLS) algorithm to estimate the hidden process parameters. To evaluate the proposed approach, an experimental study on simulated data and real world data was performed using two alternative approaches: a heteroskedastic piecewise regression model using a global optimization algorithm based on dynamic programming, and a Hidden Markov Regression Model whose parameters are estimated by the Baum-Welch algorithm. Finally, in the context of the remote monitoring of components of the French railway infrastructure, and more particularly the switch mechanism, the proposed approach has been applied to modeling and classifying time series representing the condition measurements acquired during switch operations.

  8. Differential Language Influence on Math Achievement

    ERIC Educational Resources Information Center

    Chen, Fang

    2010-01-01

    New models are commonly designed to solve certain limitations of other ones. Quantile regression is introduced in this paper because it can provide information that a regular mean regression misses. This research aims to demonstrate its utility in the educational research and measurement field for questions that may not be detected otherwise.…

  9. Comparison and Contrast of Two General Functional Regression Modeling Frameworks.

    PubMed

    Morris, Jeffrey S

    2017-02-01

    In this article, Greven and Scheipl describe an impressively general framework for performing functional regression that builds upon the generalized additive modeling framework. Over the past number of years, my collaborators and I have also been developing a general framework for functional regression, functional mixed models, which shares many similarities with this framework, but has many differences as well. In this discussion, I compare and contrast these two frameworks, to hopefully illuminate characteristics of each, highlighting their respecitve strengths and weaknesses, and providing recommendations regarding the settings in which each approach might be preferable.

  10. Comparison Between Linear and Non-parametric Regression Models for Genome-Enabled Prediction in Wheat

    PubMed Central

    Pérez-Rodríguez, Paulino; Gianola, Daniel; González-Camacho, Juan Manuel; Crossa, José; Manès, Yann; Dreisigacker, Susanne

    2012-01-01

    In genome-enabled prediction, parametric, semi-parametric, and non-parametric regression models have been used. This study assessed the predictive ability of linear and non-linear models using dense molecular markers. The linear models were linear on marker effects and included the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B. The non-linear models (this refers to non-linearity on markers) were reproducing kernel Hilbert space (RKHS) regression, Bayesian regularized neural networks (BRNN), and radial basis function neural networks (RBFNN). These statistical models were compared using 306 elite wheat lines from CIMMYT genotyped with 1717 diversity array technology (DArT) markers and two traits, days to heading (DTH) and grain yield (GY), measured in each of 12 environments. It was found that the three non-linear models had better overall prediction accuracy than the linear regression specification. Results showed a consistent superiority of RKHS and RBFNN over the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B models. PMID:23275882

  11. Comparison between linear and non-parametric regression models for genome-enabled prediction in wheat.

    PubMed

    Pérez-Rodríguez, Paulino; Gianola, Daniel; González-Camacho, Juan Manuel; Crossa, José; Manès, Yann; Dreisigacker, Susanne

    2012-12-01

    In genome-enabled prediction, parametric, semi-parametric, and non-parametric regression models have been used. This study assessed the predictive ability of linear and non-linear models using dense molecular markers. The linear models were linear on marker effects and included the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B. The non-linear models (this refers to non-linearity on markers) were reproducing kernel Hilbert space (RKHS) regression, Bayesian regularized neural networks (BRNN), and radial basis function neural networks (RBFNN). These statistical models were compared using 306 elite wheat lines from CIMMYT genotyped with 1717 diversity array technology (DArT) markers and two traits, days to heading (DTH) and grain yield (GY), measured in each of 12 environments. It was found that the three non-linear models had better overall prediction accuracy than the linear regression specification. Results showed a consistent superiority of RKHS and RBFNN over the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B models.

  12. Adjusting for Confounding in Early Postlaunch Settings: Going Beyond Logistic Regression Models.

    PubMed

    Schmidt, Amand F; Klungel, Olaf H; Groenwold, Rolf H H

    2016-01-01

    Postlaunch data on medical treatments can be analyzed to explore adverse events or relative effectiveness in real-life settings. These analyses are often complicated by the number of potential confounders and the possibility of model misspecification. We conducted a simulation study to compare the performance of logistic regression, propensity score, disease risk score, and stabilized inverse probability weighting methods to adjust for confounding. Model misspecification was induced in the independent derivation dataset. We evaluated performance using relative bias confidence interval coverage of the true effect, among other metrics. At low events per coefficient (1.0 and 0.5), the logistic regression estimates had a large relative bias (greater than -100%). Bias of the disease risk score estimates was at most 13.48% and 18.83%. For the propensity score model, this was 8.74% and >100%, respectively. At events per coefficient of 1.0 and 0.5, inverse probability weighting frequently failed or reduced to a crude regression, resulting in biases of -8.49% and 24.55%. Coverage of logistic regression estimates became less than the nominal level at events per coefficient ≤5. For the disease risk score, inverse probability weighting, and propensity score, coverage became less than nominal at events per coefficient ≤2.5, ≤1.0, and ≤1.0, respectively. Bias of misspecified disease risk score models was 16.55%. In settings with low events/exposed subjects per coefficient, disease risk score methods can be useful alternatives to logistic regression models, especially when propensity score models cannot be used. Despite better performance of disease risk score methods than logistic regression and propensity score models in small events per coefficient settings, bias, and coverage still deviated from nominal.

  13. Regression and multivariate models for predicting particulate matter concentration level.

    PubMed

    Nazif, Amina; Mohammed, Nurul Izma; Malakahmad, Amirhossein; Abualqumboz, Motasem S

    2018-01-01

    The devastating health effects of particulate matter (PM 10 ) exposure by susceptible populace has made it necessary to evaluate PM 10 pollution. Meteorological parameters and seasonal variation increases PM 10 concentration levels, especially in areas that have multiple anthropogenic activities. Hence, stepwise regression (SR), multiple linear regression (MLR) and principal component regression (PCR) analyses were used to analyse daily average PM 10 concentration levels. The analyses were carried out using daily average PM 10 concentration, temperature, humidity, wind speed and wind direction data from 2006 to 2010. The data was from an industrial air quality monitoring station in Malaysia. The SR analysis established that meteorological parameters had less influence on PM 10 concentration levels having coefficient of determination (R 2 ) result from 23 to 29% based on seasoned and unseasoned analysis. While, the result of the prediction analysis showed that PCR models had a better R 2 result than MLR methods. The results for the analyses based on both seasoned and unseasoned data established that MLR models had R 2 result from 0.50 to 0.60. While, PCR models had R 2 result from 0.66 to 0.89. In addition, the validation analysis using 2016 data also recognised that the PCR model outperformed the MLR model, with the PCR model for the seasoned analysis having the best result. These analyses will aid in achieving sustainable air quality management strategies.

  14. No causal impact of serum vascular endothelial growth factor level on temporal changes in body mass index in Japanese male workers: a five-year longitudinal study.

    PubMed

    Imatoh, Takuya; Kamimura, Seiichiro; Miyazaki, Motonobu

    2017-03-01

    It has been reported that adipocytes secrete vascular endothelial growth factor. Therefore, we conducted a 5-year longitudinal epidemiological study to further elucidate the association between vascular endothelial growth factor levels and temporal changes in body mass index. Our study subjects were Japanese male workers, who had regular health check-ups. Vascular endothelial growth factor levels were measured at baseline. To examine the association between vascular endothelial growth factor levels and overweight, we calculated the odds ratio using a multivariate logistic regression model. Moreover, linear mixed effect models were used to assess the association between vascular endothelial growth factor level and temporal changes in body mass index during the 5-year follow-up period. Vascular endothelial growth factor levels were marginally higher in subjects with a body mass index greater than 25 kg/m 2 compared with in those with a body mass index less than 25 kg/m 2 (505.4 vs. 465.5 pg/mL, P = 0.1) and were weakly correlated with leptin levels (β: 0.05, P = 0.07). In multivariate logistic regression, subjects in the highest vascular endothelial growth factor quantile were significantly associated with an increased risk for overweight compared with those in the lowest quantile (odds ratio 1.65, 95 % confidential interval: 1.10-2.50). Moreover P for trend was significant (P for trend = 0.003). However, the linear mixed effect model revealed that vascular endothelial growth factor levels were not associated with changes in body mass index over a 5-year period (quantile 2, β: 0.06, P = 0.46; quantile 3, β: -0.06, P = 0.45; quantile 4, β: -0.10, P = 0.22; quantile 1 as reference). Our results suggested that high vascular endothelial growth factor levels were significantly associated with overweight in Japanese males but high vascular endothelial growth factor levels did not necessarily cause obesity.

  15. Bootstrap investigation of the stability of a Cox regression model.

    PubMed

    Altman, D G; Andersen, P K

    1989-07-01

    We describe a bootstrap investigation of the stability of a Cox proportional hazards regression model resulting from the analysis of a clinical trial of azathioprine versus placebo in patients with primary biliary cirrhosis. We have considered stability to refer both to the choice of variables included in the model and, more importantly, to the predictive ability of the model. In stepwise Cox regression analyses of 100 bootstrap samples using 17 candidate variables, the most frequently selected variables were those selected in the original analysis, and no other important variable was identified. Thus there was no reason to doubt the model obtained in the original analysis. For each patient in the trial, bootstrap confidence intervals were constructed for the estimated probability of surviving two years. It is shown graphically that these intervals are markedly wider than those obtained from the original model.

  16. Logistic regression for risk factor modelling in stuttering research.

    PubMed

    Reed, Phil; Wu, Yaqionq

    2013-06-01

    To outline the uses of logistic regression and other statistical methods for risk factor analysis in the context of research on stuttering. The principles underlying the application of a logistic regression are illustrated, and the types of questions to which such a technique has been applied in the stuttering field are outlined. The assumptions and limitations of the technique are discussed with respect to existing stuttering research, and with respect to formulating appropriate research strategies to accommodate these considerations. Finally, some alternatives to the approach are briefly discussed. The way the statistical procedures are employed are demonstrated with some hypothetical data. Research into several practical issues concerning stuttering could benefit if risk factor modelling were used. Important examples are early diagnosis, prognosis (whether a child will recover or persist) and assessment of treatment outcome. After reading this article you will: (a) Summarize the situations in which logistic regression can be applied to a range of issues about stuttering; (b) Follow the steps in performing a logistic regression analysis; (c) Describe the assumptions of the logistic regression technique and the precautions that need to be checked when it is employed; (d) Be able to summarize its advantages over other techniques like estimation of group differences and simple regression. Copyright © 2012 Elsevier Inc. All rights reserved.

  17. A land use regression model for ambient ultrafine particles in Montreal, Canada: A comparison of linear regression and a machine learning approach.

    PubMed

    Weichenthal, Scott; Ryswyk, Keith Van; Goldstein, Alon; Bagg, Scott; Shekkarizfard, Maryam; Hatzopoulou, Marianne

    2016-04-01

    Existing evidence suggests that ambient ultrafine particles (UFPs) (<0.1µm) may contribute to acute cardiorespiratory morbidity. However, few studies have examined the long-term health effects of these pollutants owing in part to a need for exposure surfaces that can be applied in large population-based studies. To address this need, we developed a land use regression model for UFPs in Montreal, Canada using mobile monitoring data collected from 414 road segments during the summer and winter months between 2011 and 2012. Two different approaches were examined for model development including standard multivariable linear regression and a machine learning approach (kernel-based regularized least squares (KRLS)) that learns the functional form of covariate impacts on ambient UFP concentrations from the data. The final models included parameters for population density, ambient temperature and wind speed, land use parameters (park space and open space), length of local roads and rail, and estimated annual average NOx emissions from traffic. The final multivariable linear regression model explained 62% of the spatial variation in ambient UFP concentrations whereas the KRLS model explained 79% of the variance. The KRLS model performed slightly better than the linear regression model when evaluated using an external dataset (R(2)=0.58 vs. 0.55) or a cross-validation procedure (R(2)=0.67 vs. 0.60). In general, our findings suggest that the KRLS approach may offer modest improvements in predictive performance compared to standard multivariable linear regression models used to estimate spatial variations in ambient UFPs. However, differences in predictive performance were not statistically significant when evaluated using the cross-validation procedure. Crown Copyright © 2015. Published by Elsevier Inc. All rights reserved.

  18. Advanced statistics: linear regression, part I: simple linear regression.

    PubMed

    Marill, Keith A

    2004-01-01

    Simple linear regression is a mathematical technique used to model the relationship between a single independent predictor variable and a single dependent outcome variable. In this, the first of a two-part series exploring concepts in linear regression analysis, the four fundamental assumptions and the mechanics of simple linear regression are reviewed. The most common technique used to derive the regression line, the method of least squares, is described. The reader will be acquainted with other important concepts in simple linear regression, including: variable transformations, dummy variables, relationship to inference testing, and leverage. Simplified clinical examples with small datasets and graphic models are used to illustrate the points. This will provide a foundation for the second article in this series: a discussion of multiple linear regression, in which there are multiple predictor variables.

  19. Advanced statistics: linear regression, part II: multiple linear regression.

    PubMed

    Marill, Keith A

    2004-01-01

    The applications of simple linear regression in medical research are limited, because in most situations, there are multiple relevant predictor variables. Univariate statistical techniques such as simple linear regression use a single predictor variable, and they often may be mathematically correct but clinically misleading. Multiple linear regression is a mathematical technique used to model the relationship between multiple independent predictor variables and a single dependent outcome variable. It is used in medical research to model observational data, as well as in diagnostic and therapeutic studies in which the outcome is dependent on more than one factor. Although the technique generally is limited to data that can be expressed with a linear function, it benefits from a well-developed mathematical framework that yields unique solutions and exact confidence intervals for regression coefficients. Building on Part I of this series, this article acquaints the reader with some of the important concepts in multiple regression analysis. These include multicollinearity, interaction effects, and an expansion of the discussion of inference testing, leverage, and variable transformations to multivariate models. Examples from the first article in this series are expanded on using a primarily graphic, rather than mathematical, approach. The importance of the relationships among the predictor variables and the dependence of the multivariate model coefficients on the choice of these variables are stressed. Finally, concepts in regression model building are discussed.

  20. Developing global regression models for metabolite concentration prediction regardless of cell line.

    PubMed

    André, Silvère; Lagresle, Sylvain; Da Sliva, Anthony; Heimendinger, Pierre; Hannas, Zahia; Calvosa, Éric; Duponchel, Ludovic

    2017-11-01

    Following the Process Analytical Technology (PAT) of the Food and Drug Administration (FDA), drug manufacturers are encouraged to develop innovative techniques in order to monitor and understand their processes in a better way. Within this framework, it has been demonstrated that Raman spectroscopy coupled with chemometric tools allow to predict critical parameters of mammalian cell cultures in-line and in real time. However, the development of robust and predictive regression models clearly requires many batches in order to take into account inter-batch variability and enhance models accuracy. Nevertheless, this heavy procedure has to be repeated for every new line of cell culture involving many resources. This is why we propose in this paper to develop global regression models taking into account different cell lines. Such models are finally transferred to any culture of the cells involved. This article first demonstrates the feasibility of developing regression models, not only for mammalian cell lines (CHO and HeLa cell cultures), but also for insect cell lines (Sf9 cell cultures). Then global regression models are generated, based on CHO cells, HeLa cells, and Sf9 cells. Finally, these models are evaluated considering a fourth cell line(HEK cells). In addition to suitable predictions of glucose and lactate concentration of HEK cell cultures, we expose that by adding a single HEK-cell culture to the calibration set, the predictive ability of the regression models are substantially increased. In this way, we demonstrate that using global models, it is not necessary to consider many cultures of a new cell line in order to obtain accurate models. Biotechnol. Bioeng. 2017;114: 2550-2559. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.

  1. Evaluation of weighted regression and sample size in developing a taper model for loblolly pine

    Treesearch

    Kenneth L. Cormier; Robin M. Reich; Raymond L. Czaplewski; William A. Bechtold

    1992-01-01

    A stem profile model, fit using pseudo-likelihood weighted regression, was used to estimate merchantable volume of loblolly pine (Pinus taeda L.) in the southeast. The weighted regression increased model fit marginally, but did not substantially increase model performance. In all cases, the unweighted regression models performed as well as the...

  2. Crime Modeling using Spatial Regression Approach

    NASA Astrophysics Data System (ADS)

    Saleh Ahmar, Ansari; Adiatma; Kasim Aidid, M.

    2018-01-01

    Act of criminality in Indonesia increased both variety and quantity every year. As murder, rape, assault, vandalism, theft, fraud, fencing, and other cases that make people feel unsafe. Risk of society exposed to crime is the number of reported cases in the police institution. The higher of the number of reporter to the police institution then the number of crime in the region is increasing. In this research, modeling criminality in South Sulawesi, Indonesia with the dependent variable used is the society exposed to the risk of crime. Modelling done by area approach is the using Spatial Autoregressive (SAR) and Spatial Error Model (SEM) methods. The independent variable used is the population density, the number of poor population, GDP per capita, unemployment and the human development index (HDI). Based on the analysis using spatial regression can be shown that there are no dependencies spatial both lag or errors in South Sulawesi.

  3. The bivariate regression model and its application

    NASA Astrophysics Data System (ADS)

    Pratikno, B.; Sulistia, L.; Saniyah

    2018-03-01

    The paper studied a bivariate regression model (BRM) and its application. The maximum power and minimum size are used to choose the eligible tests using non-sample prior information (NSPI). In the simulation study on real data, we used Wilk’s lamda to determine the best model of the BRM. The result showed that the power of the pre-test-test (PTT) on the NSPI is a significant choice of the tests among unrestricted test (UT) and restricted test (RT), and the best model of the BRM is Y (1) = ‑894 + 46X and Y (2) = 78 + 0.2X with significant Wilk’s lamda 0.88 < 0.90 (Wilk’s table).

  4. Determining factors influencing survival of breast cancer by fuzzy logistic regression model.

    PubMed

    Nikbakht, Roya; Bahrampour, Abbas

    2017-01-01

    Fuzzy logistic regression model can be used for determining influential factors of disease. This study explores the important factors of actual predictive survival factors of breast cancer's patients. We used breast cancer data which collected by cancer registry of Kerman University of Medical Sciences during the period of 2000-2007. The variables such as morphology, grade, age, and treatments (surgery, radiotherapy, and chemotherapy) were applied in the fuzzy logistic regression model. Performance of model was determined in terms of mean degree of membership (MDM). The study results showed that almost 41% of patients were in neoplasm and malignant group and more than two-third of them were still alive after 5-year follow-up. Based on the fuzzy logistic model, the most important factors influencing survival were chemotherapy, morphology, and radiotherapy, respectively. Furthermore, the MDM criteria show that the fuzzy logistic regression have a good fit on the data (MDM = 0.86). Fuzzy logistic regression model showed that chemotherapy is more important than radiotherapy in survival of patients with breast cancer. In addition, another ability of this model is calculating possibilistic odds of survival in cancer patients. The results of this study can be applied in clinical research. Furthermore, there are few studies which applied the fuzzy logistic models. Furthermore, we recommend using this model in various research areas.

  5. Modeling the frequency of opposing left-turn conflicts at signalized intersections using generalized linear regression models.

    PubMed

    Zhang, Xin; Liu, Pan; Chen, Yuguang; Bai, Lu; Wang, Wei

    2014-01-01

    The primary objective of this study was to identify whether the frequency of traffic conflicts at signalized intersections can be modeled. The opposing left-turn conflicts were selected for the development of conflict predictive models. Using data collected at 30 approaches at 20 signalized intersections, the underlying distributions of the conflicts under different traffic conditions were examined. Different conflict-predictive models were developed to relate the frequency of opposing left-turn conflicts to various explanatory variables. The models considered include a linear regression model, a negative binomial model, and separate models developed for four traffic scenarios. The prediction performance of different models was compared. The frequency of traffic conflicts follows a negative binominal distribution. The linear regression model is not appropriate for the conflict frequency data. In addition, drivers behaved differently under different traffic conditions. Accordingly, the effects of conflicting traffic volumes on conflict frequency vary across different traffic conditions. The occurrences of traffic conflicts at signalized intersections can be modeled using generalized linear regression models. The use of conflict predictive models has potential to expand the uses of surrogate safety measures in safety estimation and evaluation.

  6. Use of empirical likelihood to calibrate auxiliary information in partly linear monotone regression models.

    PubMed

    Chen, Baojiang; Qin, Jing

    2014-05-10

    In statistical analysis, a regression model is needed if one is interested in finding the relationship between a response variable and covariates. When the response depends on the covariate, then it may also depend on the function of this covariate. If one has no knowledge of this functional form but expect for monotonic increasing or decreasing, then the isotonic regression model is preferable. Estimation of parameters for isotonic regression models is based on the pool-adjacent-violators algorithm (PAVA), where the monotonicity constraints are built in. With missing data, people often employ the augmented estimating method to improve estimation efficiency by incorporating auxiliary information through a working regression model. However, under the framework of the isotonic regression model, the PAVA does not work as the monotonicity constraints are violated. In this paper, we develop an empirical likelihood-based method for isotonic regression model to incorporate the auxiliary information. Because the monotonicity constraints still hold, the PAVA can be used for parameter estimation. Simulation studies demonstrate that the proposed method can yield more efficient estimates, and in some situations, the efficiency improvement is substantial. We apply this method to a dementia study. Copyright © 2013 John Wiley & Sons, Ltd.

  7. Comparison and Contrast of Two General Functional Regression Modeling Frameworks

    PubMed Central

    Morris, Jeffrey S.

    2017-01-01

    In this article, Greven and Scheipl describe an impressively general framework for performing functional regression that builds upon the generalized additive modeling framework. Over the past number of years, my collaborators and I have also been developing a general framework for functional regression, functional mixed models, which shares many similarities with this framework, but has many differences as well. In this discussion, I compare and contrast these two frameworks, to hopefully illuminate characteristics of each, highlighting their respecitve strengths and weaknesses, and providing recommendations regarding the settings in which each approach might be preferable. PMID:28736502

  8. Survival Regression Modeling Strategies in CVD Prediction.

    PubMed

    Barkhordari, Mahnaz; Padyab, Mojgan; Sardarinia, Mahsa; Hadaegh, Farzad; Azizi, Fereidoun; Bozorgmanesh, Mohammadreza

    2016-04-01

    A fundamental part of prevention is prediction. Potential predictors are the sine qua non of prediction models. However, whether incorporating novel predictors to prediction models could be directly translated to added predictive value remains an area of dispute. The difference between the predictive power of a predictive model with (enhanced model) and without (baseline model) a certain predictor is generally regarded as an indicator of the predictive value added by that predictor. Indices such as discrimination and calibration have long been used in this regard. Recently, the use of added predictive value has been suggested while comparing the predictive performances of the predictive models with and without novel biomarkers. User-friendly statistical software capable of implementing novel statistical procedures is conspicuously lacking. This shortcoming has restricted implementation of such novel model assessment methods. We aimed to construct Stata commands to help researchers obtain the aforementioned statistical indices. We have written Stata commands that are intended to help researchers obtain the following. 1, Nam-D'Agostino X 2 goodness of fit test; 2, Cut point-free and cut point-based net reclassification improvement index (NRI), relative absolute integrated discriminatory improvement index (IDI), and survival-based regression analyses. We applied the commands to real data on women participating in the Tehran lipid and glucose study (TLGS) to examine if information relating to a family history of premature cardiovascular disease (CVD), waist circumference, and fasting plasma glucose can improve predictive performance of Framingham's general CVD risk algorithm. The command is adpredsurv for survival models. Herein we have described the Stata package "adpredsurv" for calculation of the Nam-D'Agostino X 2 goodness of fit test as well as cut point-free and cut point-based NRI, relative and absolute IDI, and survival-based regression analyses. We hope this

  9. Analysis of Multivariate Experimental Data Using A Simplified Regression Model Search Algorithm

    NASA Technical Reports Server (NTRS)

    Ulbrich, Norbert Manfred

    2013-01-01

    A new regression model search algorithm was developed in 2011 that may be used to analyze both general multivariate experimental data sets and wind tunnel strain-gage balance calibration data. The new algorithm is a simplified version of a more complex search algorithm that was originally developed at the NASA Ames Balance Calibration Laboratory. The new algorithm has the advantage that it needs only about one tenth of the original algorithm's CPU time for the completion of a search. In addition, extensive testing showed that the prediction accuracy of math models obtained from the simplified algorithm is similar to the prediction accuracy of math models obtained from the original algorithm. The simplified algorithm, however, cannot guarantee that search constraints related to a set of statistical quality requirements are always satisfied in the optimized regression models. Therefore, the simplified search algorithm is not intended to replace the original search algorithm. Instead, it may be used to generate an alternate optimized regression model of experimental data whenever the application of the original search algorithm either fails or requires too much CPU time. Data from a machine calibration of NASA's MK40 force balance is used to illustrate the application of the new regression model search algorithm.

  10. Evaluation of accuracy of linear regression models in predicting urban stormwater discharge characteristics.

    PubMed

    Madarang, Krish J; Kang, Joo-Hyon

    2014-06-01

    Stormwater runoff has been identified as a source of pollution for the environment, especially for receiving waters. In order to quantify and manage the impacts of stormwater runoff on the environment, predictive models and mathematical models have been developed. Predictive tools such as regression models have been widely used to predict stormwater discharge characteristics. Storm event characteristics, such as antecedent dry days (ADD), have been related to response variables, such as pollutant loads and concentrations. However it has been a controversial issue among many studies to consider ADD as an important variable in predicting stormwater discharge characteristics. In this study, we examined the accuracy of general linear regression models in predicting discharge characteristics of roadway runoff. A total of 17 storm events were monitored in two highway segments, located in Gwangju, Korea. Data from the monitoring were used to calibrate United States Environmental Protection Agency's Storm Water Management Model (SWMM). The calibrated SWMM was simulated for 55 storm events, and the results of total suspended solid (TSS) discharge loads and event mean concentrations (EMC) were extracted. From these data, linear regression models were developed. R(2) and p-values of the regression of ADD for both TSS loads and EMCs were investigated. Results showed that pollutant loads were better predicted than pollutant EMC in the multiple regression models. Regression may not provide the true effect of site-specific characteristics, due to uncertainty in the data. Copyright © 2014 The Research Centre for Eco-Environmental Sciences, Chinese Academy of Sciences. Published by Elsevier B.V. All rights reserved.

  11. Estimation of Standard Error of Regression Effects in Latent Regression Models Using Binder's Linearization. Research Report. ETS RR-07-09

    ERIC Educational Resources Information Center

    Li, Deping; Oranje, Andreas

    2007-01-01

    Two versions of a general method for approximating standard error of regression effect estimates within an IRT-based latent regression model are compared. The general method is based on Binder's (1983) approach, accounting for complex samples and finite populations by Taylor series linearization. In contrast, the current National Assessment of…

  12. A regularization corrected score method for nonlinear regression models with covariate error.

    PubMed

    Zucker, David M; Gorfine, Malka; Li, Yi; Tadesse, Mahlet G; Spiegelman, Donna

    2013-03-01

    Many regression analyses involve explanatory variables that are measured with error, and failing to account for this error is well known to lead to biased point and interval estimates of the regression coefficients. We present here a new general method for adjusting for covariate error. Our method consists of an approximate version of the Stefanski-Nakamura corrected score approach, using the method of regularization to obtain an approximate solution of the relevant integral equation. We develop the theory in the setting of classical likelihood models; this setting covers, for example, linear regression, nonlinear regression, logistic regression, and Poisson regression. The method is extremely general in terms of the types of measurement error models covered, and is a functional method in the sense of not involving assumptions on the distribution of the true covariate. We discuss the theoretical properties of the method and present simulation results in the logistic regression setting (univariate and multivariate). For illustration, we apply the method to data from the Harvard Nurses' Health Study concerning the relationship between physical activity and breast cancer mortality in the period following a diagnosis of breast cancer. Copyright © 2013, The International Biometric Society.

  13. A Semiparametric Change-Point Regression Model for Longitudinal Observations.

    PubMed

    Xing, Haipeng; Ying, Zhiliang

    2012-12-01

    Many longitudinal studies involve relating an outcome process to a set of possibly time-varying covariates, giving rise to the usual regression models for longitudinal data. When the purpose of the study is to investigate the covariate effects when experimental environment undergoes abrupt changes or to locate the periods with different levels of covariate effects, a simple and easy-to-interpret approach is to introduce change-points in regression coefficients. In this connection, we propose a semiparametric change-point regression model, in which the error process (stochastic component) is nonparametric and the baseline mean function (functional part) is completely unspecified, the observation times are allowed to be subject-specific, and the number, locations and magnitudes of change-points are unknown and need to be estimated. We further develop an estimation procedure which combines the recent advance in semiparametric analysis based on counting process argument and multiple change-points inference, and discuss its large sample properties, including consistency and asymptotic normality, under suitable regularity conditions. Simulation results show that the proposed methods work well under a variety of scenarios. An application to a real data set is also given.

  14. Food away from home and body mass outcomes: taking heterogeneity into account enhances quality of results.

    PubMed

    Kim, Tae Hyun; Lee, Eui-Kyung; Han, Euna

    2014-09-01

    The aim of this study was to explore the heterogeneous association of consumption of food away from home (FAFH) with individual body mass outcomes including body mass index and waist circumference over the entire conditional distribution of each outcome. Information on 16,403 adults obtained from nationally representative data on nutrition and behavior in Korea was used. A quantile regression model captured the variability of the association of FAFH with body mass outcomes across the entire conditional distribution of each outcome measure. Heavy FAFH consumption was defined as obtaining ≥1400 kcal from FAFH on a single day. Heavy FAFH consumption, specifically at full-service restaurants, was significantly associated with higher body mass index (+0.46 kg/m2 at the 50th quantile, 0.55 at the 75th, 0.66 at the 90th, and 0.44 at the 95th) and waist circumference (+0.96 cm at the 25th quantile, 1.06 cm at the 50th, 1.35 cm at the 75th, and 0.96 cm at the 90th quantiles) with overall larger associations at higher quantiles. Findings of the study indicate that conventional regression methods may mask important heterogeneity in the association between heavy FAFH consumption and body mass outcomes. Further public health efforts are needed to improve the nutritional quality of affordable FAFH choices and nutrition education and to establish a healthy food consumption environment. Copyright © 2014 Elsevier Inc. All rights reserved.

  15. Bivariate least squares linear regression: Towards a unified analytic formalism. I. Functional models

    NASA Astrophysics Data System (ADS)

    Caimmi, R.

    2011-08-01

    Concerning bivariate least squares linear regression, the classical approach pursued for functional models in earlier attempts ( York, 1966, 1969) is reviewed using a new formalism in terms of deviation (matrix) traces which, for unweighted data, reduce to usual quantities leaving aside an unessential (but dimensional) multiplicative factor. Within the framework of classical error models, the dependent variable relates to the independent variable according to the usual additive model. The classes of linear models considered are regression lines in the general case of correlated errors in X and in Y for weighted data, and in the opposite limiting situations of (i) uncorrelated errors in X and in Y, and (ii) completely correlated errors in X and in Y. The special case of (C) generalized orthogonal regression is considered in detail together with well known subcases, namely: (Y) errors in X negligible (ideally null) with respect to errors in Y; (X) errors in Y negligible (ideally null) with respect to errors in X; (O) genuine orthogonal regression; (R) reduced major-axis regression. In the limit of unweighted data, the results determined for functional models are compared with their counterparts related to extreme structural models i.e. the instrumental scatter is negligible (ideally null) with respect to the intrinsic scatter ( Isobe et al., 1990; Feigelson and Babu, 1992). While regression line slope and intercept estimators for functional and structural models necessarily coincide, the contrary holds for related variance estimators even if the residuals obey a Gaussian distribution, with the exception of Y models. An example of astronomical application is considered, concerning the [O/H]-[Fe/H] empirical relations deduced from five samples related to different stars and/or different methods of oxygen abundance determination. For selected samples and assigned methods, different regression models yield consistent results within the errors (∓ σ) for both

  16. Benchmark dose analysis via nonparametric regression modeling

    PubMed Central

    Piegorsch, Walter W.; Xiong, Hui; Bhattacharya, Rabi N.; Lin, Lizhen

    2013-01-01

    Estimation of benchmark doses (BMDs) in quantitative risk assessment traditionally is based upon parametric dose-response modeling. It is a well-known concern, however, that if the chosen parametric model is uncertain and/or misspecified, inaccurate and possibly unsafe low-dose inferences can result. We describe a nonparametric approach for estimating BMDs with quantal-response data based on an isotonic regression method, and also study use of corresponding, nonparametric, bootstrap-based confidence limits for the BMD. We explore the confidence limits’ small-sample properties via a simulation study, and illustrate the calculations with an example from cancer risk assessment. It is seen that this nonparametric approach can provide a useful alternative for BMD estimation when faced with the problem of parametric model uncertainty. PMID:23683057

  17. Modelling the behaviour of unemployment rates in the US over time and across space

    NASA Astrophysics Data System (ADS)

    Holmes, Mark J.; Otero, Jesús; Panagiotidis, Theodore

    2013-11-01

    This paper provides evidence that unemployment rates across US states are stationary and therefore behave according to the natural rate hypothesis. We provide new insights by considering the effect of key variables on the speed of adjustment associated with unemployment shocks. A highly-dimensional VAR analysis of the half-lives associated with shocks to unemployment rates in pairs of states suggests that the distance between states and vacancy rates respectively exert a positive and negative influence. We find that higher homeownership rates do not lead to higher half-lives. When the symmetry assumption is relaxed through quantile regression, support for the Oswald hypothesis through a positive relationship between homeownership rates and half-lives is found at the higher quantiles.

  18. Forecasting daily meteorological time series using ARIMA and regression models

    NASA Astrophysics Data System (ADS)

    Murat, Małgorzata; Malinowska, Iwona; Gos, Magdalena; Krzyszczak, Jaromir

    2018-04-01

    The daily air temperature and precipitation time series recorded between January 1, 1980 and December 31, 2010 in four European sites (Jokioinen, Dikopshof, Lleida and Lublin) from different climatic zones were modeled and forecasted. In our forecasting we used the methods of the Box-Jenkins and Holt- Winters seasonal auto regressive integrated moving-average, the autoregressive integrated moving-average with external regressors in the form of Fourier terms and the time series regression, including trend and seasonality components methodology with R software. It was demonstrated that obtained models are able to capture the dynamics of the time series data and to produce sensible forecasts.

  19. Analyzing Student Learning Outcomes: Usefulness of Logistic and Cox Regression Models. IR Applications, Volume 5

    ERIC Educational Resources Information Center

    Chen, Chau-Kuang

    2005-01-01

    Logistic and Cox regression methods are practical tools used to model the relationships between certain student learning outcomes and their relevant explanatory variables. The logistic regression model fits an S-shaped curve into a binary outcome with data points of zero and one. The Cox regression model allows investigators to study the duration…

  20. Replica analysis of overfitting in regression models for time-to-event data

    NASA Astrophysics Data System (ADS)

    Coolen, A. C. C.; Barrett, J. E.; Paga, P.; Perez-Vicente, C. J.

    2017-09-01

    Overfitting, which happens when the number of parameters in a model is too large compared to the number of data points available for determining these parameters, is a serious and growing problem in survival analysis. While modern medicine presents us with data of unprecedented dimensionality, these data cannot yet be used effectively for clinical outcome prediction. Standard error measures in maximum likelihood regression, such as p-values and z-scores, are blind to overfitting, and even for Cox’s proportional hazards model (the main tool of medical statisticians), one finds in literature only rules of thumb on the number of samples required to avoid overfitting. In this paper we present a mathematical theory of overfitting in regression models for time-to-event data, which aims to increase our quantitative understanding of the problem and provide practical tools with which to correct regression outcomes for the impact of overfitting. It is based on the replica method, a statistical mechanical technique for the analysis of heterogeneous many-variable systems that has been used successfully for several decades in physics, biology, and computer science, but not yet in medical statistics. We develop the theory initially for arbitrary regression models for time-to-event data, and verify its predictions in detail for the popular Cox model.

  1. Cross-validation pitfalls when selecting and assessing regression and classification models.

    PubMed

    Krstajic, Damjan; Buturovic, Ljubomir J; Leahy, David E; Thomas, Simon

    2014-03-29

    We address the problem of selecting and assessing classification and regression models using cross-validation. Current state-of-the-art methods can yield models with high variance, rendering them unsuitable for a number of practical applications including QSAR. In this paper we describe and evaluate best practices which improve reliability and increase confidence in selected models. A key operational component of the proposed methods is cloud computing which enables routine use of previously infeasible approaches. We describe in detail an algorithm for repeated grid-search V-fold cross-validation for parameter tuning in classification and regression, and we define a repeated nested cross-validation algorithm for model assessment. As regards variable selection and parameter tuning we define two algorithms (repeated grid-search cross-validation and double cross-validation), and provide arguments for using the repeated grid-search in the general case. We show results of our algorithms on seven QSAR datasets. The variation of the prediction performance, which is the result of choosing different splits of the dataset in V-fold cross-validation, needs to be taken into account when selecting and assessing classification and regression models. We demonstrate the importance of repeating cross-validation when selecting an optimal model, as well as the importance of repeating nested cross-validation when assessing a prediction error.

  2. Robust inference under the beta regression model with application to health care studies.

    PubMed

    Ghosh, Abhik

    2017-01-01

    Data on rates, percentages, or proportions arise frequently in many different applied disciplines like medical biology, health care, psychology, and several others. In this paper, we develop a robust inference procedure for the beta regression model, which is used to describe such response variables taking values in (0, 1) through some related explanatory variables. In relation to the beta regression model, the issue of robustness has been largely ignored in the literature so far. The existing maximum likelihood-based inference has serious lack of robustness against outliers in data and generate drastically different (erroneous) inference in the presence of data contamination. Here, we develop the robust minimum density power divergence estimator and a class of robust Wald-type tests for the beta regression model along with several applications. We derive their asymptotic properties and describe their robustness theoretically through the influence function analyses. Finite sample performances of the proposed estimators and tests are examined through suitable simulation studies and real data applications in the context of health care and psychology. Although we primarily focus on the beta regression models with a fixed dispersion parameter, some indications are also provided for extension to the variable dispersion beta regression models with an application.

  3. Functional mixture regression.

    PubMed

    Yao, Fang; Fu, Yuejiao; Lee, Thomas C M

    2011-04-01

    In functional linear models (FLMs), the relationship between the scalar response and the functional predictor process is often assumed to be identical for all subjects. Motivated by both practical and methodological considerations, we relax this assumption and propose a new class of functional regression models that allow the regression structure to vary for different groups of subjects. By projecting the predictor process onto its eigenspace, the new functional regression model is simplified to a framework that is similar to classical mixture regression models. This leads to the proposed approach named as functional mixture regression (FMR). The estimation of FMR can be readily carried out using existing software implemented for functional principal component analysis and mixture regression. The practical necessity and performance of FMR are illustrated through applications to a longevity analysis of female medflies and a human growth study. Theoretical investigations concerning the consistent estimation and prediction properties of FMR along with simulation experiments illustrating its empirical properties are presented in the supplementary material available at Biostatistics online. Corresponding results demonstrate that the proposed approach could potentially achieve substantial gains over traditional FLMs.

  4. Regression mixture models: Does modeling the covariance between independent variables and latent classes improve the results?

    PubMed Central

    Lamont, Andrea E.; Vermunt, Jeroen K.; Van Horn, M. Lee

    2016-01-01

    Regression mixture models are increasingly used as an exploratory approach to identify heterogeneity in the effects of a predictor on an outcome. In this simulation study, we test the effects of violating an implicit assumption often made in these models – i.e., independent variables in the model are not directly related to latent classes. Results indicated that the major risk of failing to model the relationship between predictor and latent class was an increase in the probability of selecting additional latent classes and biased class proportions. Additionally, this study tests whether regression mixture models can detect a piecewise relationship between a predictor and outcome. Results suggest that these models are able to detect piecewise relations, but only when the relationship between the latent class and the predictor is included in model estimation. We illustrate the implications of making this assumption through a re-analysis of applied data examining heterogeneity in the effects of family resources on academic achievement. We compare previous results (which assumed no relation between independent variables and latent class) to the model where this assumption is lifted. Implications and analytic suggestions for conducting regression mixture based on these findings are noted. PMID:26881956

  5. [Comparison of predictive effect between the single auto regressive integrated moving average (ARIMA) model and the ARIMA-generalized regression neural network (GRNN) combination model on the incidence of scarlet fever].

    PubMed

    Zhu, Yu; Xia, Jie-lai; Wang, Jing

    2009-09-01

    Application of the 'single auto regressive integrated moving average (ARIMA) model' and the 'ARIMA-generalized regression neural network (GRNN) combination model' in the research of the incidence of scarlet fever. Establish the auto regressive integrated moving average model based on the data of the monthly incidence on scarlet fever of one city, from 2000 to 2006. The fitting values of the ARIMA model was used as input of the GRNN, and the actual values were used as output of the GRNN. After training the GRNN, the effect of the single ARIMA model and the ARIMA-GRNN combination model was then compared. The mean error rate (MER) of the single ARIMA model and the ARIMA-GRNN combination model were 31.6%, 28.7% respectively and the determination coefficient (R(2)) of the two models were 0.801, 0.872 respectively. The fitting efficacy of the ARIMA-GRNN combination model was better than the single ARIMA, which had practical value in the research on time series data such as the incidence of scarlet fever.

  6. An operational GLS model for hydrologic regression

    USGS Publications Warehouse

    Tasker, Gary D.; Stedinger, J.R.

    1989-01-01

    Recent Monte Carlo studies have documented the value of generalized least squares (GLS) procedures to estimate empirical relationships between streamflow statistics and physiographic basin characteristics. This paper presents a number of extensions of the GLS method that deal with realities and complexities of regional hydrologic data sets that were not addressed in the simulation studies. These extensions include: (1) a more realistic model of the underlying model errors; (2) smoothed estimates of cross correlation of flows; (3) procedures for including historical flow data; (4) diagnostic statistics describing leverage and influence for GLS regression; and (5) the formulation of a mathematical program for evaluating future gaging activities. ?? 1989.

  7. Goodness-Of-Fit Test for Nonparametric Regression Models: Smoothing Spline ANOVA Models as Example.

    PubMed

    Teran Hidalgo, Sebastian J; Wu, Michael C; Engel, Stephanie M; Kosorok, Michael R

    2018-06-01

    Nonparametric regression models do not require the specification of the functional form between the outcome and the covariates. Despite their popularity, the amount of diagnostic statistics, in comparison to their parametric counter-parts, is small. We propose a goodness-of-fit test for nonparametric regression models with linear smoother form. In particular, we apply this testing framework to smoothing spline ANOVA models. The test can consider two sources of lack-of-fit: whether covariates that are not currently in the model need to be included, and whether the current model fits the data well. The proposed method derives estimated residuals from the model. Then, statistical dependence is assessed between the estimated residuals and the covariates using the HSIC. If dependence exists, the model does not capture all the variability in the outcome associated with the covariates, otherwise the model fits the data well. The bootstrap is used to obtain p-values. Application of the method is demonstrated with a neonatal mental development data analysis. We demonstrate correct type I error as well as power performance through simulations.

  8. Linear Multivariable Regression Models for Prediction of Eddy Dissipation Rate from Available Meteorological Data

    NASA Technical Reports Server (NTRS)

    MCKissick, Burnell T. (Technical Monitor); Plassman, Gerald E.; Mall, Gerald H.; Quagliano, John R.

    2005-01-01

    Linear multivariable regression models for predicting day and night Eddy Dissipation Rate (EDR) from available meteorological data sources are defined and validated. Model definition is based on a combination of 1997-2000 Dallas/Fort Worth (DFW) data sources, EDR from Aircraft Vortex Spacing System (AVOSS) deployment data, and regression variables primarily from corresponding Automated Surface Observation System (ASOS) data. Model validation is accomplished through EDR predictions on a similar combination of 1994-1995 Memphis (MEM) AVOSS and ASOS data. Model forms include an intercept plus a single term of fixed optimal power for each of these regression variables; 30-minute forward averaged mean and variance of near-surface wind speed and temperature, variance of wind direction, and a discrete cloud cover metric. Distinct day and night models, regressing on EDR and the natural log of EDR respectively, yield best performance and avoid model discontinuity over day/night data boundaries.

  9. Reconstruction of missing daily streamflow data using dynamic regression models

    NASA Astrophysics Data System (ADS)

    Tencaliec, Patricia; Favre, Anne-Catherine; Prieur, Clémentine; Mathevet, Thibault

    2015-12-01

    River discharge is one of the most important quantities in hydrology. It provides fundamental records for water resources management and climate change monitoring. Even very short data-gaps in this information can cause extremely different analysis outputs. Therefore, reconstructing missing data of incomplete data sets is an important step regarding the performance of the environmental models, engineering, and research applications, thus it presents a great challenge. The objective of this paper is to introduce an effective technique for reconstructing missing daily discharge data when one has access to only daily streamflow data. The proposed procedure uses a combination of regression and autoregressive integrated moving average models (ARIMA) called dynamic regression model. This model uses the linear relationship between neighbor and correlated stations and then adjusts the residual term by fitting an ARIMA structure. Application of the model to eight daily streamflow data for the Durance river watershed showed that the model yields reliable estimates for the missing data in the time series. Simulation studies were also conducted to evaluate the performance of the procedure.

  10. Development of LACIE CCEA-1 weather/wheat yield models. [regression analysis

    NASA Technical Reports Server (NTRS)

    Strommen, N. D.; Sakamoto, C. M.; Leduc, S. K.; Umberger, D. E. (Principal Investigator)

    1979-01-01

    The advantages and disadvantages of the casual (phenological, dynamic, physiological), statistical regression, and analog approaches to modeling for grain yield are examined. Given LACIE's primary goal of estimating wheat production for the large areas of eight major wheat-growing regions, the statistical regression approach of correlating historical yield and climate data offered the Center for Climatic and Environmental Assessment the greatest potential return within the constraints of time and data sources. The basic equation for the first generation wheat-yield model is given. Topics discussed include truncation, trend variable, selection of weather variables, episodic events, strata selection, operational data flow, weighting, and model results.

  11. Association Between Awareness of Hypertension and Health-Related Quality of Life in a Cross-Sectional Population-Based Study in Rural Area of Northwest China.

    PubMed

    Mi, Baibing; Dang, Shaonong; Li, Qiang; Zhao, Yaling; Yang, Ruihai; Wang, Duolao; Yan, Hong

    2015-07-01

    Hypertensive patients have more complex health care needs and are more likely to have poorer health-related quality of life than normotensive people. The awareness of hypertension could be related to reduce health-related quality of life. We propose the use of quantile regression to explore more detailed relationships between awareness of hypertension and health-related quality of life. In a cross-sectional, population-based study, 2737 participants (including 1035 hypertensive patients and 1702 normotensive participants) completed the Short-Form Health Survey. A quantile regression model was employed to investigate the association of physical component summary scores and mental component summary scores with awareness of hypertension and to evaluate the associated factors. Patients who were aware of hypertension (N = 554) had lower scores than patients who were unaware of hypertension (N = 481). The median (IQR) of physical component summary scores: 48.20 (13.88) versus 53.27 (10.79), P < 0.01; the mental component summary scores: 50.68 (15.09) versus 51.70 (10.65), P = 0.03. adjusting for covariates, the quantile regression results suggest awareness of hypertension was associated with most physical component summary scores quantiles (P < 0.05 except 10th and 20th quantiles) in which the β-estimates from -2.14 (95% CI: -3.80 to -0.48) to -1.45 (95% CI: -2.42 to -0.47), as the same significant trend with some poorer mental component summary scores quantiles in which the β-estimates from -3.47 (95% CI: -6.65 to -0.39) to -2.18 (95% CI: -4.30 to -0.06). The awareness of hypertension has a greater effect on those with intermediate physical component summary status: the β-estimates were equal to -2.04 (95% CI: -3.51 to -0.57, P < 0.05) at the 40th and decreased further to -1.45 (95% CI: -2.42 to -0.47, P < 0.01) at the 90th quantile. Awareness of hypertension was negatively related to health-related quality of life in hypertensive patients in rural western China

  12. Criterion for evaluating the predictive ability of nonlinear regression models without cross-validation.

    PubMed

    Kaneko, Hiromasa; Funatsu, Kimito

    2013-09-23

    We propose predictive performance criteria for nonlinear regression models without cross-validation. The proposed criteria are the determination coefficient and the root-mean-square error for the midpoints between k-nearest-neighbor data points. These criteria can be used to evaluate predictive ability after the regression models are updated, whereas cross-validation cannot be performed in such a situation. The proposed method is effective and helpful in handling big data when cross-validation cannot be applied. By analyzing data from numerical simulations and quantitative structural relationships, we confirm that the proposed criteria enable the predictive ability of the nonlinear regression models to be appropriately quantified.

  13. Improved model of the retardance in citric acid coated ferrofluids using stepwise regression

    NASA Astrophysics Data System (ADS)

    Lin, J. F.; Qiu, X. R.

    2017-06-01

    Citric acid (CA) coated Fe3O4 ferrofluids (FFs) have been conducted for biomedical application. The magneto-optical retardance of CA coated FFs was measured by a Stokes polarimeter. Optimization and multiple regression of retardance in FFs were executed by Taguchi method and Microsoft Excel previously, and the F value of regression model was large enough. However, the model executed by Excel was not systematic. Instead we adopted the stepwise regression to model the retardance of CA coated FFs. From the results of stepwise regression by MATLAB, the developed model had highly predictable ability owing to F of 2.55897e+7 and correlation coefficient of one. The average absolute error of predicted retardances to measured retardances was just 0.0044%. Using the genetic algorithm (GA) in MATLAB, the optimized parametric combination was determined as [4.709 0.12 39.998 70.006] corresponding to the pH of suspension, molar ratio of CA to Fe3O4, CA volume, and coating temperature. The maximum retardance was found as 31.712°, close to that obtained by evolutionary solver in Excel and a relative error of -0.013%. Above all, the stepwise regression method was successfully used to model the retardance of CA coated FFs, and the maximum global retardance was determined by the use of GA.

  14. A quadratic regression modelling on paddy production in the area of Perlis

    NASA Astrophysics Data System (ADS)

    Goh, Aizat Hanis Annas; Ali, Zalila; Nor, Norlida Mohd; Baharum, Adam; Ahmad, Wan Muhamad Amir W.

    2017-08-01

    Polynomial regression models are useful in situations in which the relationship between a response variable and predictor variables is curvilinear. Polynomial regression fits the nonlinear relationship into a least squares linear regression model by decomposing the predictor variables into a kth order polynomial. The polynomial order determines the number of inflexions on the curvilinear fitted line. A second order polynomial forms a quadratic expression (parabolic curve) with either a single maximum or minimum, a third order polynomial forms a cubic expression with both a relative maximum and a minimum. This study used paddy data in the area of Perlis to model paddy production based on paddy cultivation characteristics and environmental characteristics. The results indicated that a quadratic regression model best fits the data and paddy production is affected by urea fertilizer application and the interaction between amount of average rainfall and percentage of area defected by pest and disease. Urea fertilizer application has a quadratic effect in the model which indicated that if the number of days of urea fertilizer application increased, paddy production is expected to decrease until it achieved a minimum value and paddy production is expected to increase at higher number of days of urea application. The decrease in paddy production with an increased in rainfall is greater, the higher the percentage of area defected by pest and disease.

  15. Neural Network and Regression Soft Model Extended for PAX-300 Aircraft Engine

    NASA Technical Reports Server (NTRS)

    Patnaik, Surya N.; Hopkins, Dale A.

    2002-01-01

    In fiscal year 2001, the neural network and regression capabilities of NASA Glenn Research Center's COMETBOARDS design optimization testbed were extended to generate approximate models for the PAX-300 aircraft engine. The analytical model of the engine is defined through nine variables: the fan efficiency factor, the low pressure of the compressor, the high pressure of the compressor, the high pressure of the turbine, the low pressure of the turbine, the operating pressure, and three critical temperatures (T(sub 4), T(sub vane), and T(sub metal)). Numerical Propulsion System Simulation (NPSS) calculations of the specific fuel consumption (TSFC), as a function of the variables can become time consuming, and numerical instabilities can occur during these design calculations. "Soft" models can alleviate both deficiencies. These approximate models are generated from a set of high-fidelity input-output pairs obtained from the NPSS code and a design of the experiment strategy. A neural network and a regression model with 45 weight factors were trained for the input/output pairs. Then, the trained models were validated through a comparison with the original NPSS code. Comparisons of TSFC versus the operating pressure and of TSFC versus the three temperatures (T(sub 4), T(sub vane), and T(sub metal)) are depicted in the figures. The overall performance was satisfactory for both the regression and the neural network model. The regression model required fewer calculations than the neural network model, and it produced marginally superior results. Training the approximate methods is time consuming. Once trained, the approximate methods generated the solution with only a trivial computational effort, reducing the solution time from hours to less than a minute.

  16. Modelling Nitrogen Oxides in Los Angeles Using a Hybrid Dispersion/Land Use Regression Model

    NASA Astrophysics Data System (ADS)

    Wilton, Darren C.

    The goal of this dissertation is to develop models capable of predicting long term annual average NOx concentrations in urban areas. Predictions from simple meteorological dispersion models and seasonal proxies for NO2 oxidation were included as covariates in a land use regression (LUR) model for NOx in Los Angeles, CA. The NO x measurements were obtained from a comprehensive measurement campaign that is part of the Multi-Ethnic Study of Atherosclerosis Air Pollution Study (MESA Air). Simple land use regression models were initially developed using a suite of GIS-derived land use variables developed from various buffer sizes (R²=0.15). Caline3, a simple steady-state Gaussian line source model, was initially incorporated into the land-use regression framework. The addition of this spatio-temporally varying Caline3 covariate improved the simple LUR model predictions. The extent of improvement was much more pronounced for models based solely on the summer measurements (simple LUR: R²=0.45; Caline3/LUR: R²=0.70), than it was for models based on all seasons (R²=0.20). We then used a Lagrangian dispersion model to convert static land use covariates for population density, commercial/industrial area into spatially and temporally varying covariates. The inclusion of these covariates resulted in significant improvement in model prediction (R²=0.57). In addition to the dispersion model covariates described above, a two-week average value of daily peak-hour ozone was included as a surrogate of the oxidation of NO2 during the different sampling periods. This additional covariate further improved overall model performance for all models. The best model by 10-fold cross validation (R²=0.73) contained the Caline3 prediction, a static covariate for length of A3 roads within 50 meters, the Calpuff-adjusted covariates derived from both population density and industrial/commercial land area, and the ozone covariate. This model was tested against annual average NOx

  17. Accounting for spatial effects in land use regression for urban air pollution modeling.

    PubMed

    Bertazzon, Stefania; Johnson, Markey; Eccles, Kristin; Kaplan, Gilaad G

    2015-01-01

    In order to accurately assess air pollution risks, health studies require spatially resolved pollution concentrations. Land-use regression (LUR) models estimate ambient concentrations at a fine spatial scale. However, spatial effects such as spatial non-stationarity and spatial autocorrelation can reduce the accuracy of LUR estimates by increasing regression errors and uncertainty; and statistical methods for resolving these effects--e.g., spatially autoregressive (SAR) and geographically weighted regression (GWR) models--may be difficult to apply simultaneously. We used an alternate approach to address spatial non-stationarity and spatial autocorrelation in LUR models for nitrogen dioxide. Traditional models were re-specified to include a variable capturing wind speed and direction, and re-fit as GWR models. Mean R(2) values for the resulting GWR-wind models (summer: 0.86, winter: 0.73) showed a 10-20% improvement over traditional LUR models. GWR-wind models effectively addressed both spatial effects and produced meaningful predictive models. These results suggest a useful method for improving spatially explicit models. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.

  18. The prediction of intelligence in preschool children using alternative models to regression.

    PubMed

    Finch, W Holmes; Chang, Mei; Davis, Andrew S; Holden, Jocelyn E; Rothlisberg, Barbara A; McIntosh, David E

    2011-12-01

    Statistical prediction of an outcome variable using multiple independent variables is a common practice in the social and behavioral sciences. For example, neuropsychologists are sometimes called upon to provide predictions of preinjury cognitive functioning for individuals who have suffered a traumatic brain injury. Typically, these predictions are made using standard multiple linear regression models with several demographic variables (e.g., gender, ethnicity, education level) as predictors. Prior research has shown conflicting evidence regarding the ability of such models to provide accurate predictions of outcome variables such as full-scale intelligence (FSIQ) test scores. The present study had two goals: (1) to demonstrate the utility of a set of alternative prediction methods that have been applied extensively in the natural sciences and business but have not been frequently explored in the social sciences and (2) to develop models that can be used to predict premorbid cognitive functioning in preschool children. Predictions of Stanford-Binet 5 FSIQ scores for preschool-aged children is used to compare the performance of a multiple regression model with several of these alternative methods. Results demonstrate that classification and regression trees provided more accurate predictions of FSIQ scores than does the more traditional regression approach. Implications of these results are discussed.

  19. A primer for biomedical scientists on how to execute model II linear regression analysis.

    PubMed

    Ludbrook, John

    2012-04-01

    1. There are two very different ways of executing linear regression analysis. One is Model I, when the x-values are fixed by the experimenter. The other is Model II, in which the x-values are free to vary and are subject to error. 2. I have received numerous complaints from biomedical scientists that they have great difficulty in executing Model II linear regression analysis. This may explain the results of a Google Scholar search, which showed that the authors of articles in journals of physiology, pharmacology and biochemistry rarely use Model II regression analysis. 3. I repeat my previous arguments in favour of using least products linear regression analysis for Model II regressions. I review three methods for executing ordinary least products (OLP) and weighted least products (WLP) regression analysis: (i) scientific calculator and/or computer spreadsheet; (ii) specific purpose computer programs; and (iii) general purpose computer programs. 4. Using a scientific calculator and/or computer spreadsheet, it is easy to obtain correct values for OLP slope and intercept, but the corresponding 95% confidence intervals (CI) are inaccurate. 5. Using specific purpose computer programs, the freeware computer program smatr gives the correct OLP regression coefficients and obtains 95% CI by bootstrapping. In addition, smatr can be used to compare the slopes of OLP lines. 6. When using general purpose computer programs, I recommend the commercial programs systat and Statistica for those who regularly undertake linear regression analysis and I give step-by-step instructions in the Supplementary Information as to how to use loss functions. © 2011 The Author. Clinical and Experimental Pharmacology and Physiology. © 2011 Blackwell Publishing Asia Pty Ltd.

  20. REGRESSION MODELS OF RESIDENTIAL EXPOSURE TO CHLORPYRIFOS AND DIAZINON

    EPA Science Inventory

    This study examines the ability of regression models to predict residential exposures to chlorpyrifos and diazinon, based on the information from the NHEXAS-AZ database. The robust method was used to generate "fill-in" values for samples that are below the detection l...

  1. Properties of added variable plots in Cox's regression model.

    PubMed

    Lindkvist, M

    2000-03-01

    The added variable plot is useful for examining the effect of a covariate in regression models. The plot provides information regarding the inclusion of a covariate, and is useful in identifying influential observations on the parameter estimates. Hall et al. (1996) proposed a plot for Cox's proportional hazards model derived by regarding the Cox model as a generalized linear model. This paper proves and discusses properties of this plot. These properties make the plot a valuable tool in model evaluation. Quantities considered include parameter estimates, residuals, leverage, case influence measures and correspondence to previously proposed residuals and diagnostics.

  2. Predictive and mechanistic multivariate linear regression models for reaction development

    PubMed Central

    Santiago, Celine B.; Guo, Jing-Yao

    2018-01-01

    Multivariate Linear Regression (MLR) models utilizing computationally-derived and empirically-derived physical organic molecular descriptors are described in this review. Several reports demonstrating the effectiveness of this methodological approach towards reaction optimization and mechanistic interrogation are discussed. A detailed protocol to access quantitative and predictive MLR models is provided as a guide for model development and parameter analysis. PMID:29719711

  3. Detection of Cutting Tool Wear using Statistical Analysis and Regression Model

    NASA Astrophysics Data System (ADS)

    Ghani, Jaharah A.; Rizal, Muhammad; Nuawi, Mohd Zaki; Haron, Che Hassan Che; Ramli, Rizauddin

    2010-10-01

    This study presents a new method for detecting the cutting tool wear based on the measured cutting force signals. A statistical-based method called Integrated Kurtosis-based Algorithm for Z-Filter technique, called I-kaz was used for developing a regression model and 3D graphic presentation of I-kaz 3D coefficient during machining process. The machining tests were carried out using a CNC turning machine Colchester Master Tornado T4 in dry cutting condition. A Kistler 9255B dynamometer was used to measure the cutting force signals, which were transmitted, analyzed, and displayed in the DasyLab software. Various force signals from machining operation were analyzed, and each has its own I-kaz 3D coefficient. This coefficient was examined and its relationship with flank wear lands (VB) was determined. A regression model was developed due to this relationship, and results of the regression model shows that the I-kaz 3D coefficient value decreases as tool wear increases. The result then is used for real time tool wear monitoring.

  4. Gender difference in the association between food away-from-home consumption and body weight outcomes among Chinese adults.

    PubMed

    Du, Wen-Wen; Zhang, Bing; Wang, Hui-Jun; Wang, Zhi-Hong; Su, Chang; Zhang, Ji-Guo; Zhang, Ji; Jia, Xiao-Fang; Jiang, Hong-Ru

    2016-11-01

    The present study aimed to explore the associations between food away-from-home (FAFH) consumption and body weight outcomes among Chinese adults. FAFH was defined as food prepared at restaurants and the percentage of energy from FAFH was calculated. Measured BMI and waist circumference (WC) were used as body weight outcomes. Quantile regression models for BMI and WC were performed separately by gender. Information on demographic, socio-economic, diet and health parameters at individual, household and community levels was collected in twelve provinces of China. A cross-sectional sample of 7738 non-pregnant individuals aged 18-60 years from the China Health and Nutrition Survey 2011 was analysed. For males, quantile regression models showed that percentage of energy from FAFH was associated with an increase in BMI of 0·01, 0·01, 0·01, 0·02, 0·02 and 0·03 kg/m2 at the 5th, 25th, 50th, 75th, 90th and 95th quantile, and an increase in WC of 0·04, 0·06, 0·06, 0·04, 0·06, 0·05 and 0·07 cm at the 5th, 10th, 25th, 50th, 75th, 90th and 95th quantile. For females, percentage of energy from FAFH was associated with 0·01, 0·01, 0·01 and 0·02 kg/m2 increase in BMI at the 10th, 25th, 90th and 95th quantile, and with 0·05, 0·04, 0·03 and 0·03 cm increase in WC at the 5th, 10th, 25th and 75th quantile. Our findings suggest that FAFH consumption is relatively more important for BMI and WC among males rather than females in China. Public health initiatives are needed to encourage Chinese adults to make healthy food choices when eating out.

  5. Efficient least angle regression for identification of linear-in-the-parameters models

    PubMed Central

    Beach, Thomas H.; Rezgui, Yacine

    2017-01-01

    Least angle regression, as a promising model selection method, differentiates itself from conventional stepwise and stagewise methods, in that it is neither too greedy nor too slow. It is closely related to L1 norm optimization, which has the advantage of low prediction variance through sacrificing part of model bias property in order to enhance model generalization capability. In this paper, we propose an efficient least angle regression algorithm for model selection for a large class of linear-in-the-parameters models with the purpose of accelerating the model selection process. The entire algorithm works completely in a recursive manner, where the correlations between model terms and residuals, the evolving directions and other pertinent variables are derived explicitly and updated successively at every subset selection step. The model coefficients are only computed when the algorithm finishes. The direct involvement of matrix inversions is thereby relieved. A detailed computational complexity analysis indicates that the proposed algorithm possesses significant computational efficiency, compared with the original approach where the well-known efficient Cholesky decomposition is involved in solving least angle regression. Three artificial and real-world examples are employed to demonstrate the effectiveness, efficiency and numerical stability of the proposed algorithm. PMID:28293140

  6. Robust neural network with applications to credit portfolio data analysis.

    PubMed

    Feng, Yijia; Li, Runze; Sudjianto, Agus; Zhang, Yiyun

    2010-01-01

    In this article, we study nonparametric conditional quantile estimation via neural network structure. We proposed an estimation method that combines quantile regression and neural network (robust neural network, RNN). It provides good smoothing performance in the presence of outliers and can be used to construct prediction bands. A Majorization-Minimization (MM) algorithm was developed for optimization. Monte Carlo simulation study is conducted to assess the performance of RNN. Comparison with other nonparametric regression methods (e.g., local linear regression and regression splines) in real data application demonstrate the advantage of the newly proposed procedure.

  7. Regression models to predict hip joint centers in pathological hip population.

    PubMed

    Mantovani, Giulia; Ng, K C Geoffrey; Lamontagne, Mario

    2016-02-01

    The purpose was to investigate the validity of Harrington's and Davis's hip joint center (HJC) regression equations on a population affected by a hip deformity, (i.e., femoroacetabular impingement). Sixty-seven participants (21 healthy controls, 46 with a cam-type deformity) underwent pelvic CT imaging. Relevant bony landmarks and geometric HJCs were digitized from the images, and skin thickness was measured for the anterior and posterior superior iliac spines. Non-parametric statistical and Bland-Altman tests analyzed differences between the predicted HJC (from regression equations) and the actual HJC (from CT images). The error from Davis's model (25.0 ± 6.7 mm) was larger than Harrington's (12.3 ± 5.9 mm, p<0.001). There were no differences between groups, thus, studies on femoroacetabular impingement can implement conventional regression models. Measured skin thickness was 9.7 ± 7.0mm and 19.6 ± 10.9 mm for the anterior and posterior bony landmarks, respectively, and correlated with body mass index. Skin thickness estimates can be considered to reduce the systematic error introduced by surface markers. New adult-specific regression equations were developed from the CT dataset, with the hypothesis that they could provide better estimates when tuned to a larger adult-specific dataset. The linear models were validated on external datasets and using leave-one-out cross-validation techniques; Prediction errors were comparable to those of Harrington's model, despite the adult-specific population and the larger sample size, thus, prediction accuracy obtained from these parameters could not be improved. Copyright © 2015 Elsevier B.V. All rights reserved.

  8. Two levels ARIMAX and regression models for forecasting time series data with calendar variation effects

    NASA Astrophysics Data System (ADS)

    Suhartono, Lee, Muhammad Hisyam; Prastyo, Dedy Dwi

    2015-12-01

    The aim of this research is to develop a calendar variation model for forecasting retail sales data with the Eid ul-Fitr effect. The proposed model is based on two methods, namely two levels ARIMAX and regression methods. Two levels ARIMAX and regression models are built by using ARIMAX for the first level and regression for the second level. Monthly men's jeans and women's trousers sales in a retail company for the period January 2002 to September 2009 are used as case study. In general, two levels of calendar variation model yields two models, namely the first model to reconstruct the sales pattern that already occurred, and the second model to forecast the effect of increasing sales due to Eid ul-Fitr that affected sales at the same and the previous months. The results show that the proposed two level calendar variation model based on ARIMAX and regression methods yields better forecast compared to the seasonal ARIMA model and Neural Networks.

  9. Establishment of Biological Reference Intervals and Reference Curve for Urea by Exploratory Parametric and Non-Parametric Quantile Regression Models.

    PubMed

    Sarkar, Rajarshi

    2013-07-01

    The validity of the entire renal function tests as a diagnostic tool depends substantially on the Biological Reference Interval (BRI) of urea. Establishment of BRI of urea is difficult partly because exclusion criteria for selection of reference data are quite rigid and partly due to the compartmentalization considerations regarding age and sex of the reference individuals. Moreover, construction of Biological Reference Curve (BRC) of urea is imperative to highlight the partitioning requirements. This a priori study examines the data collected by measuring serum urea of 3202 age and sex matched individuals, aged between 1 and 80 years, by a kinetic UV Urease/GLDH method on a Roche Cobas 6000 auto-analyzer. Mann-Whitney U test of the reference data confirmed the partitioning requirement by both age and sex. Further statistical analysis revealed the incompatibility of the data for a proposed parametric model. Hence the data was non-parametrically analysed. BRI was found to be identical for both sexes till the 2(nd) decade, and the BRI for males increased progressively 6(th) decade onwards. Four non-parametric models were postulated for construction of BRC: Gaussian kernel, double kernel, local mean and local constant, of which the last one generated the best-fitting curves. Clinical decision making should become easier and diagnostic implications of renal function tests should become more meaningful if this BRI is followed and the BRC is used as a desktop tool in conjunction with similar data for serum creatinine.

  10. Nonlinear-regression groundwater flow modeling of a deep regional aquifer system

    USGS Publications Warehouse

    Cooley, Richard L.; Konikow, Leonard F.; Naff, Richard L.

    1986-01-01

    A nonlinear regression groundwater flow model, based on a Galerkin finite-element discretization, was used to analyze steady state two-dimensional groundwater flow in the areally extensive Madison aquifer in a 75,000 mi2 area of the Northern Great Plains. Regression parameters estimated include intrinsic permeabilities of the main aquifer and separate lineament zones, discharges from eight major springs surrounding the Black Hills, and specified heads on the model boundaries. Aquifer thickness and temperature variations were included as specified functions. The regression model was applied using sequential F testing so that the fewest number and simplest zonation of intrinsic permeabilities, combined with the simplest overall model, were evaluated initially; additional complexities (such as subdivisions of zones and variations in temperature and thickness) were added in stages to evaluate the subsequent degree of improvement in the model results. It was found that only the eight major springs, a single main aquifer intrinsic permeability, two separate lineament intrinsic permeabilities of much smaller values, and temperature variations are warranted by the observed data (hydraulic heads and prior information on some parameters) for inclusion in a model that attempts to explain significant controls on groundwater flow. Addition of thickness variations did not significantly improve model results; however, thickness variations were included in the final model because they are fairly well defined. Effects on the observed head distribution from other features, such as vertical leakage and regional variations in intrinsic permeability, apparently were overshadowed by measurement errors in the observed heads. Estimates of the parameters correspond well to estimates obtained from other independent sources.

  11. Nonlinear-Regression Groundwater Flow Modeling of a Deep Regional Aquifer System

    NASA Astrophysics Data System (ADS)

    Cooley, Richard L.; Konikow, Leonard F.; Naff, Richard L.

    1986-12-01

    A nonlinear regression groundwater flow model, based on a Galerkin finite-element discretization, was used to analyze steady state two-dimensional groundwater flow in the areally extensive Madison aquifer in a 75,000 mi2 area of the Northern Great Plains. Regression parameters estimated include intrinsic permeabilities of the main aquifer and separate lineament zones, discharges from eight major springs surrounding the Black Hills, and specified heads on the model boundaries. Aquifer thickness and temperature variations were included as specified functions. The regression model was applied using sequential F testing so that the fewest number and simplest zonation of intrinsic permeabilities, combined with the simplest overall model, were evaluated initially; additional complexities (such as subdivisions of zones and variations in temperature and thickness) were added in stages to evaluate the subsequent degree of improvement in the model results. It was found that only the eight major springs, a single main aquifer intrinsic permeability, two separate lineament intrinsic permeabilities of much smaller values, and temperature variations are warranted by the observed data (hydraulic heads and prior information on some parameters) for inclusion in a model that attempts to explain significant controls on groundwater flow. Addition of thickness variations did not significantly improve model results; however, thickness variations were included in the final model because they are fairly well defined. Effects on the observed head distribution from other features, such as vertical leakage and regional variations in intrinsic permeability, apparently were overshadowed by measurement errors in the observed heads. Estimates of the parameters correspond well to estimates obtained from other independent sources.

  12. Regional regression models of watershed suspended-sediment discharge for the eastern United States

    NASA Astrophysics Data System (ADS)

    Roman, David C.; Vogel, Richard M.; Schwarz, Gregory E.

    2012-11-01

    SummaryEstimates of mean annual watershed sediment discharge, derived from long-term measurements of suspended-sediment concentration and streamflow, often are not available at locations of interest. The goal of this study was to develop multivariate regression models to enable prediction of mean annual suspended-sediment discharge from available basin characteristics useful for most ungaged river locations in the eastern United States. The models are based on long-term mean sediment discharge estimates and explanatory variables obtained from a combined dataset of 1201 US Geological Survey (USGS) stations derived from a SPAtially Referenced Regression on Watershed attributes (SPARROW) study and the Geospatial Attributes of Gages for Evaluating Streamflow (GAGES) database. The resulting regional regression models summarized for major US water resources regions 1-8, exhibited prediction R2 values ranging from 76.9% to 92.7% and corresponding average model prediction errors ranging from 56.5% to 124.3%. Results from cross-validation experiments suggest that a majority of the models will perform similarly to calibration runs. The 36-parameter regional regression models also outperformed a 16-parameter national SPARROW model of suspended-sediment discharge and indicate that mean annual sediment loads in the eastern United States generally correlates with a combination of basin area, land use patterns, seasonal precipitation, soil composition, hydrologic modification, and to a lesser extent, topography.

  13. Regional regression models of watershed suspended-sediment discharge for the eastern United States

    USGS Publications Warehouse

    Roman, David C.; Vogel, Richard M.; Schwarz, Gregory E.

    2012-01-01

    Estimates of mean annual watershed sediment discharge, derived from long-term measurements of suspended-sediment concentration and streamflow, often are not available at locations of interest. The goal of this study was to develop multivariate regression models to enable prediction of mean annual suspended-sediment discharge from available basin characteristics useful for most ungaged river locations in the eastern United States. The models are based on long-term mean sediment discharge estimates and explanatory variables obtained from a combined dataset of 1201 US Geological Survey (USGS) stations derived from a SPAtially Referenced Regression on Watershed attributes (SPARROW) study and the Geospatial Attributes of Gages for Evaluating Streamflow (GAGES) database. The resulting regional regression models summarized for major US water resources regions 1–8, exhibited prediction R2 values ranging from 76.9% to 92.7% and corresponding average model prediction errors ranging from 56.5% to 124.3%. Results from cross-validation experiments suggest that a majority of the models will perform similarly to calibration runs. The 36-parameter regional regression models also outperformed a 16-parameter national SPARROW model of suspended-sediment discharge and indicate that mean annual sediment loads in the eastern United States generally correlates with a combination of basin area, land use patterns, seasonal precipitation, soil composition, hydrologic modification, and to a lesser extent, topography.

  14. A general framework for the use of logistic regression models in meta-analysis.

    PubMed

    Simmonds, Mark C; Higgins, Julian Pt

    2016-12-01

    Where individual participant data are available for every randomised trial in a meta-analysis of dichotomous event outcomes, "one-stage" random-effects logistic regression models have been proposed as a way to analyse these data. Such models can also be used even when individual participant data are not available and we have only summary contingency table data. One benefit of this one-stage regression model over conventional meta-analysis methods is that it maximises the correct binomial likelihood for the data and so does not require the common assumption that effect estimates are normally distributed. A second benefit of using this model is that it may be applied, with only minor modification, in a range of meta-analytic scenarios, including meta-regression, network meta-analyses and meta-analyses of diagnostic test accuracy. This single model can potentially replace the variety of often complex methods used in these areas. This paper considers, with a range of meta-analysis examples, how random-effects logistic regression models may be used in a number of different types of meta-analyses. This one-stage approach is compared with widely used meta-analysis methods including Bayesian network meta-analysis and the bivariate and hierarchical summary receiver operating characteristic (ROC) models for meta-analyses of diagnostic test accuracy. © The Author(s) 2014.

  15. Regression analysis of informative current status data with the additive hazards model.

    PubMed

    Zhao, Shishun; Hu, Tao; Ma, Ling; Wang, Peijie; Sun, Jianguo

    2015-04-01

    This paper discusses regression analysis of current status failure time data arising from the additive hazards model in the presence of informative censoring. Many methods have been developed for regression analysis of current status data under various regression models if the censoring is noninformative, and also there exists a large literature on parametric analysis of informative current status data in the context of tumorgenicity experiments. In this paper, a semiparametric maximum likelihood estimation procedure is presented and in the method, the copula model is employed to describe the relationship between the failure time of interest and the censoring time. Furthermore, I-splines are used to approximate the nonparametric functions involved and the asymptotic consistency and normality of the proposed estimators are established. A simulation study is conducted and indicates that the proposed approach works well for practical situations. An illustrative example is also provided.

  16. Differential gene expression detection and sample classification using penalized linear regression models.

    PubMed

    Wu, Baolin

    2006-02-15

    Differential gene expression detection and sample classification using microarray data have received much research interest recently. Owing to the large number of genes p and small number of samples n (p > n), microarray data analysis poses big challenges for statistical analysis. An obvious problem owing to the 'large p small n' is over-fitting. Just by chance, we are likely to find some non-differentially expressed genes that can classify the samples very well. The idea of shrinkage is to regularize the model parameters to reduce the effects of noise and produce reliable inferences. Shrinkage has been successfully applied in the microarray data analysis. The SAM statistics proposed by Tusher et al. and the 'nearest shrunken centroid' proposed by Tibshirani et al. are ad hoc shrinkage methods. Both methods are simple, intuitive and prove to be useful in empirical studies. Recently Wu proposed the penalized t/F-statistics with shrinkage by formally using the (1) penalized linear regression models for two-class microarray data, showing good performance. In this paper we systematically discussed the use of penalized regression models for analyzing microarray data. We generalize the two-class penalized t/F-statistics proposed by Wu to multi-class microarray data. We formally derive the ad hoc shrunken centroid used by Tibshirani et al. using the (1) penalized regression models. And we show that the penalized linear regression models provide a rigorous and unified statistical framework for sample classification and differential gene expression detection.

  17. Prediction of unwanted pregnancies using logistic regression, probit regression and discriminant analysis

    PubMed Central

    Ebrahimzadeh, Farzad; Hajizadeh, Ebrahim; Vahabi, Nasim; Almasian, Mohammad; Bakhteyar, Katayoon

    2015-01-01

    Background: Unwanted pregnancy not intended by at least one of the parents has undesirable consequences for the family and the society. In the present study, three classification models were used and compared to predict unwanted pregnancies in an urban population. Methods: In this cross-sectional study, 887 pregnant mothers referring to health centers in Khorramabad, Iran, in 2012 were selected by the stratified and cluster sampling; relevant variables were measured and for prediction of unwanted pregnancy, logistic regression, discriminant analysis, and probit regression models and SPSS software version 21 were used. To compare these models, indicators such as sensitivity, specificity, the area under the ROC curve, and the percentage of correct predictions were used. Results: The prevalence of unwanted pregnancies was 25.3%. The logistic and probit regression models indicated that parity and pregnancy spacing, contraceptive methods, household income and number of living male children were related to unwanted pregnancy. The performance of the models based on the area under the ROC curve was 0.735, 0.733, and 0.680 for logistic regression, probit regression, and linear discriminant analysis, respectively. Conclusion: Given the relatively high prevalence of unwanted pregnancies in Khorramabad, it seems necessary to revise family planning programs. Despite the similar accuracy of the models, if the researcher is interested in the interpretability of the results, the use of the logistic regression model is recommended. PMID:26793655

  18. Prediction of unwanted pregnancies using logistic regression, probit regression and discriminant analysis.

    PubMed

    Ebrahimzadeh, Farzad; Hajizadeh, Ebrahim; Vahabi, Nasim; Almasian, Mohammad; Bakhteyar, Katayoon

    2015-01-01

    Unwanted pregnancy not intended by at least one of the parents has undesirable consequences for the family and the society. In the present study, three classification models were used and compared to predict unwanted pregnancies in an urban population. In this cross-sectional study, 887 pregnant mothers referring to health centers in Khorramabad, Iran, in 2012 were selected by the stratified and cluster sampling; relevant variables were measured and for prediction of unwanted pregnancy, logistic regression, discriminant analysis, and probit regression models and SPSS software version 21 were used. To compare these models, indicators such as sensitivity, specificity, the area under the ROC curve, and the percentage of correct predictions were used. The prevalence of unwanted pregnancies was 25.3%. The logistic and probit regression models indicated that parity and pregnancy spacing, contraceptive methods, household income and number of living male children were related to unwanted pregnancy. The performance of the models based on the area under the ROC curve was 0.735, 0.733, and 0.680 for logistic regression, probit regression, and linear discriminant analysis, respectively. Given the relatively high prevalence of unwanted pregnancies in Khorramabad, it seems necessary to revise family planning programs. Despite the similar accuracy of the models, if the researcher is interested in the interpretability of the results, the use of the logistic regression model is recommended.

  19. Convergent Time-Varying Regression Models for Data Streams: Tracking Concept Drift by the Recursive Parzen-Based Generalized Regression Neural Networks.

    PubMed

    Duda, Piotr; Jaworski, Maciej; Rutkowski, Leszek

    2018-03-01

    One of the greatest challenges in data mining is related to processing and analysis of massive data streams. Contrary to traditional static data mining problems, data streams require that each element is processed only once, the amount of allocated memory is constant and the models incorporate changes of investigated streams. A vast majority of available methods have been developed for data stream classification and only a few of them attempted to solve regression problems, using various heuristic approaches. In this paper, we develop mathematically justified regression models working in a time-varying environment. More specifically, we study incremental versions of generalized regression neural networks, called IGRNNs, and we prove their tracking properties - weak (in probability) and strong (with probability one) convergence assuming various concept drift scenarios. First, we present the IGRNNs, based on the Parzen kernels, for modeling stationary systems under nonstationary noise. Next, we extend our approach to modeling time-varying systems under nonstationary noise. We present several types of concept drifts to be handled by our approach in such a way that weak and strong convergence holds under certain conditions. Finally, in the series of simulations, we compare our method with commonly used heuristic approaches, based on forgetting mechanism or sliding windows, to deal with concept drift. Finally, we apply our concept in a real life scenario solving the problem of currency exchange rates prediction.

  20. A simulation study on Bayesian Ridge regression models for several collinearity levels

    NASA Astrophysics Data System (ADS)

    Efendi, Achmad; Effrihan

    2017-12-01

    When analyzing data with multiple regression model if there are collinearities, then one or several predictor variables are usually omitted from the model. However, there sometimes some reasons, for instance medical or economic reasons, the predictors are all important and should be included in the model. Ridge regression model is not uncommon in some researches to use to cope with collinearity. Through this modeling, weights for predictor variables are used for estimating parameters. The next estimation process could follow the concept of likelihood. Furthermore, for the estimation nowadays the Bayesian version could be an alternative. This estimation method does not match likelihood one in terms of popularity due to some difficulties; computation and so forth. Nevertheless, with the growing improvement of computational methodology recently, this caveat should not at the moment become a problem. This paper discusses about simulation process for evaluating the characteristic of Bayesian Ridge regression parameter estimates. There are several simulation settings based on variety of collinearity levels and sample sizes. The results show that Bayesian method gives better performance for relatively small sample sizes, and for other settings the method does perform relatively similar to the likelihood method.

  1. A hybrid neural network model for noisy data regression.

    PubMed

    Lee, Eric W M; Lim, Chee Peng; Yuen, Richard K K; Lo, S M

    2004-04-01

    A hybrid neural network model, based on the fusion of fuzzy adaptive resonance theory (FA ART) and the general regression neural network (GRNN), is proposed in this paper. Both FA and the GRNN are incremental learning systems and are very fast in network training. The proposed hybrid model, denoted as GRNNFA, is able to retain these advantages and, at the same time, to reduce the computational requirements in calculating and storing information of the kernels. A clustering version of the GRNN is designed with data compression by FA for noise removal. An adaptive gradient-based kernel width optimization algorithm has also been devised. Convergence of the gradient descent algorithm can be accelerated by the geometric incremental growth of the updating factor. A series of experiments with four benchmark datasets have been conducted to assess and compare effectiveness of GRNNFA with other approaches. The GRNNFA model is also employed in a novel application task for predicting the evacuation time of patrons at typical karaoke centers in Hong Kong in the event of fire. The results positively demonstrate the applicability of GRNNFA in noisy data regression problems.

  2. Prediction models for CO2 emission in Malaysia using best subsets regression and multi-linear regression

    NASA Astrophysics Data System (ADS)

    Tan, C. H.; Matjafri, M. Z.; Lim, H. S.

    2015-10-01

    This paper presents the prediction models which analyze and compute the CO2 emission in Malaysia. Each prediction model for CO2 emission will be analyzed based on three main groups which is transportation, electricity and heat production as well as residential buildings and commercial and public services. The prediction models were generated using data obtained from World Bank Open Data. Best subset method will be used to remove irrelevant data and followed by multi linear regression to produce the prediction models. From the results, high R-square (prediction) value was obtained and this implies that the models are reliable to predict the CO2 emission by using specific data. In addition, the CO2 emissions from these three groups are forecasted using trend analysis plots for observation purpose.

  3. Development and Evaluation of Land-Use Regression Models Using Modeled Air Quality Concentrations

    EPA Science Inventory

    Abstract Land-use regression (LUR) models have emerged as a preferred methodology for estimating individual exposure to ambient air pollution in epidemiologic studies in absence of subject-specific measurements. Although there is a growing literature focused on LUR evaluation, fu...

  4. Extended cox regression model: The choice of timefunction

    NASA Astrophysics Data System (ADS)

    Isik, Hatice; Tutkun, Nihal Ata; Karasoy, Durdu

    2017-07-01

    Cox regression model (CRM), which takes into account the effect of censored observations, is one the most applicative and usedmodels in survival analysis to evaluate the effects of covariates. Proportional hazard (PH), requires a constant hazard ratio over time, is the assumptionofCRM. Using extended CRM provides the test of including a time dependent covariate to assess the PH assumption or an alternative model in case of nonproportional hazards. In this study, the different types of real data sets are used to choose the time function and the differences between time functions are analyzed and discussed.

  5. Poisson regression models outperform the geometrical model in estimating the peak-to-trough ratio of seasonal variation: a simulation study.

    PubMed

    Christensen, A L; Lundbye-Christensen, S; Dethlefsen, C

    2011-12-01

    Several statistical methods of assessing seasonal variation are available. Brookhart and Rothman [3] proposed a second-order moment-based estimator based on the geometrical model derived by Edwards [1], and reported that this estimator is superior in estimating the peak-to-trough ratio of seasonal variation compared with Edwards' estimator with respect to bias and mean squared error. Alternatively, seasonal variation may be modelled using a Poisson regression model, which provides flexibility in modelling the pattern of seasonal variation and adjustments for covariates. Based on a Monte Carlo simulation study three estimators, one based on the geometrical model, and two based on log-linear Poisson regression models, were evaluated in regards to bias and standard deviation (SD). We evaluated the estimators on data simulated according to schemes varying in seasonal variation and presence of a secular trend. All methods and analyses in this paper are available in the R package Peak2Trough[13]. Applying a Poisson regression model resulted in lower absolute bias and SD for data simulated according to the corresponding model assumptions. Poisson regression models had lower bias and SD for data simulated to deviate from the corresponding model assumptions than the geometrical model. This simulation study encourages the use of Poisson regression models in estimating the peak-to-trough ratio of seasonal variation as opposed to the geometrical model. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.

  6. Hazard Regression Models of Early Mortality in Trauma Centers

    PubMed Central

    Clark, David E; Qian, Jing; Winchell, Robert J; Betensky, Rebecca A

    2013-01-01

    Background Factors affecting early hospital deaths after trauma may be different from factors affecting later hospital deaths, and the distribution of short and long prehospital times may vary among hospitals. Hazard regression (HR) models may therefore be more useful than logistic regression (LR) models for analysis of trauma mortality, especially when treatment effects at different time points are of interest. Study Design We obtained data for trauma center patients from the 2008–9 National Trauma Data Bank (NTDB). Cases were included if they had complete data for prehospital times, hospital times, survival outcome, age, vital signs, and severity scores. Cases were excluded if pulseless on admission, transferred in or out, or ISS<9. Using covariates proposed for the Trauma Quality Improvement Program and an indicator for each hospital, we compared LR models predicting survival at 8 hours after injury to HR models with survival censored at 8 hours. HR models were then modified to allow time-varying hospital effects. Results 85,327 patients in 161 hospitals met inclusion criteria. Crude hazards peaked initially, then steadily declined. When hazard ratios were assumed constant in HR models, they were similar to odds ratios in LR models associating increased mortality with increased age, firearm mechanism, increased severity, more deranged physiology, and estimated hospital-specific effects. However, when hospital effects were allowed to vary by time, HR models demonstrated that hospital outliers were not the same at different times after injury. Conclusions HR models with time-varying hazard ratios reveal inconsistencies in treatment effects, data quality, and/or timing of early death among trauma centers. HR models are generally more flexible than LR models, can be adapted for censored data, and potentially offer a better tool for analysis of factors affecting early death after injury. PMID:23036828

  7. Testing a single regression coefficient in high dimensional linear models

    PubMed Central

    Zhong, Ping-Shou; Li, Runze; Wang, Hansheng; Tsai, Chih-Ling

    2017-01-01

    In linear regression models with high dimensional data, the classical z-test (or t-test) for testing the significance of each single regression coefficient is no longer applicable. This is mainly because the number of covariates exceeds the sample size. In this paper, we propose a simple and novel alternative by introducing the Correlated Predictors Screening (CPS) method to control for predictors that are highly correlated with the target covariate. Accordingly, the classical ordinary least squares approach can be employed to estimate the regression coefficient associated with the target covariate. In addition, we demonstrate that the resulting estimator is consistent and asymptotically normal even if the random errors are heteroscedastic. This enables us to apply the z-test to assess the significance of each covariate. Based on the p-value obtained from testing the significance of each covariate, we further conduct multiple hypothesis testing by controlling the false discovery rate at the nominal level. Then, we show that the multiple hypothesis testing achieves consistent model selection. Simulation studies and empirical examples are presented to illustrate the finite sample performance and the usefulness of the proposed method, respectively. PMID:28663668

  8. Testing a single regression coefficient in high dimensional linear models.

    PubMed

    Lan, Wei; Zhong, Ping-Shou; Li, Runze; Wang, Hansheng; Tsai, Chih-Ling

    2016-11-01

    In linear regression models with high dimensional data, the classical z -test (or t -test) for testing the significance of each single regression coefficient is no longer applicable. This is mainly because the number of covariates exceeds the sample size. In this paper, we propose a simple and novel alternative by introducing the Correlated Predictors Screening (CPS) method to control for predictors that are highly correlated with the target covariate. Accordingly, the classical ordinary least squares approach can be employed to estimate the regression coefficient associated with the target covariate. In addition, we demonstrate that the resulting estimator is consistent and asymptotically normal even if the random errors are heteroscedastic. This enables us to apply the z -test to assess the significance of each covariate. Based on the p -value obtained from testing the significance of each covariate, we further conduct multiple hypothesis testing by controlling the false discovery rate at the nominal level. Then, we show that the multiple hypothesis testing achieves consistent model selection. Simulation studies and empirical examples are presented to illustrate the finite sample performance and the usefulness of the proposed method, respectively.

  9. Improving regression-model-based streamwater constituent load estimates derived from serially correlated data

    USGS Publications Warehouse

    Aulenbach, Brent T.

    2013-01-01

    A regression-model based approach is a commonly used, efficient method for estimating streamwater constituent load when there is a relationship between streamwater constituent concentration and continuous variables such as streamwater discharge, season and time. A subsetting experiment using a 30-year dataset of daily suspended sediment observations from the Mississippi River at Thebes, Illinois, was performed to determine optimal sampling frequency, model calibration period length, and regression model methodology, as well as to determine the effect of serial correlation of model residuals on load estimate precision. Two regression-based methods were used to estimate streamwater loads, the Adjusted Maximum Likelihood Estimator (AMLE), and the composite method, a hybrid load estimation approach. While both methods accurately and precisely estimated loads at the model’s calibration period time scale, precisions were progressively worse at shorter reporting periods, from annually to monthly. Serial correlation in model residuals resulted in observed AMLE precision to be significantly worse than the model calculated standard errors of prediction. The composite method effectively improved upon AMLE loads for shorter reporting periods, but required a sampling interval of at least 15-days or shorter, when the serial correlations in the observed load residuals were greater than 0.15. AMLE precision was better at shorter sampling intervals and when using the shortest model calibration periods, such that the regression models better fit the temporal changes in the concentration–discharge relationship. The models with the largest errors typically had poor high flow sampling coverage resulting in unrepresentative models. Increasing sampling frequency and/or targeted high flow sampling are more efficient approaches to ensure sufficient sampling and to avoid poorly performing models, than increasing calibration period length.

  10. Alternative Statistical Frameworks for Student Growth Percentile Estimation

    ERIC Educational Resources Information Center

    Lockwood, J. R.; Castellano, Katherine E.

    2015-01-01

    This article suggests two alternative statistical approaches for estimating student growth percentiles (SGP). The first is to estimate percentile ranks of current test scores conditional on past test scores directly, by modeling the conditional cumulative distribution functions, rather than indirectly through quantile regressions. This would…

  11. Incremental impact of body mass status with modifiable unhealthy lifestyle behaviors on pharmaceutical expenditure.

    PubMed

    Kim, Tae Hyun; Lee, Eui-Kyung; Han, Euna

    Overweight/obesity is a growing health risk in Korea. The impact of overweight/obesity on pharmaceutical expenditure can be larger if individuals have multiple risk factors and multiple comorbidities. The current study estimated the combined effects of overweight/obesity and other unhealthy behaviors on pharmaceutical expenditure. An instrumental variable quantile regression model was estimated using Korea Health Panel Study data. The current study extracted data from 3 waves (2009, 2010, and 2011). The final sample included 7148 person-year observations for adults aged 20 years or older. Overweight/obese individuals had higher pharmaceutical expenditure than their non-obese counterparts only at the upper quantiles of the conditional distribution of pharmaceutical expenditure (by 119% at the 90th quantile and 115% at the 95th). The current study found a stronger association at the upper quantiles among men (152%, 144%, and 150% at the 75th, 90th, and 95th quantiles, respectively) than among women (152%, 150%, and 148% at the 75th, 90th, and 95th quantiles, respectively). The association at the upper quantiles was stronger when combined with moderate to heavy drinking and no regular physical check-up, particularly among males. The current study confirms that the association of overweight/obesity with modifiable unhealthy behaviors on pharmaceutical expenditure is larger than with overweight/obesity alone. Assessing the effect of overweight/obesity with lifestyle risk factors can help target groups for public health intervention programs. Copyright © 2015 Elsevier Inc. All rights reserved.

  12. Regression model estimation of early season crop proportions: North Dakota, some preliminary results

    NASA Technical Reports Server (NTRS)

    Lin, K. K. (Principal Investigator)

    1982-01-01

    To estimate crop proportions early in the season, an approach is proposed based on: use of a regression-based prediction equation to obtain an a priori estimate for specific major crop groups; modification of this estimate using current-year LANDSAT and weather data; and a breakdown of the major crop groups into specific crops by regression models. Results from the development and evaluation of appropriate regression models for the first portion of the proposed approach are presented. The results show that the model predicts 1980 crop proportions very well at both county and crop reporting district levels. In terms of planted acreage, the model underpredicted 9.1 percent of the 1980 published data on planted acreage at the county level. It predicted almost exactly the 1980 published data on planted acreage at the crop reporting district level and overpredicted the planted acreage by just 0.92 percent.

  13. Ordinal regression models to describe tourist satisfaction with Sintra's world heritage

    NASA Astrophysics Data System (ADS)

    Mouriño, Helena

    2013-10-01

    In Tourism Research, ordinal regression models are becoming a very powerful tool in modelling the relationship between an ordinal response variable and a set of explanatory variables. In August and September 2010, we conducted a pioneering Tourist Survey in Sintra, Portugal. The data were obtained by face-to-face interviews at the entrances of the Palaces and Parks of Sintra. The work developed in this paper focus on two main points: tourists' perception of the entrance fees; overall level of satisfaction with this heritage site. For attaining these goals, ordinal regression models were developed. We concluded that tourist's nationality was the only significant variable to describe the perception of the admission fees. Also, Sintra's image among tourists depends not only on their nationality, but also on previous knowledge about Sintra's World Heritage status.

  14. Uncertainty analysis of an inflow forecasting model: extension of the UNEEC machine learning-based method

    NASA Astrophysics Data System (ADS)

    Pianosi, Francesca; Lal Shrestha, Durga; Solomatine, Dimitri

    2010-05-01

    This research presents an extension of UNEEC (Uncertainty Estimation based on Local Errors and Clustering, Shrestha and Solomatine, 2006, 2008 & Solomatine and Shrestha, 2009) method in the direction of explicit inclusion of parameter uncertainty. UNEEC method assumes that there is an optimal model and the residuals of the model can be used to assess the uncertainty of the model prediction. It is assumed that all sources of uncertainty including input, parameter and model structure uncertainty are explicitly manifested in the model residuals. In this research, theses assumptions are relaxed, and the UNEEC method is extended to consider parameter uncertainty as well (abbreviated as UNEEC-P). In UNEEC-P, first we use Monte Carlo (MC) sampling in parameter space to generate N model realizations (each of which is a time series), estimate the prediction quantiles based on the empirical distribution functions of the model residuals considering all the residual realizations, and only then apply the standard UNEEC method that encapsulates the uncertainty of a hydrologic model (expressed by quantiles of the error distribution) in a machine learning model (e.g., ANN). UNEEC-P is applied first to a linear regression model of synthetic data, and then to a real case study of forecasting inflow to lake Lugano in northern Italy. The inflow forecasting model is a stochastic heteroscedastic model (Pianosi and Soncini-Sessa, 2009). The preliminary results show that the UNEEC-P method produces wider uncertainty bounds, which is consistent with the fact that the method considers also parameter uncertainty of the optimal model. In the future UNEEC method will be further extended to consider input and structure uncertainty which will provide more realistic estimation of model predictions.

  15. Breeding value accuracy estimates for growth traits using random regression and multi-trait models in Nelore cattle.

    PubMed

    Boligon, A A; Baldi, F; Mercadante, M E Z; Lobo, R B; Pereira, R J; Albuquerque, L G

    2011-06-28

    We quantified the potential increase in accuracy of expected breeding value for weights of Nelore cattle, from birth to mature age, using multi-trait and random regression models on Legendre polynomials and B-spline functions. A total of 87,712 weight records from 8144 females were used, recorded every three months from birth to mature age from the Nelore Brazil Program. For random regression analyses, all female weight records from birth to eight years of age (data set I) were considered. From this general data set, a subset was created (data set II), which included only nine weight records: at birth, weaning, 365 and 550 days of age, and 2, 3, 4, 5, and 6 years of age. Data set II was analyzed using random regression and multi-trait models. The model of analysis included the contemporary group as fixed effects and age of dam as a linear and quadratic covariable. In the random regression analyses, average growth trends were modeled using a cubic regression on orthogonal polynomials of age. Residual variances were modeled by a step function with five classes. Legendre polynomials of fourth and sixth order were utilized to model the direct genetic and animal permanent environmental effects, respectively, while third-order Legendre polynomials were considered for maternal genetic and maternal permanent environmental effects. Quadratic polynomials were applied to model all random effects in random regression models on B-spline functions. Direct genetic and animal permanent environmental effects were modeled using three segments or five coefficients, and genetic maternal and maternal permanent environmental effects were modeled with one segment or three coefficients in the random regression models on B-spline functions. For both data sets (I and II), animals ranked differently according to expected breeding value obtained by random regression or multi-trait models. With random regression models, the highest gains in accuracy were obtained at ages with a low number of

  16. Gender Gap in Mathematics and Physics in Chinese Middle Schools: A Case Study of A Beijing's District

    ERIC Educational Resources Information Center

    Li, Manli; Zhang, Yu; Wang, Yihan

    2017-01-01

    This study examines the gender gaps in mathematics and physics in Chinese middle schools. The data is from the Education Bureau management database which includes all middle school students who took high school entrance exam in a district of Beijing from 2006-2013. The ordinary least square model and quantile regression model are applied. This…

  17. Gender Gap in the National College Entrance Exam Performance in China: A Case Study of a Typical Chinese Municipality

    ERIC Educational Resources Information Center

    Zhang, Yu; Tsang, Mun

    2015-01-01

    This is one of the first studies to investigate gender achievement gap in the National College Entrance Exam in a typical municipality in China, which is the crucial examination for the transition from high school to higher education in that country. Using ordinary least square model and quantile regression model, the study consistently finds that…

  18. EMD-regression for modelling multi-scale relationships, and application to weather-related cardiovascular mortality

    NASA Astrophysics Data System (ADS)

    Masselot, Pierre; Chebana, Fateh; Bélanger, Diane; St-Hilaire, André; Abdous, Belkacem; Gosselin, Pierre; Ouarda, Taha B. M. J.

    2018-01-01

    In a number of environmental studies, relationships between natural processes are often assessed through regression analyses, using time series data. Such data are often multi-scale and non-stationary, leading to a poor accuracy of the resulting regression models and therefore to results with moderate reliability. To deal with this issue, the present paper introduces the EMD-regression methodology consisting in applying the empirical mode decomposition (EMD) algorithm on data series and then using the resulting components in regression models. The proposed methodology presents a number of advantages. First, it accounts of the issues of non-stationarity associated to the data series. Second, this approach acts as a scan for the relationship between a response variable and the predictors at different time scales, providing new insights about this relationship. To illustrate the proposed methodology it is applied to study the relationship between weather and cardiovascular mortality in Montreal, Canada. The results shed new knowledge concerning the studied relationship. For instance, they show that the humidity can cause excess mortality at the monthly time scale, which is a scale not visible in classical models. A comparison is also conducted with state of the art methods which are the generalized additive models and distributed lag models, both widely used in weather-related health studies. The comparison shows that EMD-regression achieves better prediction performances and provides more details than classical models concerning the relationship.

  19. Application of nonlinear least-squares regression to ground-water flow modeling, west-central Florida

    USGS Publications Warehouse

    Yobbi, D.K.

    2000-01-01

    A nonlinear least-squares regression technique for estimation of ground-water flow model parameters was applied to an existing model of the regional aquifer system underlying west-central Florida. The regression technique minimizes the differences between measured and simulated water levels. Regression statistics, including parameter sensitivities and correlations, were calculated for reported parameter values in the existing model. Optimal parameter values for selected hydrologic variables of interest are estimated by nonlinear regression. Optimal estimates of parameter values are about 140 times greater than and about 0.01 times less than reported values. Independently estimating all parameters by nonlinear regression was impossible, given the existing zonation structure and number of observations, because of parameter insensitivity and correlation. Although the model yields parameter values similar to those estimated by other methods and reproduces the measured water levels reasonably accurately, a simpler parameter structure should be considered. Some possible ways of improving model calibration are to: (1) modify the defined parameter-zonation structure by omitting and/or combining parameters to be estimated; (2) carefully eliminate observation data based on evidence that they are likely to be biased; (3) collect additional water-level data; (4) assign values to insensitive parameters, and (5) estimate the most sensitive parameters first, then, using the optimized values for these parameters, estimate the entire data set.

  20. A Linear Regression and Markov Chain Model for the Arabian Horse Registry

    DTIC Science & Technology

    1993-04-01

    as a tax deduction? Yes No T-4367 68 26. Regardless of previous equine tax deductions, do you consider your current horse activities to be... (Mark one...E L T-4367 A Linear Regression and Markov Chain Model For the Arabian Horse Registry Accesion For NTIS CRA&I UT 7 4:iC=D 5 D-IC JA" LI J:13tjlC,3 lO...the Arabian Horse Registry, which needed to forecast its future registration of purebred Arabian horses . A linear regression model was utilized to

  1. New robust statistical procedures for the polytomous logistic regression models.

    PubMed

    Castilla, Elena; Ghosh, Abhik; Martin, Nirian; Pardo, Leandro

    2018-05-17

    This article derives a new family of estimators, namely the minimum density power divergence estimators, as a robust generalization of the maximum likelihood estimator for the polytomous logistic regression model. Based on these estimators, a family of Wald-type test statistics for linear hypotheses is introduced. Robustness properties of both the proposed estimators and the test statistics are theoretically studied through the classical influence function analysis. Appropriate real life examples are presented to justify the requirement of suitable robust statistical procedures in place of the likelihood based inference for the polytomous logistic regression model. The validity of the theoretical results established in the article are further confirmed empirically through suitable simulation studies. Finally, an approach for the data-driven selection of the robustness tuning parameter is proposed with empirical justifications. © 2018, The International Biometric Society.

  2. Socioeconomic and ethnic inequalities in exposure to air and noise pollution in London.

    PubMed

    Tonne, Cathryn; Milà, Carles; Fecht, Daniela; Alvarez, Mar; Gulliver, John; Smith, James; Beevers, Sean; Ross Anderson, H; Kelly, Frank

    2018-06-01

    Transport-related air and noise pollution, exposures linked to adverse health outcomes, varies within cities potentially resulting in exposure inequalities. Relatively little is known regarding inequalities in personal exposure to air pollution or transport-related noise. Our objectives were to quantify socioeconomic and ethnic inequalities in London in 1) air pollution exposure at residence compared to personal exposure; and 2) transport-related noise at residence from different sources. We used individual-level data from the London Travel Demand Survey (n = 45,079) between 2006 and 2010. We modeled residential (CMAQ-urban) and personal (London Hybrid Exposure Model) particulate matter <2.5 μm and nitrogen dioxide (NO 2 ), road-traffic noise at residence (TRANEX) and identified those within 50 dB noise contours of railways and Heathrow airport. We analyzed relationships between household income, area-level income deprivation and ethnicity with air and noise pollution using quantile and logistic regression. We observed inverse patterns in inequalities in air pollution when estimated at residence versus personal exposure with respect to household income (categorical, 8 groups). Compared to the lowest income group (<£10,000), the highest group (>£75,000) had lower residential NO 2 (-1.3 (95% CI -2.1, -0.6) μg/m 3 in the 95th exposure quantile) but higher personal NO 2 exposure (1.9 (95% CI 1.6, 2.3) μg/m 3 in the 95th quantile), which was driven largely by transport mode and duration. Inequalities in residential exposure to NO 2 with respect to area-level deprivation were larger at lower exposure quantiles (e.g. estimate for NO 2 5.1 (95% CI 4.6, 5.5) at quantile 0.15 versus 1.9 (95% CI 1.1, 2.6) at quantile 0.95), reflecting low-deprivation, high residential NO 2 areas in the city centre. Air pollution exposure at residence consistently overestimated personal exposure; this overestimation varied with age, household income, and area-level income

  3. Regional estimation of extreme suspended sediment concentrations using watershed characteristics

    NASA Astrophysics Data System (ADS)

    Tramblay, Yves; Ouarda, Taha B. M. J.; St-Hilaire, André; Poulin, Jimmy

    2010-01-01

    SummaryThe number of stations monitoring daily suspended sediment concentration (SSC) has been decreasing since the 1980s in North America while suspended sediment is considered as a key variable for water quality. The objective of this study is to test the feasibility of regionalising extreme SSC, i.e. estimating SSC extremes values for ungauged basins. Annual maximum SSC for 72 rivers in Canada and USA were modelled with probability distributions in order to estimate quantiles corresponding to different return periods. Regionalisation techniques, originally developed for flood prediction in ungauged basins, were tested using the climatic, topographic, land cover and soils attributes of the watersheds. Two approaches were compared, using either physiographic characteristics or seasonality of extreme SSC to delineate the regions. Multiple regression models to estimate SSC quantiles as a function of watershed characteristics were built in each region, and compared to a global model including all sites. Regional estimates of SSC quantiles were compared with the local values. Results show that regional estimation of extreme SSC is more efficient than a global regression model including all sites. Groups/regions of stations have been identified, using either the watershed characteristics or the seasonality of occurrence for extreme SSC values providing a method to better describe the extreme events of SSC. The most important variables for predicting extreme SSC are the percentage of clay in the soils, precipitation intensity and forest cover.

  4. Random regression models using different functions to model milk flow in dairy cows.

    PubMed

    Laureano, M M M; Bignardi, A B; El Faro, L; Cardoso, V L; Tonhati, H; Albuquerque, L G

    2014-09-12

    We analyzed 75,555 test-day milk flow records from 2175 primiparous Holstein cows that calved between 1997 and 2005. Milk flow was obtained by dividing the mean milk yield (kg) of the 3 daily milking by the total milking time (min) and was expressed as kg/min. Milk flow was grouped into 43 weekly classes. The analyses were performed using a single-trait Random Regression Models that included direct additive genetic, permanent environmental, and residual random effects. In addition, the contemporary group and linear and quadratic effects of cow age at calving were included as fixed effects. Fourth-order orthogonal Legendre polynomial of days in milk was used to model the mean trend in milk flow. The additive genetic and permanent environmental covariance functions were estimated using random regression Legendre polynomials and B-spline functions of days in milk. The model using a third-order Legendre polynomial for additive genetic effects and a sixth-order polynomial for permanent environmental effects, which contained 7 residual classes, proved to be the most adequate to describe variations in milk flow, and was also the most parsimonious. The heritability in milk flow estimated by the most parsimonious model was of moderate to high magnitude.

  5. Revisiting Gaussian Process Regression Modeling for Localization in Wireless Sensor Networks

    PubMed Central

    Richter, Philipp; Toledano-Ayala, Manuel

    2015-01-01

    Signal strength-based positioning in wireless sensor networks is a key technology for seamless, ubiquitous localization, especially in areas where Global Navigation Satellite System (GNSS) signals propagate poorly. To enable wireless local area network (WLAN) location fingerprinting in larger areas while maintaining accuracy, methods to reduce the effort of radio map creation must be consolidated and automatized. Gaussian process regression has been applied to overcome this issue, also with auspicious results, but the fit of the model was never thoroughly assessed. Instead, most studies trained a readily available model, relying on the zero mean and squared exponential covariance function, without further scrutinization. This paper studies the Gaussian process regression model selection for WLAN fingerprinting in indoor and outdoor environments. We train several models for indoor/outdoor- and combined areas; we evaluate them quantitatively and compare them by means of adequate model measures, hence assessing the fit of these models directly. To illuminate the quality of the model fit, the residuals of the proposed model are investigated, as well. Comparative experiments on the positioning performance verify and conclude the model selection. In this way, we show that the standard model is not the most appropriate, discuss alternatives and present our best candidate. PMID:26370996

  6. Bayesian isotonic density regression

    PubMed Central

    Wang, Lianming; Dunson, David B.

    2011-01-01

    Density regression models allow the conditional distribution of the response given predictors to change flexibly over the predictor space. Such models are much more flexible than nonparametric mean regression models with nonparametric residual distributions, and are well supported in many applications. A rich variety of Bayesian methods have been proposed for density regression, but it is not clear whether such priors have full support so that any true data-generating model can be accurately approximated. This article develops a new class of density regression models that incorporate stochastic-ordering constraints which are natural when a response tends to increase or decrease monotonely with a predictor. Theory is developed showing large support. Methods are developed for hypothesis testing, with posterior computation relying on a simple Gibbs sampler. Frequentist properties are illustrated in a simulation study, and an epidemiology application is considered. PMID:22822259

  7. Stochastic Approximation Methods for Latent Regression Item Response Models. Research Report. ETS RR-09-09

    ERIC Educational Resources Information Center

    von Davier, Matthias; Sinharay, Sandip

    2009-01-01

    This paper presents an application of a stochastic approximation EM-algorithm using a Metropolis-Hastings sampler to estimate the parameters of an item response latent regression model. Latent regression models are extensions of item response theory (IRT) to a 2-level latent variable model in which covariates serve as predictors of the…

  8. Weather variability and the incidence of cryptosporidiosis: comparison of time series poisson regression and SARIMA models.

    PubMed

    Hu, Wenbiao; Tong, Shilu; Mengersen, Kerrie; Connell, Des

    2007-09-01

    Few studies have examined the relationship between weather variables and cryptosporidiosis in Australia. This paper examines the potential impact of weather variability on the transmission of cryptosporidiosis and explores the possibility of developing an empirical forecast system. Data on weather variables, notified cryptosporidiosis cases, and population size in Brisbane were supplied by the Australian Bureau of Meteorology, Queensland Department of Health, and Australian Bureau of Statistics for the period of January 1, 1996-December 31, 2004, respectively. Time series Poisson regression and seasonal auto-regression integrated moving average (SARIMA) models were performed to examine the potential impact of weather variability on the transmission of cryptosporidiosis. Both the time series Poisson regression and SARIMA models show that seasonal and monthly maximum temperature at a prior moving average of 1 and 3 months were significantly associated with cryptosporidiosis disease. It suggests that there may be 50 more cases a year for an increase of 1 degrees C maximum temperature on average in Brisbane. Model assessments indicated that the SARIMA model had better predictive ability than the Poisson regression model (SARIMA: root mean square error (RMSE): 0.40, Akaike information criterion (AIC): -12.53; Poisson regression: RMSE: 0.54, AIC: -2.84). Furthermore, the analysis of residuals shows that the time series Poisson regression appeared to violate a modeling assumption, in that residual autocorrelation persisted. The results of this study suggest that weather variability (particularly maximum temperature) may have played a significant role in the transmission of cryptosporidiosis. A SARIMA model may be a better predictive model than a Poisson regression model in the assessment of the relationship between weather variability and the incidence of cryptosporidiosis.

  9. Wheat flour dough Alveograph characteristics predicted by Mixolab regression models.

    PubMed

    Codină, Georgiana Gabriela; Mironeasa, Silvia; Mironeasa, Costel; Popa, Ciprian N; Tamba-Berehoiu, Radiana

    2012-02-01

    In Romania, the Alveograph is the most used device to evaluate the rheological properties of wheat flour dough, but lately the Mixolab device has begun to play an important role in the breadmaking industry. These two instruments are based on different principles but there are some correlations that can be found between the parameters determined by the Mixolab and the rheological properties of wheat dough measured with the Alveograph. Statistical analysis on 80 wheat flour samples using the backward stepwise multiple regression method showed that Mixolab values using the ‘Chopin S’ protocol (40 samples) and ‘Chopin + ’ protocol (40 samples) can be used to elaborate predictive models for estimating the value of the rheological properties of wheat dough: baking strength (W), dough tenacity (P) and extensibility (L). The correlation analysis confirmed significant findings (P < 0.05 and P < 0.01) between the parameters of wheat dough studied by the Mixolab and its rheological properties measured with the Alveograph. A number of six predictive linear equations were obtained. Linear regression models gave multiple regression coefficients with R²(adjusted) > 0.70 for P, R²(adjusted) > 0.70 for W and R²(adjusted) > 0.38 for L, at a 95% confidence interval. Copyright © 2011 Society of Chemical Industry.

  10. Regression Model for Light Weight and Crashworthiness Enhancement Design of Automotive Parts in Frontal CAR Crash

    NASA Astrophysics Data System (ADS)

    Bae, Gihyun; Huh, Hoon; Park, Sungho

    This paper deals with a regression model for light weight and crashworthiness enhancement design of automotive parts in frontal car crash. The ULSAB-AVC model is employed for the crash analysis and effective parts are selected based on the amount of energy absorption during the crash behavior. Finite element analyses are carried out for designated design cases in order to investigate the crashworthiness and weight according to the material and thickness of main energy absorption parts. Based on simulations results, a regression analysis is performed to construct a regression model utilized for light weight and crashworthiness enhancement design of automotive parts. An example for weight reduction of main energy absorption parts demonstrates the validity of a regression model constructed.

  11. Fuzzy regression modeling for tool performance prediction and degradation detection.

    PubMed

    Li, X; Er, M J; Lim, B S; Zhou, J H; Gan, O P; Rutkowski, L

    2010-10-01

    In this paper, the viability of using Fuzzy-Rule-Based Regression Modeling (FRM) algorithm for tool performance and degradation detection is investigated. The FRM is developed based on a multi-layered fuzzy-rule-based hybrid system with Multiple Regression Models (MRM) embedded into a fuzzy logic inference engine that employs Self Organizing Maps (SOM) for clustering. The FRM converts a complex nonlinear problem to a simplified linear format in order to further increase the accuracy in prediction and rate of convergence. The efficacy of the proposed FRM is tested through a case study - namely to predict the remaining useful life of a ball nose milling cutter during a dry machining process of hardened tool steel with a hardness of 52-54 HRc. A comparative study is further made between four predictive models using the same set of experimental data. It is shown that the FRM is superior as compared with conventional MRM, Back Propagation Neural Networks (BPNN) and Radial Basis Function Networks (RBFN) in terms of prediction accuracy and learning speed.

  12. Mixed-effects Gaussian process functional regression models with application to dose-response curve prediction.

    PubMed

    Shi, J Q; Wang, B; Will, E J; West, R M

    2012-11-20

    We propose a new semiparametric model for functional regression analysis, combining a parametric mixed-effects model with a nonparametric Gaussian process regression model, namely a mixed-effects Gaussian process functional regression model. The parametric component can provide explanatory information between the response and the covariates, whereas the nonparametric component can add nonlinearity. We can model the mean and covariance structures simultaneously, combining the information borrowed from other subjects with the information collected from each individual subject. We apply the model to dose-response curves that describe changes in the responses of subjects for differing levels of the dose of a drug or agent and have a wide application in many areas. We illustrate the method for the management of renal anaemia. An individual dose-response curve is improved when more information is included by this mechanism from the subject/patient over time, enabling a patient-specific treatment regime. Copyright © 2012 John Wiley & Sons, Ltd.

  13. Building interpretable predictive models for pediatric hospital readmission using Tree-Lasso logistic regression.

    PubMed

    Jovanovic, Milos; Radovanovic, Sandro; Vukicevic, Milan; Van Poucke, Sven; Delibasic, Boris

    2016-09-01

    Quantification and early identification of unplanned readmission risk have the potential to improve the quality of care during hospitalization and after discharge. However, high dimensionality, sparsity, and class imbalance of electronic health data and the complexity of risk quantification, challenge the development of accurate predictive models. Predictive models require a certain level of interpretability in order to be applicable in real settings and create actionable insights. This paper aims to develop accurate and interpretable predictive models for readmission in a general pediatric patient population, by integrating a data-driven model (sparse logistic regression) and domain knowledge based on the international classification of diseases 9th-revision clinical modification (ICD-9-CM) hierarchy of diseases. Additionally, we propose a way to quantify the interpretability of a model and inspect the stability of alternative solutions. The analysis was conducted on >66,000 pediatric hospital discharge records from California, State Inpatient Databases, Healthcare Cost and Utilization Project between 2009 and 2011. We incorporated domain knowledge based on the ICD-9-CM hierarchy in a data driven, Tree-Lasso regularized logistic regression model, providing the framework for model interpretation. This approach was compared with traditional Lasso logistic regression resulting in models that are easier to interpret by fewer high-level diagnoses, with comparable prediction accuracy. The results revealed that the use of a Tree-Lasso model was as competitive in terms of accuracy (measured by area under the receiver operating characteristic curve-AUC) as the traditional Lasso logistic regression, but integration with the ICD-9-CM hierarchy of diseases provided more interpretable models in terms of high-level diagnoses. Additionally, interpretations of models are in accordance with existing medical understanding of pediatric readmission. Best performing models have

  14. Multiple regression analysis in modelling of carbon dioxide emissions by energy consumption use in Malaysia

    NASA Astrophysics Data System (ADS)

    Keat, Sim Chong; Chun, Beh Boon; San, Lim Hwee; Jafri, Mohd Zubir Mat

    2015-04-01

    Climate change due to carbon dioxide (CO2) emissions is one of the most complex challenges threatening our planet. This issue considered as a great and international concern that primary attributed from different fossil fuels. In this paper, regression model is used for analyzing the causal relationship among CO2 emissions based on the energy consumption in Malaysia using time series data for the period of 1980-2010. The equations were developed using regression model based on the eight major sources that contribute to the CO2 emissions such as non energy, Liquefied Petroleum Gas (LPG), diesel, kerosene, refinery gas, Aviation Turbine Fuel (ATF) and Aviation Gasoline (AV Gas), fuel oil and motor petrol. The related data partly used for predict the regression model (1980-2000) and partly used for validate the regression model (2001-2010). The results of the prediction model with the measured data showed a high correlation coefficient (R2=0.9544), indicating the model's accuracy and efficiency. These results are accurate and can be used in early warning of the population to comply with air quality standards.

  15. Variable-Domain Functional Regression for Modeling ICU Data.

    PubMed

    Gellar, Jonathan E; Colantuoni, Elizabeth; Needham, Dale M; Crainiceanu, Ciprian M

    2014-12-01

    We introduce a class of scalar-on-function regression models with subject-specific functional predictor domains. The fundamental idea is to consider a bivariate functional parameter that depends both on the functional argument and on the width of the functional predictor domain. Both parametric and nonparametric models are introduced to fit the functional coefficient. The nonparametric model is theoretically and practically invariant to functional support transformation, or support registration. Methods were motivated by and applied to a study of association between daily measures of the Intensive Care Unit (ICU) Sequential Organ Failure Assessment (SOFA) score and two outcomes: in-hospital mortality, and physical impairment at hospital discharge among survivors. Methods are generally applicable to a large number of new studies that record a continuous variables over unequal domains.

  16. Digital Image Restoration Under a Regression Model - The Unconstrained, Linear Equality and Inequality Constrained Approaches

    DTIC Science & Technology

    1974-01-01

    REGRESSION MODEL - THE UNCONSTRAINED, LINEAR EQUALITY AND INEQUALITY CONSTRAINED APPROACHES January 1974 Nelson Delfino d’Avila Mascarenha;? Image...Report 520 DIGITAL IMAGE RESTORATION UNDER A REGRESSION MODEL THE UNCONSTRAINED, LINEAR EQUALITY AND INEQUALITY CONSTRAINED APPROACHES January...a two- dimensional form adequately describes the linear model . A dis- cretization is performed by using quadrature methods. By trans

  17. Multilevel Modeling and Ordinary Least Squares Regression: How Comparable Are They?

    ERIC Educational Resources Information Center

    Huang, Francis L.

    2018-01-01

    Studies analyzing clustered data sets using both multilevel models (MLMs) and ordinary least squares (OLS) regression have generally concluded that resulting point estimates, but not the standard errors, are comparable with each other. However, the accuracy of the estimates of OLS models is important to consider, as several alternative techniques…

  18. Harmonic regression of Landsat time series for modeling attributes from national forest inventory data

    NASA Astrophysics Data System (ADS)

    Wilson, Barry T.; Knight, Joseph F.; McRoberts, Ronald E.

    2018-03-01

    Imagery from the Landsat Program has been used frequently as a source of auxiliary data for modeling land cover, as well as a variety of attributes associated with tree cover. With ready access to all scenes in the archive since 2008 due to the USGS Landsat Data Policy, new approaches to deriving such auxiliary data from dense Landsat time series are required. Several methods have previously been developed for use with finer temporal resolution imagery (e.g. AVHRR and MODIS), including image compositing and harmonic regression using Fourier series. The manuscript presents a study, using Minnesota, USA during the years 2009-2013 as the study area and timeframe. The study examined the relative predictive power of land cover models, in particular those related to tree cover, using predictor variables based solely on composite imagery versus those using estimated harmonic regression coefficients. The study used two common non-parametric modeling approaches (i.e. k-nearest neighbors and random forests) for fitting classification and regression models of multiple attributes measured on USFS Forest Inventory and Analysis plots using all available Landsat imagery for the study area and timeframe. The estimated Fourier coefficients developed by harmonic regression of tasseled cap transformation time series data were shown to be correlated with land cover, including tree cover. Regression models using estimated Fourier coefficients as predictor variables showed a two- to threefold increase in explained variance for a small set of continuous response variables, relative to comparable models using monthly image composites. Similarly, the overall accuracies of classification models using the estimated Fourier coefficients were approximately 10-20 percentage points higher than the models using the image composites, with corresponding individual class accuracies between six and 45 percentage points higher.

  19. An appraisal of convergence failures in the application of logistic regression model in published manuscripts.

    PubMed

    Yusuf, O B; Bamgboye, E A; Afolabi, R F; Shodimu, M A

    2014-09-01

    Logistic regression model is widely used in health research for description and predictive purposes. Unfortunately, most researchers are sometimes not aware that the underlying principles of the techniques have failed when the algorithm for maximum likelihood does not converge. Young researchers particularly postgraduate students may not know why separation problem whether quasi or complete occurs, how to identify it and how to fix it. This study was designed to critically evaluate convergence issues in articles that employed logistic regression analysis published in an African Journal of Medicine and medical sciences between 2004 and 2013. Problems of quasi or complete separation were described and were illustrated with the National Demographic and Health Survey dataset. A critical evaluation of articles that employed logistic regression was conducted. A total of 581 articles was reviewed, of which 40 (6.9%) used binary logistic regression. Twenty-four (60.0%) stated the use of logistic regression model in the methodology while none of the articles assessed model fit. Only 3 (12.5%) properly described the procedures. Of the 40 that used the logistic regression model, the problem of convergence occurred in 6 (15.0%) of the articles. Logistic regression tends to be poorly reported in studies published between 2004 and 2013. Our findings showed that the procedure may not be well understood by researchers since very few described the process in their reports and may be totally unaware of the problem of convergence or how to deal with it.

  20. New approach to probability estimate of femoral neck fracture by fall (Slovak regression model).

    PubMed

    Wendlova, J

    2009-01-01

    3,216 Slovak women with primary or secondary osteoporosis or osteopenia, aged 20-89 years, were examined with the bone densitometer DXA (dual energy X-ray absorptiometry, GE, Prodigy - Primo), x = 58.9, 95% C.I. (58.42; 59.38). The values of the following variables for each patient were measured: FSI (femur strength index), T-score total hip left, alpha angle - left, theta angle - left, HAL (hip axis length) left, BMI (body mass index) was calculated from the height and weight of the patients. Regression model determined the following order of independent variables according to the intensity of their influence upon the occurrence of values of dependent FSI variable: 1. BMI, 2. theta angle, 3. T-score total hip, 4. alpha angle, 5. HAL. The regression model equation, calculated from the variables monitored in the study, enables a doctor in praxis to determine the probability magnitude (absolute risk) for the occurrence of pathological value of FSI (FSI < 1) in the femoral neck area, i. e., allows for probability estimate of a femoral neck fracture by fall for Slovak women. 1. The Slovak regression model differs from regression models, published until now, in chosen independent variables and a dependent variable, belonging to biomechanical variables, characterising the bone quality. 2. The Slovak regression model excludes the inaccuracies of other models, which are not able to define precisely the current and past clinical condition of tested patients (e.g., to define the length and dose of exposure to risk factors). 3. The Slovak regression model opens the way to a new method of estimating the probability (absolute risk) or the odds for a femoral neck fracture by fall, based upon the bone quality determination. 4. It is assumed that the development will proceed by improving the methods enabling to measure the bone quality, determining the probability of fracture by fall (Tab. 6, Fig. 3, Ref. 22). Full Text (Free, PDF) www.bmj.sk.

  1. Predicting the occurrence of wildfires with binary structured additive regression models.

    PubMed

    Ríos-Pena, Laura; Kneib, Thomas; Cadarso-Suárez, Carmen; Marey-Pérez, Manuel

    2017-02-01

    Wildfires are one of the main environmental problems facing societies today, and in the case of Galicia (north-west Spain), they are the main cause of forest destruction. This paper used binary structured additive regression (STAR) for modelling the occurrence of wildfires in Galicia. Binary STAR models are a recent contribution to the classical logistic regression and binary generalized additive models. Their main advantage lies in their flexibility for modelling non-linear effects, while simultaneously incorporating spatial and temporal variables directly, thereby making it possible to reveal possible relationships among the variables considered. The results showed that the occurrence of wildfires depends on many covariates which display variable behaviour across space and time, and which largely determine the likelihood of ignition of a fire. The joint possibility of working on spatial scales with a resolution of 1 × 1 km cells and mapping predictions in a colour range makes STAR models a useful tool for plotting and predicting wildfire occurrence. Lastly, it will facilitate the development of fire behaviour models, which can be invaluable when it comes to drawing up fire-prevention and firefighting plans. Copyright © 2016 Elsevier Ltd. All rights reserved.

  2. Scoring and staging systems using cox linear regression modeling and recursive partitioning.

    PubMed

    Lee, J W; Um, S H; Lee, J B; Mun, J; Cho, H

    2006-01-01

    Scoring and staging systems are used to determine the order and class of data according to predictors. Systems used for medical data, such as the Child-Turcotte-Pugh scoring and staging systems for ordering and classifying patients with liver disease, are often derived strictly from physicians' experience and intuition. We construct objective and data-based scoring/staging systems using statistical methods. We consider Cox linear regression modeling and recursive partitioning techniques for censored survival data. In particular, to obtain a target number of stages we propose cross-validation and amalgamation algorithms. We also propose an algorithm for constructing scoring and staging systems by integrating local Cox linear regression models into recursive partitioning, so that we can retain the merits of both methods such as superior predictive accuracy, ease of use, and detection of interactions between predictors. The staging system construction algorithms are compared by cross-validation evaluation of real data. The data-based cross-validation comparison shows that Cox linear regression modeling is somewhat better than recursive partitioning when there are only continuous predictors, while recursive partitioning is better when there are significant categorical predictors. The proposed local Cox linear recursive partitioning has better predictive accuracy than Cox linear modeling and simple recursive partitioning. This study indicates that integrating local linear modeling into recursive partitioning can significantly improve prediction accuracy in constructing scoring and staging systems.

  3. Correlation and simple linear regression.

    PubMed

    Eberly, Lynn E

    2007-01-01

    This chapter highlights important steps in using correlation and simple linear regression to address scientific questions about the association of two continuous variables with each other. These steps include estimation and inference, assessing model fit, the connection between regression and ANOVA, and study design. Examples in microbiology are used throughout. This chapter provides a framework that is helpful in understanding more complex statistical techniques, such as multiple linear regression, linear mixed effects models, logistic regression, and proportional hazards regression.

  4. The effects of climate downscaling technique and observational data set on modeled ecological responses.

    PubMed

    Pourmokhtarian, Afshin; Driscoll, Charles T; Campbell, John L; Hayhoe, Katharine; Stoner, Anne M K

    2016-07-01

    Assessments of future climate change impacts on ecosystems typically rely on multiple climate model projections, but often utilize only one downscaling approach trained on one set of observations. Here, we explore the extent to which modeled biogeochemical responses to changing climate are affected by the selection of the climate downscaling method and training observations used at the montane landscape of the Hubbard Brook Experimental Forest, New Hampshire, USA. We evaluated three downscaling methods: the delta method (or the change factor method), monthly quantile mapping (Bias Correction-Spatial Disaggregation, or BCSD), and daily quantile regression (Asynchronous Regional Regression Model, or ARRM). Additionally, we trained outputs from four atmosphere-ocean general circulation models (AOGCMs) (CCSM3, HadCM3, PCM, and GFDL-CM2.1) driven by higher (A1fi) and lower (B1) future emissions scenarios on two sets of observations (1/8º resolution grid vs. individual weather station) to generate the high-resolution climate input for the forest biogeochemical model PnET-BGC (eight ensembles of six runs).The choice of downscaling approach and spatial resolution of the observations used to train the downscaling model impacted modeled soil moisture and streamflow, which in turn affected forest growth, net N mineralization, net soil nitrification, and stream chemistry. All three downscaling methods were highly sensitive to the observations used, resulting in projections that were significantly different between station-based and grid-based observations. The choice of downscaling method also slightly affected the results, however not as much as the choice of observations. Using spatially smoothed gridded observations and/or methods that do not resolve sub-monthly shifts in the distribution of temperature and/or precipitation can produce biased results in model applications run at greater temporal and/or spatial resolutions. These results underscore the importance of

  5. Calibration of limited-area ensemble precipitation forecasts for hydrological predictions

    NASA Astrophysics Data System (ADS)

    Diomede, Tommaso; Marsigli, Chiara; Montani, Andrea; Nerozzi, Fabrizio; Paccagnella, Tiziana

    2015-04-01

    The main objective of this study is to investigate the impact of calibration for limited-area ensemble precipitation forecasts, to be used for driving discharge predictions up to 5 days in advance. A reforecast dataset, which spans 30 years, based on the Consortium for Small Scale Modeling Limited-Area Ensemble Prediction System (COSMO-LEPS) was used for testing the calibration strategy. Three calibration techniques were applied: quantile-to-quantile mapping, linear regression, and analogs. The performance of these methodologies was evaluated in terms of statistical scores for the precipitation forecasts operationally provided by COSMO-LEPS in the years 2003-2007 over Germany, Switzerland, and the Emilia-Romagna region (northern Italy). The analog-based method seemed to be preferred because of its capability of correct position errors and spread deficiencies. A suitable spatial domain for the analog search can help to handle model spatial errors as systematic errors. However, the performance of the analog-based method may degrade in cases where a limited training dataset is available. A sensitivity test on the length of the training dataset over which to perform the analog search has been performed. The quantile-to-quantile mapping and linear regression methods were less effective, mainly because the forecast-analysis relation was not so strong for the available training dataset. A comparison between the calibration based on the deterministic reforecast and the calibration based on the full operational ensemble used as training dataset has been considered, with the aim to evaluate whether reforecasts are really worthy for calibration, given that their computational cost is remarkable. The verification of the calibration process was then performed by coupling ensemble precipitation forecasts with a distributed rainfall-runoff model. This test was carried out for a medium-sized catchment located in Emilia-Romagna, showing a beneficial impact of the analog

  6. A generalized multivariate regression model for modelling ocean wave heights

    NASA Astrophysics Data System (ADS)

    Wang, X. L.; Feng, Y.; Swail, V. R.

    2012-04-01

    In this study, a generalized multivariate linear regression model is developed to represent the relationship between 6-hourly ocean significant wave heights (Hs) and the corresponding 6-hourly mean sea level pressure (MSLP) fields. The model is calibrated using the ERA-Interim reanalysis of Hs and MSLP fields for 1981-2000, and is validated using the ERA-Interim reanalysis for 2001-2010 and ERA40 reanalysis of Hs and MSLP for 1958-2001. The performance of the fitted model is evaluated in terms of Pierce skill score, frequency bias index, and correlation skill score. Being not normally distributed, wave heights are subjected to a data adaptive Box-Cox transformation before being used in the model fitting. Also, since 6-hourly data are being modelled, lag-1 autocorrelation must be and is accounted for. The models with and without Box-Cox transformation, and with and without accounting for autocorrelation, are inter-compared in terms of their prediction skills. The fitted MSLP-Hs relationship is then used to reconstruct historical wave height climate from the 6-hourly MSLP fields taken from the Twentieth Century Reanalysis (20CR, Compo et al. 2011), and to project possible future wave height climates using CMIP5 model simulations of MSLP fields. The reconstructed and projected wave heights, both seasonal means and maxima, are subject to a trend analysis that allows for non-linear (polynomial) trends.

  7. Cox Regression Models with Functional Covariates for Survival Data.

    PubMed

    Gellar, Jonathan E; Colantuoni, Elizabeth; Needham, Dale M; Crainiceanu, Ciprian M

    2015-06-01

    We extend the Cox proportional hazards model to cases when the exposure is a densely sampled functional process, measured at baseline. The fundamental idea is to combine penalized signal regression with methods developed for mixed effects proportional hazards models. The model is fit by maximizing the penalized partial likelihood, with smoothing parameters estimated by a likelihood-based criterion such as AIC or EPIC. The model may be extended to allow for multiple functional predictors, time varying coefficients, and missing or unequally-spaced data. Methods were inspired by and applied to a study of the association between time to death after hospital discharge and daily measures of disease severity collected in the intensive care unit, among survivors of acute respiratory distress syndrome.

  8. Predicting seasonal influenza transmission using functional regression models with temporal dependence.

    PubMed

    Oviedo de la Fuente, Manuel; Febrero-Bande, Manuel; Muñoz, María Pilar; Domínguez, Àngela

    2018-01-01

    This paper proposes a novel approach that uses meteorological information to predict the incidence of influenza in Galicia (Spain). It extends the Generalized Least Squares (GLS) methods in the multivariate framework to functional regression models with dependent errors. These kinds of models are useful when the recent history of the incidence of influenza are readily unavailable (for instance, by delays on the communication with health informants) and the prediction must be constructed by correcting the temporal dependence of the residuals and using more accessible variables. A simulation study shows that the GLS estimators render better estimations of the parameters associated with the regression model than they do with the classical models. They obtain extremely good results from the predictive point of view and are competitive with the classical time series approach for the incidence of influenza. An iterative version of the GLS estimator (called iGLS) was also proposed that can help to model complicated dependence structures. For constructing the model, the distance correlation measure [Formula: see text] was employed to select relevant information to predict influenza rate mixing multivariate and functional variables. These kinds of models are extremely useful to health managers in allocating resources in advance to manage influenza epidemics.

  9. Genetic parameters for growth characteristics of free-range chickens under univariate random regression models.

    PubMed

    Rovadoscki, Gregori A; Petrini, Juliana; Ramirez-Diaz, Johanna; Pertile, Simone F N; Pertille, Fábio; Salvian, Mayara; Iung, Laiza H S; Rodriguez, Mary Ana P; Zampar, Aline; Gaya, Leila G; Carvalho, Rachel S B; Coelho, Antonio A D; Savino, Vicente J M; Coutinho, Luiz L; Mourão, Gerson B

    2016-09-01

    Repeated measures from the same individual have been analyzed by using repeatability and finite dimension models under univariate or multivariate analyses. However, in the last decade, the use of random regression models for genetic studies with longitudinal data have become more common. Thus, the aim of this research was to estimate genetic parameters for body weight of four experimental chicken lines by using univariate random regression models. Body weight data from hatching to 84 days of age (n = 34,730) from four experimental free-range chicken lines (7P, Caipirão da ESALQ, Caipirinha da ESALQ and Carijó Barbado) were used. The analysis model included the fixed effects of contemporary group (gender and rearing system), fixed regression coefficients for age at measurement, and random regression coefficients for permanent environmental effects and additive genetic effects. Heterogeneous variances for residual effects were considered, and one residual variance was assigned for each of six subclasses of age at measurement. Random regression curves were modeled by using Legendre polynomials of the second and third orders, with the best model chosen based on the Akaike Information Criterion, Bayesian Information Criterion, and restricted maximum likelihood. Multivariate analyses under the same animal mixed model were also performed for the validation of the random regression models. The Legendre polynomials of second order were better for describing the growth curves of the lines studied. Moderate to high heritabilities (h(2) = 0.15 to 0.98) were estimated for body weight between one and 84 days of age, suggesting that selection for body weight at all ages can be used as a selection criteria. Genetic correlations among body weight records obtained through multivariate analyses ranged from 0.18 to 0.96, 0.12 to 0.89, 0.06 to 0.96, and 0.28 to 0.96 in 7P, Caipirão da ESALQ, Caipirinha da ESALQ, and Carijó Barbado chicken lines, respectively. Results indicate that

  10. Modeling Tetanus Neonatorum case using the regression of negative binomial and zero-inflated negative binomial

    NASA Astrophysics Data System (ADS)

    Amaliana, Luthfatul; Sa'adah, Umu; Wayan Surya Wardhani, Ni

    2017-12-01

    Tetanus Neonatorum is an infectious disease that can be prevented by immunization. The number of Tetanus Neonatorum cases in East Java Province is the highest in Indonesia until 2015. Tetanus Neonatorum data contain over dispersion and big enough proportion of zero-inflation. Negative Binomial (NB) regression is an alternative method when over dispersion happens in Poisson regression. However, the data containing over dispersion and zero-inflation are more appropriately analyzed by using Zero-Inflated Negative Binomial (ZINB) regression. The purpose of this study are: (1) to model Tetanus Neonatorum cases in East Java Province with 71.05 percent proportion of zero-inflation by using NB and ZINB regression, (2) to obtain the best model. The result of this study indicates that ZINB is better than NB regression with smaller AIC.

  11. Modeling of geogenic radon in Switzerland based on ordered logistic regression.

    PubMed

    Kropat, Georg; Bochud, François; Murith, Christophe; Palacios Gruson, Martha; Baechler, Sébastien

    2017-01-01

    The estimation of the radon hazard of a future construction site should ideally be based on the geogenic radon potential (GRP), since this estimate is free of anthropogenic influences and building characteristics. The goal of this study was to evaluate terrestrial gamma dose rate (TGD), geology, fault lines and topsoil permeability as predictors for the creation of a GRP map based on logistic regression. Soil gas radon measurements (SRC) are more suited for the estimation of GRP than indoor radon measurements (IRC) since the former do not depend on ventilation and heating habits or building characteristics. However, SRC have only been measured at a few locations in Switzerland. In former studies a good correlation between spatial aggregates of IRC and SRC has been observed. That's why we used IRC measurements aggregated on a 10 km × 10 km grid to calibrate an ordered logistic regression model for geogenic radon potential (GRP). As predictors we took into account terrestrial gamma doserate, regrouped geological units, fault line density and the permeability of the soil. The classification success rate of the model results to 56% in case of the inclusion of all 4 predictor variables. Our results suggest that terrestrial gamma doserate and regrouped geological units are more suited to model GRP than fault line density and soil permeability. Ordered logistic regression is a promising tool for the modeling of GRP maps due to its simplicity and fast computation time. Future studies should account for additional variables to improve the modeling of high radon hazard in the Jura Mountains of Switzerland. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.

  12. Modeling Information Content Via Dirichlet-Multinomial Regression Analysis.

    PubMed

    Ferrari, Alberto

    2017-01-01

    Shannon entropy is being increasingly used in biomedical research as an index of complexity and information content in sequences of symbols, e.g. languages, amino acid sequences, DNA methylation patterns and animal vocalizations. Yet, distributional properties of information entropy as a random variable have seldom been the object of study, leading to researchers mainly using linear models or simulation-based analytical approach to assess differences in information content, when entropy is measured repeatedly in different experimental conditions. Here a method to perform inference on entropy in such conditions is proposed. Building on results coming from studies in the field of Bayesian entropy estimation, a symmetric Dirichlet-multinomial regression model, able to deal efficiently with the issue of mean entropy estimation, is formulated. Through a simulation study the model is shown to outperform linear modeling in a vast range of scenarios and to have promising statistical properties. As a practical example, the method is applied to a data set coming from a real experiment on animal communication.

  13. Normalization Regression Estimation With Application to a Nonorthogonal, Nonrecursive Model of School Learning.

    ERIC Educational Resources Information Center

    Bulcock, J. W.; And Others

    Advantages of normalization regression estimation over ridge regression estimation are demonstrated by reference to Bloom's model of school learning. Theoretical concern centered on the structure of scholastic achievement at grade 10 in Canadian high schools. Data on 886 students were randomly sampled from the Carnegie Human Resources Data Bank.…

  14. The extension of total gain (TG) statistic in survival models: properties and applications.

    PubMed

    Choodari-Oskooei, Babak; Royston, Patrick; Parmar, Mahesh K B

    2015-07-01

    The results of multivariable regression models are usually summarized in the form of parameter estimates for the covariates, goodness-of-fit statistics, and the relevant p-values. These statistics do not inform us about whether covariate information will lead to any substantial improvement in prediction. Predictive ability measures can be used for this purpose since they provide important information about the practical significance of prognostic factors. R (2)-type indices are the most familiar forms of such measures in survival models, but they all have limitations and none is widely used. In this paper, we extend the total gain (TG) measure, proposed for a logistic regression model, to survival models and explore its properties using simulations and real data. TG is based on the binary regression quantile plot, otherwise known as the predictiveness curve. Standardised TG ranges from 0 (no explanatory power) to 1 ('perfect' explanatory power). The results of our simulations show that unlike many of the other R (2)-type predictive ability measures, TG is independent of random censoring. It increases as the effect of a covariate increases and can be applied to different types of survival models, including models with time-dependent covariate effects. We also apply TG to quantify the predictive ability of multivariable prognostic models developed in several disease areas. Overall, TG performs well in our simulation studies and can be recommended as a measure to quantify the predictive ability in survival models.

  15. A mathematical programming method for formulating a fuzzy regression model based on distance criterion.

    PubMed

    Chen, Liang-Hsuan; Hsueh, Chan-Ching

    2007-06-01

    Fuzzy regression models are useful to investigate the relationship between explanatory and response variables with fuzzy observations. Different from previous studies, this correspondence proposes a mathematical programming method to construct a fuzzy regression model based on a distance criterion. The objective of the mathematical programming is to minimize the sum of distances between the estimated and observed responses on the X axis, such that the fuzzy regression model constructed has the minimal total estimation error in distance. Only several alpha-cuts of fuzzy observations are needed as inputs to the mathematical programming model; therefore, the applications are not restricted to triangular fuzzy numbers. Three examples, adopted in the previous studies, and a larger example, modified from the crisp case, are used to illustrate the performance of the proposed approach. The results indicate that the proposed model has better performance than those in the previous studies based on either distance criterion or Kim and Bishu's criterion. In addition, the efficiency and effectiveness for solving the larger example by the proposed model are also satisfactory.

  16. Predicting 30-day Hospital Readmission with Publicly Available Administrative Database. A Conditional Logistic Regression Modeling Approach.

    PubMed

    Zhu, K; Lou, Z; Zhou, J; Ballester, N; Kong, N; Parikh, P

    2015-01-01

    This article is part of the Focus Theme of Methods of Information in Medicine on "Big Data and Analytics in Healthcare". Hospital readmissions raise healthcare costs and cause significant distress to providers and patients. It is, therefore, of great interest to healthcare organizations to predict what patients are at risk to be readmitted to their hospitals. However, current logistic regression based risk prediction models have limited prediction power when applied to hospital administrative data. Meanwhile, although decision trees and random forests have been applied, they tend to be too complex to understand among the hospital practitioners. Explore the use of conditional logistic regression to increase the prediction accuracy. We analyzed an HCUP statewide inpatient discharge record dataset, which includes patient demographics, clinical and care utilization data from California. We extracted records of heart failure Medicare beneficiaries who had inpatient experience during an 11-month period. We corrected the data imbalance issue with under-sampling. In our study, we first applied standard logistic regression and decision tree to obtain influential variables and derive practically meaning decision rules. We then stratified the original data set accordingly and applied logistic regression on each data stratum. We further explored the effect of interacting variables in the logistic regression modeling. We conducted cross validation to assess the overall prediction performance of conditional logistic regression (CLR) and compared it with standard classification models. The developed CLR models outperformed several standard classification models (e.g., straightforward logistic regression, stepwise logistic regression, random forest, support vector machine). For example, the best CLR model improved the classification accuracy by nearly 20% over the straightforward logistic regression model. Furthermore, the developed CLR models tend to achieve better sensitivity of

  17. Learning accurate and interpretable models based on regularized random forests regression

    PubMed Central

    2014-01-01

    Background Many biology related research works combine data from multiple sources in an effort to understand the underlying problems. It is important to find and interpret the most important information from these sources. Thus it will be beneficial to have an effective algorithm that can simultaneously extract decision rules and select critical features for good interpretation while preserving the prediction performance. Methods In this study, we focus on regression problems for biological data where target outcomes are continuous. In general, models constructed from linear regression approaches are relatively easy to interpret. However, many practical biological applications are nonlinear in essence where we can hardly find a direct linear relationship between input and output. Nonlinear regression techniques can reveal nonlinear relationship of data, but are generally hard for human to interpret. We propose a rule based regression algorithm that uses 1-norm regularized random forests. The proposed approach simultaneously extracts a small number of rules from generated random forests and eliminates unimportant features. Results We tested the approach on some biological data sets. The proposed approach is able to construct a significantly smaller set of regression rules using a subset of attributes while achieving prediction performance comparable to that of random forests regression. Conclusion It demonstrates high potential in aiding prediction and interpretation of nonlinear relationships of the subject being studied. PMID:25350120

  18. Education and inequalities in risk scores for coronary heart disease and body mass index: evidence for a population strategy.

    PubMed

    Liu, Sze Yan; Kawachi, Ichiro; Glymour, M Maria

    2012-09-01

    Concerns have been raised that education may have greater benefits for persons at high risk of coronary heart disease (CHD) than for those at low risk. We estimated the association of education (less than high school, high school, or college graduates) with 10-year CHD risk and body mass index (BMI), using linear and quantile regression models, in the following two nationally representative datasets: the 2006 wave of the Health and Retirement Survey and the 2003-2008 National Health and Nutrition Examination Survey (NHANES). Higher educational attainment was associated with lower 10-year CHD risk for all groups. However, the magnitude of this association varied considerably across quantiles for some subgroups. For example, among women in NHANES, a high school degree was associated with 4% (95% confidence interval = -9% to 1%) and 17% (-24% to -8%) lower CHD risk in the 10th and 90th percentiles, respectively. For BMI, a college degree was associated with uniform decreases across the distribution for women, but with varying increases for men. Compared with those who had not completed high school, male college graduates in the NHANES sample had a BMI that was 6% greater (2% to 11%) at the 10th percentile of the BMI distribution and 7% lower (-10% to -3%) at the 90th percentile (ie, overweight/obese). Estimates from the Health and Retirement Survey sample and the marginal quantile regression models showed similar patterns. Conventional regression methods may mask important variations in the associations between education and CHD risk.

  19. Post-processing ECMWF precipitation and temperature ensemble reforecasts for operational hydrologic forecasting at various spatial scales

    NASA Astrophysics Data System (ADS)

    Verkade, J. S.; Brown, J. D.; Reggiani, P.; Weerts, A. H.

    2013-09-01

    The ECMWF temperature and precipitation ensemble reforecasts are evaluated for biases in the mean, spread and forecast probabilities, and how these biases propagate to streamflow ensemble forecasts. The forcing ensembles are subsequently post-processed to reduce bias and increase skill, and to investigate whether this leads to improved streamflow ensemble forecasts. Multiple post-processing techniques are used: quantile-to-quantile transform, linear regression with an assumption of bivariate normality and logistic regression. Both the raw and post-processed ensembles are run through a hydrologic model of the river Rhine to create streamflow ensembles. The results are compared using multiple verification metrics and skill scores: relative mean error, Brier skill score and its decompositions, mean continuous ranked probability skill score and its decomposition, and the ROC score. Verification of the streamflow ensembles is performed at multiple spatial scales: relatively small headwater basins, large tributaries and the Rhine outlet at Lobith. The streamflow ensembles are verified against simulated streamflow, in order to isolate the effects of biases in the forcing ensembles and any improvements therein. The results indicate that the forcing ensembles contain significant biases, and that these cascade to the streamflow ensembles. Some of the bias in the forcing ensembles is unconditional in nature; this was resolved by a simple quantile-to-quantile transform. Improvements in conditional bias and skill of the forcing ensembles vary with forecast lead time, amount, and spatial scale, but are generally moderate. The translation to streamflow forecast skill is further muted, and several explanations are considered, including limitations in the modelling of the space-time covariability of the forcing ensembles and the presence of storages.

  20. Nonlinear-regression flow model of the Gulf Coast aquifer systems in the south-central United States

    USGS Publications Warehouse

    Kuiper, L.K.

    1994-01-01

    A multiple-regression methodology was used to help answer questions concerning model reliability, and to calibrate a time-dependent variable-density ground-water flow model of the gulf coast aquifer systems in the south-central United States. More than 40 regression models with 2 to 31 regressions parameters are used and detailed results are presented for 12 of the models. More than 3,000 values for grid-element volume-averaged head and hydraulic conductivity are used for the regression model observations. Calculated prediction interval half widths, though perhaps inaccurate due to a lack of normality of the residuals, are the smallest for models with only four regression parameters. In addition, the root-mean weighted residual decreases very little with an increase in the number of regression parameters. The various models showed considerable overlap between the prediction inter- vals for shallow head and hydraulic conductivity. Approximate 95-percent prediction interval half widths for volume-averaged freshwater head exceed 108 feet; for volume-averaged base 10 logarithm hydraulic conductivity, they exceed 0.89. All of the models are unreliable for the prediction of head and ground-water flow in the deeper parts of the aquifer systems, including the amount of flow coming from the underlying geopressured zone. Truncating the domain of solution of one model to exclude that part of the system having a ground-water density greater than 1.005 grams per cubic centimeter or to exclude that part of the systems below a depth of 3,000 feet, and setting the density to that of freshwater does not appreciably change the results for head and ground-water flow, except for locations close to the truncation surface.