Science.gov

Sample records for adjusted regression models

  1. Coercively Adjusted Auto Regression Model for Forecasting in Epilepsy EEG

    PubMed Central

    Kim, Sun-Hee; Faloutsos, Christos; Yang, Hyung-Jeong

    2013-01-01

    Recently, data with complex characteristics such as epilepsy electroencephalography (EEG) time series has emerged. Epilepsy EEG data has special characteristics including nonlinearity, nonnormality, and nonperiodicity. Therefore, it is important to find a suitable forecasting method that covers these special characteristics. In this paper, we propose a coercively adjusted autoregression (CA-AR) method that forecasts future values from a multivariable epilepsy EEG time series. We use the technique of random coefficients, which forcefully adjusts the coefficients with −1 and 1. The fractal dimension is used to determine the order of the CA-AR model. We applied the CA-AR method reflecting special characteristics of data to forecast the future value of epilepsy EEG data. Experimental results show that when compared to previous methods, the proposed method can forecast faster and accurately. PMID:23710252

  2. Comparison of the Properties of Regression and Categorical Risk-Adjustment Models

    PubMed Central

    Averill, Richard F.; Muldoon, John H.; Hughes, John S.

    2016-01-01

    Clinical risk-adjustment, the ability to standardize the comparison of individuals with different health needs, is based upon 2 main alternative approaches: regression models and clinical categorical models. In this article, we examine the impact of the differences in the way these models are constructed on end user applications. PMID:26945302

  3. Using Wherry's Adjusted R Squared and Mallow's C (p) for Model Selection from All Possible Regressions.

    ERIC Educational Resources Information Center

    Olejnik, Stephen; Mills, Jamie; Keselman, Harvey

    2000-01-01

    Evaluated the use of Mallow's C(p) and Wherry's adjusted R squared (R. Wherry, 1931) statistics to select a final model from a pool of model solutions using computer generated data. Neither statistic identified the underlying regression model any better than, and usually less well than, the stepwise selection method, which itself was poor for…

  4. An evaluation of bias in propensity score-adjusted non-linear regression models.

    PubMed

    Wan, Fei; Mitra, Nandita

    2016-04-19

    Propensity score methods are commonly used to adjust for observed confounding when estimating the conditional treatment effect in observational studies. One popular method, covariate adjustment of the propensity score in a regression model, has been empirically shown to be biased in non-linear models. However, no compelling underlying theoretical reason has been presented. We propose a new framework to investigate bias and consistency of propensity score-adjusted treatment effects in non-linear models that uses a simple geometric approach to forge a link between the consistency of the propensity score estimator and the collapsibility of non-linear models. Under this framework, we demonstrate that adjustment of the propensity score in an outcome model results in the decomposition of observed covariates into the propensity score and a remainder term. Omission of this remainder term from a non-collapsible regression model leads to biased estimates of the conditional odds ratio and conditional hazard ratio, but not for the conditional rate ratio. We further show, via simulation studies, that the bias in these propensity score-adjusted estimators increases with larger treatment effect size, larger covariate effects, and increasing dissimilarity between the coefficients of the covariates in the treatment model versus the outcome model.

  5. Procedures for adjusting regional regression models of urban-runoff quality using local data

    USGS Publications Warehouse

    Hoos, A.B.; Sisolak, J.K.

    1993-01-01

    Statistical operations termed model-adjustment procedures (MAP?s) can be used to incorporate local data into existing regression models to improve the prediction of urban-runoff quality. Each MAP is a form of regression analysis in which the local data base is used as a calibration data set. Regression coefficients are determined from the local data base, and the resulting `adjusted? regression models can then be used to predict storm-runoff quality at unmonitored sites. The response variable in the regression analyses is the observed load or mean concentration of a constituent in storm runoff for a single storm. The set of explanatory variables used in the regression analyses is different for each MAP, but always includes the predicted value of load or mean concentration from a regional regression model. The four MAP?s examined in this study were: single-factor regression against the regional model prediction, P, (termed MAP-lF-P), regression against P,, (termed MAP-R-P), regression against P, and additional local variables (termed MAP-R-P+nV), and a weighted combination of P, and a local-regression prediction (termed MAP-W). The procedures were tested by means of split-sample analysis, using data from three cities included in the Nationwide Urban Runoff Program: Denver, Colorado; Bellevue, Washington; and Knoxville, Tennessee. The MAP that provided the greatest predictive accuracy for the verification data set differed among the three test data bases and among model types (MAP-W for Denver and Knoxville, MAP-lF-P and MAP-R-P for Bellevue load models, and MAP-R-P+nV for Bellevue concentration models) and, in many cases, was not clearly indicated by the values of standard error of estimate for the calibration data set. A scheme to guide MAP selection, based on exploratory data analysis of the calibration data set, is presented and tested. The MAP?s were tested for sensitivity to the size of a calibration data set. As expected, predictive accuracy of all MAP?s for

  6. 10 km running performance predicted by a multiple linear regression model with allometrically adjusted variables

    PubMed Central

    Abad, Cesar C. C.; Barros, Ronaldo V.; Bertuzzi, Romulo; Gagliardi, João F. L.; Lima-Silva, Adriano E.; Lambert, Mike I.

    2016-01-01

    Abstract The aim of this study was to verify the power of VO2max, peak treadmill running velocity (PTV), and running economy (RE), unadjusted or allometrically adjusted, in predicting 10 km running performance. Eighteen male endurance runners performed: 1) an incremental test to exhaustion to determine VO2max and PTV; 2) a constant submaximal run at 12 km·h−1 on an outdoor track for RE determination; and 3) a 10 km running race. Unadjusted (VO2max, PTV and RE) and adjusted variables (VO2max0.72, PTV0.72 and RE0.60) were investigated through independent multiple regression models to predict 10 km running race time. There were no significant correlations between 10 km running time and either the adjusted or unadjusted VO2max. Significant correlations (p < 0.01) were found between 10 km running time and adjusted and unadjusted RE and PTV, providing models with effect size > 0.84 and power > 0.88. The allometrically adjusted predictive model was composed of PTV0.72 and RE0.60 and explained 83% of the variance in 10 km running time with a standard error of the estimate (SEE) of 1.5 min. The unadjusted model composed of a single PVT accounted for 72% of the variance in 10 km running time (SEE of 1.9 min). Both regression models provided powerful estimates of 10 km running time; however, the unadjusted PTV may provide an uncomplicated estimation. PMID:28149382

  7. [Applying temporally-adjusted land use regression models to estimate ambient air pollution exposure during pregnancy].

    PubMed

    Zhang, Y J; Xue, F X; Bai, Z P

    2017-03-06

    The impact of maternal air pollution exposure on offspring health has received much attention. Precise and feasible exposure estimation is particularly important for clarifying exposure-response relationships and reducing heterogeneity among studies. Temporally-adjusted land use regression (LUR) models are exposure assessment methods developed in recent years that have the advantage of having high spatial-temporal resolution. Studies on the health effects of outdoor air pollution exposure during pregnancy have been increasingly carried out using this model. In China, research applying LUR models was done mostly at the model construction stage, and findings from related epidemiological studies were rarely reported. In this paper, the sources of heterogeneity and research progress of meta-analysis research on the associations between air pollution and adverse pregnancy outcomes were analyzed. The methods of the characteristics of temporally-adjusted LUR models were introduced. The current epidemiological studies on adverse pregnancy outcomes that applied this model were systematically summarized. Recommendations for the development and application of LUR models in China are presented. This will encourage the implementation of more valid exposure predictions during pregnancy in large-scale epidemiological studies on the health effects of air pollution in China.

  8. On regression adjustment for the propensity score.

    PubMed

    Vansteelandt, S; Daniel, R M

    2014-10-15

    Propensity scores are widely adopted in observational research because they enable adjustment for high-dimensional confounders without requiring models for their association with the outcome of interest. The results of statistical analyses based on stratification, matching or inverse weighting by the propensity score are therefore less susceptible to model extrapolation than those based solely on outcome regression models. This is attractive because extrapolation in outcome regression models may be alarming, yet difficult to diagnose, when the exposed and unexposed individuals have very different covariate distributions. Standard regression adjustment for the propensity score forms an alternative to the aforementioned propensity score methods, but the benefits of this are less clear because it still involves modelling the outcome in addition to the propensity score. In this article, we develop novel insights into the properties of this adjustment method. We demonstrate that standard tests of the null hypothesis of no exposure effect (based on robust variance estimators), as well as particular standardised effects obtained from such adjusted regression models, are robust against misspecification of the outcome model when a propensity score model is correctly specified; they are thus not vulnerable to the aforementioned problem of extrapolation. We moreover propose efficient estimators for these standardised effects, which retain a useful causal interpretation even when the propensity score model is misspecified, provided the outcome regression model is correctly specified.

  9. What's the Risk? A Simple Approach for Estimating Adjusted Risk Measures from Nonlinear Models Including Logistic Regression

    PubMed Central

    Kleinman, Lawrence C; Norton, Edward C

    2009-01-01

    Objective To develop and validate a general method (called regression risk analysis) to estimate adjusted risk measures from logistic and other nonlinear multiple regression models. We show how to estimate standard errors for these estimates. These measures could supplant various approximations (e.g., adjusted odds ratio [AOR]) that may diverge, especially when outcomes are common. Study Design Regression risk analysis estimates were compared with internal standards as well as with Mantel–Haenszel estimates, Poisson and log-binomial regressions, and a widely used (but flawed) equation to calculate adjusted risk ratios (ARR) from AOR. Data Collection Data sets produced using Monte Carlo simulations. Principal Findings Regression risk analysis accurately estimates ARR and differences directly from multiple regression models, even when confounders are continuous, distributions are skewed, outcomes are common, and effect size is large. It is statistically sound and intuitive, and has properties favoring it over other methods in many cases. Conclusions Regression risk analysis should be the new standard for presenting findings from multiple regression analysis of dichotomous outcomes for cross-sectional, cohort, and population-based case–control studies, particularly when outcomes are common or effect size is large. PMID:18793213

  10. Verification and adjustment of regional regression models for urban storm-runoff quality using data collected in Little Rock, Arkansas

    USGS Publications Warehouse

    Barks, C.S.

    1995-01-01

    Storm-runoff water-quality data were used to verify and, when appropriate, adjust regional regression models previously developed to estimate urban storm- runoff loads and mean concentrations in Little Rock, Arkansas. Data collected at 5 representative sites during 22 storms from June 1992 through January 1994 compose the Little Rock data base. Comparison of observed values (0) of storm-runoff loads and mean concentrations to the predicted values (Pu) from the regional regression models for nine constituents (chemical oxygen demand, suspended solids, total nitrogen, total ammonia plus organic nitrogen as nitrogen, total phosphorus, dissolved phosphorus, total recoverable copper, total recoverable lead, and total recoverable zinc) shows large prediction errors ranging from 63 to several thousand percent. Prediction errors for six of the regional regression models are less than 100 percent, and can be considered reasonable for water-quality models. Differences between 0 and Pu are due to variability in the Little Rock data base and error in the regional models. Where applicable, a model adjustment procedure (termed MAP-R-P) based upon regression with 0 against Pu was applied to improve predictive accuracy. For 11 of the 18 regional water-quality models, 0 and Pu are significantly correlated, that is much of the variation in 0 is explained by the regional models. Five of these 11 regional models consistently overestimate O; therefore, MAP-R-P can be used to provide a better estimate. For the remaining seven regional models, 0 and Pu are not significanfly correlated, thus neither the unadjusted regional models nor the MAP-R-P is appropriate. A simple estimator, such as the mean of the observed values may be used if the regression models are not appropriate. Standard error of estimate of the adjusted models ranges from 48 to 130 percent. Calibration results may be biased due to the limited data set sizes in the Little Rock data base. The relatively large values of

  11. Adjusting for unmeasured confounding due to either of two crossed factors with a logistic regression model.

    PubMed

    Li, Li; Brumback, Babette A; Weppelmann, Thomas A; Morris, J Glenn; Ali, Afsar

    2016-08-15

    Motivated by an investigation of the effect of surface water temperature on the presence of Vibrio cholerae in water samples collected from different fixed surface water monitoring sites in Haiti in different months, we investigated methods to adjust for unmeasured confounding due to either of the two crossed factors site and month. In the process, we extended previous methods that adjust for unmeasured confounding due to one nesting factor (such as site, which nests the water samples from different months) to the case of two crossed factors. First, we developed a conditional pseudolikelihood estimator that eliminates fixed effects for the levels of each of the crossed factors from the estimating equation. Using the theory of U-Statistics for independent but non-identically distributed vectors, we show that our estimator is consistent and asymptotically normal, but that its variance depends on the nuisance parameters and thus cannot be easily estimated. Consequently, we apply our estimator in conjunction with a permutation test, and we investigate use of the pigeonhole bootstrap and the jackknife for constructing confidence intervals. We also incorporate our estimator into a diagnostic test for a logistic mixed model with crossed random effects and no unmeasured confounding. For comparison, we investigate between-within models extended to two crossed factors. These generalized linear mixed models include covariate means for each level of each factor in order to adjust for the unmeasured confounding. We conduct simulation studies, and we apply the methods to the Haitian data. Copyright © 2016 John Wiley & Sons, Ltd.

  12. A Proportional Hazards Regression Model for the Sub-distribution with Covariates Adjusted Censoring Weight for Competing Risks Data

    PubMed Central

    HE, PENG; ERIKSSON, FRANK; SCHEIKE, THOMAS H.; ZHANG, MEI-JIE

    2015-01-01

    With competing risks data, one often needs to assess the treatment and covariate effects on the cumulative incidence function. Fine and Gray proposed a proportional hazards regression model for the subdistribution of a competing risk with the assumption that the censoring distribution and the covariates are independent. Covariate-dependent censoring sometimes occurs in medical studies. In this paper, we study the proportional hazards regression model for the subdistribution of a competing risk with proper adjustments for covariate-dependent censoring. We consider a covariate-adjusted weight function by fitting the Cox model for the censoring distribution and using the predictive probability for each individual. Our simulation study shows that the covariate-adjusted weight estimator is basically unbiased when the censoring time depends on the covariates, and the covariate-adjusted weight approach works well for the variance estimator as well. We illustrate our methods with bone marrow transplant data from the Center for International Blood and Marrow Transplant Research (CIBMTR). Here cancer relapse and death in complete remission are two competing risks. PMID:27034534

  13. The performance of automated case-mix adjustment regression model building methods in a health outcome prediction setting.

    PubMed

    Jen, Min-Hua; Bottle, Alex; Kirkwood, Graham; Johnston, Ron; Aylin, Paul

    2011-09-01

    We have previously described a system for monitoring a number of healthcare outcomes using case-mix adjustment models. It is desirable to automate the model fitting process in such a system if monitoring covers a large number of outcome measures or subgroup analyses. Our aim was to compare the performance of three different variable selection strategies: "manual", "automated" backward elimination and re-categorisation, and including all variables at once, irrespective of their apparent importance, with automated re-categorisation. Logistic regression models for predicting in-hospital mortality and emergency readmission within 28 days were fitted to an administrative database for 78 diagnosis groups and 126 procedures from 1996 to 2006 for National Health Services hospital trusts in England. The performance of models was assessed with Receiver Operating Characteristic (ROC) c statistics, (measuring discrimination) and Brier score (assessing the average of the predictive accuracy). Overall, discrimination was similar for diagnoses and procedures and consistently better for mortality than for emergency readmission. Brier scores were generally low overall (showing higher accuracy) and were lower for procedures than diagnoses, with a few exceptions for emergency readmission within 28 days. Among the three variable selection strategies, the automated procedure had similar performance to the manual method in almost all cases except low-risk groups with few outcome events. For the rapid generation of multiple case-mix models we suggest applying automated modelling to reduce the time required, in particular when examining different outcomes of large numbers of procedures and diseases in routinely collected administrative health data.

  14. Unitary Response Regression Models

    ERIC Educational Resources Information Center

    Lipovetsky, S.

    2007-01-01

    The dependent variable in a regular linear regression is a numerical variable, and in a logistic regression it is a binary or categorical variable. In these models the dependent variable has varying values. However, there are problems yielding an identity output of a constant value which can also be modelled in a linear or logistic regression with…

  15. Using an Adjusted Serfling Regression Model to Improve the Early Warning at the Arrival of Peak Timing of Influenza in Beijing

    PubMed Central

    Wang, Xiaoli; Wu, Shuangsheng; MacIntyre, C. Raina; Zhang, Hongbin; Shi, Weixian; Peng, Xiaomin; Duan, Wei; Yang, Peng; Zhang, Yi; Wang, Quanyi

    2015-01-01

    Serfling-type periodic regression models have been widely used to identify and analyse epidemic of influenza. In these approaches, the baseline is traditionally determined using cleaned historical non-epidemic data. However, we found that the previous exclusion of epidemic seasons was empirical, since year-year variations in the seasonal pattern of activity had been ignored. Therefore, excluding fixed ‘epidemic’ months did not seem reasonable. We made some adjustments in the rule of epidemic-period removal to avoid potentially subjective definition of the start and end of epidemic periods. We fitted the baseline iteratively. Firstly, we established a Serfling regression model based on the actual observations without any removals. After that, instead of manually excluding a predefined ‘epidemic’ period (the traditional method), we excluded observations which exceeded a calculated boundary. We then established Serfling regression once more using the cleaned data and excluded observations which exceeded a calculated boundary. We repeated this process until the R2 value stopped to increase. In addition, the definitions of the onset of influenza epidemic were heterogeneous, which might make it impossible to accurately evaluate the performance of alternative approaches. We then used this modified model to detect the peak timing of influenza instead of the onset of epidemic and compared this model with traditional Serfling models using observed weekly case counts of influenza-like illness (ILIs), in terms of sensitivity, specificity and lead time. A better performance was observed. In summary, we provide an adjusted Serfling model which may have improved performance over traditional models in early warning at arrival of peak timing of influenza. PMID:25756205

  16. Data for and adjusted regional regression models of volume and quality of urban storm-water runoff in Boise and Garden City, Idaho, 1993-94

    USGS Publications Warehouse

    Kjelstrom, L.C.

    1995-01-01

    Previously developed U.S. Geological Survey regional regression models of runoff and 11 chemical constituents were evaluated to assess their suitability for use in urban areas in Boise and Garden City. Data collected in the study area were used to develop adjusted regional models of storm-runoff volumes and mean concentrations and loads of chemical oxygen demand, dissolved and suspended solids, total nitrogen and total ammonia plus organic nitrogen as nitrogen, total and dissolved phosphorus, and total recoverable cadmium, copper, lead, and zinc. Explanatory variables used in these models were drainage area, impervious area, land-use information, and precipitation data. Mean annual runoff volume and loads at the five outfalls were estimated from 904 individual storms during 1976 through 1993. Two methods were used to compute individual storm loads. The first method used adjusted regional models of storm loads and the second used adjusted regional models for mean concentration and runoff volume. For large storms, the first method seemed to produce excessively high loads for some constituents and the second method provided more reliable results for all constituents except suspended solids. The first method provided more reliable results for large storms for suspended solids.

  17. Small-Sample Adjustments for Tests of Moderators and Model Fit in Robust Variance Estimation in Meta-Regression

    ERIC Educational Resources Information Center

    Tipton, Elizabeth; Pustejovsky, James E.

    2015-01-01

    Randomized experiments are commonly used to evaluate the effectiveness of educational interventions. The goal of the present investigation is to develop small-sample corrections for multiple contrast hypothesis tests (i.e., F-tests) such as the omnibus test of meta-regression fit or a test for equality of three or more levels of a categorical…

  18. A note on calculating asymptotic confidence intervals for the adjusted risk difference and number needed to treat in the Cox regression model.

    PubMed

    Laubender, Ruediger P; Bender, Ralf

    2014-02-28

    Recently, Laubender and Bender (Stat. Med. 2010; 29: 851-859) applied the average risk difference (RD) approach to estimate adjusted RD and corresponding number needed to treat measures in the Cox proportional hazards model. We calculated standard errors and confidence intervals by using bootstrap techniques. In this paper, we develop asymptotic variance estimates of the adjusted RD measures and corresponding asymptotic confidence intervals within the counting process theory and evaluated them in a simulation study. We illustrate the use of the asymptotic confidence intervals by means of data of the Düsseldorf Obesity Mortality Study.

  19. Regularized Regression Versus the High-Dimensional Propensity Score for Confounding Adjustment in Secondary Database Analyses.

    PubMed

    Franklin, Jessica M; Eddings, Wesley; Glynn, Robert J; Schneeweiss, Sebastian

    2015-10-01

    Selection and measurement of confounders is critical for successful adjustment in nonrandomized studies. Although the principles behind confounder selection are now well established, variable selection for confounder adjustment remains a difficult problem in practice, particularly in secondary analyses of databases. We present a simulation study that compares the high-dimensional propensity score algorithm for variable selection with approaches that utilize direct adjustment for all potential confounders via regularized regression, including ridge regression and lasso regression. Simulations were based on 2 previously published pharmacoepidemiologic cohorts and used the plasmode simulation framework to create realistic simulated data sets with thousands of potential confounders. Performance of methods was evaluated with respect to bias and mean squared error of the estimated effects of a binary treatment. Simulation scenarios varied the true underlying outcome model, treatment effect, prevalence of exposure and outcome, and presence of unmeasured confounding. Across scenarios, high-dimensional propensity score approaches generally performed better than regularized regression approaches. However, including the variables selected by lasso regression in a regular propensity score model also performed well and may provide a promising alternative variable selection method.

  20. Survival Data and Regression Models

    NASA Astrophysics Data System (ADS)

    Grégoire, G.

    2014-12-01

    We start this chapter by introducing some basic elements for the analysis of censored survival data. Then we focus on right censored data and develop two types of regression models. The first one concerns the so-called accelerated failure time models (AFT), which are parametric models where a function of a parameter depends linearly on the covariables. The second one is a semiparametric model, where the covariables enter in a multiplicative form in the expression of the hazard rate function. The main statistical tool for analysing these regression models is the maximum likelihood methodology and, in spite we recall some essential results about the ML theory, we refer to the chapter "Logistic Regression" for a more detailed presentation.

  1. Introduction to the use of regression models in epidemiology.

    PubMed

    Bender, Ralf

    2009-01-01

    Regression modeling is one of the most important statistical techniques used in analytical epidemiology. By means of regression models the effect of one or several explanatory variables (e.g., exposures, subject characteristics, risk factors) on a response variable such as mortality or cancer can be investigated. From multiple regression models, adjusted effect estimates can be obtained that take the effect of potential confounders into account. Regression methods can be applied in all epidemiologic study designs so that they represent a universal tool for data analysis in epidemiology. Different kinds of regression models have been developed in dependence on the measurement scale of the response variable and the study design. The most important methods are linear regression for continuous outcomes, logistic regression for binary outcomes, Cox regression for time-to-event data, and Poisson regression for frequencies and rates. This chapter provides a nontechnical introduction to these regression models with illustrating examples from cancer research.

  2. Interaction Models for Functional Regression

    PubMed Central

    USSET, JOSEPH; STAICU, ANA-MARIA; MAITY, ARNAB

    2015-01-01

    A functional regression model with a scalar response and multiple functional predictors is proposed that accommodates two-way interactions in addition to their main effects. The proposed estimation procedure models the main effects using penalized regression splines, and the interaction effect by a tensor product basis. Extensions to generalized linear models and data observed on sparse grids or with measurement error are presented. A hypothesis testing procedure for the functional interaction effect is described. The proposed method can be easily implemented through existing software. Numerical studies show that fitting an additive model in the presence of interaction leads to both poor estimation performance and lost prediction power, while fitting an interaction model where there is in fact no interaction leads to negligible losses. The methodology is illustrated on the AneuRisk65 study data. PMID:26744549

  3. Evaluating Differential Effects Using Regression Interactions and Regression Mixture Models

    ERIC Educational Resources Information Center

    Van Horn, M. Lee; Jaki, Thomas; Masyn, Katherine; Howe, George; Feaster, Daniel J.; Lamont, Andrea E.; George, Melissa R. W.; Kim, Minjung

    2015-01-01

    Research increasingly emphasizes understanding differential effects. This article focuses on understanding regression mixture models, which are relatively new statistical methods for assessing differential effects by comparing results to using an interactive term in linear regression. The research questions which each model answers, their…

  4. Evaluating differential effects using regression interactions and regression mixture models

    PubMed Central

    Van Horn, M. Lee; Jaki, Thomas; Masyn, Katherine; Howe, George; Feaster, Daniel J.; Lamont, Andrea E.; George, Melissa R. W.; Kim, Minjung

    2015-01-01

    Research increasingly emphasizes understanding differential effects. This paper focuses on understanding regression mixture models, a relatively new statistical methods for assessing differential effects by comparing results to using an interactive term in linear regression. The research questions which each model answers, their formulation, and their assumptions are compared using Monte Carlo simulations and real data analysis. The capabilities of regression mixture models are described and specific issues to be addressed when conducting regression mixtures are proposed. The paper aims to clarify the role that regression mixtures can take in the estimation of differential effects and increase awareness of the benefits and potential pitfalls of this approach. Regression mixture models are shown to be a potentially effective exploratory method for finding differential effects when these effects can be defined by a small number of classes of respondents who share a typical relationship between a predictor and an outcome. It is also shown that the comparison between regression mixture models and interactions becomes substantially more complex as the number of classes increases. It is argued that regression interactions are well suited for direct tests of specific hypotheses about differential effects and regression mixtures provide a useful approach for exploring effect heterogeneity given adequate samples and study design. PMID:26556903

  5. Assessing Longitudinal Change: Adjustment for Regression to the Mean Effects

    ERIC Educational Resources Information Center

    Rocconi, Louis M.; Ethington, Corinna A.

    2009-01-01

    Pascarella (J Coll Stud Dev 47:508-520, 2006) has called for an increase in use of longitudinal data with pretest-posttest design when studying effects on college students. However, such designs that use multiple measures to document change are vulnerable to an important threat to internal validity, regression to the mean. Herein, we discuss a…

  6. Using Quantile and Asymmetric Least Squares Regression for Optimal Risk Adjustment.

    PubMed

    Lorenz, Normann

    2016-06-13

    In this paper, we analyze optimal risk adjustment for direct risk selection (DRS). Integrating insurers' activities for risk selection into a discrete choice model of individuals' health insurance choice shows that DRS has the structure of a contest. For the contest success function (csf) used in most of the contest literature (the Tullock-csf), optimal transfers for a risk adjustment scheme have to be determined by means of a restricted quantile regression, irrespective of whether insurers are primarily engaged in positive DRS (attracting low risks) or negative DRS (repelling high risks). This is at odds with the common practice of determining transfers by means of a least squares regression. However, this common practice can be rationalized for a new csf, but only if positive and negative DRSs are equally important; if they are not, optimal transfers have to be calculated by means of a restricted asymmetric least squares regression. Using data from German and Swiss health insurers, we find considerable differences between the three types of regressions. Optimal transfers therefore critically depend on which csf represents insurers' incentives for DRS and, if it is not the Tullock-csf, whether insurers are primarily engaged in positive or negative DRS. Copyright © 2016 John Wiley & Sons, Ltd.

  7. Moment Adjusted Imputation for Multivariate Measurement Error Data with Applications to Logistic Regression

    PubMed Central

    Thomas, Laine; Stefanski, Leonard A.; Davidian, Marie

    2013-01-01

    In clinical studies, covariates are often measured with error due to biological fluctuations, device error and other sources. Summary statistics and regression models that are based on mismeasured data will differ from the corresponding analysis based on the “true” covariate. Statistical analysis can be adjusted for measurement error, however various methods exhibit a tradeo between convenience and performance. Moment Adjusted Imputation (MAI) is method for measurement error in a scalar latent variable that is easy to implement and performs well in a variety of settings. In practice, multiple covariates may be similarly influenced by biological fluctuastions, inducing correlated multivariate measurement error. The extension of MAI to the setting of multivariate latent variables involves unique challenges. Alternative strategies are described, including a computationally feasible option that is shown to perform well. PMID:24072947

  8. Adjustment of regional regression equations for urban storm-runoff quality using at-site data

    USGS Publications Warehouse

    Barks, C.S.

    1996-01-01

    Regional regression equations have been developed to estimate urban storm-runoff loads and mean concentrations using a national data base. Four statistical methods using at-site data to adjust the regional equation predictions were developed to provide better local estimates. The four adjustment procedures are a single-factor adjustment, a regression of the observed data against the predicted values, a regression of the observed values against the predicted values and additional local independent variables, and a weighted combination of a local regression with the regional prediction. Data collected at five representative storm-runoff sites during 22 storms in Little Rock, Arkansas, were used to verify, and, when appropriate, adjust the regional regression equation predictions. Comparison of observed values of stormrunoff loads and mean concentrations to the predicted values from the regional regression equations for nine constituents (chemical oxygen demand, suspended solids, total nitrogen as N, total ammonia plus organic nitrogen as N, total phosphorus as P, dissolved phosphorus as P, total recoverable copper, total recoverable lead, and total recoverable zinc) showed large prediction errors ranging from 63 percent to more than several thousand percent. Prediction errors for 6 of the 18 regional regression equations were less than 100 percent and could be considered reasonable for water-quality prediction equations. The regression adjustment procedure was used to adjust five of the regional equation predictions to improve the predictive accuracy. For seven of the regional equations the observed and the predicted values are not significantly correlated. Thus neither the unadjusted regional equations nor any of the adjustments were appropriate. The mean of the observed values was used as a simple estimator when the regional equation predictions and adjusted predictions were not appropriate.

  9. Model selection for logistic regression models

    NASA Astrophysics Data System (ADS)

    Duller, Christine

    2012-09-01

    Model selection for logistic regression models decides which of some given potential regressors have an effect and hence should be included in the final model. The second interesting question is whether a certain factor is heterogeneous among some subsets, i.e. whether the model should include a random intercept or not. In this paper these questions will be answered with classical as well as with Bayesian methods. The application show some results of recent research projects in medicine and business administration.

  10. An Investigation of Nonlinear Controls and Regression-Adjusted Estimators for Variance Reduction in Computer Simulation

    DTIC Science & Technology

    1991-03-01

    Adjusted Estimators for Variance 1Redutilol in Computer Simutlation by Riichiardl L. R’ r March, 1991 D~issertation Advisor: Peter A.W. Lewis Approved for...OF NONLINEAR CONTROLS AND REGRESSION-ADJUSTED ESTIMATORS FOR VARIANCE REDUCTION IN COMPUTER SIMULATION 12. Personal Author(s) Richard L. Ressler 13a...necessary and identify by block number) This dissertation develops new techniques for variance reduction in computer simulation. It demonstrates that

  11. Process modeling with the regression network.

    PubMed

    van der Walt, T; Barnard, E; van Deventer, J

    1995-01-01

    A new connectionist network topology called the regression network is proposed. The structural and underlying mathematical features of the regression network are investigated. Emphasis is placed on the intricacies of the optimization process for the regression network and some measures to alleviate these difficulties of optimization are proposed and investigated. The ability of the regression network algorithm to perform either nonparametric or parametric optimization, as well as a combination of both, is also highlighted. It is further shown how the regression network can be used to model systems which are poorly understood on the basis of sparse data. A semi-empirical regression network model is developed for a metallurgical processing operation (a hydrocyclone classifier) by building mechanistic knowledge into the connectionist structure of the regression network model. Poorly understood aspects of the process are provided for by use of nonparametric regions within the structure of the semi-empirical connectionist model. The performance of the regression network model is compared to the corresponding generalization performance results obtained by some other nonparametric regression techniques.

  12. Regression modeling of ground-water flow

    USGS Publications Warehouse

    Cooley, R.L.; Naff, R.L.

    1985-01-01

    Nonlinear multiple regression methods are developed to model and analyze groundwater flow systems. Complete descriptions of regression methodology as applied to groundwater flow models allow scientists and engineers engaged in flow modeling to apply the methods to a wide range of problems. Organization of the text proceeds from an introduction that discusses the general topic of groundwater flow modeling, to a review of basic statistics necessary to properly apply regression techniques, and then to the main topic: exposition and use of linear and nonlinear regression to model groundwater flow. Statistical procedures are given to analyze and use the regression models. A number of exercises and answers are included to exercise the student on nearly all the methods that are presented for modeling and statistical analysis. Three computer programs implement the more complex methods. These three are a general two-dimensional, steady-state regression model for flow in an anisotropic, heterogeneous porous medium, a program to calculate a measure of model nonlinearity with respect to the regression parameters, and a program to analyze model errors in computed dependent variables such as hydraulic head. (USGS)

  13. [From clinical judgment to linear regression model.

    PubMed

    Palacios-Cruz, Lino; Pérez, Marcela; Rivas-Ruiz, Rodolfo; Talavera, Juan O

    2013-01-01

    When we think about mathematical models, such as linear regression model, we think that these terms are only used by those engaged in research, a notion that is far from the truth. Legendre described the first mathematical model in 1805, and Galton introduced the formal term in 1886. Linear regression is one of the most commonly used regression models in clinical practice. It is useful to predict or show the relationship between two or more variables as long as the dependent variable is quantitative and has normal distribution. Stated in another way, the regression is used to predict a measure based on the knowledge of at least one other variable. Linear regression has as it's first objective to determine the slope or inclination of the regression line: Y = a + bx, where "a" is the intercept or regression constant and it is equivalent to "Y" value when "X" equals 0 and "b" (also called slope) indicates the increase or decrease that occurs when the variable "x" increases or decreases in one unit. In the regression line, "b" is called regression coefficient. The coefficient of determination (R(2)) indicates the importance of independent variables in the outcome.

  14. Adjusting for Cell Type Composition in DNA Methylation Data Using a Regression-Based Approach.

    PubMed

    Jones, Meaghan J; Islam, Sumaiya A; Edgar, Rachel D; Kobor, Michael S

    2017-01-01

    Analysis of DNA methylation in a population context has the potential to uncover novel gene and environment interactions as well as markers of health and disease. In order to find such associations it is important to control for factors which may mask or alter DNA methylation signatures. Since tissue of origin and coinciding cell type composition are major contributors to DNA methylation patterns, and can easily confound important findings, it is vital to adjust DNA methylation data for such differences across individuals. Here we describe the use of a regression method to adjust for cell type composition in DNA methylation data. We specifically discuss what information is required to adjust for cell type composition and then provide detailed instructions on how to perform cell type adjustment on high dimensional DNA methylation data. This method has been applied mainly to Illumina 450K data, but can also be adapted to pyrosequencing or genome-wide bisulfite sequencing data.

  15. An introduction to multilevel regression models.

    PubMed

    Austin, P C; Goel, V; van Walraven, C

    2001-01-01

    Data in health research are frequently structured hierarchically. For example, data may consist of patients nested within physicians, who in turn may be nested in hospitals or geographic regions. Fitting regression models that ignore the hierarchical structure of the data can lead to false inferences being drawn from the data. Implementing a statistical analysis that takes into account the hierarchical structure of the data requires special methodologies. In this paper, we introduce the concept of hierarchically structured data, and present an introduction to hierarchical regression models. We then compare the performance of a traditional regression model with that of a hierarchical regression model on a dataset relating test utilization at the annual health exam with patient and physician characteristics. In comparing the resultant models, we see that false inferences can be drawn by ignoring the structure of the data.

  16. Modelling of filariasis in East Java with Poisson regression and generalized Poisson regression models

    NASA Astrophysics Data System (ADS)

    Darnah

    2016-04-01

    Poisson regression has been used if the response variable is count data that based on the Poisson distribution. The Poisson distribution assumed equal dispersion. In fact, a situation where count data are over dispersion or under dispersion so that Poisson regression inappropriate because it may underestimate the standard errors and overstate the significance of the regression parameters, and consequently, giving misleading inference about the regression parameters. This paper suggests the generalized Poisson regression model to handling over dispersion and under dispersion on the Poisson regression model. The Poisson regression model and generalized Poisson regression model will be applied the number of filariasis cases in East Java. Based regression Poisson model the factors influence of filariasis are the percentage of families who don't behave clean and healthy living and the percentage of families who don't have a healthy house. The Poisson regression model occurs over dispersion so that we using generalized Poisson regression. The best generalized Poisson regression model showing the factor influence of filariasis is percentage of families who don't have healthy house. Interpretation of result the model is each additional 1 percentage of families who don't have healthy house will add 1 people filariasis patient.

  17. Regression Trees Identify Relevant Interactions: Can This Improve the Predictive Performance of Risk Adjustment?

    PubMed

    Buchner, Florian; Wasem, Jürgen; Schillo, Sonja

    2017-01-01

    Risk equalization formulas have been refined since their introduction about two decades ago. Because of the complexity and the abundance of possible interactions between the variables used, hardly any interactions are considered. A regression tree is used to systematically search for interactions, a methodologically new approach in risk equalization. Analyses are based on a data set of nearly 2.9 million individuals from a major German social health insurer. A two-step approach is applied: In the first step a regression tree is built on the basis of the learning data set. Terminal nodes characterized by more than one morbidity-group-split represent interaction effects of different morbidity groups. In the second step the 'traditional' weighted least squares regression equation is expanded by adding interaction terms for all interactions detected by the tree, and regression coefficients are recalculated. The resulting risk adjustment formula shows an improvement in the adjusted R(2) from 25.43% to 25.81% on the evaluation data set. Predictive ratios are calculated for subgroups affected by the interactions. The R(2) improvement detected is only marginal. According to the sample level performance measures used, not involving a considerable number of morbidity interactions forms no relevant loss in accuracy. Copyright © 2015 John Wiley & Sons, Ltd.

  18. A new bivariate negative binomial regression model

    NASA Astrophysics Data System (ADS)

    Faroughi, Pouya; Ismail, Noriszura

    2014-12-01

    This paper introduces a new form of bivariate negative binomial (BNB-1) regression which can be fitted to bivariate and correlated count data with covariates. The BNB regression discussed in this study can be fitted to bivariate and overdispersed count data with positive, zero or negative correlations. The joint p.m.f. of the BNB1 distribution is derived from the product of two negative binomial marginals with a multiplicative factor parameter. Several testing methods were used to check overdispersion and goodness-of-fit of the model. Application of BNB-1 regression is illustrated on Malaysian motor insurance dataset. The results indicated that BNB-1 regression has better fit than bivariate Poisson and BNB-2 models with regards to Akaike information criterion.

  19. A Spline Regression Model for Latent Variables

    ERIC Educational Resources Information Center

    Harring, Jeffrey R.

    2014-01-01

    Spline (or piecewise) regression models have been used in the past to account for patterns in observed data that exhibit distinct phases. The changepoint or knot marking the shift from one phase to the other, in many applications, is an unknown parameter to be estimated. As an extension of this framework, this research considers modeling the…

  20. Parametric modeling of quantile regression coefficient functions.

    PubMed

    Frumento, Paolo; Bottai, Matteo

    2016-03-01

    Estimating the conditional quantiles of outcome variables of interest is frequent in many research areas, and quantile regression is foremost among the utilized methods. The coefficients of a quantile regression model depend on the order of the quantile being estimated. For example, the coefficients for the median are generally different from those of the 10th centile. In this article, we describe an approach to modeling the regression coefficients as parametric functions of the order of the quantile. This approach may have advantages in terms of parsimony, efficiency, and may expand the potential of statistical modeling. Goodness-of-fit measures and testing procedures are discussed, and the results of a simulation study are presented. We apply the method to analyze the data that motivated this work. The described method is implemented in the qrcm R package.

  1. Model building strategy for logistic regression: purposeful selection.

    PubMed

    Zhang, Zhongheng

    2016-03-01

    Logistic regression is one of the most commonly used models to account for confounders in medical literature. The article introduces how to perform purposeful selection model building strategy with R. I stress on the use of likelihood ratio test to see whether deleting a variable will have significant impact on model fit. A deleted variable should also be checked for whether it is an important adjustment of remaining covariates. Interaction should be checked to disentangle complex relationship between covariates and their synergistic effect on response variable. Model should be checked for the goodness-of-fit (GOF). In other words, how the fitted model reflects the real data. Hosmer-Lemeshow GOF test is the most widely used for logistic regression model.

  2. Regression models for estimating coseismic landslide displacement

    USGS Publications Warehouse

    Jibson, R.W.

    2007-01-01

    Newmark's sliding-block model is widely used to estimate coseismic slope performance. Early efforts to develop simple regression models to estimate Newmark displacement were based on analysis of the small number of strong-motion records then available. The current availability of a much larger set of strong-motion records dictates that these regression equations be updated. Regression equations were generated using data derived from a collection of 2270 strong-motion records from 30 worldwide earthquakes. The regression equations predict Newmark displacement in terms of (1) critical acceleration ratio, (2) critical acceleration ratio and earthquake magnitude, (3) Arias intensity and critical acceleration, and (4) Arias intensity and critical acceleration ratio. These equations are well constrained and fit the data well (71% < R2 < 88%), but they have standard deviations of about 0.5 log units, such that the range defined by the mean ?? one standard deviation spans about an order of magnitude. These regression models, therefore, are not recommended for use in site-specific design, but rather for regional-scale seismic landslide hazard mapping or for rapid preliminary screening of sites. ?? 2007 Elsevier B.V. All rights reserved.

  3. Parametric regression model for survival data: Weibull regression model as an example

    PubMed Central

    2016-01-01

    Weibull regression model is one of the most popular forms of parametric regression model that it provides estimate of baseline hazard function, as well as coefficients for covariates. Because of technical difficulties, Weibull regression model is seldom used in medical literature as compared to the semi-parametric proportional hazard model. To make clinical investigators familiar with Weibull regression model, this article introduces some basic knowledge on Weibull regression model and then illustrates how to fit the model with R software. The SurvRegCensCov package is useful in converting estimated coefficients to clinical relevant statistics such as hazard ratio (HR) and event time ratio (ETR). Model adequacy can be assessed by inspecting Kaplan-Meier curves stratified by categorical variable. The eha package provides an alternative method to model Weibull regression model. The check.dist() function helps to assess goodness-of-fit of the model. Variable selection is based on the importance of a covariate, which can be tested using anova() function. Alternatively, backward elimination starting from a full model is an efficient way for model development. Visualization of Weibull regression model after model development is interesting that it provides another way to report your findings. PMID:28149846

  4. Model building in nonproportional hazard regression.

    PubMed

    Rodríguez-Girondo, Mar; Kneib, Thomas; Cadarso-Suárez, Carmen; Abu-Assi, Emad

    2013-12-30

    Recent developments of statistical methods allow for a very flexible modeling of covariates affecting survival times via the hazard rate, including also the inspection of possible time-dependent associations. Despite their immediate appeal in terms of flexibility, these models typically introduce additional difficulties when a subset of covariates and the corresponding modeling alternatives have to be chosen, that is, for building the most suitable model for given data. This is particularly true when potentially time-varying associations are given. We propose to conduct a piecewise exponential representation of the original survival data to link hazard regression with estimation schemes based on of the Poisson likelihood to make recent advances for model building in exponential family regression accessible also in the nonproportional hazard regression context. A two-stage stepwise selection approach, an approach based on doubly penalized likelihood, and a componentwise functional gradient descent approach are adapted to the piecewise exponential regression problem. These three techniques were compared via an intensive simulation study. An application to prognosis after discharge for patients who suffered a myocardial infarction supplements the simulation to demonstrate the pros and cons of the approaches in real data analyses.

  5. A Skew-Normal Mixture Regression Model

    ERIC Educational Resources Information Center

    Liu, Min; Lin, Tsung-I

    2014-01-01

    A challenge associated with traditional mixture regression models (MRMs), which rest on the assumption of normally distributed errors, is determining the number of unobserved groups. Specifically, even slight deviations from normality can lead to the detection of spurious classes. The current work aims to (a) examine how sensitive the commonly…

  6. Variable Selection in Semiparametric Regression Modeling.

    PubMed

    Li, Runze; Liang, Hua

    2008-01-01

    In this paper, we are concerned with how to select significant variables in semiparametric modeling. Variable selection for semiparametric regression models consists of two components: model selection for nonparametric components and select significant variables for parametric portion. Thus, it is much more challenging than that for parametric models such as linear models and generalized linear models because traditional variable selection procedures including stepwise regression and the best subset selection require model selection to nonparametric components for each submodel. This leads to very heavy computational burden. In this paper, we propose a class of variable selection procedures for semiparametric regression models using nonconcave penalized likelihood. The newly proposed procedures are distinguished from the traditional ones in that they delete insignificant variables and estimate the coefficients of significant variables simultaneously. This allows us to establish the sampling properties of the resulting estimate. We first establish the rate of convergence of the resulting estimate. With proper choices of penalty functions and regularization parameters, we then establish the asymptotic normality of the resulting estimate, and further demonstrate that the proposed procedures perform as well as an oracle procedure. Semiparametric generalized likelihood ratio test is proposed to select significant variables in the nonparametric component. We investigate the asymptotic behavior of the proposed test and demonstrate its limiting null distribution follows a chi-squared distribution, which is independent of the nuisance parameters. Extensive Monte Carlo simulation studies are conducted to examine the finite sample performance of the proposed variable selection procedures.

  7. A regression model analysis of longitudinal dental caries data.

    PubMed

    Ringelberg, M L; Tonascia, J A

    1976-03-01

    Longitudinal data on caries experience were derived from the reexamination and interview of a cohort of 306 subjects with an average follow-up period of 33 years after the baseline examination. Analysis of the data was accomplished by the use of contingency tables utilizing enumeration statistics compared with a multiple regression analysis. The analyses indicated a strong association of caries experience at one point in time with the caries experience of that same person earlier in life. The regression model approach offers adjustment of any given independent variable for the effect of all other independent variables, providing a powerful means of bias reduction. The model is also useful in separating out the specific effect of an independent variable over and above the contribution of other variables. The model used explained 35% of the variability in the DMFS scores recorded. Similar models could be useful adjuncts in the analyses of dental epidemiologic data.

  8. Regularized logistic regression with adjusted adaptive elastic net for gene selection in high dimensional cancer classification.

    PubMed

    Algamal, Zakariya Yahya; Lee, Muhammad Hisyam

    2015-12-01

    Cancer classification and gene selection in high-dimensional data have been popular research topics in genetics and molecular biology. Recently, adaptive regularized logistic regression using the elastic net regularization, which is called the adaptive elastic net, has been successfully applied in high-dimensional cancer classification to tackle both estimating the gene coefficients and performing gene selection simultaneously. The adaptive elastic net originally used elastic net estimates as the initial weight, however, using this weight may not be preferable for certain reasons: First, the elastic net estimator is biased in selecting genes. Second, it does not perform well when the pairwise correlations between variables are not high. Adjusted adaptive regularized logistic regression (AAElastic) is proposed to address these issues and encourage grouping effects simultaneously. The real data results indicate that AAElastic is significantly consistent in selecting genes compared to the other three competitor regularization methods. Additionally, the classification performance of AAElastic is comparable to the adaptive elastic net and better than other regularization methods. Thus, we can conclude that AAElastic is a reliable adaptive regularized logistic regression method in the field of high-dimensional cancer classification.

  9. Semiparametric regression in capture-recapture modeling.

    PubMed

    Gimenez, O; Crainiceanu, C; Barbraud, C; Jenouvrier, S; Morgan, B J T

    2006-09-01

    Capture-recapture models were developed to estimate survival using data arising from marking and monitoring wild animals over time. Variation in survival may be explained by incorporating relevant covariates. We propose nonparametric and semiparametric regression methods for estimating survival in capture-recapture models. A fully Bayesian approach using Markov chain Monte Carlo simulations was employed to estimate the model parameters. The work is illustrated by a study of Snow petrels, in which survival probabilities are expressed as nonlinear functions of a climate covariate, using data from a 40-year study on marked individuals, nesting at Petrels Island, Terre Adélie.

  10. Modeling confounding by half-sibling regression

    PubMed Central

    Schölkopf, Bernhard; Hogg, David W.; Wang, Dun; Foreman-Mackey, Daniel; Janzing, Dominik; Simon-Gabriel, Carl-Johann; Peters, Jonas

    2016-01-01

    We describe a method for removing the effect of confounders to reconstruct a latent quantity of interest. The method, referred to as “half-sibling regression,” is inspired by recent work in causal inference using additive noise models. We provide a theoretical justification, discussing both independent and identically distributed as well as time series data, respectively, and illustrate the potential of the method in a challenging astronomy application. PMID:27382154

  11. Modeling confounding by half-sibling regression.

    PubMed

    Schölkopf, Bernhard; Hogg, David W; Wang, Dun; Foreman-Mackey, Daniel; Janzing, Dominik; Simon-Gabriel, Carl-Johann; Peters, Jonas

    2016-07-05

    We describe a method for removing the effect of confounders to reconstruct a latent quantity of interest. The method, referred to as "half-sibling regression," is inspired by recent work in causal inference using additive noise models. We provide a theoretical justification, discussing both independent and identically distributed as well as time series data, respectively, and illustrate the potential of the method in a challenging astronomy application.

  12. An operational GLS model for hydrologic regression

    USGS Publications Warehouse

    Tasker, Gary D.; Stedinger, J.R.

    1989-01-01

    Recent Monte Carlo studies have documented the value of generalized least squares (GLS) procedures to estimate empirical relationships between streamflow statistics and physiographic basin characteristics. This paper presents a number of extensions of the GLS method that deal with realities and complexities of regional hydrologic data sets that were not addressed in the simulation studies. These extensions include: (1) a more realistic model of the underlying model errors; (2) smoothed estimates of cross correlation of flows; (3) procedures for including historical flow data; (4) diagnostic statistics describing leverage and influence for GLS regression; and (5) the formulation of a mathematical program for evaluating future gaging activities. ?? 1989.

  13. Validation data-based adjustments for outcome misclassification in logistic regression: an illustration.

    PubMed

    Lyles, Robert H; Tang, Li; Superak, Hillary M; King, Caroline C; Celentano, David D; Lo, Yungtai; Sobel, Jack D

    2011-07-01

    Misclassification of binary outcome variables is a known source of potentially serious bias when estimating adjusted odds ratios. Although researchers have described frequentist and Bayesian methods for dealing with the problem, these methods have seldom fully bridged the gap between statistical research and epidemiologic practice. In particular, there have been few real-world applications of readily grasped and computationally accessible methods that make direct use of internal validation data to adjust for differential outcome misclassification in logistic regression. In this paper, we illustrate likelihood-based methods for this purpose that can be implemented using standard statistical software. Using main study and internal validation data from the HIV Epidemiology Research Study, we demonstrate how misclassification rates can depend on the values of subject-specific covariates, and we illustrate the importance of accounting for this dependence. Simulation studies confirm the effectiveness of the maximum likelihood approach. We emphasize clear exposition of the likelihood function itself, to permit the reader to easily assimilate appended computer code that facilitates sensitivity analyses as well as the efficient handling of main/external and main/internal validation-study data. These methods are readily applicable under random cross-sectional sampling, and we discuss the extent to which the main/internal analysis remains appropriate under outcome-dependent (case-control) sampling.

  14. Time series regression model for infectious disease and weather.

    PubMed

    Imai, Chisato; Armstrong, Ben; Chalabi, Zaid; Mangtani, Punam; Hashizume, Masahiro

    2015-10-01

    Time series regression has been developed and long used to evaluate the short-term associations of air pollution and weather with mortality or morbidity of non-infectious diseases. The application of the regression approaches from this tradition to infectious diseases, however, is less well explored and raises some new issues. We discuss and present potential solutions for five issues often arising in such analyses: changes in immune population, strong autocorrelations, a wide range of plausible lag structures and association patterns, seasonality adjustments, and large overdispersion. The potential approaches are illustrated with datasets of cholera cases and rainfall from Bangladesh and influenza and temperature in Tokyo. Though this article focuses on the application of the traditional time series regression to infectious diseases and weather factors, we also briefly introduce alternative approaches, including mathematical modeling, wavelet analysis, and autoregressive integrated moving average (ARIMA) models. Modifications proposed to standard time series regression practice include using sums of past cases as proxies for the immune population, and using the logarithm of lagged disease counts to control autocorrelation due to true contagion, both of which are motivated from "susceptible-infectious-recovered" (SIR) models. The complexity of lag structures and association patterns can often be informed by biological mechanisms and explored by using distributed lag non-linear models. For overdispersed models, alternative distribution models such as quasi-Poisson and negative binomial should be considered. Time series regression can be used to investigate dependence of infectious diseases on weather, but may need modifying to allow for features specific to this context.

  15. Regression Models For Saffron Yields in Iran

    NASA Astrophysics Data System (ADS)

    S. H, Sanaeinejad; S. N, Hosseini

    Saffron is an important crop in social and economical aspects in Khorassan Province (Northeast of Iran). In this research wetried to evaluate trends of saffron yield in recent years and to study the relationship between saffron yield and the climate change. A regression analysis was used to predict saffron yield based on 20 years of yield data in Birjand, Ghaen and Ferdows cities.Climatologically data for the same periods was provided by database of Khorassan Climatology Center. Climatologically data includedtemperature, rainfall, relative humidity and sunshine hours for ModelI, and temperature and rainfall for Model II. The results showed the coefficients of determination for Birjand, Ferdows and Ghaen for Model I were 0.69, 0.50 and 0.81 respectively. Also coefficients of determination for the same cities for model II were 0.53, 0.50 and 0.72 respectively. Multiple regression analysisindicated that among weather variables, temperature was the key parameter for variation ofsaffron yield. It was concluded that increasing temperature at spring was the main cause of declined saffron yield during recent years across the province. Finally, yield trend was predicted for the last 5 years using time series analysis.

  16. Logistic models--an odd(s) kind of regression.

    PubMed

    Jupiter, Daniel C

    2013-01-01

    The logistic regression model bears some similarity to the multivariable linear regression with which we are familiar. However, the differences are great enough to warrant a discussion of the need for and interpretation of logistic regression.

  17. Quantile regression modeling for Malaysian automobile insurance premium data

    NASA Astrophysics Data System (ADS)

    Fuzi, Mohd Fadzli Mohd; Ismail, Noriszura; Jemain, Abd Aziz

    2015-09-01

    Quantile regression is a robust regression to outliers compared to mean regression models. Traditional mean regression models like Generalized Linear Model (GLM) are not able to capture the entire distribution of premium data. In this paper we demonstrate how a quantile regression approach can be used to model net premium data to study the effects of change in the estimates of regression parameters (rating classes) on the magnitude of response variable (pure premium). We then compare the results of quantile regression model with Gamma regression model. The results from quantile regression show that some rating classes increase as quantile increases and some decrease with decreasing quantile. Further, we found that the confidence interval of median regression (τ = O.5) is always smaller than Gamma regression in all risk factors.

  18. Inferring gene regression networks with model trees

    PubMed Central

    2010-01-01

    Background Novel strategies are required in order to handle the huge amount of data produced by microarray technologies. To infer gene regulatory networks, the first step is to find direct regulatory relationships between genes building the so-called gene co-expression networks. They are typically generated using correlation statistics as pairwise similarity measures. Correlation-based methods are very useful in order to determine whether two genes have a strong global similarity but do not detect local similarities. Results We propose model trees as a method to identify gene interaction networks. While correlation-based methods analyze each pair of genes, in our approach we generate a single regression tree for each gene from the remaining genes. Finally, a graph from all the relationships among output and input genes is built taking into account whether the pair of genes is statistically significant. For this reason we apply a statistical procedure to control the false discovery rate. The performance of our approach, named REGNET, is experimentally tested on two well-known data sets: Saccharomyces Cerevisiae and E.coli data set. First, the biological coherence of the results are tested. Second the E.coli transcriptional network (in the Regulon database) is used as control to compare the results to that of a correlation-based method. This experiment shows that REGNET performs more accurately at detecting true gene associations than the Pearson and Spearman zeroth and first-order correlation-based methods. Conclusions REGNET generates gene association networks from gene expression data, and differs from correlation-based methods in that the relationship between one gene and others is calculated simultaneously. Model trees are very useful techniques to estimate the numerical values for the target genes by linear regression functions. They are very often more precise than linear regression models because they can add just different linear regressions to separate

  19. Quantile Regression Models for Current Status Data.

    PubMed

    Ou, Fang-Shu; Zeng, Donglin; Cai, Jianwen

    2016-11-01

    Current status data arise frequently in demography, epidemiology, and econometrics where the exact failure time cannot be determined but is only known to have occurred before or after a known observation time. We propose a quantile regression model to analyze current status data, because it does not require distributional assumptions and the coefficients can be interpreted as direct regression effects on the distribution of failure time in the original time scale. Our model assumes that the conditional quantile of failure time is a linear function of covariates. We assume conditional independence between the failure time and observation time. An M-estimator is developed for parameter estimation which is computed using the concave-convex procedure and its confidence intervals are constructed using a subsampling method. Asymptotic properties for the estimator are derived and proven using modern empirical process theory. The small sample performance of the proposed method is demonstrated via simulation studies. Finally, we apply the proposed method to analyze data from the Mayo Clinic Study of Aging.

  20. Adjustments to de Leva-anthropometric regression data for the changes in body proportions in elderly humans.

    PubMed

    Ho Hoang, Khai-Long; Mombaur, Katja

    2015-10-15

    Dynamic modeling of the human body is an important tool to investigate the fundamentals of the biomechanics of human movement. To model the human body in terms of a multi-body system, it is necessary to know the anthropometric parameters of the body segments. For young healthy subjects, several data sets exist that are widely used in the research community, e.g. the tables provided by de Leva. None such comprehensive anthropometric parameter sets exist for elderly people. It is, however, well known that body proportions change significantly during aging, e.g. due to degenerative effects in the spine, such that parameters for young people cannot be used for realistically simulating the dynamics of elderly people. In this study, regression equations are derived from the inertial parameters, center of mass positions, and body segment lengths provided by de Leva to be adjustable to the changes in proportion of the body parts of male and female humans due to aging. Additional adjustments are made to the reference points of the parameters for the upper body segments as they are chosen in a more practicable way in the context of creating a multi-body model in a chain structure with the pelvis representing the most proximal segment.

  1. Survival Regression Modeling Strategies in CVD Prediction

    PubMed Central

    Barkhordari, Mahnaz; Padyab, Mojgan; Sardarinia, Mahsa; Hadaegh, Farzad; Azizi, Fereidoun; Bozorgmanesh, Mohammadreza

    2016-01-01

    Background A fundamental part of prevention is prediction. Potential predictors are the sine qua non of prediction models. However, whether incorporating novel predictors to prediction models could be directly translated to added predictive value remains an area of dispute. The difference between the predictive power of a predictive model with (enhanced model) and without (baseline model) a certain predictor is generally regarded as an indicator of the predictive value added by that predictor. Indices such as discrimination and calibration have long been used in this regard. Recently, the use of added predictive value has been suggested while comparing the predictive performances of the predictive models with and without novel biomarkers. Objectives User-friendly statistical software capable of implementing novel statistical procedures is conspicuously lacking. This shortcoming has restricted implementation of such novel model assessment methods. We aimed to construct Stata commands to help researchers obtain the aforementioned statistical indices. Materials and Methods We have written Stata commands that are intended to help researchers obtain the following. 1, Nam-D’Agostino X2 goodness of fit test; 2, Cut point-free and cut point-based net reclassification improvement index (NRI), relative absolute integrated discriminatory improvement index (IDI), and survival-based regression analyses. We applied the commands to real data on women participating in the Tehran lipid and glucose study (TLGS) to examine if information relating to a family history of premature cardiovascular disease (CVD), waist circumference, and fasting plasma glucose can improve predictive performance of Framingham’s general CVD risk algorithm. Results The command is adpredsurv for survival models. Conclusions Herein we have described the Stata package “adpredsurv” for calculation of the Nam-D’Agostino X2 goodness of fit test as well as cut point-free and cut point-based NRI, relative

  2. Poisson Mixture Regression Models for Heart Disease Prediction.

    PubMed

    Mufudza, Chipo; Erol, Hamza

    2016-01-01

    Early heart disease control can be achieved by high disease prediction and diagnosis efficiency. This paper focuses on the use of model based clustering techniques to predict and diagnose heart disease via Poisson mixture regression models. Analysis and application of Poisson mixture regression models is here addressed under two different classes: standard and concomitant variable mixture regression models. Results show that a two-component concomitant variable Poisson mixture regression model predicts heart disease better than both the standard Poisson mixture regression model and the ordinary general linear Poisson regression model due to its low Bayesian Information Criteria value. Furthermore, a Zero Inflated Poisson Mixture Regression model turned out to be the best model for heart prediction over all models as it both clusters individuals into high or low risk category and predicts rate to heart disease componentwise given clusters available. It is deduced that heart disease prediction can be effectively done by identifying the major risks componentwise using Poisson mixture regression model.

  3. Reconstruction of missing daily streamflow data using dynamic regression models

    NASA Astrophysics Data System (ADS)

    Tencaliec, Patricia; Favre, Anne-Catherine; Prieur, Clémentine; Mathevet, Thibault

    2015-12-01

    River discharge is one of the most important quantities in hydrology. It provides fundamental records for water resources management and climate change monitoring. Even very short data-gaps in this information can cause extremely different analysis outputs. Therefore, reconstructing missing data of incomplete data sets is an important step regarding the performance of the environmental models, engineering, and research applications, thus it presents a great challenge. The objective of this paper is to introduce an effective technique for reconstructing missing daily discharge data when one has access to only daily streamflow data. The proposed procedure uses a combination of regression and autoregressive integrated moving average models (ARIMA) called dynamic regression model. This model uses the linear relationship between neighbor and correlated stations and then adjusts the residual term by fitting an ARIMA structure. Application of the model to eight daily streamflow data for the Durance river watershed showed that the model yields reliable estimates for the missing data in the time series. Simulation studies were also conducted to evaluate the performance of the procedure.

  4. Boosted Regression Tree Models to Explain Watershed ...

    EPA Pesticide Factsheets

    Boosted regression tree (BRT) models were developed to quantify the nonlinear relationships between landscape variables and nutrient concentrations in a mesoscale mixed land cover watershed during base-flow conditions. Factors that affect instream biological components, based on the Index of Biotic Integrity (IBI), were also analyzed. Seasonal BRT models at two spatial scales (watershed and riparian buffered area [RBA]) for nitrite-nitrate (NO2-NO3), total Kjeldahl nitrogen, and total phosphorus (TP) and annual models for the IBI score were developed. Two primary factors — location within the watershed (i.e., geographic position, stream order, and distance to a downstream confluence) and percentage of urban land cover (both scales) — emerged as important predictor variables. Latitude and longitude interacted with other factors to explain the variability in summer NO2-NO3 concentrations and IBI scores. BRT results also suggested that location might be associated with indicators of sources (e.g., land cover), runoff potential (e.g., soil and topographic factors), and processes not easily represented by spatial data indicators. Runoff indicators (e.g., Hydrological Soil Group D and Topographic Wetness Indices) explained a substantial portion of the variability in nutrient concentrations as did point sources for TP in the summer months. The results from our BRT approach can help prioritize areas for nutrient management in mixed-use and heavily impacted watershed

  5. Three-Dimensional Modeling in Linear Regression.

    ERIC Educational Resources Information Center

    Herman, James D.

    Linear regression examines the relationship between one or more independent (predictor) variables and a dependent variable. By using a particular formula, regression determines the weights needed to minimize the error term for a given set of predictors. With one predictor variable, the relationship between the predictor and the dependent variable…

  6. Adjustment in mothers of children with Asperger syndrome: an application of the double ABCX model of family adjustment.

    PubMed

    Pakenham, Kenneth I; Samios, Christina; Sofronoff, Kate

    2005-05-01

    The present study examined the applicability of the double ABCX model of family adjustment in explaining maternal adjustment to caring for a child diagnosed with Asperger syndrome. Forty-seven mothers completed questionnaires at a university clinic while their children were participating in an anxiety intervention. The children were aged between 10 and 12 years. Results of correlations showed that each of the model components was related to one or more domains of maternal adjustment in the direction predicted, with the exception of problem-focused coping. Hierarchical regression analyses demonstrated that, after controlling for the effects of relevant demographics, stressor severity, pile-up of demands and coping were related to adjustment. Findings indicate the utility of the double ABCX model in guiding research into parental adjustment when caring for a child with Asperger syndrome. Limitations of the study and clinical implications are discussed.

  7. Mapping Lifetime Brain Volumetry with Covariate-Adjusted Restricted Cubic Spline Regression from Cross-sectional Multi-site MRI.

    PubMed

    Huo, Yuankai; Aboud, Katherine; Kang, Hakmook; Cutting, Laurie E; Landman, Bennett A

    2016-10-01

    Understanding brain volumetry is essential to understand neurodevelopment and disease. Historically, age-related changes have been studied in detail for specific age ranges (e.g., early childhood, teen, young adults, elderly, etc.) or more sparsely sampled for wider considerations of lifetime aging. Recent advancements in data sharing and robust processing have made available considerable quantities of brain images from normal, healthy volunteers. However, existing analysis approaches have had difficulty addressing (1) complex volumetric developments on the large cohort across the life time (e.g., beyond cubic age trends), (2) accounting for confound effects, and (3) maintaining an analysis framework consistent with the general linear model (GLM) approach pervasive in neuroscience. To address these challenges, we propose to use covariate-adjusted restricted cubic spline (C-RCS) regression within a multi-site cross-sectional framework. This model allows for flexible consideration of non-linear age-associated patterns while accounting for traditional covariates and interaction effects. As a demonstration of this approach on lifetime brain aging, we derive normative volumetric trajectories and 95% confidence intervals from 5111 healthy patients from 64 sites while accounting for confounding sex, intracranial volume and field strength effects. The volumetric results are shown to be consistent with traditional studies that have explored more limited age ranges using single-site analyses. This work represents the first integration of C-RCS with neuroimaging and the derivation of structural covariance networks (SCNs) from a large study of multi-site, cross-sectional data.

  8. A comparison of least squares regression and geographically weighted regression modeling of West Nile virus risk based on environmental parameters

    PubMed Central

    Kala, Abhishek K.; Tiwari, Chetan; Mikler, Armin R.

    2017-01-01

    Background The primary aim of the study reported here was to determine the effectiveness of utilizing local spatial variations in environmental data to uncover the statistical relationships between West Nile Virus (WNV) risk and environmental factors. Because least squares regression methods do not account for spatial autocorrelation and non-stationarity of the type of spatial data analyzed for studies that explore the relationship between WNV and environmental determinants, we hypothesized that a geographically weighted regression model would help us better understand how environmental factors are related to WNV risk patterns without the confounding effects of spatial non-stationarity. Methods We examined commonly mapped environmental factors using both ordinary least squares regression (LSR) and geographically weighted regression (GWR). Both types of models were applied to examine the relationship between WNV-infected dead bird counts and various environmental factors for those locations. The goal was to determine which approach yielded a better predictive model. Results LSR efforts lead to identifying three environmental variables that were statistically significantly related to WNV infected dead birds (adjusted R2 = 0.61): stream density, road density, and land surface temperature. GWR efforts increased the explanatory value of these three environmental variables with better spatial precision (adjusted R2 = 0.71). Conclusions The spatial granularity resulting from the geographically weighted approach provides a better understanding of how environmental spatial heterogeneity is related to WNV risk as implied by WNV infected dead birds, which should allow improved planning of public health management strategies. PMID:28367364

  9. Moderation analysis using a two-level regression model.

    PubMed

    Yuan, Ke-Hai; Cheng, Ying; Maxwell, Scott

    2014-10-01

    Moderation analysis is widely used in social and behavioral research. The most commonly used model for moderation analysis is moderated multiple regression (MMR) in which the explanatory variables of the regression model include product terms, and the model is typically estimated by least squares (LS). This paper argues for a two-level regression model in which the regression coefficients of a criterion variable on predictors are further regressed on moderator variables. An algorithm for estimating the parameters of the two-level model by normal-distribution-based maximum likelihood (NML) is developed. Formulas for the standard errors (SEs) of the parameter estimates are provided and studied. Results indicate that, when heteroscedasticity exists, NML with the two-level model gives more efficient and more accurate parameter estimates than the LS analysis of the MMR model. When error variances are homoscedastic, NML with the two-level model leads to essentially the same results as LS with the MMR model. Most importantly, the two-level regression model permits estimating the percentage of variance of each regression coefficient that is due to moderator variables. When applied to data from General Social Surveys 1991, NML with the two-level model identified a significant moderation effect of race on the regression of job prestige on years of education while LS with the MMR model did not. An R package is also developed and documented to facilitate the application of the two-level model.

  10. Climate variations and salmonellosis transmission in Adelaide, South Australia: a comparison between regression models

    NASA Astrophysics Data System (ADS)

    Zhang, Ying; Bi, Peng; Hiller, Janet

    2008-01-01

    This is the first study to identify appropriate regression models for the association between climate variation and salmonellosis transmission. A comparison between different regression models was conducted using surveillance data in Adelaide, South Australia. By using notified salmonellosis cases and climatic variables from the Adelaide metropolitan area over the period 1990-2003, four regression methods were examined: standard Poisson regression, autoregressive adjusted Poisson regression, multiple linear regression, and a seasonal autoregressive integrated moving average (SARIMA) model. Notified salmonellosis cases in 2004 were used to test the forecasting ability of the four models. Parameter estimation, goodness-of-fit and forecasting ability of the four regression models were compared. Temperatures occurring 2 weeks prior to cases were positively associated with cases of salmonellosis. Rainfall was also inversely related to the number of cases. The comparison of the goodness-of-fit and forecasting ability suggest that the SARIMA model is better than the other three regression models. Temperature and rainfall may be used as climatic predictors of salmonellosis cases in regions with climatic characteristics similar to those of Adelaide. The SARIMA model could, thus, be adopted to quantify the relationship between climate variations and salmonellosis transmission.

  11. Stochastic Approximation Methods for Latent Regression Item Response Models

    ERIC Educational Resources Information Center

    von Davier, Matthias; Sinharay, Sandip

    2010-01-01

    This article presents an application of a stochastic approximation expectation maximization (EM) algorithm using a Metropolis-Hastings (MH) sampler to estimate the parameters of an item response latent regression model. Latent regression item response models are extensions of item response theory (IRT) to a latent variable model with covariates…

  12. [Evaluation of estimation of prevalence ratio using bayesian log-binomial regression model].

    PubMed

    Gao, W L; Lin, H; Liu, X N; Ren, X W; Li, J S; Shen, X P; Zhu, S L

    2017-03-10

    To evaluate the estimation of prevalence ratio (PR) by using bayesian log-binomial regression model and its application, we estimated the PR of medical care-seeking prevalence to caregivers' recognition of risk signs of diarrhea in their infants by using bayesian log-binomial regression model in Openbugs software. The results showed that caregivers' recognition of infant' s risk signs of diarrhea was associated significantly with a 13% increase of medical care-seeking. Meanwhile, we compared the differences in PR's point estimation and its interval estimation of medical care-seeking prevalence to caregivers' recognition of risk signs of diarrhea and convergence of three models (model 1: not adjusting for the covariates; model 2: adjusting for duration of caregivers' education, model 3: adjusting for distance between village and township and child month-age based on model 2) between bayesian log-binomial regression model and conventional log-binomial regression model. The results showed that all three bayesian log-binomial regression models were convergence and the estimated PRs were 1.130(95%CI: 1.005-1.265), 1.128(95%CI: 1.001-1.264) and 1.132(95%CI: 1.004-1.267), respectively. Conventional log-binomial regression model 1 and model 2 were convergence and their PRs were 1.130(95% CI: 1.055-1.206) and 1.126(95% CI: 1.051-1.203), respectively, but the model 3 was misconvergence, so COPY method was used to estimate PR, which was 1.125 (95%CI: 1.051-1.200). In addition, the point estimation and interval estimation of PRs from three bayesian log-binomial regression models differed slightly from those of PRs from conventional log-binomial regression model, but they had a good consistency in estimating PR. Therefore, bayesian log-binomial regression model can effectively estimate PR with less misconvergence and have more advantages in application compared with conventional log-binomial regression model.

  13. Tolerance bounds for log gamma regression models

    NASA Technical Reports Server (NTRS)

    Jones, R. A.; Scholz, F. W.; Ossiander, M.; Shorack, G. R.

    1985-01-01

    The present procedure for finding lower confidence bounds for the quantiles of Weibull populations, on the basis of the solution of a quadratic equation, is more accurate than current Monte Carlo tables and extends to any location-scale family. It is shown that this method is accurate for all members of the log gamma(K) family, where K = 1/2 to infinity, and works well for censored data, while also extending to regression data. An even more accurate procedure involving an approximation to the Lawless (1982) conditional procedure, with numerical integrations whose tables are independent of the data, is also presented. These methods are applied to the case of failure strengths of ceramic specimens from each of three billets of Si3N4, which have undergone flexural strength testing.

  14. A Model for Quadratic Outliers in Linear Regression.

    ERIC Educational Resources Information Center

    Elashoff, Janet Dixon; Elashoff, Robert M.

    This paper introduces a model for describing outliers (observations which are extreme in some sense or violate the apparent pattern of other observations) in linear regression which can be viewed as a mixture of a quadratic and a linear regression. The maximum likelihood estimators of the parameters in the model are derived and their asymptotic…

  15. An Importance Sampling EM Algorithm for Latent Regression Models

    ERIC Educational Resources Information Center

    von Davier, Matthias; Sinharay, Sandip

    2007-01-01

    Reporting methods used in large-scale assessments such as the National Assessment of Educational Progress (NAEP) rely on latent regression models. To fit the latent regression model using the maximum likelihood estimation technique, multivariate integrals must be evaluated. In the computer program MGROUP used by the Educational Testing Service for…

  16. Relative risk regression models with inverse polynomials.

    PubMed

    Ning, Yang; Woodward, Mark

    2013-08-30

    The proportional hazards model assumes that the log hazard ratio is a linear function of parameters. In the current paper, we model the log relative risk as an inverse polynomial, which is particularly suitable for modeling bounded and asymmetric functions. The parameters estimated by maximizing the partial likelihood are consistent and asymptotically normal. The advantages of the inverse polynomial model over the ordinary polynomial model and the fractional polynomial model for fitting various asymmetric log relative risk functions are shown by simulation. The utility of the method is further supported by analyzing two real data sets, addressing the specific question of the location of the minimum risk threshold.

  17. Objective Bayesian model selection for Cox regression.

    PubMed

    Held, Leonhard; Gravestock, Isaac; Sabanés Bové, Daniel

    2016-12-20

    There is now a large literature on objective Bayesian model selection in the linear model based on the g-prior. The methodology has been recently extended to generalized linear models using test-based Bayes factors. In this paper, we show that test-based Bayes factors can also be applied to the Cox proportional hazards model. If the goal is to select a single model, then both the maximum a posteriori and the median probability model can be calculated. For clinical prediction of survival, we shrink the model-specific log hazard ratio estimates with subsequent calculation of the Breslow estimate of the cumulative baseline hazard function. A Bayesian model average can also be employed. We illustrate the proposed methodology with the analysis of survival data on primary biliary cirrhosis patients and the development of a clinical prediction model for future cardiovascular events based on data from the Second Manifestations of ARTerial disease (SMART) cohort study. Cross-validation is applied to compare the predictive performance with alternative model selection approaches based on Harrell's c-Index, the calibration slope and the integrated Brier score. Finally, a novel application of Bayesian variable selection to optimal conditional prediction via landmarking is described. Copyright © 2016 John Wiley & Sons, Ltd.

  18. A Latent Transition Model with Logistic Regression

    ERIC Educational Resources Information Center

    Chung, Hwan; Walls, Theodore A.; Park, Yousung

    2007-01-01

    Latent transition models increasingly include covariates that predict prevalence of latent classes at a given time or transition rates among classes over time. In many situations, the covariate of interest may be latent. This paper describes an approach for handling both manifest and latent covariates in a latent transition model. A Bayesian…

  19. Rank-preserving regression: a more robust rank regression model against outliers.

    PubMed

    Chen, Tian; Kowalski, Jeanne; Chen, Rui; Wu, Pan; Zhang, Hui; Feng, Changyong; Tu, Xin M

    2016-08-30

    Mean-based semi-parametric regression models such as the popular generalized estimating equations are widely used to improve robustness of inference over parametric models. Unfortunately, such models are quite sensitive to outlying observations. The Wilcoxon-score-based rank regression (RR) provides more robust estimates over generalized estimating equations against outliers. However, the RR and its extensions do not sufficiently address missing data arising in longitudinal studies. In this paper, we propose a new approach to address outliers under a different framework based on the functional response models. This functional-response-model-based alternative not only addresses limitations of the RR and its extensions for longitudinal data, but, with its rank-preserving property, even provides more robust estimates than these alternatives. The proposed approach is illustrated with both real and simulated data. Copyright © 2016 John Wiley & Sons, Ltd.

  20. Symbolic regression of generative network models

    PubMed Central

    Menezes, Telmo; Roth, Camille

    2014-01-01

    Networks are a powerful abstraction with applicability to a variety of scientific fields. Models explaining their morphology and growth processes permit a wide range of phenomena to be more systematically analysed and understood. At the same time, creating such models is often challenging and requires insights that may be counter-intuitive. Yet there currently exists no general method to arrive at better models. We have developed an approach to automatically detect realistic decentralised network growth models from empirical data, employing a machine learning technique inspired by natural selection and defining a unified formalism to describe such models as computer programs. As the proposed method is completely general and does not assume any pre-existing models, it can be applied “out of the box” to any given network. To validate our approach empirically, we systematically rediscover pre-defined growth laws underlying several canonical network generation models and credible laws for diverse real-world networks. We were able to find programs that are simple enough to lead to an actual understanding of the mechanisms proposed, namely for a simple brain and a social network. PMID:25190000

  1. Simulation study comparing exposure matching with regression adjustment in an observational safety setting with group sequential monitoring.

    PubMed

    Stratton, Kelly G; Cook, Andrea J; Jackson, Lisa A; Nelson, Jennifer C

    2015-03-30

    Sequential methods are well established for randomized clinical trials (RCTs), and their use in observational settings has increased with the development of national vaccine and drug safety surveillance systems that monitor large healthcare databases. Observational safety monitoring requires that sequential testing methods be better equipped to incorporate confounder adjustment and accommodate rare adverse events. New methods designed specifically for observational surveillance include a group sequential likelihood ratio test that uses exposure matching and generalized estimating equations approach that involves regression adjustment. However, little is known about the statistical performance of these methods or how they compare to RCT methods in both observational and rare outcome settings. We conducted a simulation study to determine the type I error, power and time-to-surveillance-end of group sequential likelihood ratio test, generalized estimating equations and RCT methods that construct group sequential Lan-DeMets boundaries using data from a matched (group sequential Lan-DeMets-matching) or unmatched regression (group sequential Lan-DeMets-regression) setting. We also compared the methods using data from a multisite vaccine safety study. All methods had acceptable type I error, but regression methods were more powerful, faster at detecting true safety signals and less prone to implementation difficulties with rare events than exposure matching methods. Method performance also depended on the distribution of information and extent of confounding by site. Our results suggest that choice of sequential method, especially the confounder control strategy, is critical in rare event observational settings. These findings provide guidance for choosing methods in this context and, in particular, suggest caution when conducting exposure matching.

  2. A SEMIPARAMETRIC BAYESIAN MODEL FOR CIRCULAR-LINEAR REGRESSION

    EPA Science Inventory

    We present a Bayesian approach to regress a circular variable on a linear predictor. The regression coefficients are assumed to have a nonparametric distribution with a Dirichlet process prior. The semiparametric Bayesian approach gives added flexibility to the model and is usefu...

  3. Measures of Nonlinearity for Segmented Regression Models,

    DTIC Science & Technology

    1983-08-01

    daily consumption is constant at the baseload level a as long as the average outdoor temperature T is above a reference temperature T, and increases In...discussed below in Section 5. For the energy model, we will consider estimation of the reference temperature T, baseload a, and heating rate by the method of

  4. [Application of tobit regression models in modelling censored epidemiological variables].

    PubMed

    Bleda Hernández, M J; Tobías Garcés, A

    2002-01-01

    Many variables in epidemiological studies are continuous measures obtained by means of measurement equipments with detection limits, generating censored distributions. The censorship, opposite to the trucation, takes place for a defect of the data of the sample. The distribution of a censored variable is a mixture between a continuous and a categorical distributions. In this case, results from lineal regression models, by means of ordinary least squares, will provide biased estimates. With one only censorhip point the tobit model must be used, while with several censorship points this model's generalization should also be used. The illustration of these models is presented through the analysis of the levels of mercury measured in urine in the study about health effects of a municipal solid-waste incinerator in the county of Mataró (Spain).

  5. Complementary Log Regression for Sufficient-Cause Modeling of Epidemiologic Data.

    PubMed

    Lin, Jui-Hsiang; Lee, Wen-Chung

    2016-12-13

    The logistic regression model is the workhorse of epidemiological data analysis. The model helps to clarify the relationship between multiple exposures and a binary outcome. Logistic regression analysis is readily implemented using existing statistical software, and this has contributed to it becoming a routine procedure for epidemiologists. In this paper, the authors focus on a causal model which has recently received much attention from the epidemiologic community, namely, the sufficient-component cause model (causal-pie model). The authors show that the sufficient-component cause model is associated with a particular 'link' function: the complementary log link. In a complementary log regression, the exponentiated coefficient of a main-effect term corresponds to an adjusted 'peril ratio', and the coefficient of a cross-product term can be used directly to test for causal mechanistic interaction (sufficient-cause interaction). The authors provide detailed instructions on how to perform a complementary log regression using existing statistical software and use three datasets to illustrate the methodology. Complementary log regression is the model of choice for sufficient-cause analysis of binary outcomes. Its implementation is as easy as conventional logistic regression.

  6. Genomic selection using regularized linear regression models: ridge regression, lasso, elastic net and their extensions

    PubMed Central

    2012-01-01

    Background Genomic selection (GS) is emerging as an efficient and cost-effective method for estimating breeding values using molecular markers distributed over the entire genome. In essence, it involves estimating the simultaneous effects of all genes or chromosomal segments and combining the estimates to predict the total genomic breeding value (GEBV). Accurate prediction of GEBVs is a central and recurring challenge in plant and animal breeding. The existence of a bewildering array of approaches for predicting breeding values using markers underscores the importance of identifying approaches able to efficiently and accurately predict breeding values. Here, we comparatively evaluate the predictive performance of six regularized linear regression methods-- ridge regression, ridge regression BLUP, lasso, adaptive lasso, elastic net and adaptive elastic net-- for predicting GEBV using dense SNP markers. Methods We predicted GEBVs for a quantitative trait using a dataset on 3000 progenies of 20 sires and 200 dams and an accompanying genome consisting of five chromosomes with 9990 biallelic SNP-marker loci simulated for the QTL-MAS 2011 workshop. We applied all the six methods that use penalty-based (regularization) shrinkage to handle datasets with far more predictors than observations. The lasso, elastic net and their adaptive extensions further possess the desirable property that they simultaneously select relevant predictive markers and optimally estimate their effects. The regression models were trained with a subset of 2000 phenotyped and genotyped individuals and used to predict GEBVs for the remaining 1000 progenies without phenotypes. Predictive accuracy was assessed using the root mean squared error, the Pearson correlation between predicted GEBVs and (1) the true genomic value (TGV), (2) the true breeding value (TBV) and (3) the simulated phenotypic values based on fivefold cross-validation (CV). Results The elastic net, lasso, adaptive lasso and the

  7. A generalized multivariate regression model for modelling ocean wave heights

    NASA Astrophysics Data System (ADS)

    Wang, X. L.; Feng, Y.; Swail, V. R.

    2012-04-01

    In this study, a generalized multivariate linear regression model is developed to represent the relationship between 6-hourly ocean significant wave heights (Hs) and the corresponding 6-hourly mean sea level pressure (MSLP) fields. The model is calibrated using the ERA-Interim reanalysis of Hs and MSLP fields for 1981-2000, and is validated using the ERA-Interim reanalysis for 2001-2010 and ERA40 reanalysis of Hs and MSLP for 1958-2001. The performance of the fitted model is evaluated in terms of Pierce skill score, frequency bias index, and correlation skill score. Being not normally distributed, wave heights are subjected to a data adaptive Box-Cox transformation before being used in the model fitting. Also, since 6-hourly data are being modelled, lag-1 autocorrelation must be and is accounted for. The models with and without Box-Cox transformation, and with and without accounting for autocorrelation, are inter-compared in terms of their prediction skills. The fitted MSLP-Hs relationship is then used to reconstruct historical wave height climate from the 6-hourly MSLP fields taken from the Twentieth Century Reanalysis (20CR, Compo et al. 2011), and to project possible future wave height climates using CMIP5 model simulations of MSLP fields. The reconstructed and projected wave heights, both seasonal means and maxima, are subject to a trend analysis that allows for non-linear (polynomial) trends.

  8. Extension of the modified Poisson regression model to prospective studies with correlated binary data.

    PubMed

    Zou, G Y; Donner, Allan

    2013-12-01

    The Poisson regression model using a sandwich variance estimator has become a viable alternative to the logistic regression model for the analysis of prospective studies with independent binary outcomes. The primary advantage of this approach is that it readily provides covariate-adjusted risk ratios and associated standard errors. In this article, the model is extended to studies with correlated binary outcomes as arise in longitudinal or cluster randomization studies. The key step involves a cluster-level grouping strategy for the computation of the middle term in the sandwich estimator. For a single binary exposure variable without covariate adjustment, this approach results in risk ratio estimates and standard errors that are identical to those found in the survey sampling literature. Simulation results suggest that it is reliable for studies with correlated binary data, provided the total number of clusters is at least 50. Data from observational and cluster randomized studies are used to illustrate the methods.

  9. Regression Model Optimization for the Analysis of Experimental Data

    NASA Technical Reports Server (NTRS)

    Ulbrich, N.

    2009-01-01

    A candidate math model search algorithm was developed at Ames Research Center that determines a recommended math model for the multivariate regression analysis of experimental data. The search algorithm is applicable to classical regression analysis problems as well as wind tunnel strain gage balance calibration analysis applications. The algorithm compares the predictive capability of different regression models using the standard deviation of the PRESS residuals of the responses as a search metric. This search metric is minimized during the search. Singular value decomposition is used during the search to reject math models that lead to a singular solution of the regression analysis problem. Two threshold dependent constraints are also applied. The first constraint rejects math models with insignificant terms. The second constraint rejects math models with near-linear dependencies between terms. The math term hierarchy rule may also be applied as an optional constraint during or after the candidate math model search. The final term selection of the recommended math model depends on the regressor and response values of the data set, the user s function class combination choice, the user s constraint selections, and the result of the search metric minimization. A frequently used regression analysis example from the literature is used to illustrate the application of the search algorithm to experimental data.

  10. Comparison of regression modeling techniques for resource estimation

    NASA Technical Reports Server (NTRS)

    Card, D. N.

    1983-01-01

    The development and validation of resource utilization models is an active area of software engineering research. Regression analysis is the principal tool employed in these studies. However, little attention was given to determining which of the various regression methods available is the most appropriate. The objective of the study is to compare three alternative regession procedures by examining the results of their application to one commonly accepted equations for resource estimation. The data studied was summarized, the resource estimation equation was described, the regression procedures were explained, and the results obtained from the proceures were compared.

  11. Analysis of Sting Balance Calibration Data Using Optimized Regression Models

    NASA Technical Reports Server (NTRS)

    Ulbrich, N.; Bader, Jon B.

    2010-01-01

    Calibration data of a wind tunnel sting balance was processed using a candidate math model search algorithm that recommends an optimized regression model for the data analysis. During the calibration the normal force and the moment at the balance moment center were selected as independent calibration variables. The sting balance itself had two moment gages. Therefore, after analyzing the connection between calibration loads and gage outputs, it was decided to choose the difference and the sum of the gage outputs as the two responses that best describe the behavior of the balance. The math model search algorithm was applied to these two responses. An optimized regression model was obtained for each response. Classical strain gage balance load transformations and the equations of the deflection of a cantilever beam under load are used to show that the search algorithm s two optimized regression models are supported by a theoretical analysis of the relationship between the applied calibration loads and the measured gage outputs. The analysis of the sting balance calibration data set is a rare example of a situation when terms of a regression model of a balance can directly be derived from first principles of physics. In addition, it is interesting to note that the search algorithm recommended the correct regression model term combinations using only a set of statistical quality metrics that were applied to the experimental data during the algorithm s term selection process.

  12. A regularization corrected score method for nonlinear regression models with covariate error.

    PubMed

    Zucker, David M; Gorfine, Malka; Li, Yi; Tadesse, Mahlet G; Spiegelman, Donna

    2013-03-01

    Many regression analyses involve explanatory variables that are measured with error, and failing to account for this error is well known to lead to biased point and interval estimates of the regression coefficients. We present here a new general method for adjusting for covariate error. Our method consists of an approximate version of the Stefanski-Nakamura corrected score approach, using the method of regularization to obtain an approximate solution of the relevant integral equation. We develop the theory in the setting of classical likelihood models; this setting covers, for example, linear regression, nonlinear regression, logistic regression, and Poisson regression. The method is extremely general in terms of the types of measurement error models covered, and is a functional method in the sense of not involving assumptions on the distribution of the true covariate. We discuss the theoretical properties of the method and present simulation results in the logistic regression setting (univariate and multivariate). For illustration, we apply the method to data from the Harvard Nurses' Health Study concerning the relationship between physical activity and breast cancer mortality in the period following a diagnosis of breast cancer.

  13. Penalized spline estimation for functional coefficient regression models.

    PubMed

    Cao, Yanrong; Lin, Haiqun; Wu, Tracy Z; Yu, Yan

    2010-04-01

    The functional coefficient regression models assume that the regression coefficients vary with some "threshold" variable, providing appreciable flexibility in capturing the underlying dynamics in data and avoiding the so-called "curse of dimensionality" in multivariate nonparametric estimation. We first investigate the estimation, inference, and forecasting for the functional coefficient regression models with dependent observations via penalized splines. The P-spline approach, as a direct ridge regression shrinkage type global smoothing method, is computationally efficient and stable. With established fixed-knot asymptotics, inference is readily available. Exact inference can be obtained for fixed smoothing parameter λ, which is most appealing for finite samples. Our penalized spline approach gives an explicit model expression, which also enables multi-step-ahead forecasting via simulations. Furthermore, we examine different methods of choosing the important smoothing parameter λ: modified multi-fold cross-validation (MCV), generalized cross-validation (GCV), and an extension of empirical bias bandwidth selection (EBBS) to P-splines. In addition, we implement smoothing parameter selection using mixed model framework through restricted maximum likelihood (REML) for P-spline functional coefficient regression models with independent observations. The P-spline approach also easily allows different smoothness for different functional coefficients, which is enabled by assigning different penalty λ accordingly. We demonstrate the proposed approach by both simulation examples and a real data application.

  14. Optimization of Regression Models of Experimental Data Using Confirmation Points

    NASA Technical Reports Server (NTRS)

    Ulbrich, N.

    2010-01-01

    A new search metric is discussed that may be used to better assess the predictive capability of different math term combinations during the optimization of a regression model of experimental data. The new search metric can be determined for each tested math term combination if the given experimental data set is split into two subsets. The first subset consists of data points that are only used to determine the coefficients of the regression model. The second subset consists of confirmation points that are exclusively used to test the regression model. The new search metric value is assigned after comparing two values that describe the quality of the fit of each subset. The first value is the standard deviation of the PRESS residuals of the data points. The second value is the standard deviation of the response residuals of the confirmation points. The greater of the two values is used as the new search metric value. This choice guarantees that both standard deviations are always less or equal to the value that is used during the optimization. Experimental data from the calibration of a wind tunnel strain-gage balance is used to illustrate the application of the new search metric. The new search metric ultimately generates an optimized regression model that was already tested at regression model independent confirmation points before it is ever used to predict an unknown response from a set of regressors.

  15. copCAR: A Flexible Regression Model for Areal Data.

    PubMed

    Hughes, John

    2015-09-16

    Non-Gaussian spatial data are common in many fields. When fitting regressions for such data, one needs to account for spatial dependence to ensure reliable inference for the regression coefficients. The two most commonly used regression models for spatially aggregated data are the automodel and the areal generalized linear mixed model (GLMM). These models induce spatial dependence in different ways but share the smoothing approach, which is intuitive but problematic. This article develops a new regression model for areal data. The new model is called copCAR because it is copula-based and employs the areal GLMM's conditional autoregression (CAR). copCAR overcomes many of the drawbacks of the automodel and the areal GLMM. Specifically, copCAR (1) is flexible and intuitive, (2) permits positive spatial dependence for all types of data, (3) permits efficient computation, and (4) provides reliable spatial regression inference and information about dependence strength. An implementation is provided by R package copCAR, which is available from the Comprehensive R Archive Network, and supplementary materials are available online.

  16. A zero-augmented generalized gamma regression calibration to adjust for covariate measurement error: A case of an episodically consumed dietary intake.

    PubMed

    Agogo, George O

    2017-01-01

    Measurement error in exposure variables is a serious impediment in epidemiological studies that relate exposures to health outcomes. In nutritional studies, interest could be in the association between long-term dietary intake and disease occurrence. Long-term intake is usually assessed with food frequency questionnaire (FFQ), which is prone to recall bias. Measurement error in FFQ-reported intakes leads to bias in parameter estimate that quantifies the association. To adjust for bias in the association, a calibration study is required to obtain unbiased intake measurements using a short-term instrument such as 24-hour recall (24HR). The 24HR intakes are used as response in regression calibration to adjust for bias in the association. For foods not consumed daily, 24HR-reported intakes are usually characterized by excess zeroes, right skewness, and heteroscedasticity posing serious challenge in regression calibration modeling. We proposed a zero-augmented calibration model to adjust for measurement error in reported intake, while handling excess zeroes, skewness, and heteroscedasticity simultaneously without transforming 24HR intake values. We compared the proposed calibration method with the standard method and with methods that ignore measurement error by estimating long-term intake with 24HR and FFQ-reported intakes. The comparison was done in real and simulated datasets. With the 24HR, the mean increase in mercury level per ounce fish intake was about 0.4; with the FFQ intake, the increase was about 1.2. With both calibration methods, the mean increase was about 2.0. Similar trend was observed in the simulation study. In conclusion, the proposed calibration method performs at least as good as the standard method.

  17. Evaluation of land use regression models in Detroit, Michigan

    EPA Science Inventory

    Introduction: Land use regression (LUR) models have emerged as a cost-effective tool for characterizing exposure in epidemiologic health studies. However, little critical attention has been focused on validation of these models as a step toward temporal and spatial extension of ...

  18. Analysis of Sting Balance Calibration Data Using Optimized Regression Models

    NASA Technical Reports Server (NTRS)

    Ulbrich, Norbert; Bader, Jon B.

    2009-01-01

    Calibration data of a wind tunnel sting balance was processed using a search algorithm that identifies an optimized regression model for the data analysis. The selected sting balance had two moment gages that were mounted forward and aft of the balance moment center. The difference and the sum of the two gage outputs were fitted in the least squares sense using the normal force and the pitching moment at the balance moment center as independent variables. The regression model search algorithm predicted that the difference of the gage outputs should be modeled using the intercept and the normal force. The sum of the two gage outputs, on the other hand, should be modeled using the intercept, the pitching moment, and the square of the pitching moment. Equations of the deflection of a cantilever beam are used to show that the search algorithm s two recommended math models can also be obtained after performing a rigorous theoretical analysis of the deflection of the sting balance under load. The analysis of the sting balance calibration data set is a rare example of a situation when regression models of balance calibration data can directly be derived from first principles of physics and engineering. In addition, it is interesting to see that the search algorithm recommended the same regression models for the data analysis using only a set of statistical quality metrics.

  19. Sugarcane Land Classification with Satellite Imagery using Logistic Regression Model

    NASA Astrophysics Data System (ADS)

    Henry, F.; Herwindiati, D. E.; Mulyono, S.; Hendryli, J.

    2017-03-01

    This paper discusses the classification of sugarcane plantation area from Landsat-8 satellite imagery. The classification process uses binary logistic regression method with time series data of normalized difference vegetation index as input. The process is divided into two steps: training and classification. The purpose of training step is to identify the best parameter of the regression model using gradient descent algorithm. The best fit of the model can be utilized to classify sugarcane and non-sugarcane area. The experiment shows high accuracy and successfully maps the sugarcane plantation area which obtained best result of Cohen’s Kappa value 0.7833 (strong) with 89.167% accuracy.

  20. Testing a theoretical model for examining the relationship between family adjustment and expatriates' work adjustment.

    PubMed

    Caligiuri, P M; Hyland, M M; Joshi, A; Bross, A S

    1998-08-01

    Based on theoretical perspectives from the work/family literature, this study tested a model for examining expatriate families' adjustment while on global assignments as an antecedent to expatriates' adjustment to working in a host country. Data were collected from 110 families that had been relocated for global assignments. Longitudinal data, assessing family characteristics before the assignment and cross-cultural adjustment approximately 6 months into the assignment, were coded. This study found that family characteristics (family support, family communication, family adaptability) were related to expatriates' adjustment to working in the host country. As hypothesized, the families' cross-cultural adjustment mediated the effect of family characteristics on expatriates' host-country work adjustment.

  1. Alternatives for logistic regression in cross-sectional studies: an empirical comparison of models that directly estimate the prevalence ratio

    PubMed Central

    Barros, Aluísio JD; Hirakata, Vânia N

    2003-01-01

    Background Cross-sectional studies with binary outcomes analyzed by logistic regression are frequent in the epidemiological literature. However, the odds ratio can importantly overestimate the prevalence ratio, the measure of choice in these studies. Also, controlling for confounding is not equivalent for the two measures. In this paper we explore alternatives for modeling data of such studies with techniques that directly estimate the prevalence ratio. Methods We compared Cox regression with constant time at risk, Poisson regression and log-binomial regression against the standard Mantel-Haenszel estimators. Models with robust variance estimators in Cox and Poisson regressions and variance corrected by the scale parameter in Poisson regression were also evaluated. Results Three outcomes, from a cross-sectional study carried out in Pelotas, Brazil, with different levels of prevalence were explored: weight-for-age deficit (4%), asthma (31%) and mother in a paid job (52%). Unadjusted Cox/Poisson regression and Poisson regression with scale parameter adjusted by deviance performed worst in terms of interval estimates. Poisson regression with scale parameter adjusted by χ2 showed variable performance depending on the outcome prevalence. Cox/Poisson regression with robust variance, and log-binomial regression performed equally well when the model was correctly specified. Conclusions Cox or Poisson regression with robust variance and log-binomial regression provide correct estimates and are a better alternative for the analysis of cross-sectional studies with binary outcomes than logistic regression, since the prevalence ratio is more interpretable and easier to communicate to non-specialists than the odds ratio. However, precautions are needed to avoid estimation problems in specific situations. PMID:14567763

  2. Direction of Effects in Multiple Linear Regression Models.

    PubMed

    Wiedermann, Wolfgang; von Eye, Alexander

    2015-01-01

    Previous studies analyzed asymmetric properties of the Pearson correlation coefficient using higher than second order moments. These asymmetric properties can be used to determine the direction of dependence in a linear regression setting (i.e., establish which of two variables is more likely to be on the outcome side) within the framework of cross-sectional observational data. Extant approaches are restricted to the bivariate regression case. The present contribution extends the direction of dependence methodology to a multiple linear regression setting by analyzing distributional properties of residuals of competing multiple regression models. It is shown that, under certain conditions, the third central moments of estimated regression residuals can be used to decide upon direction of effects. In addition, three different approaches for statistical inference are discussed: a combined D'Agostino normality test, a skewness difference test, and a bootstrap difference test. Type I error and power of the procedures are assessed using Monte Carlo simulations, and an empirical example is provided for illustrative purposes. In the discussion, issues concerning the quality of psychological data, possible extensions of the proposed methods to the fourth central moment of regression residuals, and potential applications are addressed.

  3. Maximum Entropy Discrimination Poisson Regression for Software Reliability Modeling.

    PubMed

    Chatzis, Sotirios P; Andreou, Andreas S

    2015-11-01

    Reliably predicting software defects is one of the most significant tasks in software engineering. Two of the major components of modern software reliability modeling approaches are: 1) extraction of salient features for software system representation, based on appropriately designed software metrics and 2) development of intricate regression models for count data, to allow effective software reliability data modeling and prediction. Surprisingly, research in the latter frontier of count data regression modeling has been rather limited. More specifically, a lack of simple and efficient algorithms for posterior computation has made the Bayesian approaches appear unattractive, and thus underdeveloped in the context of software reliability modeling. In this paper, we try to address these issues by introducing a novel Bayesian regression model for count data, based on the concept of max-margin data modeling, effected in the context of a fully Bayesian model treatment with simple and efficient posterior distribution updates. Our novel approach yields a more discriminative learning technique, making more effective use of our training data during model inference. In addition, it allows of better handling uncertainty in the modeled data, which can be a significant problem when the training data are limited. We derive elegant inference algorithms for our model under the mean-field paradigm and exhibit its effectiveness using the publicly available benchmark data sets.

  4. Detecting influential observations in nonlinear regression modeling of groundwater flow

    USGS Publications Warehouse

    Yager, R.M.

    1998-01-01

    Nonlinear regression is used to estimate optimal parameter values in models of groundwater flow to ensure that differences between predicted and observed heads and flows do not result from nonoptimal parameter values. Parameter estimates can be affected, however, by observations that disproportionately influence the regression, such as outliers that exert undue leverage on the objective function. Certain statistics developed for linear regression can be used to detect influential observations in nonlinear regression if the models are approximately linear. This paper discusses the application of Cook's D, which measures the effect of omitting a single observation on a set of estimated parameter values, and the statistical parameter DFBETAS, which quantifies the influence of an observation on each parameter. The influence statistics were used to (1) identify the influential observations in the calibration of a three-dimensional, groundwater flow model of a fractured-rock aquifer through nonlinear regression, and (2) quantify the effect of omitting influential observations on the set of estimated parameter values. Comparison of the spatial distribution of Cook's D with plots of model sensitivity shows that influential observations correspond to areas where the model heads are most sensitive to certain parameters, and where predicted groundwater flow rates are largest. Five of the six discharge observations were identified as influential, indicating that reliable measurements of groundwater flow rates are valuable data in model calibration. DFBETAS are computed and examined for an alternative model of the aquifer system to identify a parameterization error in the model design that resulted in overestimation of the effect of anisotropy on horizontal hydraulic conductivity.

  5. Spatial stochastic regression modelling of urban land use

    NASA Astrophysics Data System (ADS)

    Arshad, S. H. M.; Jaafar, J.; Abiden, M. Z. Z.; Latif, Z. A.; Rasam, A. R. A.

    2014-02-01

    Urbanization is very closely linked to industrialization, commercialization or overall economic growth and development. This results in innumerable benefits of the quantity and quality of the urban environment and lifestyle but on the other hand contributes to unbounded development, urban sprawl, overcrowding and decreasing standard of living. Regulation and observation of urban development activities is crucial. The understanding of urban systems that promotes urban growth are also essential for the purpose of policy making, formulating development strategies as well as development plan preparation. This study aims to compare two different stochastic regression modeling techniques for spatial structure models of urban growth in the same specific study area. Both techniques will utilize the same datasets and their results will be analyzed. The work starts by producing an urban growth model by using stochastic regression modeling techniques namely the Ordinary Least Square (OLS) and Geographically Weighted Regression (GWR). The two techniques are compared to and it is found that, GWR seems to be a more significant stochastic regression model compared to OLS, it gives a smaller AICc (Akaike's Information Corrected Criterion) value and its output is more spatially explainable.

  6. A Study of Perfectionism, Attachment, and College Student Adjustment: Testing Mediational Models.

    ERIC Educational Resources Information Center

    Hood, Camille A.; Kubal, Anne E.; Pfaller, Joan; Rice, Kenneth G.

    Mediational models predicting college students' adjustment were tested using regression analyses. Contemporary adult attachment theory was employed to explore the cognitive/affective mechanisms by which adult attachment and perfectionism affect various aspects of psychological functioning. Consistent with theoretical expectations, results…

  7. Analyzing industrial energy use through ordinary least squares regression models

    NASA Astrophysics Data System (ADS)

    Golden, Allyson Katherine

    Extensive research has been performed using regression analysis and calibrated simulations to create baseline energy consumption models for residential buildings and commercial institutions. However, few attempts have been made to discuss the applicability of these methodologies to establish baseline energy consumption models for industrial manufacturing facilities. In the few studies of industrial facilities, the presented linear change-point and degree-day regression analyses illustrate ideal cases. It follows that there is a need in the established literature to discuss the methodologies and to determine their applicability for establishing baseline energy consumption models of industrial manufacturing facilities. The thesis determines the effectiveness of simple inverse linear statistical regression models when establishing baseline energy consumption models for industrial manufacturing facilities. Ordinary least squares change-point and degree-day regression methods are used to create baseline energy consumption models for nine different case studies of industrial manufacturing facilities located in the southeastern United States. The influence of ambient dry-bulb temperature and production on total facility energy consumption is observed. The energy consumption behavior of industrial manufacturing facilities is only sometimes sufficiently explained by temperature, production, or a combination of the two variables. This thesis also provides methods for generating baseline energy models that are straightforward and accessible to anyone in the industrial manufacturing community. The methods outlined in this thesis may be easily replicated by anyone that possesses basic spreadsheet software and general knowledge of the relationship between energy consumption and weather, production, or other influential variables. With the help of simple inverse linear regression models, industrial manufacturing facilities may better understand their energy consumption and

  8. Modeling energy expenditure in children and adolescents using quantile regression

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Advanced mathematical models have the potential to capture the complex metabolic and physiological processes that result in energy expenditure (EE). Study objective is to apply quantile regression (QR) to predict EE and determine quantile-dependent variation in covariate effects in nonobese and obes...

  9. REGRESSION MODELS OF RESIDENTIAL EXPOSURE TO CHLORPYRIFOS AND DIAZINON

    EPA Science Inventory

    This study examines the ability of regression models to predict residential exposures to chlorpyrifos and diazinon, based on the information from the NHEXAS-AZ database. The robust method was used to generate "fill-in" values for samples that are below the detection l...

  10. Estimation of the Regression Effect Using a Latent Trait Model.

    ERIC Educational Resources Information Center

    Quinn, Jimmy L.

    A logistic model was used to generate data to serve as a proxy for an immediate retest from item responses to a fourth grade standardized reading comprehension test of 45 items. Assuming that the actual test may be considered a pretest and the proxy data may be considered a retest, the effect of regression was investigated using a percentage of…

  11. Regression models for mixed Poisson and continuous longitudinal data.

    PubMed

    Yang, Ying; Kang, Jian; Mao, Kai; Zhang, Jie

    2007-09-10

    In this article we develop flexible regression models in two respects to evaluate the influence of the covariate variables on the mixed Poisson and continuous responses and to evaluate how the correlation between Poisson response and continuous response changes over time. A scenario for dealing with regression models of mixed continuous and Poisson responses when the heterogeneous variance and correlation changing over time exist is proposed. Our general approach is first to jointly build marginal model and to check whether the variance and correlation change over time via likelihood ratio test. If the variance and correlation change over time, we will do a suitable data transformation to properly evaluate the influence of the covariates on the mixed responses. The proposed methods are applied to the interstitial cystitis data base (ICDB) cohort study, and we find that the positive correlations significantly change over time, which suggests heterogeneous variances should not be ignored in modelling and inference.

  12. Bayesian Isotonic Regression Dose-response (BIRD) Model.

    PubMed

    Li, Wen; Fu, Haoda

    2016-12-21

    Understanding dose-response relationship is a crucial step in drug development. There are a few parametric methods to estimate dose-response curves, such as the Emax model and the logistic model. These parametric models are easy to interpret and, hence, widely used. However, these models often require the inclusion of patients on high-dose levels; otherwise, the model parameters cannot be reliably estimated. To have robust estimation, nonparametric models are used. However, these models are not able to estimate certain important clinical parameters, such as ED50 and Emax. Furthermore, in many therapeutic areas, dose-response curves can be assumed as non-decreasing functions. This creates an additional challenge for nonparametric methods. In this paper, we propose a new Bayesian isotonic regression dose-response model which features advantages from both parametric and nonparametric models. The ED50 and Emax can be derived from this model. Simulations are provided to evaluate the Bayesian isotonic regression dose-response model performance against two parametric models. We apply this model to a data set from a diabetes dose-finding study.

  13. Effect of flux adjustments on temperature variability in climate models

    NASA Astrophysics Data System (ADS)

    CMIP investigators; Duffy, P. B.; Bell, J.; Covey, C.; Sloan, L.

    2000-03-01

    It has been suggested that “flux adjustments” in climate models suppress simulated temperature variability. If true, this might invalidate the conclusion that at least some of observed temperature increases since 1860 are anthropogenic, since this conclusion is based in part on estimates of natural temperature variability derived from flux-adjusted models. We assess variability of surface air temperatures in 17 simulations of internal temperature variability submitted to the Coupled Model Intercomparison Project. By comparing variability in flux-adjusted vs. non-flux adjusted simulations, we find no evidence that flux adjustments suppress temperature variability in climate models; other, largely unknown, factors are much more important in determining simulated temperature variability. Therefore the conclusion that at least some of observed temperature increases are anthropogenic cannot be questioned on the grounds that it is based in part on results of flux-adjusted models. Also, reducing or eliminating flux adjustments would probably do little to improve simulations of temperature variability.

  14. Batch Mode Active Learning for Regression With Expected Model Change.

    PubMed

    Cai, Wenbin; Zhang, Muhan; Zhang, Ya

    2016-04-20

    While active learning (AL) has been widely studied for classification problems, limited efforts have been done on AL for regression. In this paper, we introduce a new AL framework for regression, expected model change maximization (EMCM), which aims at choosing the unlabeled data instances that result in the maximum change of the current model once labeled. The model change is quantified as the difference between the current model parameters and the updated parameters after the inclusion of the newly selected examples. In light of the stochastic gradient descent learning rule, we approximate the change as the gradient of the loss function with respect to each single candidate instance. Under the EMCM framework, we propose novel AL algorithms for the linear and nonlinear regression models. In addition, by simulating the behavior of the sequential AL policy when applied for k iterations, we further extend the algorithms to batch mode AL to simultaneously choose a set of k most informative instances at each query time. Extensive experimental results on both UCI and StatLib benchmark data sets have demonstrated that the proposed algorithms are highly effective and efficient.

  15. A mathematical model of tumour angiogenesis: growth, regression and regrowth.

    PubMed

    Vilanova, Guillermo; Colominas, Ignasi; Gomez, Hector

    2017-01-01

    Cancerous tumours have the ability to recruit new blood vessels through a process called angiogenesis. By stimulating vascular growth, tumours get connected to the circulatory system, receive nutrients and open a way to colonize distant organs. Tumour-induced vascular networks become unstable in the absence of tumour angiogenic factors (TAFs). They may undergo alternating stages of growth, regression and regrowth. Following a phase-field methodology, we propose a model of tumour angiogenesis that reproduces the aforementioned features and highlights the importance of vascular regression and regrowth. In contrast with previous theories which focus on vessel remodelling due to the absence of flow, we model an alternative regression mechanism based on the dependency of tumour-induced vascular networks on TAFs. The model captures capillaries at full scale, the plastic dynamics of tumour-induced vessel networks at long time scales, and shows the key role played by filopodia during angiogenesis. The predictions of our model are in agreement with in vivo experiments and may prove useful for the design of antiangiogenic therapies.

  16. Procedure for Detecting Outliers in a Circular Regression Model

    PubMed Central

    Rambli, Adzhar; Abuzaid, Ali H. M.; Mohamed, Ibrahim Bin; Hussin, Abdul Ghapor

    2016-01-01

    A number of circular regression models have been proposed in the literature. In recent years, there is a strong interest shown on the subject of outlier detection in circular regression. An outlier detection procedure can be developed by defining a new statistic in terms of the circular residuals. In this paper, we propose a new measure which transforms the circular residuals into linear measures using a trigonometric function. We then employ the row deletion approach to identify observations that affect the measure the most, a candidate of outlier. The corresponding cut-off points and the performance of the detection procedure when applied on Down and Mardia’s model are studied via simulations. For illustration, we apply the procedure on circadian data. PMID:27064566

  17. Modeling energy expenditure in children and adolescents using quantile regression.

    PubMed

    Yang, Yunwen; Adolph, Anne L; Puyau, Maurice R; Vohra, Firoz A; Butte, Nancy F; Zakeri, Issa F

    2013-07-15

    Advanced mathematical models have the potential to capture the complex metabolic and physiological processes that result in energy expenditure (EE). Study objective is to apply quantile regression (QR) to predict EE and determine quantile-dependent variation in covariate effects in nonobese and obese children. First, QR models will be developed to predict minute-by-minute awake EE at different quantile levels based on heart rate (HR) and physical activity (PA) accelerometry counts, and child characteristics of age, sex, weight, and height. Second, the QR models will be used to evaluate the covariate effects of weight, PA, and HR across the conditional EE distribution. QR and ordinary least squares (OLS) regressions are estimated in 109 children, aged 5-18 yr. QR modeling of EE outperformed OLS regression for both nonobese and obese populations. Average prediction errors for QR compared with OLS were not only smaller at the median τ = 0.5 (18.6 vs. 21.4%), but also substantially smaller at the tails of the distribution (10.2 vs. 39.2% at τ = 0.1 and 8.7 vs. 19.8% at τ = 0.9). Covariate effects of weight, PA, and HR on EE for the nonobese and obese children differed across quantiles (P < 0.05). The associations (linear and quadratic) between PA and HR with EE were stronger for the obese than nonobese population (P < 0.05). In conclusion, QR provided more accurate predictions of EE compared with conventional OLS regression, especially at the tails of the distribution, and revealed substantially different covariate effects of weight, PA, and HR on EE in nonobese and obese children.

  18. Logistic Regression Model on Antenna Control Unit Autotracking Mode

    DTIC Science & Technology

    2015-10-20

    412TW-PA-15240 Logistic Regression Model on Antenna Control Unit Autotracking Mode DANIEL T. LAIRD AIR FORCE TEST CENTER EDWARDS AFB, CA...OCTOBER 20 2015 4 1 2 T W Approved for public release; distribution is unlimited. 412TW-PA-15240 AIR FORCE TEST ...unlimited. 13. SUPPLEMENTARY NOTES CA: Air Force Test Center Edwards AFB CA CC: 012100 14. ABSTRACT Over the past several years

  19. Stepwise multiple regression method of greenhouse gas emission modeling in the energy sector in Poland.

    PubMed

    Kolasa-Wiecek, Alicja

    2015-04-01

    The energy sector in Poland is the source of 81% of greenhouse gas (GHG) emissions. Poland, among other European Union countries, occupies a leading position with regard to coal consumption. Polish energy sector actively participates in efforts to reduce GHG emissions to the atmosphere, through a gradual decrease of the share of coal in the fuel mix and development of renewable energy sources. All evidence which completes the knowledge about issues related to GHG emissions is a valuable source of information. The article presents the results of modeling of GHG emissions which are generated by the energy sector in Poland. For a better understanding of the quantitative relationship between total consumption of primary energy and greenhouse gas emission, multiple stepwise regression model was applied. The modeling results of CO2 emissions demonstrate a high relationship (0.97) with the hard coal consumption variable. Adjustment coefficient of the model to actual data is high and equal to 95%. The backward step regression model, in the case of CH4 emission, indicated the presence of hard coal (0.66), peat and fuel wood (0.34), solid waste fuels, as well as other sources (-0.64) as the most important variables. The adjusted coefficient is suitable and equals R2=0.90. For N2O emission modeling the obtained coefficient of determination is low and equal to 43%. A significant variable influencing the amount of N2O emission is the peat and wood fuel consumption.

  20. Improving regression-model-based streamwater constituent load estimates derived from serially correlated data

    NASA Astrophysics Data System (ADS)

    Aulenbach, Brent T.

    2013-10-01

    A regression-model based approach is a commonly used, efficient method for estimating streamwater constituent load when there is a relationship between streamwater constituent concentration and continuous variables such as streamwater discharge, season and time. A subsetting experiment using a 30-year dataset of daily suspended sediment observations from the Mississippi River at Thebes, Illinois, was performed to determine optimal sampling frequency, model calibration period length, and regression model methodology, as well as to determine the effect of serial correlation of model residuals on load estimate precision. Two regression-based methods were used to estimate streamwater loads, the Adjusted Maximum Likelihood Estimator (AMLE), and the composite method, a hybrid load estimation approach. While both methods accurately and precisely estimated loads at the model's calibration period time scale, precisions were progressively worse at shorter reporting periods, from annually to monthly. Serial correlation in model residuals resulted in observed AMLE precision to be significantly worse than the model calculated standard errors of prediction. The composite method effectively improved upon AMLE loads for shorter reporting periods, but required a sampling interval of at least 15-days or shorter, when the serial correlations in the observed load residuals were greater than 0.15. AMLE precision was better at shorter sampling intervals and when using the shortest model calibration periods, such that the regression models better fit the temporal changes in the concentration-discharge relationship. The models with the largest errors typically had poor high flow sampling coverage resulting in unrepresentative models. Increasing sampling frequency and/or targeted high flow sampling are more efficient approaches to ensure sufficient sampling and to avoid poorly performing models, than increasing calibration period length.

  1. Transferability and Generalizability of Regression Models of Ultrafine Particles in Urban Neighborhoods in the Boston Area

    PubMed Central

    2015-01-01

    Land use regression (LUR) models have been used to assess air pollutant exposure, but limited evidence exists on whether location-specific LUR models are applicable to other locations (transferability) or general models are applicable to smaller areas (generalizability). We tested transferability and generalizability of spatial-temporal LUR models of hourly particle number concentration (PNC) for Boston-area (MA, U.S.A.) urban neighborhoods near Interstate 93. Four neighborhood-specific regression models and one Boston-area model were developed from mobile monitoring measurements (34–46 days/neighborhood over one year each). Transferability was tested by applying each neighborhood-specific model to the other neighborhoods; generalizability was tested by applying the Boston-area model to each neighborhood. Both the transferability and generalizability of models were tested with and without neighborhood-specific calibration. Important PNC predictors (adjusted-R2 = 0.24–0.43) included wind speed and direction, temperature, highway traffic volume, and distance from the highway edge. Direct model transferability was poor (R2 < 0.17). Locally-calibrated transferred models (R2 = 0.19–0.40) and the Boston-area model (adjusted-R2 = 0.26, range: 0.13–0.30) performed similarly to neighborhood-specific models; however, some coefficients of locally calibrated transferred models were uninterpretable. Our results show that transferability of neighborhood-specific LUR models of hourly PNC was limited, but that a general model performed acceptably in multiple areas when calibrated with local data. PMID:25867675

  2. A new approach in regression analysis for modeling adsorption isotherms.

    PubMed

    Marković, Dana D; Lekić, Branislava M; Rajaković-Ognjanović, Vladana N; Onjia, Antonije E; Rajaković, Ljubinka V

    2014-01-01

    Numerous regression approaches to isotherm parameters estimation appear in the literature. The real insight into the proper modeling pattern can be achieved only by testing methods on a very big number of cases. Experimentally, it cannot be done in a reasonable time, so the Monte Carlo simulation method was applied. The objective of this paper is to introduce and compare numerical approaches that involve different levels of knowledge about the noise structure of the analytical method used for initial and equilibrium concentration determination. Six levels of homoscedastic noise and five types of heteroscedastic noise precision models were considered. Performance of the methods was statistically evaluated based on median percentage error and mean absolute relative error in parameter estimates. The present study showed a clear distinction between two cases. When equilibrium experiments are performed only once, for the homoscedastic case, the winning error function is ordinary least squares, while for the case of heteroscedastic noise the use of orthogonal distance regression or Margart's percent standard deviation is suggested. It was found that in case when experiments are repeated three times the simple method of weighted least squares performed as well as more complicated orthogonal distance regression method.

  3. A New Approach in Regression Analysis for Modeling Adsorption Isotherms

    PubMed Central

    Onjia, Antonije E.

    2014-01-01

    Numerous regression approaches to isotherm parameters estimation appear in the literature. The real insight into the proper modeling pattern can be achieved only by testing methods on a very big number of cases. Experimentally, it cannot be done in a reasonable time, so the Monte Carlo simulation method was applied. The objective of this paper is to introduce and compare numerical approaches that involve different levels of knowledge about the noise structure of the analytical method used for initial and equilibrium concentration determination. Six levels of homoscedastic noise and five types of heteroscedastic noise precision models were considered. Performance of the methods was statistically evaluated based on median percentage error and mean absolute relative error in parameter estimates. The present study showed a clear distinction between two cases. When equilibrium experiments are performed only once, for the homoscedastic case, the winning error function is ordinary least squares, while for the case of heteroscedastic noise the use of orthogonal distance regression or Margart's percent standard deviation is suggested. It was found that in case when experiments are repeated three times the simple method of weighted least squares performed as well as more complicated orthogonal distance regression method. PMID:24672394

  4. Comparison of a Bayesian Network with a Logistic Regression Model to Forecast IgA Nephropathy

    PubMed Central

    Ducher, Michel; Kalbacher, Emilie; Combarnous, François; Finaz de Vilaine, Jérome; McGregor, Brigitte; Fouque, Denis; Fauvel, Jean Pierre

    2013-01-01

    Models are increasingly used in clinical practice to improve the accuracy of diagnosis. The aim of our work was to compare a Bayesian network to logistic regression to forecast IgA nephropathy (IgAN) from simple clinical and biological criteria. Retrospectively, we pooled the results of all biopsies (n = 155) performed by nephrologists in a specialist clinical facility between 2002 and 2009. Two groups were constituted at random. The first subgroup was used to determine the parameters of the models adjusted to data by logistic regression or Bayesian network, and the second was used to compare the performances of the models using receiver operating characteristics (ROC) curves. IgAN was found (on pathology) in 44 patients. Areas under the ROC curves provided by both methods were highly significant but not different from each other. Based on the highest Youden indices, sensitivity reached (100% versus 67%) and specificity (73% versus 95%) using the Bayesian network and logistic regression, respectively. A Bayesian network is at least as efficient as logistic regression to estimate the probability of a patient suffering IgAN, using simple clinical and biological data obtained during consultation. PMID:24328031

  5. Comparison of a Bayesian network with a logistic regression model to forecast IgA nephropathy.

    PubMed

    Ducher, Michel; Kalbacher, Emilie; Combarnous, François; Finaz de Vilaine, Jérome; McGregor, Brigitte; Fouque, Denis; Fauvel, Jean Pierre

    2013-01-01

    Models are increasingly used in clinical practice to improve the accuracy of diagnosis. The aim of our work was to compare a Bayesian network to logistic regression to forecast IgA nephropathy (IgAN) from simple clinical and biological criteria. Retrospectively, we pooled the results of all biopsies (n = 155) performed by nephrologists in a specialist clinical facility between 2002 and 2009. Two groups were constituted at random. The first subgroup was used to determine the parameters of the models adjusted to data by logistic regression or Bayesian network, and the second was used to compare the performances of the models using receiver operating characteristics (ROC) curves. IgAN was found (on pathology) in 44 patients. Areas under the ROC curves provided by both methods were highly significant but not different from each other. Based on the highest Youden indices, sensitivity reached (100% versus 67%) and specificity (73% versus 95%) using the Bayesian network and logistic regression, respectively. A Bayesian network is at least as efficient as logistic regression to estimate the probability of a patient suffering IgAN, using simple clinical and biological data obtained during consultation.

  6. Modeling the number of car theft using Poisson regression

    NASA Astrophysics Data System (ADS)

    Zulkifli, Malina; Ling, Agnes Beh Yen; Kasim, Maznah Mat; Ismail, Noriszura

    2016-10-01

    Regression analysis is the most popular statistical methods used to express the relationship between the variables of response with the covariates. The aim of this paper is to evaluate the factors that influence the number of car theft using Poisson regression model. This paper will focus on the number of car thefts that occurred in districts in Peninsular Malaysia. There are two groups of factor that have been considered, namely district descriptive factors and socio and demographic factors. The result of the study showed that Bumiputera composition, Chinese composition, Other ethnic composition, foreign migration, number of residence with the age between 25 to 64, number of employed person and number of unemployed person are the most influence factors that affect the car theft cases. These information are very useful for the law enforcement department, insurance company and car owners in order to reduce and limiting the car theft cases in Peninsular Malaysia.

  7. Improving regression-model-based streamwater constituent load estimates derived from serially correlated data

    USGS Publications Warehouse

    Aulenbach, Brent T.

    2013-01-01

    A regression-model based approach is a commonly used, efficient method for estimating streamwater constituent load when there is a relationship between streamwater constituent concentration and continuous variables such as streamwater discharge, season and time. A subsetting experiment using a 30-year dataset of daily suspended sediment observations from the Mississippi River at Thebes, Illinois, was performed to determine optimal sampling frequency, model calibration period length, and regression model methodology, as well as to determine the effect of serial correlation of model residuals on load estimate precision. Two regression-based methods were used to estimate streamwater loads, the Adjusted Maximum Likelihood Estimator (AMLE), and the composite method, a hybrid load estimation approach. While both methods accurately and precisely estimated loads at the model’s calibration period time scale, precisions were progressively worse at shorter reporting periods, from annually to monthly. Serial correlation in model residuals resulted in observed AMLE precision to be significantly worse than the model calculated standard errors of prediction. The composite method effectively improved upon AMLE loads for shorter reporting periods, but required a sampling interval of at least 15-days or shorter, when the serial correlations in the observed load residuals were greater than 0.15. AMLE precision was better at shorter sampling intervals and when using the shortest model calibration periods, such that the regression models better fit the temporal changes in the concentration–discharge relationship. The models with the largest errors typically had poor high flow sampling coverage resulting in unrepresentative models. Increasing sampling frequency and/or targeted high flow sampling are more efficient approaches to ensure sufficient sampling and to avoid poorly performing models, than increasing calibration period length.

  8. Estimating Regression Parameters in an Extended Proportional Odds Model

    PubMed Central

    Chen, Ying Qing; Hu, Nan; Cheng, Su-Chun; Musoke, Philippa; Zhao, Lue Ping

    2012-01-01

    The proportional odds model may serve as a useful alternative to the Cox proportional hazards model to study association between covariates and their survival functions in medical studies. In this article, we study an extended proportional odds model that incorporates the so-called “external” time-varying covariates. In the extended model, regression parameters have a direct interpretation of comparing survival functions, without specifying the baseline survival odds function. Semiparametric and maximum likelihood estimation procedures are proposed to estimate the extended model. Our methods are demonstrated by Monte-Carlo simulations, and applied to a landmark randomized clinical trial of a short course Nevirapine (NVP) for mother-to-child transmission (MTCT) of human immunodeficiency virus type-1 (HIV-1). Additional application includes analysis of the well-known Veterans Administration (VA) Lung Cancer Trial. PMID:22904583

  9. A recurrent support vector regression model in rainfall forecasting

    NASA Astrophysics Data System (ADS)

    Pai, Ping-Feng; Hong, Wei-Chiang

    2007-03-01

    To minimize potential loss of life and property caused by rainfall during typhoon seasons, precise rainfall forecasts have been one of the key subjects in hydrological research. However, rainfall forecast is made difficult by some very complicated and unforeseen physical factors associated with rainfall. Recently, support vector regression (SVR) models and recurrent SVR (RSVR) models have been successfully employed to solve time-series problems in some fields. Nevertheless, the use of RSVR models in rainfall forecasting has not been investigated widely. This study attempts to improve the forecasting accuracy of rainfall by taking advantage of the unique strength of the SVR model, genetic algorithms, and the recurrent network architecture. The performance of genetic algorithms with different mutation rates and crossover rates in SVR parameter selection is examined. Simulation results identify the RSVR with genetic algorithms model as being an effective means of forecasting rainfall amount. Copyright

  10. Development and Application of Nonlinear Land-Use Regression Models

    NASA Astrophysics Data System (ADS)

    Champendal, Alexandre; Kanevski, Mikhail; Huguenot, Pierre-Emmanuel

    2014-05-01

    The problem of air pollution modelling in urban zones is of great importance both from scientific and applied points of view. At present there are several fundamental approaches either based on science-based modelling (air pollution dispersion) or on the application of space-time geostatistical methods (e.g. family of kriging models or conditional stochastic simulations). Recently, there were important developments in so-called Land Use Regression (LUR) models. These models take into account geospatial information (e.g. traffic network, sources of pollution, average traffic, population census, land use, etc.) at different scales, for example, using buffering operations. Usually the dimension of the input space (number of independent variables) is within the range of (10-100). It was shown that LUR models have some potential to model complex and highly variable patterns of air pollution in urban zones. Most of LUR models currently used are linear models. In the present research the nonlinear LUR models are developed and applied for Geneva city. Mainly two nonlinear data-driven models were elaborated: multilayer perceptron and random forest. An important part of the research deals also with a comprehensive exploratory data analysis using statistical, geostatistical and time series tools. Unsupervised self-organizing maps were applied to better understand space-time patterns of the pollution. The real data case study deals with spatial-temporal air pollution data of Geneva (2002-2011). Nitrogen dioxide (NO2) has caught our attention. It has effects on human health and on plants; NO2 contributes to the phenomenon of acid rain. The negative effects of nitrogen dioxides on plants are the reduction of the growth, production and pesticide resistance. And finally, the effects on materials: nitrogen dioxide increases the corrosion. The data used for this study consist of a set of 106 NO2 passive sensors. 80 were used to build the models and the remaining 36 have constituted

  11. Estimation of reference evapotranspiration using multivariate fractional polynomial, Bayesian regression, and robust regression models in three arid environments

    NASA Astrophysics Data System (ADS)

    Khoshravesh, Mojtaba; Sefidkouhi, Mohammad Ali Gholami; Valipour, Mohammad

    2015-12-01

    The proper evaluation of evapotranspiration is essential in food security investigation, farm management, pollution detection, irrigation scheduling, nutrient flows, carbon balance as well as hydrologic modeling, especially in arid environments. To achieve sustainable development and to ensure water supply, especially in arid environments, irrigation experts need tools to estimate reference evapotranspiration on a large scale. In this study, the monthly reference evapotranspiration was estimated by three different regression models including the multivariate fractional polynomial (MFP), robust regression, and Bayesian regression in Ardestan, Esfahan, and Kashan. The results were compared with Food and Agriculture Organization (FAO)-Penman-Monteith (FAO-PM) to select the best model. The results show that at a monthly scale, all models provided a closer agreement with the calculated values for FAO-PM (R 2 > 0.95 and RMSE < 12.07 mm month-1). However, the MFP model gives better estimates than the other two models for estimating reference evapotranspiration at all stations.

  12. Imbedding linear regressions in models for factor crossing

    NASA Astrophysics Data System (ADS)

    Santos, Carla; Nunes, Célia; Dias, Cristina; Varadinov, Maria; Mexia, João T.

    2016-12-01

    Given u factors with J1, …, Ju levels we are led to test their effects and interactions. For this we consider an orthogonal partition of Rn, with n =∏l=1uJl, in subspaces associated with the sets of factors. The space corresponding to the set C will have density g (C )=∏l∈C(Jl-1) so that g({1, …, u}) will be much larger than the other number of degrees of freedom when Jl > 2, l = 1, …, u This fact may be used to enrich these models imbedding in them linear regressions.

  13. Comparison of regression models for estimation of isometric wrist joint torques using surface electromyography

    PubMed Central

    2011-01-01

    Background Several regression models have been proposed for estimation of isometric joint torque using surface electromyography (SEMG) signals. Common issues related to torque estimation models are degradation of model accuracy with passage of time, electrode displacement, and alteration of limb posture. This work compares the performance of the most commonly used regression models under these circumstances, in order to assist researchers with identifying the most appropriate model for a specific biomedical application. Methods Eleven healthy volunteers participated in this study. A custom-built rig, equipped with a torque sensor, was used to measure isometric torque as each volunteer flexed and extended his wrist. SEMG signals from eight forearm muscles, in addition to wrist joint torque data were gathered during the experiment. Additional data were gathered one hour and twenty-four hours following the completion of the first data gathering session, for the purpose of evaluating the effects of passage of time and electrode displacement on accuracy of models. Acquired SEMG signals were filtered, rectified, normalized and then fed to models for training. Results It was shown that mean adjusted coefficient of determination (Ra2) values decrease between 20%-35% for different models after one hour while altering arm posture decreased mean Ra2 values between 64% to 74% for different models. Conclusions Model estimation accuracy drops significantly with passage of time, electrode displacement, and alteration of limb posture. Therefore model retraining is crucial for preserving estimation accuracy. Data resampling can significantly reduce model training time without losing estimation accuracy. Among the models compared, ordinary least squares linear regression model (OLS) was shown to have high isometric torque estimation accuracy combined with very short training times. PMID:21943179

  14. Estimating leaf photosynthetic pigments information by stepwise multiple linear regression analysis and a leaf optical model

    NASA Astrophysics Data System (ADS)

    Liu, Pudong; Shi, Runhe; Wang, Hong; Bai, Kaixu; Gao, Wei

    2014-10-01

    Leaf pigments are key elements for plant photosynthesis and growth. Traditional manual sampling of these pigments is labor-intensive and costly, which also has the difficulty in capturing their temporal and spatial characteristics. The aim of this work is to estimate photosynthetic pigments at large scale by remote sensing. For this purpose, inverse model were proposed with the aid of stepwise multiple linear regression (SMLR) analysis. Furthermore, a leaf radiative transfer model (i.e. PROSPECT model) was employed to simulate the leaf reflectance where wavelength varies from 400 to 780 nm at 1 nm interval, and then these values were treated as the data from remote sensing observations. Meanwhile, simulated chlorophyll concentration (Cab), carotenoid concentration (Car) and their ratio (Cab/Car) were taken as target to build the regression model respectively. In this study, a total of 4000 samples were simulated via PROSPECT with different Cab, Car and leaf mesophyll structures as 70% of these samples were applied for training while the last 30% for model validation. Reflectance (r) and its mathematic transformations (1/r and log (1/r)) were all employed to build regression model respectively. Results showed fair agreements between pigments and simulated reflectance with all adjusted coefficients of determination (R2) larger than 0.8 as 6 wavebands were selected to build the SMLR model. The largest value of R2 for Cab, Car and Cab/Car are 0.8845, 0.876 and 0.8765, respectively. Meanwhile, mathematic transformations of reflectance showed little influence on regression accuracy. We concluded that it was feasible to estimate the chlorophyll and carotenoids and their ratio based on statistical model with leaf reflectance data.

  15. Variable-Domain Functional Regression for Modeling ICU Data.

    PubMed

    Gellar, Jonathan E; Colantuoni, Elizabeth; Needham, Dale M; Crainiceanu, Ciprian M

    2014-12-01

    We introduce a class of scalar-on-function regression models with subject-specific functional predictor domains. The fundamental idea is to consider a bivariate functional parameter that depends both on the functional argument and on the width of the functional predictor domain. Both parametric and nonparametric models are introduced to fit the functional coefficient. The nonparametric model is theoretically and practically invariant to functional support transformation, or support registration. Methods were motivated by and applied to a study of association between daily measures of the Intensive Care Unit (ICU) Sequential Organ Failure Assessment (SOFA) score and two outcomes: in-hospital mortality, and physical impairment at hospital discharge among survivors. Methods are generally applicable to a large number of new studies that record a continuous variables over unequal domains.

  16. Modeling pan evaporation for Kuwait by multiple linear regression.

    PubMed

    Almedeij, Jaber

    2012-01-01

    Evaporation is an important parameter for many projects related to hydrology and water resources systems. This paper constitutes the first study conducted in Kuwait to obtain empirical relations for the estimation of daily and monthly pan evaporation as functions of available meteorological data of temperature, relative humidity, and wind speed. The data used here for the modeling are daily measurements of substantial continuity coverage, within a period of 17 years between January 1993 and December 2009, which can be considered representative of the desert climate of the urban zone of the country. Multiple linear regression technique is used with a procedure of variable selection for fitting the best model forms. The correlations of evaporation with temperature and relative humidity are also transformed in order to linearize the existing curvilinear patterns of the data by using power and exponential functions, respectively. The evaporation models suggested with the best variable combinations were shown to produce results that are in a reasonable agreement with observation values.

  17. Quality Reporting of Multivariable Regression Models in Observational Studies: Review of a Representative Sample of Articles Published in Biomedical Journals.

    PubMed

    Real, Jordi; Forné, Carles; Roso-Llorach, Albert; Martínez-Sánchez, Jose M

    2016-05-01

    Controlling for confounders is a crucial step in analytical observational studies, and multivariable models are widely used as statistical adjustment techniques. However, the validation of the assumptions of the multivariable regression models (MRMs) should be made clear in scientific reporting. The objective of this study is to review the quality of statistical reporting of the most commonly used MRMs (logistic, linear, and Cox regression) that were applied in analytical observational studies published between 2003 and 2014 by journals indexed in MEDLINE.Review of a representative sample of articles indexed in MEDLINE (n = 428) with observational design and use of MRMs (logistic, linear, and Cox regression). We assessed the quality of reporting about: model assumptions and goodness-of-fit, interactions, sensitivity analysis, crude and adjusted effect estimate, and specification of more than 1 adjusted model.The tests of underlying assumptions or goodness-of-fit of the MRMs used were described in 26.2% (95% CI: 22.0-30.3) of the articles and 18.5% (95% CI: 14.8-22.1) reported the interaction analysis. Reporting of all items assessed was higher in articles published in journals with a higher impact factor.A low percentage of articles indexed in MEDLINE that used multivariable techniques provided information demonstrating rigorous application of the model selected as an adjustment method. Given the importance of these methods to the final results and conclusions of observational studies, greater rigor is required in reporting the use of MRMs in the scientific literature.

  18. Bias and uncertainty in regression-calibrated models of groundwater flow in heterogeneous media

    USGS Publications Warehouse

    Cooley, R.L.; Christensen, S.

    2006-01-01

    Groundwater models need to account for detailed but generally unknown spatial variability (heterogeneity) of the hydrogeologic model inputs. To address this problem we replace the large, m-dimensional stochastic vector ?? that reflects both small and large scales of heterogeneity in the inputs by a lumped or smoothed m-dimensional approximation ????*, where ?? is an interpolation matrix and ??* is a stochastic vector of parameters. Vector ??* has small enough dimension to allow its estimation with the available data. The consequence of the replacement is that model function f(????*) written in terms of the approximate inputs is in error with respect to the same model function written in terms of ??, ??,f(??), which is assumed to be nearly exact. The difference f(??) - f(????*), termed model error, is spatially correlated, generates prediction biases, and causes standard confidence and prediction intervals to be too small. Model error is accounted for in the weighted nonlinear regression methodology developed to estimate ??* and assess model uncertainties by incorporating the second-moment matrix of the model errors into the weight matrix. Techniques developed by statisticians to analyze classical nonlinear regression methods are extended to analyze the revised method. The analysis develops analytical expressions for bias terms reflecting the interaction of model nonlinearity and model error, for correction factors needed to adjust the sizes of confidence and prediction intervals for this interaction, and for correction factors needed to adjust the sizes of confidence and prediction intervals for possible use of a diagonal weight matrix in place of the correct one. If terms expressing the degree of intrinsic nonlinearity for f(??) and f(????*) are small, then most of the biases are small and the correction factors are reduced in magnitude. Biases, correction factors, and confidence and prediction intervals were obtained for a test problem for which model error is

  19. Comparison of regression methods for modeling intensive care length of stay.

    PubMed

    Verburg, Ilona W M; de Keizer, Nicolette F; de Jonge, Evert; Peek, Niels

    2014-01-01

    Intensive care units (ICUs) are increasingly interested in assessing and improving their performance. ICU Length of Stay (LoS) could be seen as an indicator for efficiency of care. However, little consensus exists on which prognostic method should be used to adjust ICU LoS for case-mix factors. This study compared the performance of different regression models when predicting ICU LoS. We included data from 32,667 unplanned ICU admissions to ICUs participating in the Dutch National Intensive Care Evaluation (NICE) in the year 2011. We predicted ICU LoS using eight regression models: ordinary least squares regression on untransformed ICU LoS,LoS truncated at 30 days and log-transformed LoS; a generalized linear model with a Gaussian distribution and a logarithmic link function; Poisson regression; negative binomial regression; Gamma regression with a logarithmic link function; and the original and recalibrated APACHE IV model, for all patients together and for survivors and non-survivors separately. We assessed the predictive performance of the models using bootstrapping and the squared Pearson correlation coefficient (R2), root mean squared prediction error (RMSPE), mean absolute prediction error (MAPE) and bias. The distribution of ICU LoS was skewed to the right with a median of 1.7 days (interquartile range 0.8 to 4.0) and a mean of 4.2 days (standard deviation 7.9). The predictive performance of the models was between 0.09 and 0.20 for R2, between 7.28 and 8.74 days for RMSPE, between 3.00 and 4.42 days for MAPE and between -2.99 and 1.64 days for bias. The predictive performance was slightly better for survivors than for non-survivors. We were disappointed in the predictive performance of the regression models and conclude that it is difficult to predict LoS of unplanned ICU admissions using patient characteristics at admission time only.

  20. A flexible count data regression model for risk analysis.

    PubMed

    Guikema, Seth D; Coffelt, Jeremy P; Goffelt, Jeremy P

    2008-02-01

    In many cases, risk and reliability analyses involve estimating the probabilities of discrete events such as hardware failures and occurrences of disease or death. There is often additional information in the form of explanatory variables that can be used to help estimate the likelihood of different numbers of events in the future through the use of an appropriate regression model, such as a generalized linear model. However, existing generalized linear models (GLM) are limited in their ability to handle the types of variance structures often encountered in using count data in risk and reliability analysis. In particular, standard models cannot handle both underdispersed data (variance less than the mean) and overdispersed data (variance greater than the mean) in a single coherent modeling framework. This article presents a new GLM based on a reformulation of the Conway-Maxwell Poisson (COM) distribution that is useful for both underdispersed and overdispersed count data and demonstrates this model by applying it to the assessment of electric power system reliability. The results show that the proposed COM GLM can provide as good of fits to data as the commonly used existing models for overdispered data sets while outperforming these commonly used models for underdispersed data sets.

  1. Partial Correlation Estimation by Joint Sparse Regression Models

    PubMed Central

    Peng, Jie; Wang, Pei; Zhou, Nengfeng; Zhu, Ji

    2009-01-01

    In this paper, we propose a computationally efficient approach —space(Sparse PArtial Correlation Estimation)— for selecting non-zero partial correlations under the high-dimension-low-sample-size setting. This method assumes the overall sparsity of the partial correlation matrix and employs sparse regression techniques for model fitting. We illustrate the performance of space by extensive simulation studies. It is shown that space performs well in both non-zero partial correlation selection and the identification of hub variables, and also outperforms two existing methods. We then apply space to a microarray breast cancer data set and identify a set of hub genes which may provide important insights on genetic regulatory networks. Finally, we prove that, under a set of suitable assumptions, the proposed procedure is asymptotically consistent in terms of model selection and parameter estimation. PMID:19881892

  2. Complex Environmental Data Modelling Using Adaptive General Regression Neural Networks

    NASA Astrophysics Data System (ADS)

    Kanevski, Mikhail

    2015-04-01

    The research deals with an adaptation and application of Adaptive General Regression Neural Networks (GRNN) to high dimensional environmental data. GRNN [1,2,3] are efficient modelling tools both for spatial and temporal data and are based on nonparametric kernel methods closely related to classical Nadaraya-Watson estimator. Adaptive GRNN, using anisotropic kernels, can be also applied for features selection tasks when working with high dimensional data [1,3]. In the present research Adaptive GRNN are used to study geospatial data predictability and relevant feature selection using both simulated and real data case studies. The original raw data were either three dimensional monthly precipitation data or monthly wind speeds embedded into 13 dimensional space constructed by geographical coordinates and geo-features calculated from digital elevation model. GRNN were applied in two different ways: 1) adaptive GRNN with the resulting list of features ordered according to their relevancy; and 2) adaptive GRNN applied to evaluate all possible models N [in case of wind fields N=(2^13 -1)=8191] and rank them according to the cross-validation error. In both cases training were carried out applying leave-one-out procedure. An important result of the study is that the set of the most relevant features depends on the month (strong seasonal effect) and year. The predictabilities of precipitation and wind field patterns, estimated using the cross-validation and testing errors of raw and shuffled data, were studied in detail. The results of both approaches were qualitatively and quantitatively compared. In conclusion, Adaptive GRNN with their ability to select features and efficient modelling of complex high dimensional data can be widely used in automatic/on-line mapping and as an integrated part of environmental decision support systems. 1. Kanevski M., Pozdnoukhov A., Timonin V. Machine Learning for Spatial Environmental Data. Theory, applications and software. EPFL Press

  3. THE REGRESSION MODEL OF IRAN LIBRARIES ORGANIZATIONAL CLIMATE

    PubMed Central

    Jahani, Mohammad Ali; Yaminfirooz, Mousa; Siamian, Hasan

    2015-01-01

    Background: The purpose of this study was to drawing a regression model of organizational climate of central libraries of Iran’s universities. Methods: This study is an applied research. The statistical population of this study consisted of 96 employees of the central libraries of Iran’s public universities selected among the 117 universities affiliated to the Ministry of Health by Stratified Sampling method (510 people). Climate Qual localized questionnaire was used as research tools. For predicting the organizational climate pattern of the libraries is used from the multivariate linear regression and track diagram. Results: of the 9 variables affecting organizational climate, 5 variables of innovation, teamwork, customer service, psychological safety and deep diversity play a major role in prediction of the organizational climate of Iran’s libraries. The results also indicate that each of these variables with different coefficient have the power to predict organizational climate but the climate score of psychological safety (0.94) plays a very crucial role in predicting the organizational climate. Track diagram showed that five variables of teamwork, customer service, psychological safety, deep diversity and innovation directly effects on the organizational climate variable that contribution of the team work from this influence is more than any other variables. Conclusions: Of the indicator of the organizational climate of climateQual, the contribution of the team work from this influence is more than any other variables that reinforcement of teamwork in academic libraries can be more effective in improving the organizational climate of this type libraries. PMID:26622203

  4. Functional Regression Models for Epistasis Analysis of Multiple Quantitative Traits.

    PubMed

    Zhang, Futao; Xie, Dan; Liang, Meimei; Xiong, Momiao

    2016-04-01

    To date, most genetic analyses of phenotypes have focused on analyzing single traits or analyzing each phenotype independently. However, joint epistasis analysis of multiple complementary traits will increase statistical power and improve our understanding of the complicated genetic structure of the complex diseases. Despite their importance in uncovering the genetic structure of complex traits, the statistical methods for identifying epistasis in multiple phenotypes remains fundamentally unexplored. To fill this gap, we formulate a test for interaction between two genes in multiple quantitative trait analysis as a multiple functional regression (MFRG) in which the genotype functions (genetic variant profiles) are defined as a function of the genomic position of the genetic variants. We use large-scale simulations to calculate Type I error rates for testing interaction between two genes with multiple phenotypes and to compare the power with multivariate pairwise interaction analysis and single trait interaction analysis by a single variate functional regression model. To further evaluate performance, the MFRG for epistasis analysis is applied to five phenotypes of exome sequence data from the NHLBI's Exome Sequencing Project (ESP) to detect pleiotropic epistasis. A total of 267 pairs of genes that formed a genetic interaction network showed significant evidence of epistasis influencing five traits. The results demonstrate that the joint interaction analysis of multiple phenotypes has a much higher power to detect interaction than the interaction analysis of a single trait and may open a new direction to fully uncovering the genetic structure of multiple phenotypes.

  5. The Application of the Cumulative Logistic Regression Model to Automated Essay Scoring

    ERIC Educational Resources Information Center

    Haberman, Shelby J.; Sinharay, Sandip

    2010-01-01

    Most automated essay scoring programs use a linear regression model to predict an essay score from several essay features. This article applied a cumulative logit model instead of the linear regression model to automated essay scoring. Comparison of the performances of the linear regression model and the cumulative logit model was performed on a…

  6. Regression models of sprint, vertical jump, and change of direction performance.

    PubMed

    Swinton, Paul A; Lloyd, Ray; Keogh, Justin W L; Agouris, Ioannis; Stewart, Arthur D

    2014-07-01

    It was the aim of the present study to expand on previous correlation analyses that have attempted to identify factors that influence performance of jumping, sprinting, and changing direction. This was achieved by using a regression approach to obtain linear models that combined anthropometric, strength, and other biomechanical variables. Thirty rugby union players participated in the study (age: 24.2 ± 3.9 years; stature: 181.2 ± 6.6 cm; mass: 94.2 ± 11.1 kg). The athletes' ability to sprint, jump, and change direction was assessed using a 30-m sprint, vertical jump, and 505 agility test, respectively. Regression variables were collected during maximum strength tests (1 repetition maximum [1RM] deadlift and squat) and performance of fast velocity resistance exercises (deadlift and jump squat) using submaximum loads (10-70% 1RM). Force, velocity, power, and rate of force development (RFD) values were measured during fast velocity exercises with the greatest values produced across loads selected for further analysis. Anthropometric data, including lengths, widths, and girths were collected using a 3-dimensional body scanner. Potential regression variables were first identified using correlation analyses. Suitable variables were then regressed using a best subsets approach. Three factor models generally provided the most appropriate balance between explained variance and model complexity. Adjusted R values of 0.86, 0.82, and 0.67 were obtained for sprint, jump, and change of direction performance, respectively. Anthropometric measurements did not feature in any of the top models because of their strong association with body mass. For each performance measure, variance was best explained by relative maximum strength. Improvements in models were then obtained by including velocity and power values for jumping and sprinting performance, and by including RFD values for change of direction performance.

  7. Robust cross-validation of linear regression QSAR models.

    PubMed

    Konovalov, Dmitry A; Llewellyn, Lyndon E; Vander Heyden, Yvan; Coomans, Danny

    2008-10-01

    A quantitative structure-activity relationship (QSAR) model is typically developed to predict the biochemical activity of untested compounds from the compounds' molecular structures. "The gold standard" of model validation is the blindfold prediction when the model's predictive power is assessed from how well the model predicts the activity values of compounds that were not considered in any way during the model development/calibration. However, during the development of a QSAR model, it is necessary to obtain some indication of the model's predictive power. This is often done by some form of cross-validation (CV). In this study, the concepts of the predictive power and fitting ability of a multiple linear regression (MLR) QSAR model were examined in the CV context allowing for the presence of outliers. Commonly used predictive power and fitting ability statistics were assessed via Monte Carlo cross-validation when applied to percent human intestinal absorption, blood-brain partition coefficient, and toxicity values of saxitoxin QSAR data sets, as well as three known benchmark data sets with known outlier contamination. It was found that (1) a robust version of MLR should always be preferred over the ordinary-least-squares MLR, regardless of the degree of outlier contamination and that (2) the model's predictive power should only be assessed via robust statistics. The Matlab and java source code used in this study is freely available from the QSAR-BENCH section of www.dmitrykonovalov.org for academic use. The Web site also contains the java-based QSAR-BENCH program, which could be run online via java's Web Start technology (supporting Windows, Mac OSX, Linux/Unix) to reproduce most of the reported results or apply the reported procedures to other data sets.

  8. Adaptive Modeling: An Approach for Incorporating Nonlinearity in Regression Analyses.

    PubMed

    Knafl, George J; Barakat, Lamia P; Hanlon, Alexandra L; Hardie, Thomas; Knafl, Kathleen A; Li, Yimei; Deatrick, Janet A

    2017-02-01

    Although regression relationships commonly are treated as linear, this often is not the case. An adaptive approach is described for identifying nonlinear relationships based on power transforms of predictor (or independent) variables and for assessing whether or not relationships are distinctly nonlinear. It is also possible to model adaptively both means and variances of continuous outcome (or dependent) variables and to adaptively power transform positive-valued continuous outcomes, along with their predictors. Example analyses are provided of data from parents in a nursing study on emotional-health-related quality of life for childhood brain tumor survivors as a function of the effort to manage the survivors' condition. These analyses demonstrate that relationships, including moderation relationships, can be distinctly nonlinear, that conclusions about means can be affected by accounting for non-constant variances, and that outcome transformation along with predictor transformation can provide distinct improvements and can resolve skewness problems.© 2017 Wiley Periodicals, Inc.

  9. A nonlinear regression model-based predictive control algorithm.

    PubMed

    Dubay, R; Abu-Ayyad, M; Hernandez, J M

    2009-04-01

    This paper presents a unique approach for designing a nonlinear regression model-based predictive controller (NRPC) for single-input-single-output (SISO) and multi-input-multi-output (MIMO) processes that are common in industrial applications. The innovation of this strategy is that the controller structure allows nonlinear open-loop modeling to be conducted while closed-loop control is executed every sampling instant. Consequently, the system matrix is regenerated every sampling instant using a continuous function providing a more accurate prediction of the plant. Computer simulations are carried out on nonlinear plants, demonstrating that the new approach is easily implemented and provides tight control. Also, the proposed algorithm is implemented on two real time SISO applications; a DC motor, a plastic injection molding machine and a nonlinear MIMO thermal system comprising three temperature zones to be controlled with interacting effects. The experimental closed-loop responses of the proposed algorithm were compared to a multi-model dynamic matrix controller (MPC) with improved results for various set point trajectories. Good disturbance rejection was attained, resulting in improved tracking of multi-set point profiles in comparison to multi-model MPC.

  10. Modeling Information Content Via Dirichlet-Multinomial Regression Analysis.

    PubMed

    Ferrari, Alberto

    2017-02-16

    Shannon entropy is being increasingly used in biomedical research as an index of complexity and information content in sequences of symbols, e.g. languages, amino acid sequences, DNA methylation patterns and animal vocalizations. Yet, distributional properties of information entropy as a random variable have seldom been the object of study, leading to researchers mainly using linear models or simulation-based analytical approach to assess differences in information content, when entropy is measured repeatedly in different experimental conditions. Here a method to perform inference on entropy in such conditions is proposed. Building on results coming from studies in the field of Bayesian entropy estimation, a symmetric Dirichlet-multinomial regression model, able to deal efficiently with the issue of mean entropy estimation, is formulated. Through a simulation study the model is shown to outperform linear modeling in a vast range of scenarios and to have promising statistical properties. As a practical example, the method is applied to a data set coming from a real experiment on animal communication.

  11. Forecasting Groundwater Temperature with Linear Regression Models Using Historical Data.

    PubMed

    Figura, Simon; Livingstone, David M; Kipfer, Rolf

    2015-01-01

    Although temperature is an important determinant of many biogeochemical processes in groundwater, very few studies have attempted to forecast the response of groundwater temperature to future climate warming. Using a composite linear regression model based on the lagged relationship between historical groundwater and regional air temperature data, empirical forecasts were made of groundwater temperature in several aquifers in Switzerland up to the end of the current century. The model was fed with regional air temperature projections calculated for greenhouse-gas emissions scenarios A2, A1B, and RCP3PD. Model evaluation revealed that the approach taken is adequate only when the data used to calibrate the models are sufficiently long and contain sufficient variability. These conditions were satisfied for three aquifers, all fed by riverbank infiltration. The forecasts suggest that with respect to the reference period 1980 to 2009, groundwater temperature in these aquifers will most likely increase by 1.1 to 3.8 K by the end of the current century, depending on the greenhouse-gas emissions scenario employed.

  12. Advantages of geographically weighted regression for modeling benthic substrate in two Greater Yellowstone Ecosystem streams

    USGS Publications Warehouse

    Sheehan, Kenneth R.; Strager, Michael P.; Welsh, Stuart

    2013-01-01

    Stream habitat assessments are commonplace in fish management, and often involve nonspatial analysis methods for quantifying or predicting habitat, such as ordinary least squares regression (OLS). Spatial relationships, however, often exist among stream habitat variables. For example, water depth, water velocity, and benthic substrate sizes within streams are often spatially correlated and may exhibit spatial nonstationarity or inconsistency in geographic space. Thus, analysis methods should address spatial relationships within habitat datasets. In this study, OLS and a recently developed method, geographically weighted regression (GWR), were used to model benthic substrate from water depth and water velocity data at two stream sites within the Greater Yellowstone Ecosystem. For data collection, each site was represented by a grid of 0.1 m2 cells, where actual values of water depth, water velocity, and benthic substrate class were measured for each cell. Accuracies of regressed substrate class data by OLS and GWR methods were calculated by comparing maps, parameter estimates, and determination coefficient r 2. For analysis of data from both sites, Akaike’s Information Criterion corrected for sample size indicated the best approximating model for the data resulted from GWR and not from OLS. Adjusted r 2 values also supported GWR as a better approach than OLS for prediction of substrate. This study supports GWR (a spatial analysis approach) over nonspatial OLS methods for prediction of habitat for stream habitat assessments.

  13. Modeling Pan Evaporation for Kuwait by Multiple Linear Regression

    PubMed Central

    Almedeij, Jaber

    2012-01-01

    Evaporation is an important parameter for many projects related to hydrology and water resources systems. This paper constitutes the first study conducted in Kuwait to obtain empirical relations for the estimation of daily and monthly pan evaporation as functions of available meteorological data of temperature, relative humidity, and wind speed. The data used here for the modeling are daily measurements of substantial continuity coverage, within a period of 17 years between January 1993 and December 2009, which can be considered representative of the desert climate of the urban zone of the country. Multiple linear regression technique is used with a procedure of variable selection for fitting the best model forms. The correlations of evaporation with temperature and relative humidity are also transformed in order to linearize the existing curvilinear patterns of the data by using power and exponential functions, respectively. The evaporation models suggested with the best variable combinations were shown to produce results that are in a reasonable agreement with observation values. PMID:23226984

  14. Shape adjustment of cable mesh reflector antennas considering modeling uncertainties

    NASA Astrophysics Data System (ADS)

    Du, Jingli; Bao, Hong; Cui, Chuanzhen

    2014-04-01

    Cable mesh antennas are the most important implement to construct large space antennas nowadays. Reflector surface of cable mesh antennas has to be carefully adjusted to achieve required accuracy, which is an effective way to compensate manufacturing and assembly errors or other imperfections. In this paper shape adjustment of cable mesh antennas is addressed. The required displacement of the reflector surface is determined with respect to a modified paraboloid whose axial vertex offset is also considered as a variable. Then the adjustment problem is solved by minimizing the RMS error with respect to the desired paraboloid using minimal norm least squares method. To deal with the modeling uncertainties, the adjustment is achieved by solving a simple worst-case optimization problem instead of directly using the least squares method. A numerical example demonstrates the worst-case method is of good convergence and accuracy, and is robust to perturbations.

  15. Storm Water Management Model Climate Adjustment Tool (SWMM-CAT)

    EPA Science Inventory

    The US EPA’s newest tool, the Stormwater Management Model (SWMM) – Climate Adjustment Tool (CAT) is meant to help municipal stormwater utilities better address potential climate change impacts affecting their operations. SWMM, first released in 1971, models hydrology and hydrauli...

  16. Estimation of count data using mixed Poisson, generalized Poisson and finite Poisson mixture regression models

    NASA Astrophysics Data System (ADS)

    Zamani, Hossein; Faroughi, Pouya; Ismail, Noriszura

    2014-06-01

    This study relates the Poisson, mixed Poisson (MP), generalized Poisson (GP) and finite Poisson mixture (FPM) regression models through mean-variance relationship, and suggests the application of these models for overdispersed count data. As an illustration, the regression models are fitted to the US skin care count data. The results indicate that FPM regression model is the best model since it provides the largest log likelihood and the smallest AIC, followed by Poisson-Inverse Gaussion (PIG), GP and negative binomial (NB) regression models. The results also show that NB, PIG and GP regression models provide similar results.

  17. Risk prediction for myocardial infarction via generalized functional regression models.

    PubMed

    Ieva, Francesca; Paganoni, Anna M

    2016-08-01

    In this paper, we propose a generalized functional linear regression model for a binary outcome indicating the presence/absence of a cardiac disease with multivariate functional data among the relevant predictors. In particular, the motivating aim is the analysis of electrocardiographic traces of patients whose pre-hospital electrocardiogram (ECG) has been sent to 118 Dispatch Center of Milan (the Italian free-toll number for emergencies) by life support personnel of the basic rescue units. The statistical analysis starts with a preprocessing of ECGs treated as multivariate functional data. The signals are reconstructed from noisy observations. The biological variability is then removed by a nonlinear registration procedure based on landmarks. Thus, in order to perform a data-driven dimensional reduction, a multivariate functional principal component analysis is carried out on the variance-covariance matrix of the reconstructed and registered ECGs and their first derivatives. We use the scores of the Principal Components decomposition as covariates in a generalized linear model to predict the presence of the disease in a new patient. Hence, a new semi-automatic diagnostic procedure is proposed to estimate the risk of infarction (in the case of interest, the probability of being affected by Left Bundle Brunch Block). The performance of this classification method is evaluated and compared with other methods proposed in literature. Finally, the robustness of the procedure is checked via leave-j-out techniques.

  18. Heterogeneous Breast Phantom Development for Microwave Imaging Using Regression Models

    PubMed Central

    Hahn, Camerin; Noghanian, Sima

    2012-01-01

    As new algorithms for microwave imaging emerge, it is important to have standard accurate benchmarking tests. Currently, most researchers use homogeneous phantoms for testing new algorithms. These simple structures lack the heterogeneity of the dielectric properties of human tissue and are inadequate for testing these algorithms for medical imaging. To adequately test breast microwave imaging algorithms, the phantom has to resemble different breast tissues physically and in terms of dielectric properties. We propose a systematic approach in designing phantoms that not only have dielectric properties close to breast tissues but also can be easily shaped to realistic physical models. The approach is based on regression model to match phantom's dielectric properties with the breast tissue dielectric properties found in Lazebnik et al. (2007). However, the methodology proposed here can be used to create phantoms for any tissue type as long as ex vivo, in vitro, or in vivo tissue dielectric properties are measured and available. Therefore, using this method, accurate benchmarking phantoms for testing emerging microwave imaging algorithms can be developed. PMID:22550473

  19. Regressive logistic models for familial diseases: a formulation assuming an underlying liability model.

    PubMed Central

    Demenais, F M

    1991-01-01

    Statistical models have been developed to delineate the major-gene and non-major-gene factors accounting for the familial aggregation of complex diseases. The mixed model assumes an underlying liability to the disease, to which a major gene, a multifactorial component, and random environment contribute independently. Affection is defined by a threshold on the liability scale. The regressive logistic models assume that the logarithm of the odds of being affected is a linear function of major genotype, phenotypes of antecedents and other covariates. An equivalence between these two approaches cannot be derived analytically. I propose a formulation of the regressive logistic models on the supposition of an underlying liability model of disease. Relatives are assumed to have correlated liabilities to the disease; affected persons have liabilities exceeding an estimable threshold. Under the assumption that the correlation structure of the relatives' liabilities follows a regressive model, the regression coefficients on antecedents are expressed in terms of the relevant familial correlations. A parsimonious parameterization is a consequence of the assumed liability model, and a one-to-one correspondence with the parameters of the mixed model can be established. The logits, derived under the class A regressive model and under the class D regressive model, can be extended to include a large variety of patterns of family dependence, as well as gene-environment interactions. PMID:1897524

  20. Tutorial on Using Regression Models with Count Outcomes Using R

    ERIC Educational Resources Information Center

    Beaujean, A. Alexander; Morgan, Grant B.

    2016-01-01

    Education researchers often study count variables, such as times a student reached a goal, discipline referrals, and absences. Most researchers that study these variables use typical regression methods (i.e., ordinary least-squares) either with or without transforming the count variables. In either case, using typical regression for count data can…

  1. Profile local linear estimation of generalized semiparametric regression model for longitudinal data

    PubMed Central

    Sun, Liuquan; Zhou, Jie

    2013-01-01

    This paper studies the generalized semiparametric regression model for longitudinal data where the covariate effects are constant for some and time-varying for others. Different link functions can be used to allow more flexible modelling of longitudinal data. The nonparametric components of the model are estimated using a local linear estimating equation and the parametric components are estimated through a profile estimating function. The method automatically adjusts for heterogeneity of sampling times, allowing the sampling strategy to depend on the past sampling history as well as possibly time-dependent covariates without specifically model such dependence. A K -fold cross-validation bandwidth selection is proposed as a working tool for locating an appropriate bandwidth. A criteria for selecting the link function is proposed to provide better fit of the data. Large sample properties of the proposed estimators are investigated. Large sample pointwise and simultaneous confidence intervals for the regression coefficients are constructed. Formal hypothesis testing procedures are proposed to check for the covariate effects and whether the effects are time-varying. A simulation study is conducted to examine the finite sample performances of the proposed estimation and hypothesis testing procedures. The methods are illustrated with a data example. PMID:23471814

  2. Robustness of Land-Use Regression Models Developed from Mobile Air Pollutant Measurements.

    PubMed

    Hatzopoulou, Marianne; Valois, Marie-France; Levy, Ilan; Mihele, Cristian; Lu, Gang; Bagg, Scott; Minet, Laura; Brook, Jeffrey Robert

    2017-02-27

    Land-Use Regression (LUR) models are useful for resolving fine scale spatial variations in average air pollutant concentrations across urban areas. With the rise of mobile air pollution campaigns, characterized by short-term monitoring and large spatial extents, it is important to investigate the effects of sampling protocols on the resulting LUR. In this study a mobile lab was used to repeatedly visit a large number of locations (~1800), defined by road segments, to derive average concentrations across the city of Montreal, Canada. We hypothesize that the robustness of the LUR from these data depends upon how many independent, random times each location is visited (Nvis) and the number of locations (Nloc) used in model development and that these parameters can be optimized. By performing multiple LURs on random sets of locations, we assessed the robustness of the LUR through consistency in adjusted R2 (i.e., coefficient of variation, CV) and in regression coefficients among different models. As Nloc increased, R2adj became less variable; for Nloc=100 vs. Nloc=300 the CV in R2adj for ultrafine particles decreased from 0.088 to 0.029 and from 0.115 to 0.076 for NO2. The CV in the R2adj also decreased as Nvis increased from 6 to 16; from 0.090 to 0.014 for UFP. As Nloc and Nvis increase, the variability in the coefficient sizes across the different model realizations were also seen to decrease.

  3. Measures of Residual Risk with Connections to Regression, Risk Tracking, Surrogate Models, and Ambiguity

    DTIC Science & Technology

    2015-01-07

    Measures of Residual Risk with Connections to Regression, Risk Tracking, Surrogate Models, and Ambiguity1 R. Tyrrell Rockafellar Johannes O. Royset...insights and a new class of distributionally robust optimization models. Keywords: risk measures, residual risk, generalized regression, surrogate ...Measures of Residual Risk with Connections to Regression, Risk Tracking, Surrogate Models, and Ambiguity 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c

  4. Analyzing Student Learning Outcomes: Usefulness of Logistic and Cox Regression Models. IR Applications, Volume 5

    ERIC Educational Resources Information Center

    Chen, Chau-Kuang

    2005-01-01

    Logistic and Cox regression methods are practical tools used to model the relationships between certain student learning outcomes and their relevant explanatory variables. The logistic regression model fits an S-shaped curve into a binary outcome with data points of zero and one. The Cox regression model allows investigators to study the duration…

  5. Predicting recycling behaviour: Comparison of a linear regression model and a fuzzy logic model.

    PubMed

    Vesely, Stepan; Klöckner, Christian A; Dohnal, Mirko

    2016-03-01

    In this paper we demonstrate that fuzzy logic can provide a better tool for predicting recycling behaviour than the customarily used linear regression. To show this, we take a set of empirical data on recycling behaviour (N=664), which we randomly divide into two halves. The first half is used to estimate a linear regression model of recycling behaviour, and to develop a fuzzy logic model of recycling behaviour. As the first comparison, the fit of both models to the data included in estimation of the models (N=332) is evaluated. As the second comparison, predictive accuracy of both models for "new" cases (hold-out data not included in building the models, N=332) is assessed. In both cases, the fuzzy logic model significantly outperforms the regression model in terms of fit. To conclude, when accurate predictions of recycling and possibly other environmental behaviours are needed, fuzzy logic modelling seems to be a promising technique.

  6. Waste generated in high-rise buildings construction: a quantification model based on statistical multiple regression.

    PubMed

    Parisi Kern, Andrea; Ferreira Dias, Michele; Piva Kulakowski, Marlova; Paulo Gomes, Luciana

    2015-05-01

    Reducing construction waste is becoming a key environmental issue in the construction industry. The quantification of waste generation rates in the construction sector is an invaluable management tool in supporting mitigation actions. However, the quantification of waste can be a difficult process because of the specific characteristics and the wide range of materials used in different construction projects. Large variations are observed in the methods used to predict the amount of waste generated because of the range of variables involved in construction processes and the different contexts in which these methods are employed. This paper proposes a statistical model to determine the amount of waste generated in the construction of high-rise buildings by assessing the influence of design process and production system, often mentioned as the major culprits behind the generation of waste in construction. Multiple regression was used to conduct a case study based on multiple sources of data of eighteen residential buildings. The resulting statistical model produced dependent (i.e. amount of waste generated) and independent variables associated with the design and the production system used. The best regression model obtained from the sample data resulted in an adjusted R(2) value of 0.694, which means that it predicts approximately 69% of the factors involved in the generation of waste in similar constructions. Most independent variables showed a low determination coefficient when assessed in isolation, which emphasizes the importance of assessing their joint influence on the response (dependent) variable.

  7. MODELING SNAKE MICROHABITAT FROM RADIOTELEMETRY STUDIES USING POLYTOMOUS LOGISTIC REGRESSION

    EPA Science Inventory

    Multivariate analysis of snake microhabitat has historically used techniques that were derived under assumptions of normality and common covariance structure (e.g., discriminant function analysis, MANOVA). In this study, polytomous logistic regression (PLR which does not require ...

  8. An Evaluation of Title I Model C1: The Special Regression Model.

    ERIC Educational Resources Information Center

    Mandeville, Garrett K.

    The RMC Research Corporation evaluation model C1--the special regression model (SRM)--was evaluated through a series of computer simulations and compared with an alternative model, the norm referenced model (NRM). Using local data and national norm data to determine reasonable values for sample size and pretest posttest correlation parameters, the…

  9. Modelling the rate of change in a longitudinal study with missing data, adjusting for contact attempts.

    PubMed

    Akacha, Mouna; Hutton, Jane L

    2011-05-10

    The Collaborative Ankle Support Trial (CAST) is a longitudinal trial of treatments for severe ankle sprains in which interest lies in the rate of improvement, the effectiveness of reminders and potentially informative missingness. A model is proposed for continuous longitudinal data with non-ignorable or informative missingness, taking into account the nature of attempts made to contact initial non-responders. The model combines a non-linear mixed model for the outcome model with logistic regression models for the reminder processes. A sensitivity analysis is used to contrast this model with the traditional selection model, where we adjust for missingness by modelling the missingness process. The conclusions that recovery is slower, and less satisfactory with age and more rapid with below knee cast than with a tubular bandage do not alter materially across all models investigated. The results also suggest that phone calls are most effective in retrieving questionnaires.

  10. An Explanation of the Effectiveness of Latent Semantic Indexing by Means of a Bayesian Regression Model.

    ERIC Educational Resources Information Center

    Story, Roger E.

    1996-01-01

    Discussion of the use of Latent Semantic Indexing to determine relevancy in information retrieval focuses on statistical regression and Bayesian methods. Topics include keyword searching; a multiple regression model; how the regression model can aid search methods; and limitations of this approach, including complexity, linearity, and…

  11. A Computationally Efficient State Space Approach to Estimating Multilevel Regression Models and Multilevel Confirmatory Factor Models.

    PubMed

    Gu, Fei; Preacher, Kristopher J; Wu, Wei; Yung, Yiu-Fai

    2014-01-01

    Although the state space approach for estimating multilevel regression models has been well established for decades in the time series literature, it does not receive much attention from educational and psychological researchers. In this article, we (a) introduce the state space approach for estimating multilevel regression models and (b) extend the state space approach for estimating multilevel factor models. A brief outline of the state space formulation is provided and then state space forms for univariate and multivariate multilevel regression models, and a multilevel confirmatory factor model, are illustrated. The utility of the state space approach is demonstrated with either a simulated or real example for each multilevel model. It is concluded that the results from the state space approach are essentially identical to those from specialized multilevel regression modeling and structural equation modeling software. More importantly, the state space approach offers researchers a computationally more efficient alternative to fit multilevel regression models with a large number of Level 1 units within each Level 2 unit or a large number of observations on each subject in a longitudinal study.

  12. Principal Component Regression and Linear Mixed Model in Association Analysis of Structured Samples: Competitors or Complements?

    PubMed Central

    Zhang, Yiwei; Pan, Wei

    2014-01-01

    Genome-wide association studies (GWAS) have been established as a major tool to identify genetic variants associated with complex traits, such as common diseases. However, GWAS may suffer from false positives and false negatives due to confounding population structures, including known or unknown relatedness. Another important issue is unmeasured environmental risk factors. Among many methods for adjusting for population structures, two approaches stand out: one is principal component regression (PCR) based on principal component analysis (PCA), which is perhaps most popular due to its early appearance, simplicity and general effectiveness; the other is based on a linear mixed model (LMM) that has emerged recently as perhaps the most flexible and effective, especially for samples with complex structures as in model organisms. As shown previously, the PCR approach can be regarded as an approximation to a LMM; such an approximation depends on the number of the top principal components (PCs) used, the choice of which is often difficult in practice. Hence, in the presence of population structure, the LMM appears to outperform the PCR method. However, due to the different treatments of fixed versus random effects in the two approaches, we show an advantage of PCR over LMM: in the presence of an unknown but spatially confined environmental confounder (e.g. environmental pollution or life style), the PCs may be able to implicitly and effectively adjust for the confounder while the LMM cannot. Accordingly, to adjust for both population structures and non-genetic confounders, we propose a hybrid method combining the use and thus strengths of PCR and LMM. We use real genotype data and simulated phenotypes to confirm the above points, and establish the superior performance of the hybrid method across all scenarios. PMID:25536929

  13. A Bayesian Nonparametric Causal Model for Regression Discontinuity Designs

    ERIC Educational Resources Information Center

    Karabatsos, George; Walker, Stephen G.

    2013-01-01

    The regression discontinuity (RD) design (Thistlewaite & Campbell, 1960; Cook, 2008) provides a framework to identify and estimate causal effects from a non-randomized design. Each subject of a RD design is assigned to the treatment (versus assignment to a non-treatment) whenever her/his observed value of the assignment variable equals or…

  14. Exact Analysis of Squared Cross-Validity Coefficient in Predictive Regression Models

    ERIC Educational Resources Information Center

    Shieh, Gwowen

    2009-01-01

    In regression analysis, the notion of population validity is of theoretical interest for describing the usefulness of the underlying regression model, whereas the presumably more important concept of population cross-validity represents the predictive effectiveness for the regression equation in future research. It appears that the inference…

  15. Growth curve by Gompertz nonlinear regression model in female and males in tambaqui (Colossoma macropomum).

    PubMed

    De Mello, Fernanda; Oliveira, Carlos A L; Ribeiro, Ricardo P; Resende, Emiko K; Povh, Jayme A; Fornari, Darci C; Barreto, Rogério V; McManus, Concepta; Streit, Danilo

    2015-01-01

    Was evaluated the pattern of growth among females and males of tambaqui by Gompertz nonlinear regression model. Five traits of economic importance were measured on 145 animals during the three years, totaling 981 morphometric data analyzed. Different curves were adjusted between males and females for body weight, height and head length and only one curve was adjusted to the width and body length. The asymptotic weight (a) and relative growth rate to maturity (k) were different between sexes in animals with ± 5 kg; slaughter weight practiced by a specific niche market, very profitable. However, there was no difference between males and females up to ± 2 kg; slaughter weight established to supply the bigger consumer market. Females showed weight greater than males (± 280 g), which are more suitable for fish farming purposes defined for the niche market to larger animals. In general, males had lower maximum growth rate (8.66 g / day) than females (9.34 g / day), however, reached faster than females, 476 and 486 days growth rate, respectively. The height and length body are the traits that contributed most to the weight at 516 days (P <0.001).

  16. A regressive model analysis of congenital sensorineural deafness in German Dalmatian dogs.

    PubMed

    Juraschko, Kathrin; Meyer-Lindenberg, Andrea; Nolte, Ingo; Distl, Ottmar

    2003-08-01

    The objective of the present study was to analyze the mode of inheritance for congenital sensorineural deafness (CSD) in German Dalmatian dogs by consideration of association between phenotypic breed characteristics and CSD. Segregation analysis with regressive logistic models was employed to test for different mechanisms of genetic transmission. Data were obtained from all three Dalmatian kennel clubs associated with the German Association for Dog Breeding and Husbandry (VDH). CSD was tested by veterinary practitioners using standardized protocols for Brainstem Auditory-Evoked Response (BAER). The sample included 1899 Dalmatian dogs from 354 litters in 169 different kennels. BAER testing results were from the years 1986 to 1999. Pedigree information was available for up to seven generations. The segregation analysis showed that a mixed monogenic-polygenic model including eye color as covariate among all other tested models best explained the segregation of affected animals in the pedigrees. The recessive major gene segregated in dogs with blue and brown eye color as well as in dogs with and without pigmented coat patches. Models which took into account the occurrence of patches, percentage of puppies tested per litter, or inbreeding coefficient gave no better adjustment to the most general (saturated) model. A procedure for the simultaneous prediction of breeding values and the estimation of genotype probabilities for CSD is expected to improve breeding programs significantly.

  17. Testing approaches for overdispersion in poisson regression versus the generalized poisson model.

    PubMed

    Yang, Zhao; Hardin, James W; Addy, Cheryl L; Vuong, Quang H

    2007-08-01

    Overdispersion is a common phenomenon in Poisson modeling, and the negative binomial (NB) model is frequently used to account for overdispersion. Testing approaches (Wald test, likelihood ratio test (LRT), and score test) for overdispersion in the Poisson regression versus the NB model are available. Because the generalized Poisson (GP) model is similar to the NB model, we consider the former as an alternate model for overdispersed count data. The score test has an advantage over the LRT and the Wald test in that the score test only requires that the parameter of interest be estimated under the null hypothesis. This paper proposes a score test for overdispersion based on the GP model and compares the power of the test with the LRT and Wald tests. A simulation study indicates the score test based on asymptotic standard Normal distribution is more appropriate in practical application for higher empirical power, however, it underestimates the nominal significance level, especially in small sample situations, and examples illustrate the results of comparing the candidate tests between the Poisson and GP models. A bootstrap test is also proposed to adjust the underestimation of nominal level in the score statistic when the sample size is small. The simulation study indicates the bootstrap test has significance level closer to nominal size and has uniformly greater power than the score test based on asymptotic standard Normal distribution. From a practical perspective, we suggest that, if the score test gives even a weak indication that the Poisson model is inappropriate, say at the 0.10 significance level, we advise the more accurate bootstrap procedure as a better test for comparing whether the GP model is more appropriate than Poisson model. Finally, the Vuong test is illustrated to choose between GP and NB2 models for the same dataset.

  18. Investigating the Performance of Alternate Regression Weights by Studying All Possible Criteria in Regression Models with a Fixed Set of Predictors

    ERIC Educational Resources Information Center

    Waller, Niels; Jones, Jeff

    2011-01-01

    We describe methods for assessing all possible criteria (i.e., dependent variables) and subsets of criteria for regression models with a fixed set of predictors, x (where x is an n x 1 vector of independent variables). Our methods build upon the geometry of regression coefficients (hereafter called regression weights) in n-dimensional space. For a…

  19. Modeling Polytomous Item Responses Using Simultaneously Estimated Multinomial Logistic Regression Models

    ERIC Educational Resources Information Center

    Anderson, Carolyn J.; Verkuilen, Jay; Peyton, Buddy L.

    2010-01-01

    Survey items with multiple response categories and multiple-choice test questions are ubiquitous in psychological and educational research. We illustrate the use of log-multiplicative association (LMA) models that are extensions of the well-known multinomial logistic regression model for multiple dependent outcome variables to reanalyze a set of…

  20. Watershed Regressions for Pesticides (WARP) models for predicting stream concentrations of multiple pesticides

    USGS Publications Warehouse

    Stone, Wesley W.; Crawford, Charles G.; Gilliom, Robert J.

    2013-01-01

    Watershed Regressions for Pesticides for multiple pesticides (WARP-MP) are statistical models developed to predict concentration statistics for a wide range of pesticides in unmonitored streams. The WARP-MP models use the national atrazine WARP models in conjunction with an adjustment factor for each additional pesticide. The WARP-MP models perform best for pesticides with application timing and methods similar to those used with atrazine. For other pesticides, WARP-MP models tend to overpredict concentration statistics for the model development sites. For WARP and WARP-MP, the less-than-ideal sampling frequency for the model development sites leads to underestimation of the shorter-duration concentration; hence, the WARP models tend to underpredict 4- and 21-d maximum moving-average concentrations, with median errors ranging from 9 to 38% As a result of this sampling bias, pesticides that performed well with the model development sites are expected to have predictions that are biased low for these shorter-duration concentration statistics. The overprediction by WARP-MP apparent for some of the pesticides is variably offset by underestimation of the model development concentration statistics. Of the 112 pesticides used in the WARP-MP application to stream segments nationwide, 25 were predicted to have concentration statistics with a 50% or greater probability of exceeding one or more aquatic life benchmarks in one or more stream segments. Geographically, many of the modeled streams in the Corn Belt Region were predicted to have one or more pesticides that exceeded an aquatic life benchmark during 2009, indicating the potential vulnerability of streams in this region.

  1. Watershed Regressions for Pesticides (WARP) Models for Predicting Stream Concentrations of Multiple Pesticides.

    PubMed

    Stone, Wesley W; Crawford, Charles G; Gilliom, Robert J

    2013-11-01

    Watershed Regressions for Pesticides for multiple pesticides (WARP-MP) are statistical models developed to predict concentration statistics for a wide range of pesticides in unmonitored streams. The WARP-MP models use the national atrazine WARP models in conjunction with an adjustment factor for each additional pesticide. The WARP-MP models perform best for pesticides with application timing and methods similar to those used with atrazine. For other pesticides, WARP-MP models tend to overpredict concentration statistics for the model development sites. For WARP and WARP-MP, the less-than-ideal sampling frequency for the model development sites leads to underestimation of the shorter-duration concentration; hence, the WARP models tend to underpredict 4- and 21-d maximum moving-average concentrations, with median errors ranging from 9 to 38% As a result of this sampling bias, pesticides that performed well with the model development sites are expected to have predictions that are biased low for these shorter-duration concentration statistics. The overprediction by WARP-MP apparent for some of the pesticides is variably offset by underestimation of the model development concentration statistics. Of the 112 pesticides used in the WARP-MP application to stream segments nationwide, 25 were predicted to have concentration statistics with a 50% or greater probability of exceeding one or more aquatic life benchmarks in one or more stream segments. Geographically, many of the modeled streams in the Corn Belt Region were predicted to have one or more pesticides that exceeded an aquatic life benchmark during 2009, indicating the potential vulnerability of streams in this region.

  2. Spatial Double Generalized Beta Regression Models: Extensions and Application to Study Quality of Education in Colombia

    ERIC Educational Resources Information Center

    Cepeda-Cuervo, Edilberto; Núñez-Antón, Vicente

    2013-01-01

    In this article, a proposed Bayesian extension of the generalized beta spatial regression models is applied to the analysis of the quality of education in Colombia. We briefly revise the beta distribution and describe the joint modeling approach for the mean and dispersion parameters in the spatial regression models' setting. Finally, we motivate…

  3. Finite Mixture Dynamic Regression Modeling of Panel Data with Implications for Dynamic Response Analysis

    ERIC Educational Resources Information Center

    Kaplan, David

    2005-01-01

    This article considers the problem of estimating dynamic linear regression models when the data are generated from finite mixture probability density function where the mixture components are characterized by different dynamic regression model parameters. Specifically, conventional linear models assume that the data are generated by a single…

  4. Embedding IRT in Structural Equation Models: A Comparison with Regression Based on IRT Scores

    ERIC Educational Resources Information Center

    Lu, Irene R. R.; Thomas, D. Roland; Zumbo, Bruno D.

    2005-01-01

    This article reviews the problems associated with using item response theory (IRT)-based latent variable scores for analytical modeling, discusses the connection between IRT and structural equation modeling (SEM)-based latent regression modeling for discrete data, and compares regression parameter estimates obtained using predicted IRT scores and…

  5. Assessing Fit of Latent Regression Models. Research Report. ETS RR-09-50

    ERIC Educational Resources Information Center

    Sinharay, Sandip; Guo, Zhumei; von Davier, Matthias; Veldkamp, Bernard P.

    2009-01-01

    The reporting methods used in large-scale educational assessments such as the National Assessment of Educational Progress (NAEP) rely on a "latent regression model". There is a lack of research on the assessment of fit of latent regression models. This paper suggests a simulation-based model-fit technique to assess the fit of such…

  6. Stochastic Approximation Methods for Latent Regression Item Response Models. Research Report. ETS RR-09-09

    ERIC Educational Resources Information Center

    von Davier, Matthias; Sinharay, Sandip

    2009-01-01

    This paper presents an application of a stochastic approximation EM-algorithm using a Metropolis-Hastings sampler to estimate the parameters of an item response latent regression model. Latent regression models are extensions of item response theory (IRT) to a 2-level latent variable model in which covariates serve as predictors of the…

  7. Regression equations for estimation of annual peak-streamflow frequency for undeveloped watersheds in Texas using an L-moment-based, PRESS-minimized, residual-adjusted approach

    USGS Publications Warehouse

    Asquith, William H.; Roussel, Meghan C.

    2009-01-01

    Annual peak-streamflow frequency estimates are needed for flood-plain management; for objective assessment of flood risk; for cost-effective design of dams, levees, and other flood-control structures; and for design of roads, bridges, and culverts. Annual peak-streamflow frequency represents the peak streamflow for nine recurrence intervals of 2, 5, 10, 25, 50, 100, 200, 250, and 500 years. Common methods for estimation of peak-streamflow frequency for ungaged or unmonitored watersheds are regression equations for each recurrence interval developed for one or more regions; such regional equations are the subject of this report. The method is based on analysis of annual peak-streamflow data from U.S. Geological Survey streamflow-gaging stations (stations). Beginning in 2007, the U.S. Geological Survey, in cooperation with the Texas Department of Transportation and in partnership with Texas Tech University, began a 3-year investigation concerning the development of regional equations to estimate annual peak-streamflow frequency for undeveloped watersheds in Texas. The investigation focuses primarily on 638 stations with 8 or more years of data from undeveloped watersheds and other criteria. The general approach is explicitly limited to the use of L-moment statistics, which are used in conjunction with a technique of multi-linear regression referred to as PRESS minimization. The approach used to develop the regional equations, which was refined during the investigation, is referred to as the 'L-moment-based, PRESS-minimized, residual-adjusted approach'. For the approach, seven unique distributions are fit to the sample L-moments of the data for each of 638 stations and trimmed means of the seven results of the distributions for each recurrence interval are used to define the station specific, peak-streamflow frequency. As a first iteration of regression, nine weighted-least-squares, PRESS-minimized, multi-linear regression equations are computed using the watershed

  8. Hybrid hotspot detection using regression model and lithography simulation

    NASA Astrophysics Data System (ADS)

    Kimura, Taiki; Matsunawa, Tetsuaki; Nojima, Shigeki; Pan, David Z.

    2016-03-01

    As minimum feature sizes shrink, unexpected hotspots appear on wafers. Therefore, it is important to detect and fix these hotspots at design stage to reduce development time and manufacturing cost. Currently, as the most accurate approach, lithography simulation is widely used to detect such hotspots. However, it is known to be time-consuming. This paper proposes a novel aerial image synthesizing method using regression and minimum lithography simulation for only hotspot detection. Experimental results show hotspot detection on the proposed method is equivalent compared with the results on the conventional hotspot detection method which uses only lithography simulation with much less computational cost.

  9. On the usage of linear regression models to reconstruct limb kinematics from low frequency EEG signals.

    PubMed

    Antelis, Javier M; Montesano, Luis; Ramos-Murguialday, Ander; Birbaumer, Niels; Minguez, Javier

    2013-01-01

    Several works have reported on the reconstruction of 2D/3D limb kinematics from low-frequency EEG signals using linear regression models based on positive correlation values between the recorded and the reconstructed trajectories. This paper describes the mathematical properties of the linear model and the correlation evaluation metric that may lead to a misinterpretation of the results of this type of decoders. Firstly, the use of a linear regression model to adjust the two temporal signals (EEG and velocity profiles) implies that the relevant component of the signal used for decoding (EEG) has to be in the same frequency range as the signal to be decoded (velocity profiles). Secondly, the use of a correlation to evaluate the fitting of two trajectories could lead to overly-optimistic results as this metric is invariant to scale. Also, the correlation has a non-linear nature that leads to higher values for sinus/cosinus-like signals at low frequencies. Analysis of these properties on the reconstruction results was carried out through an experiment performed in line with previous studies, where healthy participants executed predefined reaching movements of the hand in 3D space. While the correlations of limb velocity profiles reconstructed from low-frequency EEG were comparable to studies in this domain, a systematic statistical analysis revealed that these results were not above the chance level. The empirical chance level was estimated using random assignments of recorded velocity profiles and EEG signals, as well as combinations of randomly generated synthetic EEG with recorded velocity profiles and recorded EEG with randomly generated synthetic velocity profiles. The analysis shows that the positive correlation results in this experiment cannot be used as an indicator of successful trajectory reconstruction based on a neural correlate. Several directions are herein discussed to address the misinterpretation of results as well as the implications on previous

  10. Predictors of psychiatric disorders in liver transplantation candidates: logistic regression models.

    PubMed

    Rocca, Paola; Cocuzza, Elena; Rasetti, Roberta; Rocca, Giuseppe; Zanalda, Enrico; Bogetto, Filippo

    2003-07-01

    This study has two goals. The first goal is to assess the prevalence of psychiatric disorders in orthotopic liver transplantation (OLT) candidates by means of standardized procedures because there has been little research concerning psychiatric problems of potential OLT candidates using standardized instruments. The second goal focuses on identifying predictors of these psychiatric disorders. One hundred sixty-five elective OLT candidates were assessed by our unit. Psychiatric diagnoses were based on the Structured Clinical Interview for the Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition. Patients also were assessed using the Hamilton Depression Rating Scale (HDRS) and the Spielberger Anxiety Index, State and Trait forms (STAI-X1 and STAI-X2). Severity of cirrhosis was assessed by applying Child-Pugh score criteria. Chi-squared and general linear model analysis of variance were used to test the univariate association between patient characteristics and both clinical psychiatric diagnoses and severity of psychiatric diseases. Variables with P less than.10 in univariate analyses were included in multiple regression models. Forty-three percent of patients presented at least one psychiatric diagnosis. Child-Pugh score and previous psychiatric diagnoses were independent significant predictors of depressive disorders. Severity of psychiatric symptoms measured by psychometric scales (HDRS, STAI-X1, and STAI-X2) was associated with Child-Pugh score in the multiple regression model. Our data suggest a high rate of psychiatric disorders, particularly adjustment disorders, in our sample of OLT candidates. Severity of liver disease emerges as the most important variable in predicting severity of psychiatric disorders in these patients.

  11. Modelling air pollution for epidemiologic research--part II: predicting temporal variation through land use regression.

    PubMed

    Mölter, A; Lindley, S; de Vocht, F; Simpson, A; Agius, R

    2010-12-01

    Over recent years land use regression (LUR) has become a frequently used method in air pollution exposure studies, as it can model intra-urban variation in pollutant concentrations at a fine spatial scale. However, very few studies have used the LUR methodology to also model the temporal variation in air pollution exposure. The aim of this study is to estimate annual mean NO(2) and PM(10) concentrations from 1996 to 2008 for Greater Manchester using land use regression models. The results from these models will be used in the Manchester Asthma and Allergy Study (MAAS) birth cohort to determine health effects of air pollution exposure. The Greater Manchester LUR model for 2005 was recalibrated using interpolated and adjusted NO(2) and PM(10) concentrations as dependent variables for 1996-2008. In addition, temporally resolved variables were available for traffic intensity and PM(10) emissions. To validate the resulting LUR models, they were applied to the locations of automatic monitoring stations and the estimated concentrations were compared against measured concentrations. The 2005 LUR models were successfully recalibrated, providing individual models for each year from 1996 to 2008. When applied to the monitoring stations the mean prediction error (MPE) for NO(2) concentrations for all stations and years was -0.8μg/m³ and the root mean squared error (RMSE) was 6.7μg/m³. For PM(10) concentrations the MPE was 0.8μg/m³ and the RMSE was 3.4μg/m³. These results indicate that it is possible to model temporal variation in air pollution through LUR with relatively small prediction errors. It is likely that most previous LUR studies did not include temporal variation, because they were based on short term monitoring campaigns and did not have historic pollution data. The advantage of this study is that it uses data from an air dispersion model, which provided concentrations for 2005 and 2010, and therefore allowed extrapolation over a longer time period.

  12. [Clinical research XX. From clinical judgment to multiple logistic regression model].

    PubMed

    Berea-Baltierra, Ricardo; Rivas-Ruiz, Rodolfo; Pérez-Rodríguez, Marcela; Palacios-Cruz, Lino; Moreno, Jorge; Talavera, Juan O

    2014-01-01

    The complexity of the causality phenomenon in clinical practice implies that the result of a maneuver is not solely caused by the maneuver, but by the interaction among the maneuver and other baseline factors or variables occurring during the maneuver. This requires methodological designs that allow the evaluation of these variables. When the outcome is a binary variable, we use the multiple logistic regression model (MLRM). This multivariate model is useful when we want to predict or explain, adjusting due to the effect of several risk factors, the effect of a maneuver or exposition over the outcome. In order to perform an MLRM, the outcome or dependent variable must be a binary variable and both categories must mutually exclude each other (i.e. live/death, healthy/ill); on the other hand, independent variables or risk factors may be either qualitative or quantitative. The effect measure obtained from this model is the odds ratio (OR) with 95 % confidence intervals (CI), from which we can estimate the proportion of the outcome's variability explained through the risk factors. For these reasons, the MLRM is used in clinical research, since one of the main objectives in clinical practice comprises the ability to predict or explain an event where different risk or prognostic factors are taken into account.

  13. Regression-Based Norms for a Bi-factor Model for Scoring the Brief Test of Adult Cognition by Telephone (BTACT)

    PubMed Central

    Gurnani, Ashita S.; John, Samantha E.; Gavett, Brandon E.

    2015-01-01

    The current study developed regression-based normative adjustments for a bi-factor model of the The Brief Test of Adult Cognition by Telephone (BTACT). Archival data from the Midlife Development in the United States-II Cognitive Project were used to develop eight separate linear regression models that predicted bi-factor BTACT scores, accounting for age, education, gender, and occupation-alone and in various combinations. All regression models provided statistically significant fit to the data. A three-predictor regression model fit best and accounted for 32.8% of the variance in the global bi-factor BTACT score. The fit of the regression models was not improved by gender. Eight different regression models are presented to allow the user flexibility in applying demographic corrections to the bi-factor BTACT scores. Occupation corrections, while not widely used, may provide useful demographic adjustments for adult populations or for those individuals who have attained an occupational status not commensurate with expected educational attainment. PMID:25724515

  14. Improving the global applicability of the RUSLE model - adjustment of the topographical and rainfall erosivity factors

    NASA Astrophysics Data System (ADS)

    Naipal, V.; Reick, C.; Pongratz, J.; Van Oost, K.

    2015-09-01

    Large uncertainties exist in estimated rates and the extent of soil erosion by surface runoff on a global scale. This limits our understanding of the global impact that soil erosion might have on agriculture and climate. The Revised Universal Soil Loss Equation (RUSLE) model is, due to its simple structure and empirical basis, a frequently used tool in estimating average annual soil erosion rates at regional to global scales. However, large spatial-scale applications often rely on coarse data input, which is not compatible with the local scale on which the model is parameterized. Our study aims at providing the first steps in improving the global applicability of the RUSLE model in order to derive more accurate global soil erosion rates. We adjusted the topographical and rainfall erosivity factors of the RUSLE model and compared the resulting erosion rates to extensive empirical databases from the USA and Europe. By scaling the slope according to the fractal method to adjust the topographical factor, we managed to improve the topographical detail in a coarse resolution global digital elevation model. Applying the linear multiple regression method to adjust rainfall erosivity for various climate zones resulted in values that compared well to high resolution erosivity data for different regions. However, this method needs to be extended to tropical climates, for which erosivity is biased due to the lack of high resolution erosivity data. After applying the adjusted and the unadjusted versions of the RUSLE model on a global scale we find that the adjusted version shows a global higher mean erosion rate and more variability in the erosion rates. Comparison to empirical data sets of the USA and Europe shows that the adjusted RUSLE model is able to decrease the very high erosion rates in hilly regions that are observed in the unadjusted RUSLE model results. Although there are still some regional differences with the empirical databases, the results indicate that the

  15. Applying land use regression model to estimate spatial variation of PM₂.₅ in Beijing, China.

    PubMed

    Wu, Jiansheng; Li, Jiacheng; Peng, Jian; Li, Weifeng; Xu, Guang; Dong, Chengcheng

    2015-05-01

    Fine particulate matter (PM2.5) is the major air pollutant in Beijing, posing serious threats to human health. Land use regression (LUR) has been widely used in predicting spatiotemporal variation of ambient air-pollutant concentrations, though restricted to the European and North American context. We aimed to estimate spatiotemporal variations of PM2.5 by building separate LUR models in Beijing. Hourly routine PM2.5 measurements were collected at 35 sites from 4th March 2013 to 5th March 2014. Seventy-seven predictor variables were generated in GIS, including street network, land cover, population density, catering services distribution, bus stop density, intersection density, and others. Eight LUR models were developed on annual, seasonal, peak/non-peak, and incremental concentration subsets. The annual mean concentration across all sites is 90.7 μg/m(3) (SD = 13.7). PM2.5 shows more temporal variation than spatial variation, indicating the necessity of building different models to capture spatiotemporal trends. The adjusted R (2) of these models range between 0.43 and 0.65. Most LUR models are driven by significant predictors including major road length, vegetation, and water land use. Annual outdoor exposure in Beijing is as high as 96.5 μg/m(3). This is among the first LUR studies implemented in a seriously air-polluted Chinese context, which generally produce acceptable results and reliable spatial air-pollution maps. Apart from the models for winter and incremental concentration, LUR models are driven by similar variables, suggesting that the spatial variations of PM2.5 remain steady for most of the time. Temporal variations are explained by the intercepts, and spatial variations in the measurements determine the strength of variable coefficients in our models.

  16. Parental Vaccine Acceptance: A Logistic Regression Model Using Previsit Decisions.

    PubMed

    Lee, Sara; Riley-Behringer, Maureen; Rose, Jeanmarie C; Meropol, Sharon B; Lazebnik, Rina

    2016-10-26

    This study explores how parents' intentions regarding vaccination prior to their children's visit were associated with actual vaccine acceptance. A convenience sample of parents accompanying 6-week-old to 17-year-old children completed a written survey at 2 pediatric practices. Using hierarchical logistic regression, for hospital-based participants (n = 216), vaccine refusal history (P < .01) and vaccine decision made before the visit (P < .05) explained 87% of vaccine refusals. In community-based participants (n = 100), vaccine refusal history (P < .01) explained 81% of refusals. Over 1 in 5 parents changed their minds about vaccination during the visit. Thirty parents who were previous vaccine refusers accepted current vaccines, and 37 who had intended not to vaccinate choose vaccination. Twenty-nine parents without a refusal history declined vaccines, and 32 who did not intend to refuse before the visit declined vaccination. Future research should identify key factors to nudge parent decision making in favor of vaccination.

  17. Beta Regression Finite Mixture Models of Polarization and Priming

    ERIC Educational Resources Information Center

    Smithson, Michael; Merkle, Edgar C.; Verkuilen, Jay

    2011-01-01

    This paper describes the application of finite-mixture general linear models based on the beta distribution to modeling response styles, polarization, anchoring, and priming effects in probability judgments. These models, in turn, enhance our capacity for explicitly testing models and theories regarding the aforementioned phenomena. The mixture…

  18. Development and Evaluation of Land-Use Regression Models Using Modeled Air Quality Concentrations

    EPA Science Inventory

    Abstract Land-use regression (LUR) models have emerged as a preferred methodology for estimating individual exposure to ambient air pollution in epidemiologic studies in absence of subject-specific measurements. Although there is a growing literature focused on LUR evaluation, fu...

  19. An epidemiological survey on road traffic crashes in Iran: application of the two logistic regression models.

    PubMed

    Bakhtiyari, Mahmood; Mehmandar, Mohammad Reza; Mirbagheri, Babak; Hariri, Gholam Reza; Delpisheh, Ali; Soori, Hamid

    2014-01-01

    Risk factors of human-related traffic crashes are the most important and preventable challenges for community health due to their noteworthy burden in developing countries in particular. The present study aims to investigate the role of human risk factors of road traffic crashes in Iran. Through a cross-sectional study using the COM 114 data collection forms, the police records of almost 600,000 crashes occurred in 2010 are investigated. The binary logistic regression and proportional odds regression models are used. The odds ratio for each risk factor is calculated. These models are adjusted for known confounding factors including age, sex and driving time. The traffic crash reports of 537,688 men (90.8%) and 54,480 women (9.2%) are analysed. The mean age is 34.1 ± 14 years. Not maintaining eyes on the road (53.7%) and losing control of the vehicle (21.4%) are the main causes of drivers' deaths in traffic crashes within cities. Not maintaining eyes on the road is also the most frequent human risk factor for road traffic crashes out of cities. Sudden lane excursion (OR = 9.9, 95% CI: 8.2-11.9) and seat belt non-compliance (OR = 8.7, CI: 6.7-10.1), exceeding authorised speed (OR = 17.9, CI: 12.7-25.1) and exceeding safe speed (OR = 9.7, CI: 7.2-13.2) are the most significant human risk factors for traffic crashes in Iran. The high mortality rate of 39 people for every 100,000 population emphasises on the importance of traffic crashes in Iran. Considering the important role of human risk factors in traffic crashes, struggling efforts are required to control dangerous driving behaviours such as exceeding speed, illegal overtaking and not maintaining eyes on the road.

  20. Regression mixture models: Does modeling the covariance between independent variables and latent classes improve the results?

    PubMed Central

    Lamont, Andrea E.; Vermunt, Jeroen K.; Van Horn, M. Lee

    2016-01-01

    Regression mixture models are increasingly used as an exploratory approach to identify heterogeneity in the effects of a predictor on an outcome. In this simulation study, we test the effects of violating an implicit assumption often made in these models – i.e., independent variables in the model are not directly related to latent classes. Results indicated that the major risk of failing to model the relationship between predictor and latent class was an increase in the probability of selecting additional latent classes and biased class proportions. Additionally, this study tests whether regression mixture models can detect a piecewise relationship between a predictor and outcome. Results suggest that these models are able to detect piecewise relations, but only when the relationship between the latent class and the predictor is included in model estimation. We illustrate the implications of making this assumption through a re-analysis of applied data examining heterogeneity in the effects of family resources on academic achievement. We compare previous results (which assumed no relation between independent variables and latent class) to the model where this assumption is lifted. Implications and analytic suggestions for conducting regression mixture based on these findings are noted. PMID:26881956

  1. Regression Mixture Models: Does Modeling the Covariance Between Independent Variables and Latent Classes Improve the Results?

    PubMed

    Lamont, Andrea E; Vermunt, Jeroen K; Van Horn, M Lee

    2016-01-01

    Regression mixture models are increasingly used as an exploratory approach to identify heterogeneity in the effects of a predictor on an outcome. In this simulation study, we tested the effects of violating an implicit assumption often made in these models; that is, independent variables in the model are not directly related to latent classes. Results indicate that the major risk of failing to model the relationship between predictor and latent class was an increase in the probability of selecting additional latent classes and biased class proportions. In addition, we tested whether regression mixture models can detect a piecewise relationship between a predictor and outcome. Results suggest that these models are able to detect piecewise relations but only when the relationship between the latent class and the predictor is included in model estimation. We illustrate the implications of making this assumption through a reanalysis of applied data examining heterogeneity in the effects of family resources on academic achievement. We compare previous results (which assumed no relation between independent variables and latent class) to the model where this assumption is lifted. Implications and analytic suggestions for conducting regression mixture based on these findings are noted.

  2. A Multiple Linear Regression Model For Estimation of Flood Peaks In Baden-wuerttemberg/germany

    NASA Astrophysics Data System (ADS)

    Casper, M.; Krieger, S.; Ihringer, J.

    In water resources planning good estimations of flood peaks are necessary for con- struction planning, for the estimation of the existing risk potential and for the valida- tion of rainfall-runoff models. Generally these indexes are only available through statistical analysis for gauged sites. Furthermore the reliability of the underlying time series can often not be proven be- cause they are too short or of bad quality. Therefore a spatial adjustment of all gauge indexes was conducted before a linear multiple regression model was applied. It now enable us to estimate flood peaks for almost any ungauged site of the study area. The model bases on 8 parameters describing the catchment properties. 7 parameters can be derived directly from digital data including a digital elevation model (catch- ment size, maximum flowlength, center flowlength, weighted slope, annual rainfall, portion of urban resp. forested area). The last parameter is an empirical landscape fac- tor, which allows to consider the regional differences in flood generation. The spatial distribution of this factor has been linked in a first approach to the hydro-geological map of Baden-Wuerttemberg. The overall performance of the model is very good. But for some areas, the determination of the landscape factor is difficult. Further investigations indicated that a more process based approach allows to im- prove the fit of this landscape factor and also the quality of the regionalisation model. By integrating detailed soil information (which is available area wide) some hydro- geological classes could be subdivided in subclasses. By replacing the parameter "weighted slope" by a parameter which better describes the driving forces of flood generation, the model performance could be improved significantly.

  3. Modelling long-term fire occurrence factors in Spain by accounting for local variations with geographically weighted regression

    NASA Astrophysics Data System (ADS)

    Martínez-Fernández, J.; Chuvieco, E.; Koutsias, N.

    2013-02-01

    Humans are responsible for most forest fires in Europe, but anthropogenic factors behind these events are still poorly understood. We tried to identify the driving factors of human-caused fire occurrence in Spain by applying two different statistical approaches. Firstly, assuming stationary processes for the whole country, we created models based on multiple linear regression and binary logistic regression to find factors associated with fire density and fire presence, respectively. Secondly, we used geographically weighted regression (GWR) to better understand and explore the local and regional variations of those factors behind human-caused fire occurrence. The number of human-caused fires occurring within a 25-yr period (1983-2007) was computed for each of the 7638 Spanish mainland municipalities, creating a binary variable (fire/no fire) to develop logistic models, and a continuous variable (fire density) to build standard linear regression models. A total of 383 657 fires were registered in the study dataset. The binary logistic model, which estimates the probability of having/not having a fire, successfully classified 76.4% of the total observations, while the ordinary least squares (OLS) regression model explained 53% of the variation of the fire density patterns (adjusted R2 = 0.53). Both approaches confirmed, in addition to forest and climatic variables, the importance of variables related with agrarian activities, land abandonment, rural population exodus and developmental processes as underlying factors of fire occurrence. For the GWR approach, the explanatory power of the GW linear model for fire density using an adaptive bandwidth increased from 53% to 67%, while for the GW logistic model the correctly classified observations improved only slightly, from 76.4% to 78.4%, but significantly according to the corrected Akaike Information Criterion (AICc), from 3451.19 to 3321.19. The results from GWR indicated a significant spatial variation in the local

  4. Direct modeling of regression effects for transition probabilities in the progressive illness-death model.

    PubMed

    Azarang, Leyla; Scheike, Thomas; de Uña-Álvarez, Jacobo

    2017-02-26

    In this work, we present direct regression analysis for the transition probabilities in the possibly non-Markov progressive illness-death model. The method is based on binomial regression, where the response is the indicator of the occupancy for the given state along time. Randomly weighted score equations that are able to remove the bias due to censoring are introduced. By solving these equations, one can estimate the possibly time-varying regression coefficients, which have an immediate interpretation as covariate effects on the transition probabilities. The performance of the proposed estimator is investigated through simulations. We apply the method to data from the Registry of Systematic Lupus Erythematosus RELESSER, a multicenter registry created by the Spanish Society of Rheumatology. Specifically, we investigate the effect of age at Lupus diagnosis, sex, and ethnicity on the probability of damage and death along time. Copyright © 2017 John Wiley & Sons, Ltd.

  5. Artificial neural network and regression models for flow velocity at sediment incipient deposition

    NASA Astrophysics Data System (ADS)

    Safari, Mir-Jafar-Sadegh; Aksoy, Hafzullah; Mohammadi, Mirali

    2016-10-01

    A set of experiments for the determination of flow characteristics at sediment incipient deposition has been carried out in a trapezoidal cross-section channel. Using experimental data, a regression model is developed for computing velocity of flow in a trapezoidal cross-section channel at the incipient deposition condition and is presented together with already available regression models of rectangular, circular, and U-shape channels. A generalized regression model is also provided by combining the available data of any cross-section. For comparison of the models, a powerful tool, the artificial neural network (ANN) is used for modelling incipient deposition of sediment in rigid boundary channels. Three different ANN techniques, namely, the feed-forward back propagation (FFBP), generalized regression (GR), and radial basis function (RBF), are applied using six input variables; flow discharge, flow depth, channel bed slope, hydraulic radius, relative specific mass of sediment and median size of sediment particles; all taken from laboratory experiments. Hydrodynamic forces acting on sediment particles in the flow are considered in the regression models indirectly for deriving particle Froude number and relative particle size, both being dimensionless. The accuracy of the models is studied by the root mean square error (RMSE), the mean absolute percentage error (MAPE), the discrepancy ratio (Dr) and the concordance coefficient (CC). Evaluation of the models finds ANN models superior and some regression models with an acceptable performance. Therefore, it is concluded that appropriately constructed ANN and regression models can be developed and used for the rigid boundary channel design.

  6. Two levels ARIMAX and regression models for forecasting time series data with calendar variation effects

    NASA Astrophysics Data System (ADS)

    Suhartono, Lee, Muhammad Hisyam; Prastyo, Dedy Dwi

    2015-12-01

    The aim of this research is to develop a calendar variation model for forecasting retail sales data with the Eid ul-Fitr effect. The proposed model is based on two methods, namely two levels ARIMAX and regression methods. Two levels ARIMAX and regression models are built by using ARIMAX for the first level and regression for the second level. Monthly men's jeans and women's trousers sales in a retail company for the period January 2002 to September 2009 are used as case study. In general, two levels of calendar variation model yields two models, namely the first model to reconstruct the sales pattern that already occurred, and the second model to forecast the effect of increasing sales due to Eid ul-Fitr that affected sales at the same and the previous months. The results show that the proposed two level calendar variation model based on ARIMAX and regression methods yields better forecast compared to the seasonal ARIMA model and Neural Networks.

  7. Daily diaries and minority adolescents: random coefficient regression modeling of attributional style, coping, and affect.

    PubMed

    Roesch, Scott C; Vaughn, Allison A; Aldridge, Arianna A; Villodas, Feion

    2009-10-01

    Many researchers underscore the importance of coping in the daily lives of adolescents, yet very few studies measure this and related constructs at this level. Using a daily diary approach to stress and coping, the current study evaluated a series of mediational coping models in a sample of low-income minority adolescents (N = 89). Specifically, coping was hypothesized to mediate the relationship between attributional style (and dimensions) and daily affect. Using random coefficient regression modeling, the relationship between (a) the locus of causality dimension and positive affect was completely mediated by the use of acceptance and humor as coping strategies; (b) the stability dimension and positive affect was completely mediated by the use of both problem-solving and positive thinking; and (c) the stability dimension and negative affect was partially mediated by the use of religious coping. In addition, the locus of causality and stability (but not globality) dimensions were also directly related to affect. However, the relationship between pessimistic explanatory style and affect was not mediated by coping. Consistent with previous research, these findings suggest that attributions are both directly and indirectly related to indices of affect or adjustment. Thus, attributions may not only influence the type of coping strategy employed, but may also serve as coping strategies themselves.

  8. Effect of air pollution on lung cancer: A poisson regression model based on vital statistics

    SciTech Connect

    Tango, Toshiro

    1994-11-01

    This article describes a Poisson regression model for time trends of mortality to detect the long-term effects of common levels of air pollution on lung cancer, in which the adjustment for cigarette smoking is not always necessary. The main hypothesis to be tested in the model is that if the long-term and common-level air pollution had an effect on lung cancer, the death rate from lung cancer could be expected to increase gradually at a higher rate in the region with relatively high levels of air pollution than in the region with low levels, and that this trend would not be expected for other control diseases in which cigarette smoking is a risk factor. Using this approach, we analyzed the trend of mortality in females aged 40 to 79, from lung cancer and two control diseases, ischemic heart disease and cerebrovascular disease, based on vital statistics in 23 wards of the Tokyo metropolitan area for 1972 to 1988. Ward-specific mean levels per day of SO{sub 2} and NO{sub 2} from 1974 through 1976 estimated by Makino (1978) were used as the ward-specific exposure measure of air pollution. No data on tobacco consumption in each ward is available. Our analysis supported the existence of long-term effects of air pollution on lung cancer. 14 refs., 5 figs., 2 tabs.

  9. A Spectral Graph Regression Model for Learning Brain Connectivity of Alzheimer’s Disease

    PubMed Central

    Hu, Chenhui; Cheng, Lin; Sepulcre, Jorge; Johnson, Keith A.; Fakhri, Georges E.; Lu, Yue M.; Li, Quanzheng

    2015-01-01

    Understanding network features of brain pathology is essential to reveal underpinnings of neurodegenerative diseases. In this paper, we introduce a novel graph regression model (GRM) for learning structural brain connectivity of Alzheimer's disease (AD) measured by amyloid-β deposits. The proposed GRM regards 11C-labeled Pittsburgh Compound-B (PiB) positron emission tomography (PET) imaging data as smooth signals defined on an unknown graph. This graph is then estimated through an optimization framework, which fits the graph to the data with an adjustable level of uniformity of the connection weights. Under the assumed data model, results based on simulated data illustrate that our approach can accurately reconstruct the underlying network, often with better reconstruction than those obtained by both sample correlation and ℓ1-regularized partial correlation estimation. Evaluations performed upon PiB-PET imaging data of 30 AD and 40 elderly normal control (NC) subjects demonstrate that the connectivity patterns revealed by the GRM are easy to interpret and consistent with known pathology. Moreover, the hubs of the reconstructed networks match the cortical hubs given by functional MRI. The discriminative network features including both global connectivity measurements and degree statistics of specific nodes discovered from the AD and NC amyloid-beta networks provide new potential biomarkers for preclinical and clinical AD. PMID:26024224

  10. Predicting Antitumor Activity of Peptides by Consensus of Regression Models Trained on a Small Data Sample

    PubMed Central

    Radman, Andreja; Gredičak, Matija; Kopriva, Ivica; Jerić, Ivanka

    2011-01-01

    Predicting antitumor activity of compounds using regression models trained on a small number of compounds with measured biological activity is an ill-posed inverse problem. Yet, it occurs very often within the academic community. To counteract, up to some extent, overfitting problems caused by a small training data, we propose to use consensus of six regression models for prediction of biological activity of virtual library of compounds. The QSAR descriptors of 22 compounds related to the opioid growth factor (OGF, Tyr-Gly-Gly-Phe-Met) with known antitumor activity were used to train regression models: the feed-forward artificial neural network, the k-nearest neighbor, sparseness constrained linear regression, the linear and nonlinear (with polynomial and Gaussian kernel) support vector machine. Regression models were applied on a virtual library of 429 compounds that resulted in six lists with candidate compounds ranked by predicted antitumor activity. The highly ranked candidate compounds were synthesized, characterized and tested for an antiproliferative activity. Some of prepared peptides showed more pronounced activity compared with the native OGF; however, they were less active than highly ranked compounds selected previously by the radial basis function support vector machine (RBF SVM) regression model. The ill-posedness of the related inverse problem causes unstable behavior of trained regression models on test data. These results point to high complexity of prediction based on the regression models trained on a small data sample. PMID:22272081

  11. Model Averaging Methods for Weight Trimming in Generalized Linear Regression Models.

    PubMed

    Elliott, Michael R

    2009-03-01

    In sample surveys where units have unequal probabilities of inclusion, associations between the inclusion probability and the statistic of interest can induce bias in unweighted estimates. This is true even in regression models, where the estimates of the population slope may be biased if the underlying mean model is misspecified or the sampling is nonignorable. Weights equal to the inverse of the probability of inclusion are often used to counteract this bias. Highly disproportional sample designs have highly variable weights; weight trimming reduces large weights to a maximum value, reducing variability but introducing bias. Most standard approaches are ad hoc in that they do not use the data to optimize bias-variance trade-offs. This article uses Bayesian model averaging to create "data driven" weight trimming estimators. We extend previous results for linear regression models (Elliott 2008) to generalized linear regression models, developing robust models that approximate fully-weighted estimators when bias correction is of greatest importance, and approximate unweighted estimators when variance reduction is critical.

  12. Tests for Regression Parameters in Power Transformation Models.

    DTIC Science & Technology

    1980-01-01

    of estimating the correct %.JI.J scale and then performing the usual linear model F-test in this estimated Ascale. We explore situations in which this...transformation model. In this model, a simple test consists of estimating the correct scale and t ihv. performin g the usutal l iiear model F-test in ’ this...X (yi,y ) will be the least squares estimaites in the estimated scale X and -(yiY2) will be the least squares estimates calculated in the true but

  13. Measures of Residual Risk with Connections to Regression, Risk Tracking, Surrogate Models, and Ambiguity

    DTIC Science & Technology

    2015-04-15

    tracking might fail if the conditional statistic is not captured by the family of regression functions under consideration and even other times too as...shown in [21]. The next result deals with a standard model in regression analysis, under which statistic tracking is achieved for regular error...previous subsection we established conditions under which generalized regression using a specific measure of error tracks the corresponding statistic

  14. Incremental logistic regression for customizing automatic diagnostic models.

    PubMed

    Tortajada, Salvador; Robles, Montserrat; García-Gómez, Juan Miguel

    2015-01-01

    In the last decades, and following the new trends in medicine, statistical learning techniques have been used for developing automatic diagnostic models for aiding the clinical experts throughout the use of Clinical Decision Support Systems. The development of these models requires a large, representative amount of data, which is commonly obtained from one hospital or a group of hospitals after an expensive and time-consuming gathering, preprocess, and validation of cases. After the model development, it has to overcome an external validation that is often carried out in a different hospital or health center. The experience is that the models show underperformed expectations. Furthermore, patient data needs ethical approval and patient consent to send and store data. For these reasons, we introduce an incremental learning algorithm base on the Bayesian inference approach that may allow us to build an initial model with a smaller number of cases and update it incrementally when new data are collected or even perform a new calibration of a model from a different center by using a reduced number of cases. The performance of our algorithm is demonstrated by employing different benchmark datasets and a real brain tumor dataset; and we compare its performance to a previous incremental algorithm and a non-incremental Bayesian model, showing that the algorithm is independent of the data model, iterative, and has a good convergence.

  15. A Noncentral "t" Regression Model for Meta-Analysis

    ERIC Educational Resources Information Center

    Camilli, Gregory; de la Torre, Jimmy; Chiu, Chia-Yi

    2010-01-01

    In this article, three multilevel models for meta-analysis are examined. Hedges and Olkin suggested that effect sizes follow a noncentral "t" distribution and proposed several approximate methods. Raudenbush and Bryk further refined this model; however, this procedure is based on a normal approximation. In the current research literature, this…

  16. A Negative Binomial Regression Model for Accuracy Tests

    ERIC Educational Resources Information Center

    Hung, Lai-Fa

    2012-01-01

    Rasch used a Poisson model to analyze errors and speed in reading tests. An important property of the Poisson distribution is that the mean and variance are equal. However, in social science research, it is very common for the variance to be greater than the mean (i.e., the data are overdispersed). This study embeds the Rasch model within an…

  17. Comparative analysis of regression and artificial neural network models for wind speed prediction

    NASA Astrophysics Data System (ADS)

    Bilgili, Mehmet; Sahin, Besir

    2010-11-01

    In this study, wind speed was modeled by linear regression (LR), nonlinear regression (NLR) and artificial neural network (ANN) methods. A three-layer feedforward artificial neural network structure was constructed and a backpropagation algorithm was used for the training of ANNs. To get a successful simulation, firstly, the correlation coefficients between all of the meteorological variables (wind speed, ambient temperature, atmospheric pressure, relative humidity and rainfall) were calculated taking two variables in turn for each calculation. All independent variables were added to the simple regression model. Then, the method of stepwise multiple regression was applied for the selection of the “best” regression equation (model). Thus, the best independent variables were selected for the LR and NLR models and also used in the input layer of the ANN. The results obtained by all methods were compared to each other. Finally, the ANN method was found to provide better performance than the LR and NLR methods.

  18. Using multiple linear regression model to estimate thunderstorm activity

    NASA Astrophysics Data System (ADS)

    Suparta, W.; Putro, W. S.

    2017-03-01

    This paper is aimed to develop a numerical model with the use of a nonlinear model to estimate the thunderstorm activity. Meteorological data such as Pressure (P), Temperature (T), Relative Humidity (H), cloud (C), Precipitable Water Vapor (PWV), and precipitation on a daily basis were used in the proposed method. The model was constructed with six configurations of input and one target output. The output tested in this work is the thunderstorm event when one-year data is used. Results showed that the model works well in estimating thunderstorm activities with the maximum epoch reaching 1000 iterations and the percent error was found below 50%. The model also found that the thunderstorm activities in May and October are detected higher than the other months due to the inter-monsoon season.

  19. Weighted hurdle regression method for joint modeling of cardiovascular events likelihood and rate in the US dialysis population.

    PubMed

    Sentürk, Damla; Dalrymple, Lorien S; Mu, Yi; Nguyen, Danh V

    2014-11-10

    We propose a new weighted hurdle regression method for modeling count data, with particular interest in modeling cardiovascular events in patients on dialysis. Cardiovascular disease remains one of the leading causes of hospitalization and death in this population. Our aim is to jointly model the relationship/association between covariates and (i) the probability of cardiovascular events, a binary process, and (ii) the rate of events once the realization is positive-when the 'hurdle' is crossed-using a zero-truncated Poisson distribution. When the observation period or follow-up time, from the start of dialysis, varies among individuals, the estimated probability of positive cardiovascular events during the study period will be biased. Furthermore, when the model contains covariates, then the estimated relationship between the covariates and the probability of cardiovascular events will also be biased. These challenges are addressed with the proposed weighted hurdle regression method. Estimation for the weighted hurdle regression model is a weighted likelihood approach, where standard maximum likelihood estimation can be utilized. The method is illustrated with data from the United States Renal Data System. Simulation studies show the ability of proposed method to successfully adjust for differential follow-up times and incorporate the effects of covariates in the weighting.

  20. A regression model for calculating the boiling point isobars of tetrachloromethane-based binary solutions

    NASA Astrophysics Data System (ADS)

    Preobrazhenskii, M. P.; Rudakov, O. B.

    2016-01-01

    A regression model for calculating the boiling point isobars of tetrachloromethane-organic solvent binary homogeneous systems is proposed. The parameters of the model proposed were calculated for a series of solutions. The correlation between the nonadditivity parameter of the regression model and the hydrophobicity criterion of the organic solvent is established. The parameter value of the proposed model is shown to allow prediction of the potential formation of azeotropic mixtures of solvents with tetrachloromethane.

  1. A regressive storm model for extreme space weather

    NASA Astrophysics Data System (ADS)

    Terkildsen, Michael; Steward, Graham; Neudegg, Dave; Marshall, Richard

    2012-07-01

    Extreme space weather events, while rare, pose significant risk to society in the form of impacts on critical infrastructure such as power grids, and the disruption of high end technological systems such as satellites and precision navigation and timing systems. There has been an increased focus on modelling the effects of extreme space weather, as well as improving the ability of space weather forecast centres to identify, with sufficient lead time, solar activity with the potential to produce extreme events. This paper describes the development of a data-based model for predicting the occurrence of extreme space weather events from solar observation. The motivation for this work was to develop a tool to assist space weather forecasters in early identification of solar activity conditions with the potential to produce extreme space weather, and with sufficient lead time to notify relevant customer groups. Data-based modelling techniques were used to construct the model, and an extensive archive of solar observation data used to train, optimise and test the model. The optimisation of the base model aimed to eliminate false negatives (missed events) at the expense of a tolerable increase in false positives, under the assumption of an iterative improvement in forecast accuracy during progression of the solar disturbance, as subsequent data becomes available.

  2. A Connection between Item/Subtest Regression and the Rasch Model. Research Report 89-1.

    ERIC Educational Resources Information Center

    Engelen, Ronald J. H.; Jannarone, Robert J.

    The purpose of this paper is to link empirical Bayes methods with two specific topics in item response theory--item/subtest regression, and testing the goodness of fit of the Rasch model--under the assumptions of local independence and sufficiency. It is shown that item/subtest regression results in empirical Bayes estimates only if the Rasch…

  3. Mechanisms of Developmental Regression in Autism and the Broader Phenotype: A Neural Network Modeling Approach

    ERIC Educational Resources Information Center

    Thomas, Michael S. C.; Knowland, Victoria C. P.; Karmiloff-Smith, Annette

    2011-01-01

    Loss of previously established behaviors in early childhood constitutes a markedly atypical developmental trajectory. It is found almost uniquely in autism and its cause is currently unknown (Baird et al., 2008). We present an artificial neural network model of developmental regression, exploring the hypothesis that regression is caused by…

  4. Analyzing Multilevel Data: Comparing Findings from Hierarchical Linear Modeling and Ordinary Least Squares Regression

    ERIC Educational Resources Information Center

    Rocconi, Louis M.

    2013-01-01

    This study examined the differing conclusions one may come to depending upon the type of analysis chosen, hierarchical linear modeling or ordinary least squares (OLS) regression. To illustrate this point, this study examined the influences of seniors' self-reported critical thinking abilities three ways: (1) an OLS regression with the student…

  5. Linear Multivariable Regression Models for Prediction of Eddy Dissipation Rate from Available Meteorological Data

    NASA Technical Reports Server (NTRS)

    MCKissick, Burnell T. (Technical Monitor); Plassman, Gerald E.; Mall, Gerald H.; Quagliano, John R.

    2005-01-01

    Linear multivariable regression models for predicting day and night Eddy Dissipation Rate (EDR) from available meteorological data sources are defined and validated. Model definition is based on a combination of 1997-2000 Dallas/Fort Worth (DFW) data sources, EDR from Aircraft Vortex Spacing System (AVOSS) deployment data, and regression variables primarily from corresponding Automated Surface Observation System (ASOS) data. Model validation is accomplished through EDR predictions on a similar combination of 1994-1995 Memphis (MEM) AVOSS and ASOS data. Model forms include an intercept plus a single term of fixed optimal power for each of these regression variables; 30-minute forward averaged mean and variance of near-surface wind speed and temperature, variance of wind direction, and a discrete cloud cover metric. Distinct day and night models, regressing on EDR and the natural log of EDR respectively, yield best performance and avoid model discontinuity over day/night data boundaries.

  6. Penalized Likelihood for General Semi-Parametric Regression Models.

    DTIC Science & Technology

    1985-05-01

    should be stressed that q, while it may be somewhat less than n, will still be ’large’, and parametric estimation of £ will not be appropriate...Partial spline models for the semi- parametric estimation of functions of several variables, in Statistical Analysis of Time Series, Tokyo: Institute of

  7. Genomic Prediction of Genotype × Environment Interaction Kernel Regression Models.

    PubMed

    Cuevas, Jaime; Crossa, José; Soberanis, Víctor; Pérez-Elizalde, Sergio; Pérez-Rodríguez, Paulino; Campos, Gustavo de Los; Montesinos-López, O A; Burgueño, Juan

    2016-11-01

    In genomic selection (GS), genotype × environment interaction (G × E) can be modeled by a marker × environment interaction (M × E). The G × E may be modeled through a linear kernel or a nonlinear (Gaussian) kernel. In this study, we propose using two nonlinear Gaussian kernels: the reproducing kernel Hilbert space with kernel averaging (RKHS KA) and the Gaussian kernel with the bandwidth estimated through an empirical Bayesian method (RKHS EB). We performed single-environment analyses and extended to account for G × E interaction (GBLUP-G × E, RKHS KA-G × E and RKHS EB-G × E) in wheat ( L.) and maize ( L.) data sets. For single-environment analyses of wheat and maize data sets, RKHS EB and RKHS KA had higher prediction accuracy than GBLUP for all environments. For the wheat data, the RKHS KA-G × E and RKHS EB-G × E models did show up to 60 to 68% superiority over the corresponding single environment for pairs of environments with positive correlations. For the wheat data set, the models with Gaussian kernels had accuracies up to 17% higher than that of GBLUP-G × E. For the maize data set, the prediction accuracy of RKHS EB-G × E and RKHS KA-G × E was, on average, 5 to 6% higher than that of GBLUP-G × E. The superiority of the Gaussian kernel models over the linear kernel is due to more flexible kernels that accounts for small, more complex marker main effects and marker-specific interaction effects.

  8. Adjusted adaptive Lasso for covariate model-building in nonlinear mixed-effect pharmacokinetic models.

    PubMed

    Haem, Elham; Harling, Kajsa; Ayatollahi, Seyyed Mohammad Taghi; Zare, Najaf; Karlsson, Mats O

    2017-02-01

    One important aim in population pharmacokinetics (PK) and pharmacodynamics is identification and quantification of the relationships between the parameters and covariates. Lasso has been suggested as a technique for simultaneous estimation and covariate selection. In linear regression, it has been shown that Lasso possesses no oracle properties, which means it asymptotically performs as though the true underlying model was given in advance. Adaptive Lasso (ALasso) with appropriate initial weights is claimed to possess oracle properties; however, it can lead to poor predictive performance when there is multicollinearity between covariates. This simulation study implemented a new version of ALasso, called adjusted ALasso (AALasso), to take into account the ratio of the standard error of the maximum likelihood (ML) estimator to the ML coefficient as the initial weight in ALasso to deal with multicollinearity in non-linear mixed-effect models. The performance of AALasso was compared with that of ALasso and Lasso. PK data was simulated in four set-ups from a one-compartment bolus input model. Covariates were created by sampling from a multivariate standard normal distribution with no, low (0.2), moderate (0.5) or high (0.7) correlation. The true covariates influenced only clearance at different magnitudes. AALasso, ALasso and Lasso were compared in terms of mean absolute prediction error and error of the estimated covariate coefficient. The results show that AALasso performed better in small data sets, even in those in which a high correlation existed between covariates. This makes AALasso a promising method for covariate selection in nonlinear mixed-effect models.

  9. Comparing Methodologies for Developing an Early Warning System: Classification and Regression Tree Model versus Logistic Regression. REL 2015-077

    ERIC Educational Resources Information Center

    Koon, Sharon; Petscher, Yaacov

    2015-01-01

    The purpose of this report was to explicate the use of logistic regression and classification and regression tree (CART) analysis in the development of early warning systems. It was motivated by state education leaders' interest in maintaining high classification accuracy while simultaneously improving practitioner understanding of the rules by…

  10. Nationwide regression models for predicting urban runoff water quality at unmonitored sites

    USGS Publications Warehouse

    Tasker, Gary D.; Driver, N.E.

    1988-01-01

    Regression models are presented that can be used to estimate mean loads for chemical oxygen demand, suspended solids, dissolved solids, total nitrogen, total ammonia plus nitrogen, total phosphorous, dissolved phosphorous, total copper, total lead, and total zinc at unmonitored sites in urban areas. Explanatory variables include drainage area, imperviousness of drainage basin to infiltration, mean annual rainfall, a land-use indicator variable, and mean minimum January temperature. Model parameters are estimated by a generalized-least-squares regression method that accounts for cross correlation and differences in reliability of sample estimates between sites. The regression models account for 20 to 65 percent of the total variation in observed loads.

  11. Regression Models for Demand Reduction based on Cluster Analysis of Load Profiles

    SciTech Connect

    Yamaguchi, Nobuyuki; Han, Junqiao; Ghatikar, Girish; Piette, Mary Ann; Asano, Hiroshi; Kiliccote, Sila

    2009-06-28

    This paper provides new regression models for demand reduction of Demand Response programs for the purpose of ex ante evaluation of the programs and screening for recruiting customer enrollment into the programs. The proposed regression models employ load sensitivity to outside air temperature and representative load pattern derived from cluster analysis of customer baseline load as explanatory variables. The proposed models examined their performances from the viewpoint of validity of explanatory variables and fitness of regressions, using actual load profile data of Pacific Gas and Electric Company's commercial and industrial customers who participated in the 2008 Critical Peak Pricing program including Manual and Automated Demand Response.

  12. Aboveground biomass and carbon stocks modelling using non-linear regression model

    NASA Astrophysics Data System (ADS)

    Ain Mohd Zaki, Nurul; Abd Latif, Zulkiflee; Nazip Suratman, Mohd; Zainee Zainal, Mohd

    2016-06-01

    Aboveground biomass (AGB) is an important source of uncertainty in the carbon estimation for the tropical forest due to the variation biodiversity of species and the complex structure of tropical rain forest. Nevertheless, the tropical rainforest holds the most extensive forest in the world with the vast diversity of tree with layered canopies. With the usage of optical sensor integrate with empirical models is a common way to assess the AGB. Using the regression, the linkage between remote sensing and a biophysical parameter of the forest may be made. Therefore, this paper exemplifies the accuracy of non-linear regression equation of quadratic function to estimate the AGB and carbon stocks for the tropical lowland Dipterocarp forest of Ayer Hitam forest reserve, Selangor. The main aim of this investigation is to obtain the relationship between biophysical parameter field plots with the remotely-sensed data using nonlinear regression model. The result showed that there is a good relationship between crown projection area (CPA) and carbon stocks (CS) with Pearson Correlation (p < 0.01), the coefficient of correlation (r) is 0.671. The study concluded that the integration of Worldview-3 imagery with the canopy height model (CHM) raster based LiDAR were useful in order to quantify the AGB and carbon stocks for a larger sample area of the lowland Dipterocarp forest.

  13. Regression models tolerant to massively missing data: a case study in solar-radiation nowcasting

    NASA Astrophysics Data System (ADS)

    Žliobaitė, I.; Hollmén, J.; Junninen, H.

    2014-12-01

    Statistical models for environmental monitoring strongly rely on automatic data acquisition systems that use various physical sensors. Often, sensor readings are missing for extended periods of time, while model outputs need to be continuously available in real time. With a case study in solar-radiation nowcasting, we investigate how to deal with massively missing data (around 50% of the time some data are unavailable) in such situations. Our goal is to analyze characteristics of missing data and recommend a strategy for deploying regression models which would be robust to missing data in situations where data are massively missing. We are after one model that performs well at all times, with and without data gaps. Due to the need to provide instantaneous outputs with minimum energy consumption for computing in the data streaming setting, we dismiss computationally demanding data imputation methods and resort to a mean replacement, accompanied with a robust regression model. We use an established strategy for assessing different regression models and for determining how many missing sensor readings can be tolerated before model outputs become obsolete. We experimentally analyze the accuracies and robustness to missing data of seven linear regression models. We recommend using the regularized PCA regression with our established guideline in training regression models, which themselves are robust to missing data.

  14. Estimation of Standard Error of Regression Effects in Latent Regression Models Using Binder's Linearization. Research Report. ETS RR-07-09

    ERIC Educational Resources Information Center

    Li, Deping; Oranje, Andreas

    2007-01-01

    Two versions of a general method for approximating standard error of regression effect estimates within an IRT-based latent regression model are compared. The general method is based on Binder's (1983) approach, accounting for complex samples and finite populations by Taylor series linearization. In contrast, the current National Assessment of…

  15. Hidden Connections between Regression Models of Strain-Gage Balance Calibration Data

    NASA Technical Reports Server (NTRS)

    Ulbrich, Norbert

    2013-01-01

    Hidden connections between regression models of wind tunnel strain-gage balance calibration data are investigated. These connections become visible whenever balance calibration data is supplied in its design format and both the Iterative and Non-Iterative Method are used to process the data. First, it is shown how the regression coefficients of the fitted balance loads of a force balance can be approximated by using the corresponding regression coefficients of the fitted strain-gage outputs. Then, data from the manual calibration of the Ames MK40 six-component force balance is chosen to illustrate how estimates of the regression coefficients of the fitted balance loads can be obtained from the regression coefficients of the fitted strain-gage outputs. The study illustrates that load predictions obtained by applying the Iterative or the Non-Iterative Method originate from two related regression solutions of the balance calibration data as long as balance loads are given in the design format of the balance, gage outputs behave highly linear, strict statistical quality metrics are used to assess regression models of the data, and regression model term combinations of the fitted loads and gage outputs can be obtained by a simple variable exchange.

  16. Land use regression modeling of ultrafine particles, ozone, nitrogen oxides and markers of particulate matter pollution in Augsburg, Germany.

    PubMed

    Wolf, Kathrin; Cyrys, Josef; Harciníková, Tatiana; Gu, Jianwei; Kusch, Thomas; Hampel, Regina; Schneider, Alexandra; Peters, Annette

    2017-02-01

    Important health relevance has been suggested for ultrafine particles (UFP) and ozone, but studies on long-term effects are scarce, mainly due to the lack of appropriate spatial exposure models. We designed a measurement campaign to develop land use regression (LUR) models to predict the spatial variability focusing on particle number concentration (PNC) as indicator for UFP, ozone and several other air pollutants in the Augsburg region, Southern Germany. Three bi-weekly measurements of PNC, ozone, particulate matter (PM10, PM2.5), soot (PM2.5abs) and nitrogen oxides (NOx, NO2) were performed at 20 sites in 2014/15. Annual average concentration were calculated and temporally adjusted by measurements from a continuous background station. As geographic predictors we offered several traffic and land use variables, altitude, population and building density. Models were validated using leave-one-out cross-validation. Adjusted model explained variance (R(2)) was high for PNC and ozone (0.89 and 0.88). Cross-validation adjusted R(2) was slightly lower (0.82 and 0.81) but still indicated a very good fit. LUR models for other pollutants performed well with adjusted R(2) between 0.68 (PMcoarse) and 0.94 (NO2). Contrary to previous studies, ozone showed a moderate correlation with NO2 (Pearson's r=-0.26). PNC was moderately correlated with ozone and PM2.5, but highly correlated with NOx (r=0.91). For PNC and NOx, LUR models comprised similar predictors and future epidemiological analyses evaluating health effects need to consider these similarities.

  17. [Evaluation of chemotherapy for stage IV non-small cell lung cancer employing a regression tree type method for quality-adjusted survival analysis to determine prognostic factors].

    PubMed

    Fujita, A; Takabatake, H; Tagaki, S; Sohda, T; Sekine, K

    1996-03-01

    To evaluate the effect of chemotherapy on QOL, the survival period was categorized by 3 intervals: one in the hospital for chemotherapy (TOX), on an outpatient basis (TWiST Time without Symptom and Toxicity), and in the hospital for conservative therapy (REL). Coefficients showing the QOL level were expressed as ut, uw and ur. If uw was 1 and ut and ur were plotted at less than 1, ut TOX+uwTWiST+urREL could be a quality-adjusted value relative to TWiST (Q-TWiST). One hundred five patients with stage IV non-small cell lung cancer were included. Sixty-five were given chemotherapy, and the other 40 were not. The observation period was 2 years. Q-TWiST values for age, sex, PS, histology and chemotherapy were calculated. Their quantification was performed employing a regression tree type method. Chemotherapy contributed to Q-TWiST when ut approached 1 i.e., no side effect was supposed). When ut was less than 0.5, PS and sex had an appreciable role.

  18. Poisson regression for modeling count and frequency outcomes in trauma research.

    PubMed

    Gagnon, David R; Doron-LaMarca, Susan; Bell, Margret; O'Farrell, Timothy J; Taft, Casey T

    2008-10-01

    The authors describe how the Poisson regression method for analyzing count or frequency outcome variables can be applied in trauma studies. The outcome of interest in trauma research may represent a count of the number of incidents of behavior occurring in a given time interval, such as acts of physical aggression or substance abuse. Traditional linear regression approaches assume a normally distributed outcome variable with equal variances over the range of predictor variables, and may not be optimal for modeling count outcomes. An application of Poisson regression is presented using data from a study of intimate partner aggression among male patients in an alcohol treatment program and their female partners. Results of Poisson regression and linear regression models are compared.

  19. Modeling absolute differences in life expectancy with a censored skew-normal regression approach.

    PubMed

    Moser, André; Clough-Gorr, Kerri; Zwahlen, Marcel

    2015-01-01

    Parameter estimates from commonly used multivariable parametric survival regression models do not directly quantify differences in years of life expectancy. Gaussian linear regression models give results in terms of absolute mean differences, but are not appropriate in modeling life expectancy, because in many situations time to death has a negative skewed distribution. A regression approach using a skew-normal distribution would be an alternative to parametric survival models in the modeling of life expectancy, because parameter estimates can be interpreted in terms of survival time differences while allowing for skewness of the distribution. In this paper we show how to use the skew-normal regression so that censored and left-truncated observations are accounted for. With this we model differences in life expectancy using data from the Swiss National Cohort Study and from official life expectancy estimates and compare the results with those derived from commonly used survival regression models. We conclude that a censored skew-normal survival regression approach for left-truncated observations can be used to model differences in life expectancy across covariates of interest.

  20. Moment Reconstruction and Moment-Adjusted Imputation When Exposure is Generated by a Complex, Nonlinear Random Effects Modeling Process

    PubMed Central

    Potgieter, Cornelis J.; Wei, Rubin; Kipnis, Victor; Freedman, Laurence S.; Carroll, Raymond J.

    2016-01-01

    Summary For the classical, homoscedastic measurement error model, moment reconstruction (Freedman et al., 2004, 2008) and moment-adjusted imputation (Thomas et al., 2011) are appealing, computationally simple imputation-like methods for general model fitting. Like classical regression calibration, the idea is to replace the unobserved variable subject to measurement error with a proxy that can be used in a variety of analyses. Moment reconstruction and moment-adjusted imputation differ from regression calibration in that they attempt to match multiple features of the latent variable, and also to match some of the latent variable’s relationships with the response and additional covariates. In this note, we consider a problem where true exposure is generated by a complex, nonlinear random effects modeling process, and develop analogues of moment reconstruction and moment-adjusted imputation for this case. This general model includes classical measurement errors, Berkson measurement errors, mixtures of Berkson and classical errors and problems that are not measurement error problems, but also cases where the data generating process for true exposure is a complex, nonlinear random effects modeling process. The methods are illustrated using the National Institutes of Health-AARP Diet and Health Study where the latent variable is a dietary pattern score called the Healthy Eating Index - 2005. We also show how our general model includes methods used in radiation epidemiology as a special case. Simulations are used to illustrate the methods. PMID:27061196

  1. Evaluation of Land use Regression Models for NO2 in El Paso, Texas, USA

    EPA Science Inventory

    Developing suitable exposure estimates for air pollution health studies is problematic due to spatial and temporal variation in concentrations and often limited monitoring data. Though land use regression models (LURs) are often used for this purpose, their applicability to later...

  2. Boosted Regression Tree Models to Explain Watershed Nutrient Concentrations and Biological Condition

    EPA Science Inventory

    Boosted regression tree (BRT) models were developed to quantify the nonlinear relationships between landscape variables and nutrient concentrations in a mesoscale mixed land cover watershed during base-flow conditions. Factors that affect instream biological components, based on ...

  3. MULTIPLE REGRESSION MODELS FOR HINDCASTING AND FORECASTING MIDSUMMER HYPOXIA IN THE GULF OF MEXICO

    EPA Science Inventory

    A new suite of multiple regression models were developed that describe the relationship between the area of bottom water hypoxia along the northern Gulf of Mexico and Mississippi-Atchafalaya River nitrate concentration, total phosphorus (TP) concentration, and discharge. Variabil...

  4. Multivariate adaptive regression splines models for the prediction of energy expenditure in children and adolescents

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Advanced mathematical models have the potential to capture the complex metabolic and physiological processes that result in heat production, or energy expenditure (EE). Multivariate adaptive regression splines (MARS), is a nonparametric method that estimates complex nonlinear relationships by a seri...

  5. SOME STATISTICAL ISSUES RELATED TO MULTIPLE LINEAR REGRESSION MODELING OF BEACH BACTERIA CONCENTRATIONS

    EPA Science Inventory

    As a fast and effective technique, the multiple linear regression (MLR) method has been widely used in modeling and prediction of beach bacteria concentrations. Among previous works on this subject, however, several issues were insufficiently or inconsistently addressed. Those is...

  6. Estimation of pyrethroid pesticide intake using regression modeling of food groups based on composite dietary samples

    EPA Science Inventory

    Population-based estimates of pesticide intake are needed to characterize exposure for particular demographic groups based on their dietary behaviors. Regression modeling performed on measurements of selected pesticides in composited duplicate diet samples allowed (1) estimation ...

  7. Multiple regression modelling of mineral base oil biodegradability based on their physical properties and overall chemical composition.

    PubMed

    Haus, Frédérique; Boissel, Olivier; Junter, Guy Alain

    2003-02-01

    A set of 38 mineral base oils was characterized by a number of chemical (i.e., overall chemical composition) and physical parameters used routinely in industry. Their primary biodegradability was evaluated using the CEC L-33-A-93 test. Multiple (stepwise) linear regression (MLR) analyses were performed to describe the relationships between the biodegradability values and the chemical or physical properties of oils. Chemical, physical, and both types of parameters were successively used as independent variables. Using chemical descriptors as variables, a four-variable model equation was obtained that explained only 68.2% (adjusted R-squared statistic=68.2%) of the variability in biodegradability. The fitting was improved by using either the physical or the whole parameters as variables. MLR analyses led to three-descriptor model equations involving kinematic viscosity (as log), Noack volatility (as log) and either the viscosity index (pure physical model) or the paraffinic carbon percentage (mixed chemical-physical model). These two models displayed very similar adjusted R-squared statistics, of approximately 91%. Their predicting ability was verified using 25 additional base oils or oil blends. For 80% of oils on a total of 63, the absolute percentage error on biodegradability predicted by either model was lower than 20%. Kinematic viscosity was by far the most influential parameter in the two models.

  8. Improving sub-pixel imperviousness change prediction by ensembling heterogeneous non-linear regression models

    NASA Astrophysics Data System (ADS)

    Drzewiecki, Wojciech

    2016-12-01

    In this work nine non-linear regression models were compared for sub-pixel impervious surface area mapping from Landsat images. The comparison was done in three study areas both for accuracy of imperviousness coverage evaluation in individual points in time and accuracy of imperviousness change assessment. The performance of individual machine learning algorithms (Cubist, Random Forest, stochastic gradient boosting of regression trees, k-nearest neighbors regression, random k-nearest neighbors regression, Multivariate Adaptive Regression Splines, averaged neural networks, and support vector machines with polynomial and radial kernels) was also compared with the performance of heterogeneous model ensembles constructed from the best models trained using particular techniques. The results proved that in case of sub-pixel evaluation the most accurate prediction of change may not necessarily be based on the most accurate individual assessments. When single methods are considered, based on obtained results Cubist algorithm may be advised for Landsat based mapping of imperviousness for single dates. However, Random Forest may be endorsed when the most reliable evaluation of imperviousness change is the primary goal. It gave lower accuracies for individual assessments, but better prediction of change due to more correlated errors of individual predictions. Heterogeneous model ensembles performed for individual time points assessments at least as well as the best individual models. In case of imperviousness change assessment the ensembles always outperformed single model approaches. It means that it is possible to improve the accuracy of sub-pixel imperviousness change assessment using ensembles of heterogeneous non-linear regression models.

  9. Evaluation of accuracy of linear regression models in predicting urban stormwater discharge characteristics.

    PubMed

    Madarang, Krish J; Kang, Joo-Hyon

    2014-06-01

    Stormwater runoff has been identified as a source of pollution for the environment, especially for receiving waters. In order to quantify and manage the impacts of stormwater runoff on the environment, predictive models and mathematical models have been developed. Predictive tools such as regression models have been widely used to predict stormwater discharge characteristics. Storm event characteristics, such as antecedent dry days (ADD), have been related to response variables, such as pollutant loads and concentrations. However it has been a controversial issue among many studies to consider ADD as an important variable in predicting stormwater discharge characteristics. In this study, we examined the accuracy of general linear regression models in predicting discharge characteristics of roadway runoff. A total of 17 storm events were monitored in two highway segments, located in Gwangju, Korea. Data from the monitoring were used to calibrate United States Environmental Protection Agency's Storm Water Management Model (SWMM). The calibrated SWMM was simulated for 55 storm events, and the results of total suspended solid (TSS) discharge loads and event mean concentrations (EMC) were extracted. From these data, linear regression models were developed. R(2) and p-values of the regression of ADD for both TSS loads and EMCs were investigated. Results showed that pollutant loads were better predicted than pollutant EMC in the multiple regression models. Regression may not provide the true effect of site-specific characteristics, due to uncertainty in the data.

  10. Regression modeling of streamflow, baseflow, and runoff using geographic information systems.

    PubMed

    Zhu, Yuanhong; Day, Rick L

    2009-02-01

    Regression models for predicting total streamflow (TSF), baseflow (TBF), and storm runoff (TRO) are needed for water resource planning and management. This study used 54 streams with >20 years of streamflow gaging station records during the period October 1971 to September 2001 in Pennsylvania and partitioned TSF into TBF and TRO. TBF was considered a surrogate of groundwater recharge for basins. Regression models for predicting basin-wide TSF, TBF, and TRO were developed under three scenarios that varied in regression variables used for model development. Regression variables representing basin geomorphological, geological, soil, and climatic characteristics were estimated using geographic information systems. All regression models for TSF, TBF, and TRO had R(2) values >0.94 and reasonable prediction errors. The two best TSF models developed under scenarios 1 and 2 had similar absolute prediction errors. The same was true for the two best TBF models. Therefore, any one of the two best TSF and TBF models could be used for respective flow prediction depending on variable availability. The TRO model developed under scenario 1 had smaller absolute prediction errors than that developed under scenario 2. Simplified Area-alone models developed under scenario 3 might be used when variables for using best models are not available, but had lower R(2) values and higher or more variable prediction errors than the best models.

  11. Parametric modeling of quantile regression coefficient functions with censored and truncated data.

    PubMed

    Frumento, Paolo; Bottai, Matteo

    2017-02-09

    Quantile regression coefficient functions describe how the coefficients of a quantile regression model depend on the order of the quantile. A method for parametric modeling of quantile regression coefficient functions was discussed in a recent article. The aim of the present work is to extend the existing framework to censored and truncated data. We propose an estimator and derive its asymptotic properties. We discuss goodness-of-fit measures, present simulation results, and analyze the data that motivated this article. The described estimator has been implemented in the R package qrcm.

  12. Adjusting power for a baseline covariate in linear models

    PubMed Central

    Glueck, Deborah H.; Muller, Keith E.

    2009-01-01

    SUMMARY The analysis of covariance provides a common approach to adjusting for a baseline covariate in medical research. With Gaussian errors, adding random covariates does not change either the theory or the computations of general linear model data analysis. However, adding random covariates does change the theory and computation of power analysis. Many data analysts fail to fully account for this complication in planning a study. We present our results in five parts. (i) A review of published results helps document the importance of the problem and the limitations of available methods. (ii) A taxonomy for general linear multivariate models and hypotheses allows identifying a particular problem. (iii) We describe how random covariates introduce the need to consider quantiles and conditional values of power. (iv) We provide new exact and approximate methods for power analysis of a range of multivariate models with a Gaussian baseline covariate, for both small and large samples. The new results apply to the Hotelling-Lawley test and the four tests in the “univariate” approach to repeated measures (unadjusted, Huynh-Feldt, Geisser-Greenhouse, Box). The techniques allow rapid calculation and an interactive, graphical approach to sample size choice. (v) Calculating power for a clinical trial of a treatment for increasing bone density illustrates the new methods. We particularly recommend using quantile power with a new Satterthwaite-style approximation. PMID:12898543

  13. Analysis of Covariance with Linear Regression Error Model on Antenna Control Unit Tracking

    DTIC Science & Technology

    2015-10-20

    412TW-PA-15238 Analysis of Covariance with Linear Regression Error Model on Antenna Control Unit Tracking DANIEL T. LAIRD AIR...COVERED (From - To) 20 OCT 15 – 23 OCT 15 4. TITLE AND SUBTITLE Analysis of Covariance with Linear Regression Error Model on Antenna Control Tracking...analysis of variance (ANOVA) to decide for the null- or alternative-hypotheses of a telemetry antenna control unit’s (ACU) ability to track on C-band

  14. Developing and testing a global-scale regression model to quantify mean annual streamflow

    NASA Astrophysics Data System (ADS)

    Barbarossa, Valerio; Huijbregts, Mark A. J.; Hendriks, A. Jan; Beusen, Arthur H. W.; Clavreul, Julie; King, Henry; Schipper, Aafke M.

    2017-01-01

    Quantifying mean annual flow of rivers (MAF) at ungauged sites is essential for assessments of global water supply, ecosystem integrity and water footprints. MAF can be quantified with spatially explicit process-based models, which might be overly time-consuming and data-intensive for this purpose, or with empirical regression models that predict MAF based on climate and catchment characteristics. Yet, regression models have mostly been developed at a regional scale and the extent to which they can be extrapolated to other regions is not known. In this study, we developed a global-scale regression model for MAF based on a dataset unprecedented in size, using observations of discharge and catchment characteristics from 1885 catchments worldwide, measuring between 2 and 106 km2. In addition, we compared the performance of the regression model with the predictive ability of the spatially explicit global hydrological model PCR-GLOBWB by comparing results from both models to independent measurements. We obtained a regression model explaining 89% of the variance in MAF based on catchment area and catchment averaged mean annual precipitation and air temperature, slope and elevation. The regression model performed better than PCR-GLOBWB for the prediction of MAF, as root-mean-square error (RMSE) values were lower (0.29-0.38 compared to 0.49-0.57) and the modified index of agreement (d) was higher (0.80-0.83 compared to 0.72-0.75). Our regression model can be applied globally to estimate MAF at any point of the river network, thus providing a feasible alternative to spatially explicit process-based global hydrological models.

  15. OPLS statistical model versus linear regression to assess sonographic predictors of stroke prognosis

    PubMed Central

    Vajargah, Kianoush Fathi; Sadeghi-Bazargani, Homayoun; Mehdizadeh-Esfanjani, Robab; Savadi-Oskouei, Daryoush; Farhoudi, Mehdi

    2012-01-01

    The objective of the present study was to assess the comparable applicability of orthogonal projections to latent structures (OPLS) statistical model vs traditional linear regression in order to investigate the role of trans cranial doppler (TCD) sonography in predicting ischemic stroke prognosis. The study was conducted on 116 ischemic stroke patients admitted to a specialty neurology ward. The Unified Neurological Stroke Scale was used once for clinical evaluation on the first week of admission and again six months later. All data was primarily analyzed using simple linear regression and later considered for multivariate analysis using PLS/OPLS models through the SIMCA P+12 statistical software package. The linear regression analysis results used for the identification of TCD predictors of stroke prognosis were confirmed through the OPLS modeling technique. Moreover, in comparison to linear regression, the OPLS model appeared to have higher sensitivity in detecting the predictors of ischemic stroke prognosis and detected several more predictors. Applying the OPLS model made it possible to use both single TCD measures/indicators and arbitrarily dichotomized measures of TCD single vessel involvement as well as the overall TCD result. In conclusion, the authors recommend PLS/OPLS methods as complementary rather than alternative to the available classical regression models such as linear regression. PMID:22973104

  16. OPLS statistical model versus linear regression to assess sonographic predictors of stroke prognosis.

    PubMed

    Vajargah, Kianoush Fathi; Sadeghi-Bazargani, Homayoun; Mehdizadeh-Esfanjani, Robab; Savadi-Oskouei, Daryoush; Farhoudi, Mehdi

    2012-01-01

    The objective of the present study was to assess the comparable applicability of orthogonal projections to latent structures (OPLS) statistical model vs traditional linear regression in order to investigate the role of trans cranial doppler (TCD) sonography in predicting ischemic stroke prognosis. The study was conducted on 116 ischemic stroke patients admitted to a specialty neurology ward. The Unified Neurological Stroke Scale was used once for clinical evaluation on the first week of admission and again six months later. All data was primarily analyzed using simple linear regression and later considered for multivariate analysis using PLS/OPLS models through the SIMCA P+12 statistical software package. The linear regression analysis results used for the identification of TCD predictors of stroke prognosis were confirmed through the OPLS modeling technique. Moreover, in comparison to linear regression, the OPLS model appeared to have higher sensitivity in detecting the predictors of ischemic stroke prognosis and detected several more predictors. Applying the OPLS model made it possible to use both single TCD measures/indicators and arbitrarily dichotomized measures of TCD single vessel involvement as well as the overall TCD result. In conclusion, the authors recommend PLS/OPLS methods as complementary rather than alternative to the available classical regression models such as linear regression.

  17. Regression Model Term Selection for the Analysis of Strain-Gage Balance Calibration Data

    NASA Technical Reports Server (NTRS)

    Ulbrich, Norbert Manfred; Volden, Thomas R.

    2010-01-01

    The paper discusses the selection of regression model terms for the analysis of wind tunnel strain-gage balance calibration data. Different function class combinations are presented that may be used to analyze calibration data using either a non-iterative or an iterative method. The role of the intercept term in a regression model of calibration data is reviewed. In addition, useful algorithms and metrics originating from linear algebra and statistics are recommended that will help an analyst (i) to identify and avoid both linear and near-linear dependencies between regression model terms and (ii) to make sure that the selected regression model of the calibration data uses only statistically significant terms. Three different tests are suggested that may be used to objectively assess the predictive capability of the final regression model of the calibration data. These tests use both the original data points and regression model independent confirmation points. Finally, data from a simplified manual calibration of the Ames MK40 balance is used to illustrate the application of some of the metrics and tests to a realistic calibration data set.

  18. Adjusting the Adjusted X[superscript 2]/df Ratio Statistic for Dichotomous Item Response Theory Analyses: Does the Model Fit?

    ERIC Educational Resources Information Center

    Tay, Louis; Drasgow, Fritz

    2012-01-01

    Two Monte Carlo simulation studies investigated the effectiveness of the mean adjusted X[superscript 2]/df statistic proposed by Drasgow and colleagues and, because of problems with the method, a new approach for assessing the goodness of fit of an item response theory model was developed. It has been previously recommended that mean adjusted…

  19. Disaster Hits Home: A Model of Displaced Family Adjustment after Hurricane Katrina

    ERIC Educational Resources Information Center

    Peek, Lori; Morrissey, Bridget; Marlatt, Holly

    2011-01-01

    The authors explored individual and family adjustment processes among parents (n = 30) and children (n = 55) who were displaced to Colorado after Hurricane Katrina. Drawing on in-depth interviews with 23 families, this article offers an inductive model of displaced family adjustment. Four stages of family adjustment are presented in the model: (a)…

  20. Predicting Air Permeability of Handloom Fabrics: A Comparative Analysis of Regression and Artificial Neural Network Models

    NASA Astrophysics Data System (ADS)

    Mitra, Ashis; Majumdar, Prabal Kumar; Bannerjee, Debamalya

    2013-03-01

    This paper presents a comparative analysis of two modeling methodologies for the prediction of air permeability of plain woven handloom cotton fabrics. Four basic fabric constructional parameters namely ends per inch, picks per inch, warp count and weft count have been used as inputs for artificial neural network (ANN) and regression models. Out of the four regression models tried, interaction model showed very good prediction performance with a meager mean absolute error of 2.017 %. However, ANN models demonstrated superiority over the regression models both in terms of correlation coefficient and mean absolute error. The ANN model with 10 nodes in the single hidden layer showed very good correlation coefficient of 0.982 and 0.929 and mean absolute error of only 0.923 and 2.043 % for training and testing data respectively.

  1. Mechanisms of developmental regression in autism and the broader phenotype: a neural network modeling approach.

    PubMed

    Thomas, Michael S C; Knowland, Victoria C P; Karmiloff-Smith, Annette

    2011-10-01

    Loss of previously established behaviors in early childhood constitutes a markedly atypical developmental trajectory. It is found almost uniquely in autism and its cause is currently unknown (Baird et al., 2008). We present an artificial neural network model of developmental regression, exploring the hypothesis that regression is caused by overaggressive synaptic pruning and identifying the mechanisms involved. We used a novel population-modeling technique to investigate developmental deficits, in which both neurocomputational parameters and the learning environment were varied across a large number of simulated individuals. Regression was generated by the atypical setting of a single pruning-related parameter. We observed a probabilistic relationship between the atypical pruning parameter and the presence of regression, as well as variability in the onset, severity, behavioral specificity, and recovery from regression. Other neurocomputational parameters that varied across the population modulated the risk that an individual would show regression. We considered a further hypothesis that behavioral regression may index an underlying anomaly characterizing the broader autism phenotype. If this is the case, we show how the model also accounts for several additional findings: shared gene variants between autism and language impairment (Vernes et al., 2008); larger brain size in autism but only in early development (Redcay & Courchesne, 2005); and the possibility of quasi-autism, caused by extreme environmental deprivation (Rutter et al., 1999). We make a novel prediction that the earliest developmental symptoms in the emergence of autism should be sensory and motor rather than social and review empirical data offering preliminary support for this prediction.

  2. The Norwegian Healthier Goats program--modeling lactation curves using a multilevel cubic spline regression model.

    PubMed

    Nagel-Alne, G E; Krontveit, R; Bohlin, J; Valle, P S; Skjerve, E; Sølverød, L S

    2014-07-01

    In 2001, the Norwegian Goat Health Service initiated the Healthier Goats program (HG), with the aim of eradicating caprine arthritis encephalitis, caseous lymphadenitis, and Johne's disease (caprine paratuberculosis) in Norwegian goat herds. The aim of the present study was to explore how control and eradication of the above-mentioned diseases by enrolling in HG affected milk yield by comparison with herds not enrolled in HG. Lactation curves were modeled using a multilevel cubic spline regression model where farm, goat, and lactation were included as random effect parameters. The data material contained 135,446 registrations of daily milk yield from 28,829 lactations in 43 herds. The multilevel cubic spline regression model was applied to 4 categories of data: enrolled early, control early, enrolled late, and control late. For enrolled herds, the early and late notations refer to the situation before and after enrolling in HG; for nonenrolled herds (controls), they refer to development over time, independent of HG. Total milk yield increased in the enrolled herds after eradication: the total milk yields in the fourth lactation were 634.2 and 873.3 kg in enrolled early and enrolled late herds, respectively, and 613.2 and 701.4 kg in the control early and control late herds, respectively. Day of peak yield differed between enrolled and control herds. The day of peak yield came on d 6 of lactation for the control early category for parities 2, 3, and 4, indicating an inability of the goats to further increase their milk yield from the initial level. For enrolled herds, on the other hand, peak yield came between d 49 and 56, indicating a gradual increase in milk yield after kidding. Our results indicate that enrollment in the HG disease eradication program improved the milk yield of dairy goats considerably, and that the multilevel cubic spline regression was a suitable model for exploring effects of disease control and eradication on milk yield.

  3. Genetic analysis of body weights of individually fed beef bulls in South Africa using random regression models.

    PubMed

    Selapa, N W; Nephawe, K A; Maiwashe, A; Norris, D

    2012-02-08

    The aim of this study was to estimate genetic parameters for body weights of individually fed beef bulls measured at centralized testing stations in South Africa using random regression models. Weekly body weights of Bonsmara bulls (N = 2919) tested between 1999 and 2003 were available for the analyses. The model included a fixed regression of the body weights on fourth-order orthogonal Legendre polynomials of the actual days on test (7, 14, 21, 28, 35, 42, 49, 56, 63, 70, 77, and 84) for starting age and contemporary group effects. Random regressions on fourth-order orthogonal Legendre polynomials of the actual days on test were included for additive genetic effects and additional uncorrelated random effects of the weaning-herd-year and the permanent environment of the animal. Residual effects were assumed to be independently distributed with heterogeneous variance for each test day. Variance ratios for additive genetic, permanent environment and weaning-herd-year for weekly body weights at different test days ranged from 0.26 to 0.29, 0.37 to 0.44 and 0.26 to 0.34, respectively. The weaning-herd-year was found to have a significant effect on the variation of body weights of bulls despite a 28-day adjustment period. Genetic correlations amongst body weights at different test days were high, ranging from 0.89 to 1.00. Heritability estimates were comparable to literature using multivariate models. Therefore, random regression model could be applied in the genetic evaluation of body weight of individually fed beef bulls in South Africa.

  4. Intuitionistic Fuzzy Weighted Linear Regression Model with Fuzzy Entropy under Linear Restrictions.

    PubMed

    Kumar, Gaurav; Bajaj, Rakesh Kumar

    2014-01-01

    In fuzzy set theory, it is well known that a triangular fuzzy number can be uniquely determined through its position and entropies. In the present communication, we extend this concept on triangular intuitionistic fuzzy number for its one-to-one correspondence with its position and entropies. Using the concept of fuzzy entropy the estimators of the intuitionistic fuzzy regression coefficients have been estimated in the unrestricted regression model. An intuitionistic fuzzy weighted linear regression (IFWLR) model with some restrictions in the form of prior information has been considered. Further, the estimators of regression coefficients have been obtained with the help of fuzzy entropy for the restricted/unrestricted IFWLR model by assigning some weights in the distance function.

  5. Random regression models using different functions to model milk flow in dairy cows.

    PubMed

    Laureano, M M M; Bignardi, A B; El Faro, L; Cardoso, V L; Tonhati, H; Albuquerque, L G

    2014-09-12

    We analyzed 75,555 test-day milk flow records from 2175 primiparous Holstein cows that calved between 1997 and 2005. Milk flow was obtained by dividing the mean milk yield (kg) of the 3 daily milking by the total milking time (min) and was expressed as kg/min. Milk flow was grouped into 43 weekly classes. The analyses were performed using a single-trait Random Regression Models that included direct additive genetic, permanent environmental, and residual random effects. In addition, the contemporary group and linear and quadratic effects of cow age at calving were included as fixed effects. Fourth-order orthogonal Legendre polynomial of days in milk was used to model the mean trend in milk flow. The additive genetic and permanent environmental covariance functions were estimated using random regression Legendre polynomials and B-spline functions of days in milk. The model using a third-order Legendre polynomial for additive genetic effects and a sixth-order polynomial for permanent environmental effects, which contained 7 residual classes, proved to be the most adequate to describe variations in milk flow, and was also the most parsimonious. The heritability in milk flow estimated by the most parsimonious model was of moderate to high magnitude.

  6. Predicting dissolved oxygen concentration using kernel regression modeling approaches with nonlinear hydro-chemical data.

    PubMed

    Singh, Kunwar P; Gupta, Shikha; Rai, Premanjali

    2014-05-01

    Kernel function-based regression models were constructed and applied to a nonlinear hydro-chemical dataset pertaining to surface water for predicting the dissolved oxygen levels. Initial features were selected using nonlinear approach. Nonlinearity in the data was tested using BDS statistics, which revealed the data with nonlinear structure. Kernel ridge regression, kernel principal component regression, kernel partial least squares regression, and support vector regression models were developed using the Gaussian kernel function and their generalization and predictive abilities were compared in terms of several statistical parameters. Model parameters were optimized using the cross-validation procedure. The proposed kernel regression methods successfully captured the nonlinear features of the original data by transforming it to a high dimensional feature space using the kernel function. Performance of all the kernel-based modeling methods used here were comparable both in terms of predictive and generalization abilities. Values of the performance criteria parameters suggested for the adequacy of the constructed models to fit the nonlinear data and their good predictive capabilities.

  7. Mixed-effects Gaussian process functional regression models with application to dose-response curve prediction.

    PubMed

    Shi, J Q; Wang, B; Will, E J; West, R M

    2012-11-20

    We propose a new semiparametric model for functional regression analysis, combining a parametric mixed-effects model with a nonparametric Gaussian process regression model, namely a mixed-effects Gaussian process functional regression model. The parametric component can provide explanatory information between the response and the covariates, whereas the nonparametric component can add nonlinearity. We can model the mean and covariance structures simultaneously, combining the information borrowed from other subjects with the information collected from each individual subject. We apply the model to dose-response curves that describe changes in the responses of subjects for differing levels of the dose of a drug or agent and have a wide application in many areas. We illustrate the method for the management of renal anaemia. An individual dose-response curve is improved when more information is included by this mechanism from the subject/patient over time, enabling a patient-specific treatment regime.

  8. Regression models tolerant to massively missing data: a case study in solar radiation nowcasting

    NASA Astrophysics Data System (ADS)

    Žliobaitė, I.; Hollmén, J.; Junninen, H.

    2014-07-01

    Statistical models for environmental monitoring strongly rely on automatic data acquisition systems, using various physical sensors. Often, sensor readings are missing for extended periods of time while model outputs need to be continuously available in real time. With a case study in solar radiation nowcasting, we investigate how to deal with massively missing data (around 50% of the time some data are unavailable) in such situations. Our goal is to analyze the characteristics of missing data and recommend a strategy for deploying regression models, which would be robust to missing data in situations, where data are massively missing. We are after one model that performs well at all times, with and without data gaps. Due to the need to provide instantaneous outputs with minimum energy consumption for computing in the data streaming setting, we dismiss computationally demanding data imputation methods, and resort to a simple mean replacement. We use an established strategy for comparing different regression models, with the possibility of determining how many missing sensor readings can be tolerated before model outputs become obsolete. We experimentally analyze accuracies and robustness to missing data of seven linear regression models and recommend using regularized PCA regression. We recommend using our established guideline in training regression models, which themselves are robust to missing data.

  9. The Development of Salary Goal Modeling: From Regression Analysis to a Value-Based Prescriptive Approach.

    ERIC Educational Resources Information Center

    Stewart, Kenneth D.; And Others

    1996-01-01

    Development of a comprehensive, pro-active, value-centered model for reviewing college faculty salaries is described. The model, used at Frostburg State University (Maryland), draws on multiple-regression applications to salary equity issues. Applications of the model in evaluating, redressing, and preventing salary equity problems are presented.…

  10. Comparing Regression Coefficients between Nested Linear Models for Clustered Data with Generalized Estimating Equations

    ERIC Educational Resources Information Center

    Yan, Jun; Aseltine, Robert H., Jr.; Harel, Ofer

    2013-01-01

    Comparing regression coefficients between models when one model is nested within another is of great practical interest when two explanations of a given phenomenon are specified as linear models. The statistical problem is whether the coefficients associated with a given set of covariates change significantly when other covariates are added into…

  11. Evaluation of Regression Models of Balance Calibration Data Using an Empirical Criterion

    NASA Technical Reports Server (NTRS)

    Ulbrich, Norbert; Volden, Thomas R.

    2012-01-01

    An empirical criterion for assessing the significance of individual terms of regression models of wind tunnel strain gage balance outputs is evaluated. The criterion is based on the percent contribution of a regression model term. It considers a term to be significant if its percent contribution exceeds the empirical threshold of 0.05%. The criterion has the advantage that it can easily be computed using the regression coefficients of the gage outputs and the load capacities of the balance. First, a definition of the empirical criterion is provided. Then, it is compared with an alternate statistical criterion that is widely used in regression analysis. Finally, calibration data sets from a variety of balances are used to illustrate the connection between the empirical and the statistical criterion. A review of these results indicated that the empirical criterion seems to be suitable for a crude assessment of the significance of a regression model term as the boundary between a significant and an insignificant term cannot be defined very well. Therefore, regression model term reduction should only be performed by using the more universally applicable statistical criterion.

  12. Comparison Between Linear and Non-parametric Regression Models for Genome-Enabled Prediction in Wheat

    PubMed Central

    Pérez-Rodríguez, Paulino; Gianola, Daniel; González-Camacho, Juan Manuel; Crossa, José; Manès, Yann; Dreisigacker, Susanne

    2012-01-01

    In genome-enabled prediction, parametric, semi-parametric, and non-parametric regression models have been used. This study assessed the predictive ability of linear and non-linear models using dense molecular markers. The linear models were linear on marker effects and included the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B. The non-linear models (this refers to non-linearity on markers) were reproducing kernel Hilbert space (RKHS) regression, Bayesian regularized neural networks (BRNN), and radial basis function neural networks (RBFNN). These statistical models were compared using 306 elite wheat lines from CIMMYT genotyped with 1717 diversity array technology (DArT) markers and two traits, days to heading (DTH) and grain yield (GY), measured in each of 12 environments. It was found that the three non-linear models had better overall prediction accuracy than the linear regression specification. Results showed a consistent superiority of RKHS and RBFNN over the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B models. PMID:23275882

  13. Analysis of Multivariate Experimental Data Using A Simplified Regression Model Search Algorithm

    NASA Technical Reports Server (NTRS)

    Ulbrich, Norbert M.

    2013-01-01

    A new regression model search algorithm was developed that may be applied to both general multivariate experimental data sets and wind tunnel strain-gage balance calibration data. The algorithm is a simplified version of a more complex algorithm that was originally developed for the NASA Ames Balance Calibration Laboratory. The new algorithm performs regression model term reduction to prevent overfitting of data. It has the advantage that it needs only about one tenth of the original algorithm's CPU time for the completion of a regression model search. In addition, extensive testing showed that the prediction accuracy of math models obtained from the simplified algorithm is similar to the prediction accuracy of math models obtained from the original algorithm. The simplified algorithm, however, cannot guarantee that search constraints related to a set of statistical quality requirements are always satisfied in the optimized regression model. Therefore, the simplified algorithm is not intended to replace the original algorithm. Instead, it may be used to generate an alternate optimized regression model of experimental data whenever the application of the original search algorithm fails or requires too much CPU time. Data from a machine calibration of NASA's MK40 force balance is used to illustrate the application of the new search algorithm.

  14. Modeling of retardance in ferrofluid with Taguchi-based multiple regression analysis

    NASA Astrophysics Data System (ADS)

    Lin, Jing-Fung; Wu, Jyh-Shyang; Sheu, Jer-Jia

    2015-03-01

    The citric acid (CA) coated Fe3O4 ferrofluids are prepared by a co-precipitation method and the magneto-optical retardance property is measured by a Stokes polarimeter. Optimization and multiple regression of retardance in ferrofluids are executed by combining Taguchi method and Excel. From the nine tests for four parameters, including pH of suspension, molar ratio of CA to Fe3O4, volume of CA, and coating temperature, influence sequence and excellent program are found. Multiple regression analysis and F-test on the significance of regression equation are performed. It is found that the model F value is much larger than Fcritical and significance level P <0.0001. So it can be concluded that the regression model has statistically significant predictive ability. Substituting excellent program into equation, retardance is obtained as 32.703°, higher than the highest value in tests by 11.4%.

  15. Measures of Residual Risk with Connections to Regression, Risk Tracking, Surrogate Models, and Ambiguity

    DTIC Science & Technology

    2015-05-20

    Measures of Residual Risk with Connections to Regression, Risk Tracking, Surrogate Models, and Ambiguity1 R. Tyrrell Rockafellar Johannes O. Royset... Measures of residual risk are developed as extension of measures of risk. They view a random variable of interest in concert with an auxiliary random...forecasting and generalized regression. We establish the fundamental properties in this framework and show that measures of residual risk along with

  16. Modelling of binary logistic regression for obesity among secondary students in a rural area of Kedah

    NASA Astrophysics Data System (ADS)

    Kamaruddin, Ainur Amira; Ali, Zalila; Noor, Norlida Mohd.; Baharum, Adam; Ahmad, Wan Muhamad Amir W.

    2014-07-01

    Logistic regression analysis examines the influence of various factors on a dichotomous outcome by estimating the probability of the event's occurrence. Logistic regression, also called a logit model, is a statistical procedure used to model dichotomous outcomes. In the logit model the log odds of the dichotomous outcome is modeled as a linear combination of the predictor variables. The log odds ratio in logistic regression provides a description of the probabilistic relationship of the variables and the outcome. In conducting logistic regression, selection procedures are used in selecting important predictor variables, diagnostics are used to check that assumptions are valid which include independence of errors, linearity in the logit for continuous variables, absence of multicollinearity, and lack of strongly influential outliers and a test statistic is calculated to determine the aptness of the model. This study used the binary logistic regression model to investigate overweight and obesity among rural secondary school students on the basis of their demographics profile, medical history, diet and lifestyle. The results indicate that overweight and obesity of students are influenced by obesity in family and the interaction between a student's ethnicity and routine meals intake. The odds of a student being overweight and obese are higher for a student having a family history of obesity and for a non-Malay student who frequently takes routine meals as compared to a Malay student.

  17. Regression model estimation of early season crop proportions: North Dakota, some preliminary results

    NASA Technical Reports Server (NTRS)

    Lin, K. K. (Principal Investigator)

    1982-01-01

    To estimate crop proportions early in the season, an approach is proposed based on: use of a regression-based prediction equation to obtain an a priori estimate for specific major crop groups; modification of this estimate using current-year LANDSAT and weather data; and a breakdown of the major crop groups into specific crops by regression models. Results from the development and evaluation of appropriate regression models for the first portion of the proposed approach are presented. The results show that the model predicts 1980 crop proportions very well at both county and crop reporting district levels. In terms of planted acreage, the model underpredicted 9.1 percent of the 1980 published data on planted acreage at the county level. It predicted almost exactly the 1980 published data on planted acreage at the crop reporting district level and overpredicted the planted acreage by just 0.92 percent.

  18. REGRESSION APPROXIMATIONS FOR TRANSPORT MODEL CONSTRAINT SETS IN COMBINED AQUIFER SIMULATION-OPTIMIZATION STUDIES.

    USGS Publications Warehouse

    Alley, William M.

    1986-01-01

    Problems involving the combined use of contaminant transport models and nonlinear optimization schemes can be very expensive to solve. This paper explores the use of transport models with ordinary regression and regression on ranks to develop approximate response functions of concentrations at critical locations as a function of pumping and recharge at decision wells. These response functions combined with other constraints can often be solved very easily and may suggest reasonable starting points for combined simulation-management modeling or even relatively efficient operating schemes in themselves.

  19. Development of LACIE CCEA-1 weather/wheat yield models. [regression analysis

    NASA Technical Reports Server (NTRS)

    Strommen, N. D.; Sakamoto, C. M.; Leduc, S. K.; Umberger, D. E. (Principal Investigator)

    1979-01-01

    The advantages and disadvantages of the casual (phenological, dynamic, physiological), statistical regression, and analog approaches to modeling for grain yield are examined. Given LACIE's primary goal of estimating wheat production for the large areas of eight major wheat-growing regions, the statistical regression approach of correlating historical yield and climate data offered the Center for Climatic and Environmental Assessment the greatest potential return within the constraints of time and data sources. The basic equation for the first generation wheat-yield model is given. Topics discussed include truncation, trend variable, selection of weather variables, episodic events, strata selection, operational data flow, weighting, and model results.

  20. Deep ensemble learning of sparse regression models for brain disease diagnosis.

    PubMed

    Suk, Heung-Il; Lee, Seong-Whan; Shen, Dinggang

    2017-04-01

    Recent studies on brain imaging analysis witnessed the core roles of machine learning techniques in computer-assisted intervention for brain disease diagnosis. Of various machine-learning techniques, sparse regression models have proved their effectiveness in handling high-dimensional data but with a small number of training samples, especially in medical problems. In the meantime, deep learning methods have been making great successes by outperforming the state-of-the-art performances in various applications. In this paper, we propose a novel framework that combines the two conceptually different methods of sparse regression and deep learning for Alzheimer's disease/mild cognitive impairment diagnosis and prognosis. Specifically, we first train multiple sparse regression models, each of which is trained with different values of a regularization control parameter. Thus, our multiple sparse regression models potentially select different feature subsets from the original feature set; thereby they have different powers to predict the response values, i.e., clinical label and clinical scores in our work. By regarding the response values from our sparse regression models as target-level representations, we then build a deep convolutional neural network for clinical decision making, which thus we call 'Deep Ensemble Sparse Regression Network.' To our best knowledge, this is the first work that combines sparse regression models with deep neural network. In our experiments with the ADNI cohort, we validated the effectiveness of the proposed method by achieving the highest diagnostic accuracies in three classification tasks. We also rigorously analyzed our results and compared with the previous studies on the ADNI cohort in the literature.

  1. Restricted spatial regression in practice: Geostatistical models, confounding, and robustness under model misspecification

    USGS Publications Warehouse

    Hanks, Ephraim M.; Schliep, Erin M.; Hooten, Mevin B.; Hoeting, Jennifer A.

    2015-01-01

    In spatial generalized linear mixed models (SGLMMs), covariates that are spatially smooth are often collinear with spatially smooth random effects. This phenomenon is known as spatial confounding and has been studied primarily in the case where the spatial support of the process being studied is discrete (e.g., areal spatial data). In this case, the most common approach suggested is restricted spatial regression (RSR) in which the spatial random effects are constrained to be orthogonal to the fixed effects. We consider spatial confounding and RSR in the geostatistical (continuous spatial support) setting. We show that RSR provides computational benefits relative to the confounded SGLMM, but that Bayesian credible intervals under RSR can be inappropriately narrow under model misspecification. We propose a posterior predictive approach to alleviating this potential problem and discuss the appropriateness of RSR in a variety of situations. We illustrate RSR and SGLMM approaches through simulation studies and an analysis of malaria frequencies in The Gambia, Africa.

  2. Ranking contributing areas of salt and selenium in the Lower Gunnison River Basin, Colorado, using multiple linear regression models

    USGS Publications Warehouse

    Linard, Joshua I.

    2013-01-01

    Mitigating the effects of salt and selenium on water quality in the Grand Valley and lower Gunnison River Basin in western Colorado is a major concern for land managers. Previous modeling indicated means to improve the models by including more detailed geospatial data and a more rigorous method for developing the models. After evaluating all possible combinations of geospatial variables, four multiple linear regression models resulted that could estimate irrigation-season salt yield, nonirrigation-season salt yield, irrigation-season selenium yield, and nonirrigation-season selenium yield. The adjusted r-squared and the residual standard error (in units of log-transformed yield) of the models were, respectively, 0.87 and 2.03 for the irrigation-season salt model, 0.90 and 1.25 for the nonirrigation-season salt model, 0.85 and 2.94 for the irrigation-season selenium model, and 0.93 and 1.75 for the nonirrigation-season selenium model. The four models were used to estimate yields and loads from contributing areas corresponding to 12-digit hydrologic unit codes in the lower Gunnison River Basin study area. Each of the 175 contributing areas was ranked according to its estimated mean seasonal yield of salt and selenium.

  3. Experimental evaluation of regression model-based walking speed estimation using lower body-mounted IMU.

    PubMed

    Zihajehzadeh, Shaghayegh; Park, Edward J

    2016-08-01

    This study provides a concurrent comparison of regression model-based walking speed estimation accuracy using lower body mounted inertial sensors. The comparison is based on different sets of variables, features, mounting locations and regression methods. An experimental evaluation was performed on 15 healthy subjects during free walking trials. Our results show better accuracy of Gaussian process regression compared to least square regression using Lasso. Among the variables, external acceleration tends to provide improved accuracy. By using both time-domain and frequency-domain features, waist and ankle-mounted sensors result in similar accuracies: 4.5% for the waist and 4.9% for the ankle. When using only frequency-domain features, estimation accuracy based on a waist-mounted sensor suffers more compared to the one from ankle.

  4. Multi-parameter regression survival modeling: An alternative to proportional hazards.

    PubMed

    Burke, K; MacKenzie, G

    2016-11-28

    It is standard practice for covariates to enter a parametric model through a single distributional parameter of interest, for example, the scale parameter in many standard survival models. Indeed, the well-known proportional hazards model is of this kind. In this article, we discuss a more general approach whereby covariates enter the model through more than one distributional parameter simultaneously (e.g., scale and shape parameters). We refer to this practice as "multi-parameter regression" (MPR) modeling and explore its use in a survival analysis context. We find that multi-parameter regression leads to more flexible models which can offer greater insight into the underlying data generating process. To illustrate the concept, we consider the two-parameter Weibull model which leads to time-dependent hazard ratios, thus relaxing the typical proportional hazards assumption and motivating a new test of proportionality. A novel variable selection strategy is introduced for such multi-parameter regression models. It accounts for the correlation arising between the estimated regression coefficients in two or more linear predictors-a feature which has not been considered by other authors in similar settings. The methods discussed have been implemented in the mpr package in R.

  5. Regression Methods for Categorical Dependent Variables: Effects on a Model of Student College Choice

    ERIC Educational Resources Information Center

    Rapp, Kelly E.

    2012-01-01

    The use of categorical dependent variables with the classical linear regression model (CLRM) violates many of the model's assumptions and may result in biased estimates (Long, 1997; O'Connell, Goldstein, Rogers, & Peng, 2008). Many dependent variables of interest to educational researchers (e.g., professorial rank, educational attainment) are…

  6. An Additional Measure of Overall Effect Size for Logistic Regression Models

    ERIC Educational Resources Information Center

    Allen, Jeff; Le, Huy

    2008-01-01

    Users of logistic regression models often need to describe the overall predictive strength, or effect size, of the model's predictors. Analogs of R[superscript 2] have been developed, but none of these measures are interpretable on the same scale as effects of individual predictors. Furthermore, R[superscript 2] analogs are not invariant to the…

  7. Computational Tools for Probing Interactions in Multiple Linear Regression, Multilevel Modeling, and Latent Curve Analysis

    ERIC Educational Resources Information Center

    Preacher, Kristopher J.; Curran, Patrick J.; Bauer, Daniel J.

    2006-01-01

    Simple slopes, regions of significance, and confidence bands are commonly used to evaluate interactions in multiple linear regression (MLR) models, and the use of these techniques has recently been extended to multilevel or hierarchical linear modeling (HLM) and latent curve analysis (LCA). However, conducting these tests and plotting the…

  8. Getting Straight: Everything You Always Wanted to Know about the Title I Regression Model and Curvilinearity.

    ERIC Educational Resources Information Center

    Echternacht, Gary; Swinton, Spencer

    Title I evaluations using the RMC Model C design depend for their interpretation on the assumption that the regression of posttest on pretest is linear across the cut score level when there is no treatment; but there are many instances where nonlinearities may occur. If one applies the analysis of covariance, or model C analysis, large errors may…

  9. INTRODUCTION TO A COMBINED MULTIPLE LINEAR REGRESSION AND ARMA MODELING APPROACH FOR BEACH BACTERIA PREDICTION

    EPA Science Inventory

    Due to the complexity of the processes contributing to beach bacteria concentrations, many researchers rely on statistical modeling, among which multiple linear regression (MLR) modeling is most widely used. Despite its ease of use and interpretation, there may be time dependence...

  10. Sample Size Determination for Regression Models Using Monte Carlo Methods in R

    ERIC Educational Resources Information Center

    Beaujean, A. Alexander

    2014-01-01

    A common question asked by researchers using regression models is, What sample size is needed for my study? While there are formulae to estimate sample sizes, their assumptions are often not met in the collected data. A more realistic approach to sample size determination requires more information such as the model of interest, strength of the…

  11. Time series modeling by a regression approach based on a latent process.

    PubMed

    Chamroukhi, Faicel; Samé, Allou; Govaert, Gérard; Aknin, Patrice

    2009-01-01

    Time series are used in many domains including finance, engineering, economics and bioinformatics generally to represent the change of a measurement over time. Modeling techniques may then be used to give a synthetic representation of such data. A new approach for time series modeling is proposed in this paper. It consists of a regression model incorporating a discrete hidden logistic process allowing for activating smoothly or abruptly different polynomial regression models. The model parameters are estimated by the maximum likelihood method performed by a dedicated Expectation Maximization (EM) algorithm. The M step of the EM algorithm uses a multi-class Iterative Reweighted Least-Squares (IRLS) algorithm to estimate the hidden process parameters. To evaluate the proposed approach, an experimental study on simulated data and real world data was performed using two alternative approaches: a heteroskedastic piecewise regression model using a global optimization algorithm based on dynamic programming, and a Hidden Markov Regression Model whose parameters are estimated by the Baum-Welch algorithm. Finally, in the context of the remote monitoring of components of the French railway infrastructure, and more particularly the switch mechanism, the proposed approach has been applied to modeling and classifying time series representing the condition measurements acquired during switch operations.

  12. Predicting recovery of cognitive function soon after stroke: differential modeling of logarithmic and linear regression.

    PubMed

    Suzuki, Makoto; Sugimura, Yuko; Yamada, Sumio; Omori, Yoshitsugu; Miyamoto, Masaaki; Yamamoto, Jun-ichi

    2013-01-01

    Cognitive disorders in the acute stage of stroke are common and are important independent predictors of adverse outcome in the long term. Despite the impact of cognitive disorders on both patients and their families, it is still difficult to predict the extent or duration of cognitive impairments. The objective of the present study was, therefore, to provide data on predicting the recovery of cognitive function soon after stroke by differential modeling with logarithmic and linear regression. This study included two rounds of data collection comprising 57 stroke patients enrolled in the first round for the purpose of identifying the time course of cognitive recovery in the early-phase group data, and 43 stroke patients in the second round for the purpose of ensuring that the correlation of the early-phase group data applied to the prediction of each individual's degree of cognitive recovery. In the first round, Mini-Mental State Examination (MMSE) scores were assessed 3 times during hospitalization, and the scores were regressed on the logarithm and linear of time. In the second round, calculations of MMSE scores were made for the first two scoring times after admission to tailor the structures of logarithmic and linear regression formulae to fit an individual's degree of functional recovery. The time course of early-phase recovery for cognitive functions resembled both logarithmic and linear functions. However, MMSE scores sampled at two baseline points based on logarithmic regression modeling could estimate prediction of cognitive recovery more accurately than could linear regression modeling (logarithmic modeling, R(2) = 0.676, P<0.0001; linear regression modeling, R(2) = 0.598, P<0.0001). Logarithmic modeling based on MMSE scores could accurately predict the recovery of cognitive function soon after the occurrence of stroke. This logarithmic modeling with mathematical procedures is simple enough to be adopted in daily clinical practice.

  13. Prediction of soil temperature using regression and artificial neural network models

    NASA Astrophysics Data System (ADS)

    Bilgili, Mehmet

    2010-12-01

    In this study, monthly soil temperature was modeled by linear regression (LR), nonlinear regression (NLR) and artificial neural network (ANN) methods. The soil temperature and other meteorological parameters, which have been taken from Adana meteorological station, were observed between the years of 2000 and 2007 by the Turkish State Meteorological Service (TSMS). The soil temperatures were measured at depths of 5, 10, 20, 50 and 100 cm below the ground level. A three-layer feed-forward ANN structure was constructed and a back-propagation algorithm was used for the training of ANNs. In order to get a successful simulation, the correlation coefficients between all of the meteorological variables (soil temperature, atmospheric temperature, atmospheric pressure, relative humidity, wind speed, rainfall, global solar radiation and sunshine duration) were calculated taking them two by two. First, all independent variables were split into two time periods such as cold and warm seasons. They were added to the enter regression model. Then, the method of stepwise multiple regression was applied for the selection of the "best" regression equation (model). Thus, the best independent variables were selected for the LR and NLR models and they were also used in the input layer of the ANN method. Results of these methods were compared to each other. Finally, the ANN method was found to provide better performance than the LR and NLR methods.

  14. Spatial modeling of the schistosomiasis mansoni in Minas Gerais State, Brazil using spatial regression.

    PubMed

    Fonseca, F; Freitas, C; Dutra, L; Guimarães, R; Carvalho, O

    2014-05-01

    Schistosomiasis is a transmissible parasitic disease caused by the etiologic agent Schistosoma mansoni, whose intermediate hosts are snails of the genus Biomphalaria. The main goal of this paper is to estimate the prevalence of schistosomiasis in Minas Gerais State in Brazil using spatial disease information derived from the state transportation network of roads and rivers. The spatial information was incorporated in two ways: by introducing new variables that carry spatial neighborhood information and by using spatial regression models. Climate, socioeconomic and environmental variables were also used as co-variables to build models and use them to estimate a risk map for the whole state of Minas Gerais. The results show that the models constructed from the spatial regression produced a better fit, providing smaller root mean square error (RMSE) values. When no spatial information was used, the RMSE for the whole state of Minas Gerais reached 9.5%; with spatial regression, the RMSE reaches 8.8% (when the new variables are added to the model) and 8.5% (with the use of spatial regression). Variables representing vegetation, temperature, precipitation, topography, sanitation and human development indexes were important in explaining the spread of disease and identified certain conditions that are favorable for disease development. The use of spatial regression for the network of roads and rivers produced meaningful results for health management procedures and directing activities, enabling better detection of disease risk areas.

  15. The role of a murine transplantation model of atherosclerosis regression in drug discovery.

    PubMed

    Feig, Jonathan E; Quick, John S; Fisher, Edward A

    2009-03-01

    Atherosclerosis is the leading cause of death worldwide. To date, the use of statins to lower LDL levels has been the major intervention used to delay or halt disease progression. These drugs have an incomplete impact on plaque burden and risk, however, as evidenced by the substantial rates of myocardial infarctions that occur in large-scale clinical trials of statins. Thus, it is hoped that by understanding the factors that lead to plaque regression, better approaches to treating atherosclerosis may be developed. A transplantation-based mouse model of atherosclerosis regression has been developed by allowing plaques to form in a model of human atherosclerosis, the apoE-deficient mouse, and then placing these plaques into recipient mice with a normolipidemic plasma environment. Under these conditions, the depletion of foam cells occurs. Interestingly, the disappearance of foam cells was primarily due to migration in a CCR7-dependent manner to regional and systemic lymph nodes after 3 days in the normolipidemic (regression) environment. Further studies using this transplant model demonstrated that liver X receptor and HDL are other factors likely to be involved in plaque regression. In conclusion, through the use of this transplant model, the process of uncovering the pathways regulating atherosclerosis regression has begun, which will ultimately lead to the identification of new therapeutic targets.

  16. Modeling of crossbred cattle growth, comparison between cubic and piecewise random regression models.

    PubMed

    Mirzaei, H R; Pitchford, W S; Verbyla, A P

    2011-09-27

    Two analyses, cubic and piecewise random regression, were conducted to model growth of crossbred cattle from birth to about two years of age, investigating the ability of a piecewise procedure to fit growth traits without the complications of the cubic model. During a four-year period (1994-1997) of the Australian "Southern Crossbreeding Project", mature Hereford cows (N = 581) were mated to 97 sires of Angus, Belgian Blue, Hereford, Jersey, Limousin, South Devon, and Wagyu breeds, resulting in 1141 steers and heifers born over four years. Data included 13 (for steers) and eight (for heifers) live body weight measurements, made approximately every 50 days from birth until slaughter. The mixed model included fixed effects of sex, sire breed, age (linear, quadratic and cubic), and their interactions between sex and sire breed with age. Random effects were sire, dam, management (birth location, year, post-weaning groups), and permanent environmental effects and for each of these when possible, their interactions with linear, quadratic and cubic growth. In both models, body weights of all breeds increased over pre-weaning period, held fairly steady (slightly flattening) over the dry season then increased again towards the end of the feedlot period. The number of estimated parameters for the cubic model was 22 while for the piecewise model it was 32. It was concluded that the piecewise model was very similar to the cubic model in the fit to the data; with the piecewise model being marginally better. The piecewise model seems to fit the data better at the end of the growth period.

  17. Effects of model sensitivity and nonlinearity on nonlinear regression of ground water flow

    USGS Publications Warehouse

    Yager, R.M.

    2004-01-01

    Nonlinear regression is increasingly applied to the calibration of hydrologic models through the use of perturbation methods to compute the Jacobian or sensitivity matrix required by the Gauss-Newton optimization method. Sensitivities obtained by perturbation methods can be less accurate than those obtained by direct differentiation, however, and concern has arisen that the optimal parameter values and the associated parameter covariance matrix computed by perturbation could also be less accurate. Sensitivities computed by both perturbation and direct differentiation were applied in nonlinear regression calibration of seven ground water flow models. The two methods gave virtually identical optimum parameter values and covariances for the three models that were relatively linear and two of the models that were relatively nonlinear, but gave widely differing results for two other nonlinear models. The perturbation method performed better than direct differentiation in some regressions with the nonlinear models, apparently because approximate sensitivities computed for an interval yielded better search directions than did more accurately computed sensitivities for a point. The method selected to avoid overshooting minima on the error surface when updating parameter values with the Gauss-Newton procedure appears for nonlinear models to be more important than the method of sensitivity calculation in controlling regression convergence.

  18. Graphical assessment of internal and external calibration of logistic regression models by using loess smoothers.

    PubMed

    Austin, Peter C; Steyerberg, Ewout W

    2014-02-10

    Predicting the probability of the occurrence of a binary outcome or condition is important in biomedical research. While assessing discrimination is an essential issue in developing and validating binary prediction models, less attention has been paid to methods for assessing model calibration. Calibration refers to the degree of agreement between observed and predicted probabilities and is often assessed by testing for lack-of-fit. The objective of our study was to examine the ability of graphical methods to assess the calibration of logistic regression models. We examined lack of internal calibration, which was related to misspecification of the logistic regression model, and external calibration, which was related to an overfit model or to shrinkage of the linear predictor. We conducted an extensive set of Monte Carlo simulations with a locally weighted least squares regression smoother (i.e., the loess algorithm) to examine the ability of graphical methods to assess model calibration. We found that loess-based methods were able to provide evidence of moderate departures from linearity and indicate omission of a moderately strong interaction. Misspecification of the link function was harder to detect. Visual patterns were clearer with higher sample sizes, higher incidence of the outcome, or higher discrimination. Loess-based methods were also able to identify the lack of calibration in external validation samples when an overfit regression model had been used. In conclusion, loess-based smoothing methods are adequate tools to graphically assess calibration and merit wider application.

  19. A general framework for the use of logistic regression models in meta-analysis.

    PubMed

    Simmonds, Mark C; Higgins, Julian Pt

    2016-12-01

    Where individual participant data are available for every randomised trial in a meta-analysis of dichotomous event outcomes, "one-stage" random-effects logistic regression models have been proposed as a way to analyse these data. Such models can also be used even when individual participant data are not available and we have only summary contingency table data. One benefit of this one-stage regression model over conventional meta-analysis methods is that it maximises the correct binomial likelihood for the data and so does not require the common assumption that effect estimates are normally distributed. A second benefit of using this model is that it may be applied, with only minor modification, in a range of meta-analytic scenarios, including meta-regression, network meta-analyses and meta-analyses of diagnostic test accuracy. This single model can potentially replace the variety of often complex methods used in these areas. This paper considers, with a range of meta-analysis examples, how random-effects logistic regression models may be used in a number of different types of meta-analyses. This one-stage approach is compared with widely used meta-analysis methods including Bayesian network meta-analysis and the bivariate and hierarchical summary receiver operating characteristic (ROC) models for meta-analyses of diagnostic test accuracy.

  20. Logistic Regression

    NASA Astrophysics Data System (ADS)

    Grégoire, G.

    2014-12-01

    The logistic regression originally is intended to explain the relationship between the probability of an event and a set of covariables. The model's coefficients can be interpreted via the odds and odds ratio, which are presented in introduction of the chapter. The observations are possibly got individually, then we speak of binary logistic regression. When they are grouped, the logistic regression is said binomial. In our presentation we mainly focus on the binary case. For statistical inference the main tool is the maximum likelihood methodology: we present the Wald, Rao and likelihoods ratio results and their use to compare nested models. The problems we intend to deal with are essentially the same as in multiple linear regression: testing global effect, individual effect, selection of variables to build a model, measure of the fitness of the model, prediction of new values… . The methods are demonstrated on data sets using R. Finally we briefly consider the binomial case and the situation where we are interested in several events, that is the polytomous (multinomial) logistic regression and the particular case of ordinal logistic regression.

  1. Adjustment of regional climate model output for modeling the climatic mass balance of all glaciers on Svalbard.

    PubMed

    Möller, Marco; Obleitner, Friedrich; Reijmer, Carleen H; Pohjola, Veijo A; Głowacki, Piotr; Kohler, Jack

    2016-05-27

    Large-scale modeling of glacier mass balance relies often on the output from regional climate models (RCMs). However, the limited accuracy and spatial resolution of RCM output pose limitations on mass balance simulations at subregional or local scales. Moreover, RCM output is still rarely available over larger regions or for longer time periods. This study evaluates the extent to which it is possible to derive reliable region-wide glacier mass balance estimates, using coarse resolution (10 km) RCM output for model forcing. Our data cover the entire Svalbard archipelago over one decade. To calculate mass balance, we use an index-based model. Model parameters are not calibrated, but the RCM air temperature and precipitation fields are adjusted using in situ mass balance measurements as reference. We compare two different calibration methods: root mean square error minimization and regression optimization. The obtained air temperature shifts (+1.43°C versus +2.22°C) and precipitation scaling factors (1.23 versus 1.86) differ considerably between the two methods, which we attribute to inhomogeneities in the spatiotemporal distribution of the reference data. Our modeling suggests a mean annual climatic mass balance of -0.05 ± 0.40 m w.e. a(-1) for Svalbard over 2000-2011 and a mean equilibrium line altitude of 452 ± 200 m  above sea level. We find that the limited spatial resolution of the RCM forcing with respect to real surface topography and the usage of spatially homogeneous RCM output adjustments and mass balance model parameters are responsible for much of the modeling uncertainty. Sensitivity of the results to model parameter uncertainty is comparably small and of minor importance.

  2. Establishment of In Silico Prediction Models for CYP3A4 and CYP2B6 Induction in Human Hepatocytes by Multiple Regression Analysis Using Azole Compounds.

    PubMed

    Nagai, Mika; Konno, Yoshihiro; Satsukawa, Masahiro; Yamashita, Shinji; Yoshinari, Kouichi

    2016-08-01

    Drug-drug interactions (DDIs) via cytochrome P450 (P450) induction are one clinical problem leading to increased risk of adverse effects and the need for dosage adjustments and additional therapeutic monitoring. In silico models for predicting P450 induction are useful for avoiding DDI risk. In this study, we have established regression models for CYP3A4 and CYP2B6 induction in human hepatocytes using several physicochemical parameters for a set of azole compounds with different P450 induction as characteristics as model compounds. To obtain a well-correlated regression model, the compounds for CYP3A4 or CYP2B6 induction were independently selected from the tested azole compounds using principal component analysis with fold-induction data. Both of the multiple linear regression models obtained for CYP3A4 and CYP2B6 induction are represented by different sets of physicochemical parameters. The adjusted coefficients of determination for these models were of 0.8 and 0.9, respectively. The fold-induction of the validation compounds, another set of 12 azole-containing compounds, were predicted within twofold limits for both CYP3A4 and CYP2B6. The concordance for the prediction of CYP3A4 induction was 87% with another validation set, 23 marketed drugs. However, the prediction of CYP2B6 induction tended to be overestimated for these marketed drugs. The regression models show that lipophilicity mostly contributes to CYP3A4 induction, whereas not only the lipophilicity but also the molecular polarity is important for CYP2B6 induction. Our regression models, especially that for CYP3A4 induction, might provide useful methods to avoid potent CYP3A4 or CYP2B6 inducers during the lead optimization stage without performing induction assays in human hepatocytes.

  3. A Linear Regression and Markov Chain Model for the Arabian Horse Registry

    DTIC Science & Technology

    1993-04-01

    as a tax deduction? Yes No T-4367 68 26. Regardless of previous equine tax deductions, do you consider your current horse activities to be... (Mark one...E L T-4367 A Linear Regression and Markov Chain Model For the Arabian Horse Registry Accesion For NTIS CRA&I UT 7 4:iC=D 5 D-IC JA" LI J:13tjlC,3 lO...the Arabian Horse Registry, which needed to forecast its future registration of purebred Arabian horses . A linear regression model was utilized to

  4. Good practice guidelines for the use of statistical regression models in economic evaluations.

    PubMed

    Kearns, Ben; Ara, Roberta; Wailoo, Allan; Manca, Andrea; Alava, Monica Hernández; Abrams, Keith; Campbell, Mike

    2013-08-01

    Decision-analytic models (DAMs) used to evaluate the cost effectiveness of interventions are pivotal sources of evidence used in economic evaluations. Parameter estimates used in the DAMs are often based on the results of a regression analysis, but there is little guidance relating to these. This study had two objectives. The first was to identify the frequency of use of regression models in economic evaluations, the parameters they inform, and the amount of information reported to describe and support the analyses. The second objective was to provide guidance to improve practice in this area, based on the review. The review concentrated on a random sample of economic evaluations submitted to the UK National Institute for Health and Clinical Excellence (NICE) as part of its technology appraisal process. Based on these findings, recommendations for good practice were drafted, together with a checklist for critiquing reporting standards in this area. Based on the results of this review, statistical regression models are in widespread use in DAMs used to support economic evaluations, yet reporting of basic information, such as the sample size used and measures of uncertainty, is limited. Recommendations were formed about how reporting standards could be improved to better meet the needs of decision makers. These recommendations are summarised in a checklist, which may be used by both those conducting regression analyses and those critiquing them, to identify what should be reported when using the results of a regression analysis within a DAM.

  5. Analysis of Multivariate Experimental Data Using A Simplified Regression Model Search Algorithm

    NASA Technical Reports Server (NTRS)

    Ulbrich, Norbert Manfred

    2013-01-01

    A new regression model search algorithm was developed in 2011 that may be used to analyze both general multivariate experimental data sets and wind tunnel strain-gage balance calibration data. The new algorithm is a simplified version of a more complex search algorithm that was originally developed at the NASA Ames Balance Calibration Laboratory. The new algorithm has the advantage that it needs only about one tenth of the original algorithm's CPU time for the completion of a search. In addition, extensive testing showed that the prediction accuracy of math models obtained from the simplified algorithm is similar to the prediction accuracy of math models obtained from the original algorithm. The simplified algorithm, however, cannot guarantee that search constraints related to a set of statistical quality requirements are always satisfied in the optimized regression models. Therefore, the simplified search algorithm is not intended to replace the original search algorithm. Instead, it may be used to generate an alternate optimized regression model of experimental data whenever the application of the original search algorithm either fails or requires too much CPU time. Data from a machine calibration of NASA's MK40 force balance is used to illustrate the application of the new regression model search algorithm.

  6. Multiple regression analysis in modelling of carbon dioxide emissions by energy consumption use in Malaysia

    NASA Astrophysics Data System (ADS)

    Keat, Sim Chong; Chun, Beh Boon; San, Lim Hwee; Jafri, Mohd Zubir Mat

    2015-04-01

    Climate change due to carbon dioxide (CO2) emissions is one of the most complex challenges threatening our planet. This issue considered as a great and international concern that primary attributed from different fossil fuels. In this paper, regression model is used for analyzing the causal relationship among CO2 emissions based on the energy consumption in Malaysia using time series data for the period of 1980-2010. The equations were developed using regression model based on the eight major sources that contribute to the CO2 emissions such as non energy, Liquefied Petroleum Gas (LPG), diesel, kerosene, refinery gas, Aviation Turbine Fuel (ATF) and Aviation Gasoline (AV Gas), fuel oil and motor petrol. The related data partly used for predict the regression model (1980-2000) and partly used for validate the regression model (2001-2010). The results of the prediction model with the measured data showed a high correlation coefficient (R2=0.9544), indicating the model's accuracy and efficiency. These results are accurate and can be used in early warning of the population to comply with air quality standards.

  7. Comparing regression methods for the two-stage clonal expansion model of carcinogenesis.

    PubMed

    Kaiser, J C; Heidenreich, W F

    2004-11-15

    In the statistical analysis of cohort data with risk estimation models, both Poisson and individual likelihood regressions are widely used methods of parameter estimation. In this paper, their performance has been tested with the biologically motivated two-stage clonal expansion (TSCE) model of carcinogenesis. To exclude inevitable uncertainties of existing data, cohorts with simple individual exposure history have been created by Monte Carlo simulation. To generate some similar properties of atomic bomb survivors and radon-exposed mine workers, both acute and protracted exposure patterns have been generated. Then the capacity of the two regression methods has been compared to retrieve a priori known model parameters from the simulated cohort data. For simple models with smooth hazard functions, the parameter estimates from both methods come close to their true values. However, for models with strongly discontinuous functions which are generated by the cell mutation process of transformation, the Poisson regression method fails to produce reliable estimates. This behaviour is explained by the construction of class averages during data stratification. Thereby, some indispensable information on the individual exposure history was destroyed. It could not be repaired by countermeasures such as the refinement of Poisson classes or a more adequate choice of Poisson groups. Although this choice might still exist we were unable to discover it. In contrast to this, the individual likelihood regression technique was found to work reliably for all considered versions of the TSCE model.

  8. Parameter Estimation for Differential Equation Models Using a Framework of Measurement Error in Regression Models.

    PubMed

    Liang, Hua; Wu, Hulin

    2008-12-01

    Differential equation (DE) models are widely used in many scientific fields that include engineering, physics and biomedical sciences. The so-called "forward problem", the problem of simulations and predictions of state variables for given parameter values in the DE models, has been extensively studied by mathematicians, physicists, engineers and other scientists. However, the "inverse problem", the problem of parameter estimation based on the measurements of output variables, has not been well explored using modern statistical methods, although some least squares-based approaches have been proposed and studied. In this paper, we propose parameter estimation methods for ordinary differential equation models (ODE) based on the local smoothing approach and a pseudo-least squares (PsLS) principle under a framework of measurement error in regression models. The asymptotic properties of the proposed PsLS estimator are established. We also compare the PsLS method to the corresponding SIMEX method and evaluate their finite sample performances via simulation studies. We illustrate the proposed approach using an application example from an HIV dynamic study.

  9. Accounting for spatial effects in land use regression for urban air pollution modeling.

    PubMed

    Bertazzon, Stefania; Johnson, Markey; Eccles, Kristin; Kaplan, Gilaad G

    2015-01-01

    In order to accurately assess air pollution risks, health studies require spatially resolved pollution concentrations. Land-use regression (LUR) models estimate ambient concentrations at a fine spatial scale. However, spatial effects such as spatial non-stationarity and spatial autocorrelation can reduce the accuracy of LUR estimates by increasing regression errors and uncertainty; and statistical methods for resolving these effects--e.g., spatially autoregressive (SAR) and geographically weighted regression (GWR) models--may be difficult to apply simultaneously. We used an alternate approach to address spatial non-stationarity and spatial autocorrelation in LUR models for nitrogen dioxide. Traditional models were re-specified to include a variable capturing wind speed and direction, and re-fit as GWR models. Mean R(2) values for the resulting GWR-wind models (summer: 0.86, winter: 0.73) showed a 10-20% improvement over traditional LUR models. GWR-wind models effectively addressed both spatial effects and produced meaningful predictive models. These results suggest a useful method for improving spatially explicit models.

  10. Analysis of the influence of quantile regression model on mainland tourists' service satisfaction performance.

    PubMed

    Wang, Wen-Cheng; Cho, Wen-Chien; Chen, Yin-Jen

    2014-01-01

    It is estimated that mainland Chinese tourists travelling to Taiwan can bring annual revenues of 400 billion NTD to the Taiwan economy. Thus, how the Taiwanese Government formulates relevant measures to satisfy both sides is the focus of most concern. Taiwan must improve the facilities and service quality of its tourism industry so as to attract more mainland tourists. This paper conducted a questionnaire survey of mainland tourists and used grey relational analysis in grey mathematics to analyze the satisfaction performance of all satisfaction question items. The first eight satisfaction items were used as independent variables, and the overall satisfaction performance was used as a dependent variable for quantile regression model analysis to discuss the relationship between the dependent variable under different quantiles and independent variables. Finally, this study further discussed the predictive accuracy of the least mean regression model and each quantile regression model, as a reference for research personnel. The analysis results showed that other variables could also affect the overall satisfaction performance of mainland tourists, in addition to occupation and age. The overall predictive accuracy of quantile regression model Q0.25 was higher than that of the other three models.

  11. Analysis of the Influence of Quantile Regression Model on Mainland Tourists' Service Satisfaction Performance

    PubMed Central

    Wang, Wen-Cheng; Cho, Wen-Chien; Chen, Yin-Jen

    2014-01-01

    It is estimated that mainland Chinese tourists travelling to Taiwan can bring annual revenues of 400 billion NTD to the Taiwan economy. Thus, how the Taiwanese Government formulates relevant measures to satisfy both sides is the focus of most concern. Taiwan must improve the facilities and service quality of its tourism industry so as to attract more mainland tourists. This paper conducted a questionnaire survey of mainland tourists and used grey relational analysis in grey mathematics to analyze the satisfaction performance of all satisfaction question items. The first eight satisfaction items were used as independent variables, and the overall satisfaction performance was used as a dependent variable for quantile regression model analysis to discuss the relationship between the dependent variable under different quantiles and independent variables. Finally, this study further discussed the predictive accuracy of the least mean regression model and each quantile regression model, as a reference for research personnel. The analysis results showed that other variables could also affect the overall satisfaction performance of mainland tourists, in addition to occupation and age. The overall predictive accuracy of quantile regression model Q0.25 was higher than that of the other three models. PMID:24574916

  12. A novel strategy for forensic age prediction by DNA methylation and support vector regression model

    PubMed Central

    Xu, Cheng; Qu, Hongzhu; Wang, Guangyu; Xie, Bingbing; Shi, Yi; Yang, Yaran; Zhao, Zhao; Hu, Lan; Fang, Xiangdong; Yan, Jiangwei; Feng, Lei

    2015-01-01

    High deviations resulting from prediction model, gender and population difference have limited age estimation application of DNA methylation markers. Here we identified 2,957 novel age-associated DNA methylation sites (P < 0.01 and R2 > 0.5) in blood of eight pairs of Chinese Han female monozygotic twins. Among them, nine novel sites (false discovery rate < 0.01), along with three other reported sites, were further validated in 49 unrelated female volunteers with ages of 20–80 years by Sequenom Massarray. A total of 95 CpGs were covered in the PCR products and 11 of them were built the age prediction models. After comparing four different models including, multivariate linear regression, multivariate nonlinear regression, back propagation neural network and support vector regression, SVR was identified as the most robust model with the least mean absolute deviation from real chronological age (2.8 years) and an average accuracy of 4.7 years predicted by only six loci from the 11 loci, as well as an less cross-validated error compared with linear regression model. Our novel strategy provides an accurate measurement that is highly useful in estimating the individual age in forensic practice as well as in tracking the aging process in other related applications. PMID:26635134

  13. Regression based modeling of vegetation and climate variables for the Amazon rainforests

    NASA Astrophysics Data System (ADS)

    Kodali, A.; Khandelwal, A.; Ganguly, S.; Bongard, J.; Das, K.

    2015-12-01

    Both short-term (weather) and long-term (climate) variations in the atmosphere directly impact various ecosystems on earth. Forest ecosystems, especially tropical forests, are crucial as they are the largest reserves of terrestrial carbon sink. For example, the Amazon forests are a critical component of global carbon cycle storing about 100 billion tons of carbon in its woody biomass. There is a growing concern that these forests could succumb to precipitation reduction in a progressively warming climate, leading to release of significant amount of carbon in the atmosphere. Therefore, there is a need to accurately quantify the dependence of vegetation growth on different climate variables and obtain better estimates of drought-induced changes to atmospheric CO2. The availability of globally consistent climate and earth observation datasets have allowed global scale monitoring of various climate and vegetation variables such as precipitation, radiation, surface greenness, etc. Using these diverse datasets, we aim to quantify the magnitude and extent of ecosystem exposure, sensitivity and resilience to droughts in forests. The Amazon rainforests have undergone severe droughts twice in last decade (2005 and 2010), which makes them an ideal candidate for the regional scale analysis. Current studies on vegetation and climate relationships have mostly explored linear dependence due to computational and domain knowledge constraints. We explore a modeling technique called symbolic regression based on evolutionary computation that allows discovery of the dependency structure without any prior assumptions. In symbolic regression the population of possible solutions is defined via trees structures. Each tree represents a mathematical expression that includes pre-defined functions (mathematical operators) and terminal sets (independent variables from data). Selection of these sets is critical to computational efficiency and model accuracy. In this work we investigate

  14. Building factorial regression models to explain and predict nitrate concentrations in groundwater under agricultural land

    NASA Astrophysics Data System (ADS)

    Stigter, T. Y.; Ribeiro, L.; Dill, A. M. M. Carvalho

    2008-07-01

    SummaryFactorial regression models, based on correspondence analysis, are built to explain the high nitrate concentrations in groundwater beneath an agricultural area in the south of Portugal, exceeding 300 mg/l, as a function of chemical variables, electrical conductivity (EC), land use and hydrogeological setting. Two important advantages of the proposed methodology are that qualitative parameters can be involved in the regression analysis and that multicollinearity is avoided. Regression is performed on eigenvectors extracted from the data similarity matrix, the first of which clearly reveals the impact of agricultural practices and hydrogeological setting on the groundwater chemistry of the study area. Significant correlation exists between response variable NO3- and explanatory variables Ca 2+, Cl -, SO42-, depth to water, aquifer media and land use. Substituting Cl - by the EC results in the most accurate regression model for nitrate, when disregarding the four largest outliers (model A). When built solely on land use and hydrogeological setting, the regression model (model B) is less accurate but more interesting from a practical viewpoint, as it is based on easily obtainable data and can be used to predict nitrate concentrations in groundwater in other areas with similar conditions. This is particularly useful for conservative contaminants, where risk and vulnerability assessment methods, based on assumed rather than established correlations, generally produce erroneous results. Another purpose of the models can be to predict the future evolution of nitrate concentrations under influence of changes in land use or fertilization practices, which occur in compliance with policies such as the Nitrates Directive. Model B predicts a 40% decrease in nitrate concentrations in groundwater of the study area, when horticulture is replaced by other land use with much lower fertilization and irrigation rates.

  15. A vector auto-regressive model for onshore and offshore wind synthesis incorporating meteorological model information

    NASA Astrophysics Data System (ADS)

    Hill, D.; Bell, K. R. W.; McMillan, D.; Infield, D.

    2014-05-01

    The growth of wind power production in the electricity portfolio is striving to meet ambitious targets set, for example by the EU, to reduce greenhouse gas emissions by 20% by 2020. Huge investments are now being made in new offshore wind farms around UK coastal waters that will have a major impact on the GB electrical supply. Representations of the UK wind field in syntheses which capture the inherent structure and correlations between different locations including offshore sites are required. Here, Vector Auto-Regressive (VAR) models are presented and extended in a novel way to incorporate offshore time series from a pan-European meteorological model called COSMO, with onshore wind speeds from the MIDAS dataset provided by the British Atmospheric Data Centre. Forecasting ability onshore is shown to be improved with the inclusion of the offshore sites with improvements of up to 25% in RMS error at 6 h ahead. In addition, the VAR model is used to synthesise time series of wind at each offshore site, which are then used to estimate wind farm capacity factors at the sites in question. These are then compared with estimates of capacity factors derived from the work of Hawkins et al. (2011). A good degree of agreement is established indicating that this synthesis tool should be useful in power system impact studies.

  16. A land use regression model for ambient ultrafine particles in Montreal, Canada: A comparison of linear regression and a machine learning approach.

    PubMed

    Weichenthal, Scott; Ryswyk, Keith Van; Goldstein, Alon; Bagg, Scott; Shekkarizfard, Maryam; Hatzopoulou, Marianne

    2016-04-01

    Existing evidence suggests that ambient ultrafine particles (UFPs) (<0.1µm) may contribute to acute cardiorespiratory morbidity. However, few studies have examined the long-term health effects of these pollutants owing in part to a need for exposure surfaces that can be applied in large population-based studies. To address this need, we developed a land use regression model for UFPs in Montreal, Canada using mobile monitoring data collected from 414 road segments during the summer and winter months between 2011 and 2012. Two different approaches were examined for model development including standard multivariable linear regression and a machine learning approach (kernel-based regularized least squares (KRLS)) that learns the functional form of covariate impacts on ambient UFP concentrations from the data. The final models included parameters for population density, ambient temperature and wind speed, land use parameters (park space and open space), length of local roads and rail, and estimated annual average NOx emissions from traffic. The final multivariable linear regression model explained 62% of the spatial variation in ambient UFP concentrations whereas the KRLS model explained 79% of the variance. The KRLS model performed slightly better than the linear regression model when evaluated using an external dataset (R(2)=0.58 vs. 0.55) or a cross-validation procedure (R(2)=0.67 vs. 0.60). In general, our findings suggest that the KRLS approach may offer modest improvements in predictive performance compared to standard multivariable linear regression models used to estimate spatial variations in ambient UFPs. However, differences in predictive performance were not statistically significant when evaluated using the cross-validation procedure.

  17. The prediction of intelligence in preschool children using alternative models to regression.

    PubMed

    Finch, W Holmes; Chang, Mei; Davis, Andrew S; Holden, Jocelyn E; Rothlisberg, Barbara A; McIntosh, David E

    2011-12-01

    Statistical prediction of an outcome variable using multiple independent variables is a common practice in the social and behavioral sciences. For example, neuropsychologists are sometimes called upon to provide predictions of preinjury cognitive functioning for individuals who have suffered a traumatic brain injury. Typically, these predictions are made using standard multiple linear regression models with several demographic variables (e.g., gender, ethnicity, education level) as predictors. Prior research has shown conflicting evidence regarding the ability of such models to provide accurate predictions of outcome variables such as full-scale intelligence (FSIQ) test scores. The present study had two goals: (1) to demonstrate the utility of a set of alternative prediction methods that have been applied extensively in the natural sciences and business but have not been frequently explored in the social sciences and (2) to develop models that can be used to predict premorbid cognitive functioning in preschool children. Predictions of Stanford-Binet 5 FSIQ scores for preschool-aged children is used to compare the performance of a multiple regression model with several of these alternative methods. Results demonstrate that classification and regression trees provided more accurate predictions of FSIQ scores than does the more traditional regression approach. Implications of these results are discussed.

  18. On the Latent Regression Model of Item Response Theory. Research Report. ETS RR-07-12

    ERIC Educational Resources Information Center

    Antal, Tamás

    2007-01-01

    Full account of the latent regression model for the National Assessment of Educational Progress is given. The treatment includes derivation of the EM algorithm, Newton-Raphson method, and the asymptotic standard errors. The paper also features the use of the adaptive Gauss-Hermite numerical integration method as a basic tool to evaluate…

  19. Not Quite Normal: Consequences of Violating the Assumption of Normality in Regression Mixture Models

    ERIC Educational Resources Information Center

    Van Horn, M. Lee; Smith, Jessalyn; Fagan, Abigail A.; Jaki, Thomas; Feaster, Daniel J.; Masyn, Katherine; Hawkins, J. David; Howe, George

    2012-01-01

    Regression mixture models, which have only recently begun to be used in applied research, are a new approach for finding differential effects. This approach comes at the cost of the assumption that error terms are normally distributed within classes. This study uses Monte Carlo simulations to explore the effects of relatively minor violations of…

  20. What Is Wrong with ANOVA and Multiple Regression? Analyzing Sentence Reading Times with Hierarchical Linear Models

    ERIC Educational Resources Information Center

    Richter, Tobias

    2006-01-01

    Most reading time studies using naturalistic texts yield data sets characterized by a multilevel structure: Sentences (sentence level) are nested within persons (person level). In contrast to analysis of variance and multiple regression techniques, hierarchical linear models take the multilevel structure of reading time data into account. They…

  1. Strategies for Testing Statistical and Practical Significance in Detecting DIF with Logistic Regression Models

    ERIC Educational Resources Information Center

    Fidalgo, Angel M.; Alavi, Seyed Mohammad; Amirian, Seyed Mohammad Reza

    2014-01-01

    This study examines three controversial aspects in differential item functioning (DIF) detection by logistic regression (LR) models: first, the relative effectiveness of different analytical strategies for detecting DIF; second, the suitability of the Wald statistic for determining the statistical significance of the parameters of interest; and…

  2. A comparative study between nonlinear regression and nonparametric approaches for modelling Phalaris paradoxa seedling emergence

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Parametric non-linear regression (PNR) techniques commonly are used to develop weed seedling emergence models. Such techniques, however, require statistical assumptions that are difficult to meet. To examine and overcome these limitations, we compared PNR with a nonparametric estimation technique. F...

  3. Temporal Synchronization Analysis for Improving Regression Modeling of Fecal Indicator Bacteria Levels

    EPA Science Inventory

    Multiple linear regression models are often used to predict levels of fecal indicator bacteria (FIB) in recreational swimming waters based on independent variables (IVs) such as meteorologic, hydrodynamic, and water-quality measures. The IVs used for these analyses are traditiona...

  4. Regression shrinkage and neural models in predicting the results of 400-metres hurdles races

    PubMed Central

    Iskra, J; Maszczyk, A; Nawrocka, M

    2016-01-01

    This study presents the application of regression shrinkage and artificial neural networks in predicting the results of 400-metres hurdles races. The regression models predict the results for suggested training loads in the selected three-month training period. The material of the research was based on training data of 21 Polish hurdlers from the Polish National Athletics Team Association. The athletes were characterized by a high level of performance. To assess the predictive ability of the constructed models a method of leave-one-out cross-validation was used. The analysis showed that the method generating the smallest prediction error was the LASSO regression extended by quadratic terms. The optimal model generated the prediction error of 0.59 s. Otherwise the optimal set of input variables (by reducing 8 of the 27 predictors) was defined. The results obtained justify the use of regression shrinkage in predicting sports outcomes. The resulting model can be used as a tool to assist the coach in planning training loads in a selected training period. PMID:28090147

  5. Testing Mediation Using Multiple Regression and Structural Equation Modeling Analyses in Secondary Data

    ERIC Educational Resources Information Center

    Li, Spencer D.

    2011-01-01

    Mediation analysis in child and adolescent development research is possible using large secondary data sets. This article provides an overview of two statistical methods commonly used to test mediated effects in secondary analysis: multiple regression and structural equation modeling (SEM). Two empirical studies are presented to illustrate the…

  6. The Performance of the Full Information Maximum Likelihood Estimator in Multiple Regression Models with Missing Data.

    ERIC Educational Resources Information Center

    Enders, Craig K.

    2001-01-01

    Examined the performance of a recently available full information maximum likelihood (FIML) estimator in a multiple regression model with missing data using Monte Carlo simulation and considering the effects of four independent variables. Results indicate that FIML estimation was superior to that of three ad hoc techniques, with less bias and less…

  7. Power and sample size calculations for generalized regression models with covariate measurement error.

    PubMed

    Tosteson, Tor D; Buzas, Jeffrey S; Demidenko, Eugene; Karagas, Margaret

    2003-04-15

    Covariate measurement error is often a feature of scientific data used for regression modelling. The consequences of such errors include a loss of power of tests of significance for the regression parameters corresponding to the true covariates. Power and sample size calculations that ignore covariate measurement error tend to overestimate power and underestimate the actual sample size required to achieve a desired power. In this paper we derive a novel measurement error corrected power function for generalized linear models using a generalized score test based on quasi-likelihood methods. Our power function is flexible in that it is adaptable to designs with a discrete or continuous scalar covariate (exposure) that can be measured with or without error, allows for additional confounding variables and applies to a broad class of generalized regression and measurement error models. A program is described that provides sample size or power for a continuous exposure with a normal measurement error model and a single normal confounder variable in logistic regression. We demonstrate the improved properties of our power calculations with simulations and numerical studies. An example is given from an ongoing study of cancer and exposure to arsenic as measured by toenail concentrations and tap water samples.

  8. Goodness-of-fit tests and model diagnostics for negative binomial regression of RNA sequencing data.

    PubMed

    Mi, Gu; Di, Yanming; Schafer, Daniel W

    2015-01-01

    This work is about assessing model adequacy for negative binomial (NB) regression, particularly (1) assessing the adequacy of the NB assumption, and (2) assessing the appropriateness of models for NB dispersion parameters. Tools for the first are appropriate for NB regression generally; those for the second are primarily intended for RNA sequencing (RNA-Seq) data analysis. The typically small number of biological samples and large number of genes in RNA-Seq analysis motivate us to address the trade-offs between robustness and statistical power using NB regression models. One widely-used power-saving strategy, for example, is to assume some commonalities of NB dispersion parameters across genes via simple models relating them to mean expression rates, and many such models have been proposed. As RNA-Seq analysis is becoming ever more popular, it is appropriate to make more thorough investigations into power and robustness of the resulting methods, and into practical tools for model assessment. In this article, we propose simulation-based statistical tests and diagnostic graphics to address model adequacy. We provide simulated and real data examples to illustrate that our proposed methods are effective for detecting the misspecification of the NB mean-variance relationship as well as judging the adequacy of fit of several NB dispersion models.

  9. Evaluation and application of regional turbidity-sediment regression models in Virginia

    USGS Publications Warehouse

    Hyer, Kenneth; Jastram, John D.; Moyer, Douglas; Webber, James S.; Chanat, Jeffrey G.

    2015-01-01

    Conventional thinking has long held that turbidity-sediment surrogate-regression equations are site specific and that regression equations developed at a single monitoring station should not be applied to another station; however, few studies have evaluated this issue in a rigorous manner. If robust regional turbidity-sediment models can be developed successfully, their applications could greatly expand the usage of these methods. Suspended sediment load estimation could occur as soon as flow and turbidity monitoring commence at a site, suspended sediment sampling frequencies for various projects potentially could be reduced, and special-project applications (sediment monitoring following dam removal, for example) could be significantly enhanced. The objective of this effort was to investigate the turbidity-suspended sediment concentration (SSC) relations at all available USGS monitoring sites within Virginia to determine whether meaningful turbidity-sediment regression models can be developed by combining the data from multiple monitoring stations into a single model, known as a “regional” model. Following the development of the regional model, additional objectives included a comparison of predicted SSCs between the regional model and commonly used site-specific models, as well as an evaluation of why specific monitoring stations did not fit the regional model.

  10. Efficient least angle regression for identification of linear-in-the-parameters models.

    PubMed

    Zhao, Wanqing; Beach, Thomas H; Rezgui, Yacine

    2017-02-01

    Least angle regression, as a promising model selection method, differentiates itself from conventional stepwise and stagewise methods, in that it is neither too greedy nor too slow. It is closely related to L1 norm optimization, which has the advantage of low prediction variance through sacrificing part of model bias property in order to enhance model generalization capability. In this paper, we propose an efficient least angle regression algorithm for model selection for a large class of linear-in-the-parameters models with the purpose of accelerating the model selection process. The entire algorithm works completely in a recursive manner, where the correlations between model terms and residuals, the evolving directions and other pertinent variables are derived explicitly and updated successively at every subset selection step. The model coefficients are only computed when the algorithm finishes. The direct involvement of matrix inversions is thereby relieved. A detailed computational complexity analysis indicates that the proposed algorithm possesses significant computational efficiency, compared with the original approach where the well-known efficient Cholesky decomposition is involved in solving least angle regression. Three artificial and real-world examples are employed to demonstrate the effectiveness, efficiency and numerical stability of the proposed algorithm.

  11. Coupled variable selection for regression modeling of complex treatment patterns in a clinical cancer registry.

    PubMed

    Schmidtmann, I; Elsäßer, A; Weinmann, A; Binder, H

    2014-12-30

    For determining a manageable set of covariates potentially influential with respect to a time-to-event endpoint, Cox proportional hazards models can be combined with variable selection techniques, such as stepwise forward selection or backward elimination based on p-values, or regularized regression techniques such as component-wise boosting. Cox regression models have also been adapted for dealing with more complex event patterns, for example, for competing risks settings with separate, cause-specific hazard models for each event type, or for determining the prognostic effect pattern of a variable over different landmark times, with one conditional survival model for each landmark. Motivated by a clinical cancer registry application, where complex event patterns have to be dealt with and variable selection is needed at the same time, we propose a general approach for linking variable selection between several Cox models. Specifically, we combine score statistics for each covariate across models by Fisher's method as a basis for variable selection. This principle is implemented for a stepwise forward selection approach as well as for a regularized regression technique. In an application to data from hepatocellular carcinoma patients, the coupled stepwise approach is seen to facilitate joint interpretation of the different cause-specific Cox models. In conditional survival models at landmark times, which address updates of prediction as time progresses and both treatment and other potential explanatory variables may change, the coupled regularized regression approach identifies potentially important, stably selected covariates together with their effect time pattern, despite having only a small number of events. These results highlight the promise of the proposed approach for coupling variable selection between Cox models, which is particularly relevant for modeling for clinical cancer registries with their complex event patterns.

  12. Regressions by leaps and bounds and biased estimation techniques in yield modeling

    NASA Technical Reports Server (NTRS)

    Marquina, N. E. (Principal Investigator)

    1979-01-01

    The author has identified the following significant results. It was observed that OLS was not adequate as an estimation procedure when the independent or regressor variables were involved in multicollinearities. This was shown to cause the presence of small eigenvalues of the extended correlation matrix A'A. It was demonstrated that the biased estimation techniques and the all-possible subset regression could help in finding a suitable model for predicting yield. Latent root regression was an excellent tool that found how many predictive and nonpredictive multicollinearities there were.

  13. Women's Work Conditions and Marital Adjustment in Two-Earner Couples: A Structural Model.

    ERIC Educational Resources Information Center

    Sears, Heather A.; Galambos, Nancy L.

    1992-01-01

    Evaluated structural model of women's work conditions, women's stress, and marital adjustment using path analysis. Findings from 86 2-earner couples with adolescents indicated support for spillover model in which women's work stress and global stress mediated link between their work conditions and their perceptions of marital adjustment.…

  14. Fatigue design of a cellular phone folder using regression model-based multi-objective optimization

    NASA Astrophysics Data System (ADS)

    Kim, Young Gyun; Lee, Jongsoo

    2016-08-01

    In a folding cellular phone, the folding device is repeatedly opened and closed by the user, which eventually results in fatigue damage, particularly to the front of the folder. Hence, it is important to improve the safety and endurance of the folder while also reducing its weight. This article presents an optimal design for the folder front that maximizes its fatigue endurance while minimizing its thickness. Design data for analysis and optimization were obtained experimentally using a test jig. Multi-objective optimization was carried out using a nonlinear regression model. Three regression methods were employed: back-propagation neural networks, logistic regression and support vector machines. The AdaBoost ensemble technique was also used to improve the approximation. Two-objective Pareto-optimal solutions were identified using the non-dominated sorting genetic algorithm (NSGA-II). Finally, a numerically optimized solution was validated against experimental product data, in terms of both fatigue endurance and thickness index.

  15. A Multilevel Regression Model for Geographical Studies in Sets of Non-Adjacent Cities.

    PubMed

    Marí-Dell'Olmo, Marc; Martínez-Beneito, Miguel Ángel

    2015-01-01

    In recent years, small-area-based ecological regression analyses have been published that study the association between a health outcome and a covariate in several cities. These analyses have usually been performed independently for each city and have therefore yielded unrelated estimates for the cities considered, even though the same process has been studied in all of them. In this study, we propose a joint ecological regression model for multiple cities that accounts for spatial structure both within and between cities and explore the advantages of this model. The proposed model merges both disease mapping and geostatistical ideas. Our proposal is compared with two alternatives, one that models the association for each city as fixed effects and another that treats them as independent and identically distributed random effects. The proposed model allows us to estimate the association (and assess its significance) at locations with no available data. Our proposal is illustrated by an example of the association between unemployment (as a deprivation surrogate) and lung cancer mortality among men in 31 Spanish cities. In this example, the associations found were far more accurate for the proposed model than those from the fixed effects model. Our main conclusion is that ecological regression analyses can be markedly improved by performing joint analyses at several locations that share information among them. This finding should be taken into consideration in the design of future epidemiological studies.

  16. Non-linear modeling using fuzzy principal component regression for Vidyaranyapuram sewage treatment plant, Mysore - India.

    PubMed

    Sulthana, Ayesha; Latha, K C; Imran, Mohammad; Rathan, Ramya; Sridhar, R; Balasubramanian, S

    2014-01-01

    Fuzzy principal component regression (FPCR) is proposed to model the non-linear process of sewage treatment plant (STP) data matrix. The dimension reduction of voluminous data was done by principal component analysis (PCA). The PCA score values were partitioned by fuzzy-c-means (FCM) clustering, and a Takagi-Sugeno-Kang (TSK) fuzzy model was built based on the FCM functions. The FPCR approach was used to predict the reduction in chemical oxygen demand (COD) and biological oxygen demand (BOD) of treated wastewater of Vidyaranyapuram STP with respect to the relations modeled between fuzzy partitioned PCA scores and target output. The designed FPCR model showed the ability to capture the behavior of non-linear processes of STP. The predicted values of reduction in COD and BOD were analyzed by performing the linear regression analysis. The predicted values for COD and BOD reduction showed positive correlation with the observed data.

  17. A Regression Framework for Effect Size Assessments in Longitudinal Modeling of Group Differences.

    PubMed

    Feingold, Alan

    2013-03-01

    The use of growth modeling analysis (GMA)--particularly multilevel analysis and latent growth modeling--to test the significance of intervention effects has increased exponentially in prevention science, clinical psychology, and psychiatry over the past 15 years. Model-based effect sizes for differences in means between two independent groups in GMA can be expressed in the same metric (Cohen's d) commonly used in classical analysis and meta-analysis. This article first reviews conceptual issues regarding calculation of d for findings from GMA and then introduces an integrative framework for effect size assessments that subsumes GMA. The new approach uses the structure of the linear regression model, from which effect sizes for findings from diverse cross-sectional and longitudinal analyses can be calculated with familiar statistics, such as the regression coefficient, the standard deviation of the dependent measure, and study duration.

  18. Regression-based model of skin diffuse reflectance for skin color analysis

    NASA Astrophysics Data System (ADS)

    Tsumura, Norimichi; Kawazoe, Daisuke; Nakaguchi, Toshiya; Ojima, Nobutoshi; Miyake, Yoichi

    2008-11-01

    A simple regression-based model of skin diffuse reflectance is developed based on reflectance samples calculated by Monte Carlo simulation of light transport in a two-layered skin model. This reflectance model includes the values of spectral reflectance in the visible spectra for Japanese women. The modified Lambert Beer law holds in the proposed model with a modified mean free path length in non-linear density space. The averaged RMS and maximum errors of the proposed model were 1.1 and 3.1%, respectively, in the above range.

  19. A wild-type mouse-based model for the regression of inflammation in atherosclerosis

    PubMed Central

    Weinstock, Ada; Barrett, Tessa J.; Zhou, Felix; Quezada, Alexandra; Fisher, Edward A.

    2017-01-01

    Atherosclerosis can be induced by the injection of a gain-of-function mutant of proprotein convertase subtilisin/kexin type 9 (PCSK9)–encoding adeno-associated viral vector (AAVmPCSK9), avoiding the need for knockout mice models, such as low-density lipoprotein receptor deficient mice. As regression of atherosclerosis is a crucial therapeutic goal, we aimed to establish a regression model based on AAVmPCSK9, which will eliminate the need for germ-line genetic modifications. C57BL6/J mice were injected with AAVmPCSK9 and were fed with Western diet for 16 weeks, followed by reversal of hyperlipidemia by a diet switch to chow and treatment with a microsomal triglyceride transfer protein inhibitor (MTPi). Sixteen weeks following AAVmPCSK9 injection, mice had advanced atherosclerotic lesions in the aortic root. Surprisingly, diet switch to chow alone reversed hyperlipidemia to near normal levels, and the addition of MTPi completely normalized hyperlipidemia. A six week reversal of hyperlipidemia, either by diet switch alone or by diet switch and MTPi treatment, was accompanied by regression of atherosclerosis as defined by a significant decrease of macrophages in the atherosclerotic plaques, compared to baseline. Thus, we have established an atherosclerosis regression model that is independent of the genetic background. PMID:28291840

  20. Selection on plasticity of seasonal life-history traits using random regression mixed model analysis

    PubMed Central

    Brommer, Jon E; Kontiainen, Pekka; Pietiäinen, Hannu

    2012-01-01

    Theory considers the covariation of seasonal life-history traits as an optimal reaction norm, implying that deviating from this reaction norm reduces fitness. However, the estimation of reaction-norm properties (i.e., elevation, linear slope, and higher order slope terms) and the selection on these is statistically challenging. We here advocate the use of random regression mixed models to estimate reaction-norm properties and the use of bivariate random regression to estimate selection on these properties within a single model. We illustrate the approach by random regression mixed models on 1115 observations of clutch sizes and laying dates of 361 female Ural owl Strix uralensis collected over 31 years to show that (1) there is variation across individuals in the slope of their clutch size–laying date relationship, and that (2) there is selection on the slope of the reaction norm between these two traits. Hence, natural selection potentially drives the negative covariance in clutch size and laying date in this species. The random-regression approach is hampered by inability to estimate nonlinear selection, but avoids a number of disadvantages (stats-on-stats, connecting reaction-norm properties to fitness). The approach is of value in describing and studying selection on behavioral reaction norms (behavioral syndromes) or life-history reaction norms. The approach can also be extended to consider the genetic underpinning of reaction-norm properties. PMID:22837818

  1. Multiple Regression (MR) and Artificial Neural Network (ANN) models for prediction of soil suction

    NASA Astrophysics Data System (ADS)

    Erzin, Yusuf; Yilmaz, Isik

    2010-05-01

    This article presents a comparison of multiple regression (MR) and artificial neural network (ANN) model for prediction of soil suction of clayey soils. The results of the soil suction tests utilizing thermocouple psychrometers on statically compacted specimens of Bentonite-Kaolinite clay mixtures with varying soil properties were used to develope the models. The results obtained from both models were then compared with the experimental results. The performance indices such as coefficient of determination (R2), root mean square error (RMSE), mean absolute error (MAE), and variance account for (VAF) were used to control the performance of the prediction capacity of the models developed in this study. ANN model has shown higher prediction performance than regression model according to the performance indices. It is shown that ANN models provide significant improvements in prediction accuracy over statistical models. The potential benefits of soft computing models extend beyond the high computation rates. Higher performances of the soft computing models were sourced from greater degree of robustness and fault tolerance than traditional statistical models because there are many more processing neurons, each with primarily local connections. It appears that there is a possibility of estimating soil suction by using the proposed empirical relationships and soft computing models. The population of the analyzed data is relatively limited in this study. Therefore, the practical outcome of the proposed equations and models could be used, with acceptable accuracy.

  2. A Heterogeneous Bayesian Regression Model for Cross-Sectional Data Involving a Single Observation per Response Unit

    ERIC Educational Resources Information Center

    Fong, Duncan K. H.; Ebbes, Peter; DeSarbo, Wayne S.

    2012-01-01

    Multiple regression is frequently used across the various social sciences to analyze cross-sectional data. However, it can often times be challenging to justify the assumption of common regression coefficients across all respondents. This manuscript presents a heterogeneous Bayesian regression model that enables the estimation of…

  3. Modeling data for pancreatitis in presence of a duodenal diverticula using logistic regression

    NASA Astrophysics Data System (ADS)

    Dineva, S.; Prodanova, K.; Mlachkova, D.

    2013-12-01

    The presence of a periampullary duodenal diverticulum (PDD) is often observed during upper digestive tract barium meal studies and endoscopic retrograde cholangiopancreatography (ERCP). A few papers reported that the diverticulum had something to do with the incidence of pancreatitis. The aim of this study is to investigate if the presence of duodenal diverticula predisposes to the development of a pancreatic disease. A total 3966 patients who had undergone ERCP were studied retrospectively. They were divided into 2 groups-with and without PDD. Patients with a duodenal diverticula had a higher rate of acute pancreatitis. The duodenal diverticula is a risk factor for acute idiopathic pancreatitis. A multiple logistic regression to obtain adjusted estimate of odds and to identify if a PDD is a predictor of acute or chronic pancreatitis was performed. The software package STATISTICA 10.0 was used for analyzing the real data.

  4. Cross-validation pitfalls when selecting and assessing regression and classification models

    PubMed Central

    2014-01-01

    Background We address the problem of selecting and assessing classification and regression models using cross-validation. Current state-of-the-art methods can yield models with high variance, rendering them unsuitable for a number of practical applications including QSAR. In this paper we describe and evaluate best practices which improve reliability and increase confidence in selected models. A key operational component of the proposed methods is cloud computing which enables routine use of previously infeasible approaches. Methods We describe in detail an algorithm for repeated grid-search V-fold cross-validation for parameter tuning in classification and regression, and we define a repeated nested cross-validation algorithm for model assessment. As regards variable selection and parameter tuning we define two algorithms (repeated grid-search cross-validation and double cross-validation), and provide arguments for using the repeated grid-search in the general case. Results We show results of our algorithms on seven QSAR datasets. The variation of the prediction performance, which is the result of choosing different splits of the dataset in V-fold cross-validation, needs to be taken into account when selecting and assessing classification and regression models. Conclusions We demonstrate the importance of repeating cross-validation when selecting an optimal model, as well as the importance of repeating nested cross-validation when assessing a prediction error. PMID:24678909

  5. Bayesian analysis of a multivariate null intercept errors-in-variables regression model.

    PubMed

    Aoki, Reiko; Bolfarine, Heleno; Achcar, Jorge A; Dorival, Leão P Júnior

    2003-11-01

    Longitudinal data are of great interest in analysis of clinical trials. In many practical situations the covariate can not be measured precisely and a natural alternative model is the errors-in-variables regression models. In this paper we study a null intercept errors-in-variables regression model with a structure of dependency between the response variables within the same group. We apply the model to real data presented in Hadgu and Koch (Hadgu, A., Koch, G. (1999). Application of generalized estimating equations to a dental randomized clinical trial. J. Biopharmaceutical Statistics 9(1):161-178). In that study volunteers with preexisting dental plaque were randomized to two experimental mouth rinses (A and B) or a control mouth rinse with double blinding. The dental plaque index was measured for each subject in the beginning of the study and at two follow-up times, which leads to the presence of an interclass correlation. We propose the use of a Bayesian approach to model a multivariate null intercept errors-in-variables regression model to the longitudinal data. The proposed Bayesian approach accommodates the correlated measurements and incorporates the restriction that the slopes must lie in the (0, 1) interval. A Gibbs sampler is used to perform the computations.

  6. Robust colour calibration of an imaging system using a colour space transform and advanced regression modelling.

    PubMed

    Jackman, Patrick; Sun, Da-Wen; Elmasry, Gamal

    2012-08-01

    A new algorithm for the conversion of device dependent RGB colour data into device independent L*a*b* colour data without introducing noticeable error has been developed. By combining a linear colour space transform and advanced multiple regression methodologies it was possible to predict L*a*b* colour data with less than 2.2 colour units of error (CIE 1976). By transforming the red, green and blue colour components into new variables that better reflect the structure of the L*a*b* colour space, a low colour calibration error was immediately achieved (ΔE(CAL) = 14.1). Application of a range of regression models on the data further reduced the colour calibration error substantially (multilinear regression ΔE(CAL) = 5.4; response surface ΔE(CAL) = 2.9; PLSR ΔE(CAL) = 2.6; LASSO regression ΔE(CAL) = 2.1). Only the PLSR models deteriorated substantially under cross validation. The algorithm is adaptable and can be easily recalibrated to any working computer vision system. The algorithm was tested on a typical working laboratory computer vision system and delivered only a very marginal loss of colour information ΔE(CAL) = 2.35. Colour features derived on this system were able to safely discriminate between three classes of ham with 100% correct classification whereas colour features measured on a conventional colourimeter were not.

  7. Model-wise and point-wise random sample consensus for robust regression and outlier detection.

    PubMed

    El-Melegy, Moumen T

    2014-11-01

    Popular regression techniques often suffer at the presence of data outliers. Most previous efforts to solve this problem have focused on using an estimation algorithm that minimizes a robust M-estimator based error criterion instead of the usual non-robust mean squared error. However the robustness gained from M-estimators is still low. This paper addresses robust regression and outlier detection in a random sample consensus (RANSAC) framework. It studies the classical RANSAC framework and highlights its model-wise nature for processing the data. Furthermore, it introduces for the first time a point-wise strategy of RANSAC. New estimation algorithms are developed following both the model-wise and point-wise RANSAC concepts. The proposed algorithms' theoretical robustness and breakdown points are investigated in a novel probabilistic setting. While the proposed concepts and algorithms are generic and general enough to adopt many regression machineries, the paper focuses on multilayered feed-forward neural networks in solving regression problems. The algorithms are evaluated on synthetic and real data, contaminated with high degrees of outliers, and compared to existing neural network training algorithms. Furthermore, to improve the time performance, parallel implementations of the two algorithms are developed and assessed to utilize the multiple CPU cores available on nowadays computers.

  8. Detection of outliers in the response and explanatory variables of the simple circular regression model

    NASA Astrophysics Data System (ADS)

    Mahmood, Ehab A.; Rana, Sohel; Hussin, Abdul Ghapor; Midi, Habshah

    2016-06-01

    The circular regression model may contain one or more data points which appear to be peculiar or inconsistent with the main part of the model. This may be occur due to recording errors, sudden short events, sampling under abnormal conditions etc. The existence of these data points "outliers" in the data set cause lot of problems in the research results and the conclusions. Therefore, we should identify them before applying statistical analysis. In this article, we aim to propose a statistic to identify outliers in the both of the response and explanatory variables of the simple circular regression model. Our proposed statistic is robust circular distance RCDxy and it is justified by the three robust measurements such as proportion of detection outliers, masking and swamping rates.

  9. EXpectation Propagation LOgistic REgRession (EXPLORER): Distributed Privacy-Preserving Online Model Learning

    PubMed Central

    Wang, Shuang; Jiang, Xiaoqian; Wu, Yuan; Cui, Lijuan; Cheng, Samuel; Ohno-Machado, Lucila

    2013-01-01

    We developed an EXpectation Propagation LOgistic REgRession (EXPLORER) model for distributed privacy-preserving online learning. The proposed framework provides a high level guarantee for protecting sensitive information, since the information exchanged between the server and the client is the encrypted posterior distribution of coefficients. Through experimental results, EXPLORER shows the same performance (e.g., discrimination, calibration, feature selection etc.) as the traditional frequentist Logistic Regression model, but provides more flexibility in model updating. That is, EXPLORER can be updated one point at a time rather than having to retrain the entire data set when new observations are recorded. The proposed EXPLORER supports asynchronized communication, which relieves the participants from coordinating with one another, and prevents service breakdown from the absence of participants or interrupted communications. PMID:23562651

  10. Ordinal regression models to describe tourist satisfaction with Sintra's world heritage

    NASA Astrophysics Data System (ADS)

    Mouriño, Helena

    2013-10-01

    In Tourism Research, ordinal regression models are becoming a very powerful tool in modelling the relationship between an ordinal response variable and a set of explanatory variables. In August and September 2010, we conducted a pioneering Tourist Survey in Sintra, Portugal. The data were obtained by face-to-face interviews at the entrances of the Palaces and Parks of Sintra. The work developed in this paper focus on two main points: tourists' perception of the entrance fees; overall level of satisfaction with this heritage site. For attaining these goals, ordinal regression models were developed. We concluded that tourist's nationality was the only significant variable to describe the perception of the admission fees. Also, Sintra's image among tourists depends not only on their nationality, but also on previous knowledge about Sintra's World Heritage status.

  11. A logistic regression model for predicting axillary lymph node metastases in early breast carcinoma patients.

    PubMed

    Xie, Fei; Yang, Houpu; Wang, Shu; Zhou, Bo; Tong, Fuzhong; Yang, Deqi; Zhang, Jiaqing

    2012-01-01

    Nodal staging in breast cancer is a key predictor of prognosis. This paper presents the results of potential clinicopathological predictors of axillary lymph node involvement and develops an efficient prediction model to assist in predicting axillary lymph node metastases. Seventy patients with primary early breast cancer who underwent axillary dissection were evaluated. Univariate and multivariate logistic regression were performed to evaluate the association between clinicopathological factors and lymph node metastatic status. A logistic regression predictive model was built from 50 randomly selected patients; the model was also applied to the remaining 20 patients to assess its validity. Univariate analysis showed a significant relationship between lymph node involvement and absence of nm-23 (p = 0.010) and Kiss-1 (p = 0.001) expression. Absence of Kiss-1 remained significantly associated with positive axillary node status in the multivariate analysis (p = 0.018). Seven clinicopathological factors were involved in the multivariate logistic regression model: menopausal status, tumor size, ER, PR, HER2, nm-23 and Kiss-1. The model was accurate and discriminating, with an area under the receiver operating characteristic curve of 0.702 when applied to the validation group. Moreover, there is a need discover more specific candidate proteins and molecular biology tools to select more variables which should improve predictive accuracy.

  12. A Logistic Regression Model for Predicting Axillary Lymph Node Metastases in Early Breast Carcinoma Patients

    PubMed Central

    Xie, Fei; Yang, Houpu; Wang, Shu; Zhou, Bo; Tong, Fuzhong; Yang, Deqi; Zhang, Jiaqing

    2012-01-01

    Nodal staging in breast cancer is a key predictor of prognosis. This paper presents the results of potential clinicopathological predictors of axillary lymph node involvement and develops an efficient prediction model to assist in predicting axillary lymph node metastases. Seventy patients with primary early breast cancer who underwent axillary dissection were evaluated. Univariate and multivariate logistic regression were performed to evaluate the association between clinicopathological factors and lymph node metastatic status. A logistic regression predictive model was built from 50 randomly selected patients; the model was also applied to the remaining 20 patients to assess its validity. Univariate analysis showed a significant relationship between lymph node involvement and absence of nm-23 (p = 0.010) and Kiss-1 (p = 0.001) expression. Absence of Kiss-1 remained significantly associated with positive axillary node status in the multivariate analysis (p = 0.018). Seven clinicopathological factors were involved in the multivariate logistic regression model: menopausal status, tumor size, ER, PR, HER2, nm-23 and Kiss-1. The model was accurate and discriminating, with an area under the receiver operating characteristic curve of 0.702 when applied to the validation group. Moreover, there is a need discover more specific candidate proteins and molecular biology tools to select more variables which should improve predictive accuracy. PMID:23012578

  13. Comprehensible Predictive Modeling Using Regularized Logistic Regression and Comorbidity Based Features

    PubMed Central

    Stiglic, Gregor; Povalej Brzan, Petra; Fijacko, Nino; Wang, Fei; Delibasic, Boris; Kalousis, Alexandros; Obradovic, Zoran

    2015-01-01

    Different studies have demonstrated the importance of comorbidities to better understand the origin and evolution of medical complications. This study focuses on improvement of the predictive model interpretability based on simple logical features representing comorbidities. We use group lasso based feature interaction discovery followed by a post-processing step, where simple logic terms are added. In the final step, we reduce the feature set by applying lasso logistic regression to obtain a compact set of non-zero coefficients that represent a more comprehensible predictive model. The effectiveness of the proposed approach was demonstrated on a pediatric hospital discharge dataset that was used to build a readmission risk estimation model. The evaluation of the proposed method demonstrates a reduction of the initial set of features in a regression model by 72%, with a slight improvement in the Area Under the ROC Curve metric from 0.763 (95% CI: 0.755–0.771) to 0.769 (95% CI: 0.761–0.777). Additionally, our results show improvement in comprehensibility of the final predictive model using simple comorbidity based terms for logistic regression. PMID:26645087

  14. ATLS Hypovolemic Shock Classification by Prediction of Blood Loss in Rats Using Regression Models.

    PubMed

    Choi, Soo Beom; Choi, Joon Yul; Park, Jee Soo; Kim, Deok Won

    2016-07-01

    In our previous study, our input data set consisted of 78 rats, the blood loss in percent as a dependent variable, and 11 independent variables (heart rate, systolic blood pressure, diastolic blood pressure, mean arterial pressure, pulse pressure, respiration rate, temperature, perfusion index, lactate concentration, shock index, and new index (lactate concentration/perfusion)). The machine learning methods for multicategory classification were applied to a rat model in acute hemorrhage to predict the four Advanced Trauma Life Support (ATLS) hypovolemic shock classes for triage in our previous study. However, multicategory classification is much more difficult and complicated than binary classification. We introduce a simple approach for classifying ATLS hypovolaemic shock class by predicting blood loss in percent using support vector regression and multivariate linear regression (MLR). We also compared the performance of the classification models using absolute and relative vital signs. The accuracies of support vector regression and MLR models with relative values by predicting blood loss in percent were 88.5% and 84.6%, respectively. These were better than the best accuracy of 80.8% of the direct multicategory classification using the support vector machine one-versus-one model in our previous study for the same validation data set. Moreover, the simple MLR models with both absolute and relative values could provide possibility of the future clinical decision support system for ATLS classification. The perfusion index and new index were more appropriate with relative changes than absolute values.

  15. High dimensional linear regression models under long memory dependence and measurement error

    NASA Astrophysics Data System (ADS)

    Kaul, Abhishek

    This dissertation consists of three chapters. The first chapter introduces the models under consideration and motivates problems of interest. A brief literature review is also provided in this chapter. The second chapter investigates the properties of Lasso under long range dependent model errors. Lasso is a computationally efficient approach to model selection and estimation, and its properties are well studied when the regression errors are independent and identically distributed. We study the case, where the regression errors form a long memory moving average process. We establish a finite sample oracle inequality for the Lasso solution. We then show the asymptotic sign consistency in this setup. These results are established in the high dimensional setup (p> n) where p can be increasing exponentially with n. Finally, we show the consistency, n½ --d-consistency of Lasso, along with the oracle property of adaptive Lasso, in the case where p is fixed. Here d is the memory parameter of the stationary error sequence. The performance of Lasso is also analysed in the present setup with a simulation study. The third chapter proposes and investigates the properties of a penalized quantile based estimator for measurement error models. Standard formulations of prediction problems in high dimension regression models assume the availability of fully observed covariates and sub-Gaussian and homogeneous model errors. This makes these methods inapplicable to measurement errors models where covariates are unobservable and observations are possibly non sub-Gaussian and heterogeneous. We propose weighted penalized corrected quantile estimators for the regression parameter vector in linear regression models with additive measurement errors, where unobservable covariates are nonrandom. The proposed estimators forgo the need for the above mentioned model assumptions. We study these estimators in both the fixed dimension and high dimensional sparse setups, in the latter setup, the

  16. Nonlinear regression modeling of nutrient loads in streams: A Bayesian approach

    USGS Publications Warehouse

    Qian, S.S.; Reckhow, K.H.; Zhai, J.; McMahon, G.

    2005-01-01

    A Bayesian nonlinear regression modeling method is introduced and compared with the least squares method for modeling nutrient loads in stream networks. The objective of the study is to better model spatial correlation in river basin hydrology and land use for improving the model as a forecasting tool. The Bayesian modeling approach is introduced in three steps, each with a more complicated model and data error structure. The approach is illustrated using a data set from three large river basins in eastern North Carolina. Results indicate that the Bayesian model better accounts for model and data uncertainties than does the conventional least squares approach. Applications of the Bayesian models for ambient water quality standards compliance and TMDL assessment are discussed. Copyright 2005 by the American Geophysical Union.

  17. Regression Model for MODTRAN with Applications to Inactivation of Microbes Suspended in the Atmosphere by Solar Ultraviolet Radiation

    DTIC Science & Technology

    2012-05-01

    REGRESSION MODEL FOR MODTRAN WITH APPLICATIONS TO INACTIVATION OF MICROBES SUSPENDED IN THE ATMOSPHERE BY SOLAR ULTRAVIOLET RADIATION \\ . it...To) Jan 2008-Mar 2011 4. TITLE AND SUBTITLE 5a. CONTRACT NUMBER Regression Model for MODTRAN with Applications to Inactivation of Microbes...other researchers to seek simple regression models to complement the more laborious MODTRAN computations in a variety of applications . 15. SUBJECT

  18. Revisiting Gaussian Process Regression Modeling for Localization in Wireless Sensor Networks.

    PubMed

    Richter, Philipp; Toledano-Ayala, Manuel

    2015-09-08

    Signal strength-based positioning in wireless sensor networks is a key technology for seamless, ubiquitous localization, especially in areas where Global Navigation Satellite System (GNSS) signals propagate poorly. To enable wireless local area network (WLAN) location fingerprinting in larger areas while maintaining accuracy, methods to reduce the effort of radio map creation must be consolidated and automatized. Gaussian process regression has been applied to overcome this issue, also with auspicious results, but the fit of the model was never thoroughly assessed. Instead, most studies trained a readily available model, relying on the zero mean and squared exponential covariance function, without further scrutinization. This paper studies the Gaussian process regression model selection for WLAN fingerprinting in indoor and outdoor environments. We train several models for indoor/outdoor- and combined areas; we evaluate them quantitatively and compare them by means of adequate model measures, hence assessing the fit of these models directly. To illuminate the quality of the model fit, the residuals of the proposed model are investigated, as well. Comparative experiments on the positioning performance verify and conclude the model selection. In this way, we show that the standard model is not the most appropriate, discuss alternatives and present our best candidate.

  19. Revisiting Gaussian Process Regression Modeling for Localization in Wireless Sensor Networks

    PubMed Central

    Richter, Philipp; Toledano-Ayala, Manuel

    2015-01-01

    Signal strength-based positioning in wireless sensor networks is a key technology for seamless, ubiquitous localization, especially in areas where Global Navigation Satellite System (GNSS) signals propagate poorly. To enable wireless local area network (WLAN) location fingerprinting in larger areas while maintaining accuracy, methods to reduce the effort of radio map creation must be consolidated and automatized. Gaussian process regression has been applied to overcome this issue, also with auspicious results, but the fit of the model was never thoroughly assessed. Instead, most studies trained a readily available model, relying on the zero mean and squared exponential covariance function, without further scrutinization. This paper studies the Gaussian process regression model selection for WLAN fingerprinting in indoor and outdoor environments. We train several models for indoor/outdoor- and combined areas; we evaluate them quantitatively and compare them by means of adequate model measures, hence assessing the fit of these models directly. To illuminate the quality of the model fit, the residuals of the proposed model are investigated, as well. Comparative experiments on the positioning performance verify and conclude the model selection. In this way, we show that the standard model is not the most appropriate, discuss alternatives and present our best candidate. PMID:26370996

  20. PM10 modeling in the Oviedo urban area (Northern Spain) by using multivariate adaptive regression splines

    NASA Astrophysics Data System (ADS)

    Nieto, Paulino José García; Antón, Juan Carlos Álvarez; Vilán, José Antonio Vilán; García-Gonzalo, Esperanza

    2014-10-01

    The aim of this research work is to build a regression model of the particulate matter up to 10 micrometers in size (PM10) by using the multivariate adaptive regression splines (MARS) technique in the Oviedo urban area (Northern Spain) at local scale. This research work explores the use of a nonparametric regression algorithm known as multivariate adaptive regression splines (MARS) which has the ability to approximate the relationship between the inputs and outputs, and express the relationship mathematically. In this sense, hazardous air pollutants or toxic air contaminants refer to any substance that may cause or contribute to an increase in mortality or serious illness, or that may pose a present or potential hazard to human health. To accomplish the objective of this study, the experimental dataset of nitrogen oxides (NOx), carbon monoxide (CO), sulfur dioxide (SO2), ozone (O3) and dust (PM10) were collected over 3 years (2006-2008) and they are used to create a highly nonlinear model of the PM10 in the Oviedo urban nucleus (Northern Spain) based on the MARS technique. One main objective of this model is to obtain a preliminary estimate of the dependence between PM10 pollutant in the Oviedo urban area at local scale. A second aim is to determine the factors with the greatest bearing on air quality with a view to proposing health and lifestyle improvements. The United States National Ambient Air Quality Standards (NAAQS) establishes the limit values of the main pollutants in the atmosphere in order to ensure the health of healthy people. Firstly, this MARS regression model captures the main perception of statistical learning theory in order to obtain a good prediction of the dependence among the main pollutants in the Oviedo urban area. Secondly, the main advantages of MARS are its capacity to produce simple, easy-to-interpret models, its ability to estimate the contributions of the input variables, and its computational efficiency. Finally, on the basis of

  1. Rank regression: an alternative regression approach for data with outliers.

    PubMed

    Chen, Tian; Tang, Wan; Lu, Ying; Tu, Xin

    2014-10-01

    Linear regression models are widely used in mental health and related health services research. However, the classic linear regression analysis assumes that the data are normally distributed, an assumption that is not met by the data obtained in many studies. One method of dealing with this problem is to use semi-parametric models, which do not require that the data be normally distributed. But semi-parametric models are quite sensitive to outlying observations, so the generated estimates are unreliable when study data includes outliers. In this situation, some researchers trim the extreme values prior to conducting the analysis, but the ad-hoc rules used for data trimming are based on subjective criteria so different methods of adjustment can yield different results. Rank regression provides a more objective approach to dealing with non-normal data that includes outliers. This paper uses simulated and real data to illustrate this useful regression approach for dealing with outliers and compares it to the results generated using classical regression models and semi-parametric regression models.

  2. A methodology for the design of experiments in computational intelligence with multiple regression models.

    PubMed

    Fernandez-Lozano, Carlos; Gestal, Marcos; Munteanu, Cristian R; Dorado, Julian; Pazos, Alejandro

    2016-01-01

    The design of experiments and the validation of the results achieved with them are vital in any research study. This paper focuses on the use of different Machine Learning approaches for regression tasks in the field of Computational Intelligence and especially on a correct comparison between the different results provided for different methods, as those techniques are complex systems that require further study to be fully understood. A methodology commonly accepted in Computational intelligence is implemented in an R package called RRegrs. This package includes ten simple and complex regression models to carry out predictive modeling using Machine Learning and well-known regression algorithms. The framework for experimental design presented herein is evaluated and validated against RRegrs. Our results are different for three out of five state-of-the-art simple datasets and it can be stated that the selection of the best model according to our proposal is statistically significant and relevant. It is of relevance to use a statistical approach to indicate whether the differences are statistically significant using this kind of algorithms. Furthermore, our results with three real complex datasets report different best models than with the previously published methodology. Our final goal is to provide a complete methodology for the use of different steps in order to compare the results obtained in Computational Intelligence problems, as well as from other fields, such as for bioinformatics, cheminformatics, etc., given that our proposal is open and modifiable.

  3. A methodology for the design of experiments in computational intelligence with multiple regression models

    PubMed Central

    Gestal, Marcos; Munteanu, Cristian R.; Dorado, Julian; Pazos, Alejandro

    2016-01-01

    The design of experiments and the validation of the results achieved with them are vital in any research study. This paper focuses on the use of different Machine Learning approaches for regression tasks in the field of Computational Intelligence and especially on a correct comparison between the different results provided for different methods, as those techniques are complex systems that require further study to be fully understood. A methodology commonly accepted in Computational intelligence is implemented in an R package called RRegrs. This package includes ten simple and complex regression models to carry out predictive modeling using Machine Learning and well-known regression algorithms. The framework for experimental design presented herein is evaluated and validated against RRegrs. Our results are different for three out of five state-of-the-art simple datasets and it can be stated that the selection of the best model according to our proposal is statistically significant and relevant. It is of relevance to use a statistical approach to indicate whether the differences are statistically significant using this kind of algorithms. Furthermore, our results with three real complex datasets report different best models than with the previously published methodology. Our final goal is to provide a complete methodology for the use of different steps in order to compare the results obtained in Computational Intelligence problems, as well as from other fields, such as for bioinformatics, cheminformatics, etc., given that our proposal is open and modifiable. PMID:27920952

  4. Multiple linear regression modeling of disinfection by-products formation in Istanbul drinking water reservoirs.

    PubMed

    Uyak, Vedat; Ozdemir, Kadir; Toroz, Ismail

    2007-06-01

    Oxidation of raw water with chlorine results in formation of trihalomethanes (THM) and haloacetic acids (HAA). Factors affecting their concentrations have been found to be organic matter type and concentration, pH, temperature, chlorine dose, contact time and bromide concentration, but the mechanisms of their formation are still under investigation. Within this scope, chlorination experiments have been conducted with water reservoirs from Terkos, Buyukcekmece and Omerli lakes, Istanbul, with different water quality regarding bromide concentration and organic matter content. The factors studied were pH, contact time, chlorine dose, and specific ultraviolet absorbance (SUVA). The determination of disinfection by-products (DBP) was carried out by gas chromatography techniques. Statistical analysis of the results was focused on the development of multiple regression models for predicting the concentrations of total THM and total HAA based on the use of pH, contact time, chlorine dose, and SUVA. The developed models provided satisfactory estimations of the concentrations of the DBP and the model regression coefficients of THM and HAA are 0.88 and 0.61, respectively. Further, the Durbin-Watson values confirm the reliability of the two models. The results indicate that under these experimental conditions which indicate the variations of pH, chlorine dosages, contact time, and SUVA values, the formation of THM and HAA in water can be described by the multiple linear regression technique.

  5. Application of likelihood ratio and logistic regression models to landslide susceptibility mapping using GIS.

    PubMed

    Lee, Saro

    2004-08-01

    For landslide susceptibility mapping, this study applied and verified a Bayesian probability model, a likelihood ratio and statistical model, and logistic regression to Janghung, Korea, using a Geographic Information System (GIS). Landslide locations were identified in the study area from interpretation of IRS satellite imagery and field surveys; and a spatial database was constructed from topographic maps, soil type, forest cover, geology and land cover. The factors that influence landslide occurrence, such as slope gradient, slope aspect, and curvature of topography, were calculated from the topographic database. Soil texture, material, drainage, and effective depth were extracted from the soil database, while forest type, diameter, and density were extracted from the forest database. Land cover was classified from Landsat TM satellite imagery using unsupervised classification. The likelihood ratio and logistic regression coefficient were overlaid to determine each factor's rating for landslide susceptibility mapping. Then the landslide susceptibility map was verified and compared with known landslide locations. The logistic regression model had higher prediction accuracy than the likelihood ratio model. The method can be used to reduce hazards associated with landslides and to land cover planning.

  6. Measurement error in epidemiologic studies of air pollution based on land-use regression models.

    PubMed

    Basagaña, Xavier; Aguilera, Inmaculada; Rivera, Marcela; Agis, David; Foraster, Maria; Marrugat, Jaume; Elosua, Roberto; Künzli, Nino

    2013-10-15

    Land-use regression (LUR) models are increasingly used to estimate air pollution exposure in epidemiologic studies. These models use air pollution measurements taken at a small set of locations and modeling based on geographical covariates for which data are available at all study participant locations. The process of LUR model development commonly includes a variable selection procedure. When LUR model predictions are used as explanatory variables in a model for a health outcome, measurement error can lead to bias of the regression coefficients and to inflation of their variance. In previous studies dealing with spatial predictions of air pollution, bias was shown to be small while most of the effect of measurement error was on the variance. In this study, we show that in realistic cases where LUR models are applied to health data, bias in health-effect estimates can be substantial. This bias depends on the number of air pollution measurement sites, the number of available predictors for model selection, and the amount of explainable variability in the true exposure. These results should be taken into account when interpreting health effects from studies that used LUR models.

  7. Estimation of pyrethroid pesticide intake using regression modeling of food groups based on composite dietary samples.

    PubMed

    Michael, Larry C; Brown, G Gordon; Melnyk, Lisa Jo

    2016-11-01

    Population-based estimates of pesticide intake are needed to characterize exposure for particular demographic groups based on their dietary behaviors. Regression modeling performed on measurements of selected pesticides in composited duplicate diet samples allowed (1) estimation of pesticide intakes for a defined demographic community, and (2) comparison of dietary pesticide intakes between the composite and individual samples. Extant databases were useful for assigning individual samples to composites, but they could not provide the breadth of information needed to facilitate measurable levels in every composite. Composite sample measurements were found to be good predictors of pyrethroid pesticide levels in their individual sample constituents where sufficient measurements are available above the method detection limit. Statistical inference shows little evidence of differences between individual and composite measurements and suggests that regression modeling of food groups based on composite dietary samples may provide an effective tool for estimating dietary pesticide intake for a defined population.

  8. Examining Competing Models of the Associations among Peer Victimization, Adjustment Problems, and School Connectedness

    ERIC Educational Resources Information Center

    Loukas, Alexandra; Ripperger-Suhler, Ken G.; Herrera, Denise E.

    2012-01-01

    The present study tested two competing models to assess whether psychosocial adjustment problems mediate the associations between peer victimization and school connectedness one year later, or if peer victimization mediates the associations between psychosocial adjustment problems and school connectedness. Participants were 500 10- to 14-year-old…

  9. Parental Support, Coping Strategies, and Psychological Adjustment: An Integrative Model with Late Adolescents.

    ERIC Educational Resources Information Center

    Holahan, Charles J.; And Others

    1995-01-01

    An integrative predictive model was applied to responses of 241 college freshmen to examine interrelationships among parental support, adaptive coping strategies, and psychological adjustment. Social support from both parents and a nonconflictual parental relationship were positively associated with adolescents' psychological adjustment. (SLD)

  10. Forecasting Model for IPTV Service in Korea Using Bootstrap Ridge Regression Analysis

    NASA Astrophysics Data System (ADS)

    Lee, Byoung Chul; Kee, Seho; Kim, Jae Bum; Kim, Yun Bae

    The telecom firms in Korea are taking new step to prepare for the next generation of convergence services, IPTV. In this paper we described our analysis on the effective method for demand forecasting about IPTV broadcasting. We have tried according to 3 types of scenarios based on some aspects of IPTV potential market and made a comparison among the results. The forecasting method used in this paper is the multi generation substitution model with bootstrap ridge regression analysis.

  11. Note: Multivariate system spectroscopic model using Lorentz oscillators and partial least squares regression analysis

    NASA Astrophysics Data System (ADS)

    Gad, R. S.; Parab, J. S.; Naik, G. M.

    2010-11-01

    Multivariate system spectroscopic model plays important role in understanding chemometrics of ensemble under study. Here in this manuscript we discuss various approaches of modeling of spectroscopic system and demonstrate how Lorentz oscillator can be used to model any general spectroscopic system. Chemometric studies require customized templates design for the corresponding variants participating in ensemble, which generates the characteristic matrix of the ensemble under study. The typical biological system that resembles human blood tissue consisting of five major constituents i.e., alanine, urea, lactate, glucose, ascorbate; has been tested on the model. The model was validated using three approaches, namely, root mean square error (RMSE) analysis in the range of ±5% confidence interval, clerk gird error plot, and RMSE versus percent noise level study. Also the model was tested across various template sizes (consisting of samples ranging from 10 up to 1000) to ascertain the validity of partial least squares regression. The model has potential in understanding the chemometrics of proteomics pathways.

  12. Detection of Cutting Tool Wear using Statistical Analysis and Regression Model

    NASA Astrophysics Data System (ADS)

    Ghani, Jaharah A.; Rizal, Muhammad; Nuawi, Mohd Zaki; Haron, Che Hassan Che; Ramli, Rizauddin

    2010-10-01

    This study presents a new method for detecting the cutting tool wear based on the measured cutting force signals. A statistical-based method called Integrated Kurtosis-based Algorithm for Z-Filter technique, called I-kaz was used for developing a regression model and 3D graphic presentation of I-kaz 3D coefficient during machining process. The machining tests were carried out using a CNC turning machine Colchester Master Tornado T4 in dry cutting condition. A Kistler 9255B dynamometer was used to measure the cutting force signals, which were transmitted, analyzed, and displayed in the DasyLab software. Various force signals from machining operation were analyzed, and each has its own I-kaz 3D coefficient. This coefficient was examined and its relationship with flank wear lands (VB) was determined. A regression model was developed due to this relationship, and results of the regression model shows that the I-kaz 3D coefficient value decreases as tool wear increases. The result then is used for real time tool wear monitoring.

  13. Air quality modeling in the Oviedo urban area (NW Spain) by using multivariate adaptive regression splines.

    PubMed

    Nieto, P J García; Antón, J C Álvarez; Vilán, J A Vilán; García-Gonzalo, E

    2015-05-01

    The aim of this research work is to build a regression model of air quality by using the multivariate adaptive regression splines (MARS) technique in the Oviedo urban area (northern Spain) at a local scale. To accomplish the objective of this study, the experimental data set made up of nitrogen oxides (NO x ), carbon monoxide (CO), sulfur dioxide (SO2), ozone (O3), and dust (PM10) was collected over 3 years (2006-2008). The US National Ambient Air Quality Standards (NAAQS) establishes the limit values of the main pollutants in the atmosphere in order to ensure the health of healthy people. Firstly, this MARS regression model captures the main perception of statistical learning theory in order to obtain a good prediction of the dependence among the main pollutants in the Oviedo urban area. Secondly, the main advantages of MARS are its capacity to produce simple, easy-to-interpret models, its ability to estimate the contributions of the input variables, and its computational efficiency. Finally, on the basis of these numerical calculations, using the MARS technique, conclusions of this research work are exposed.

  14. Tweaking Model Parameters: Manual Adjustment and Self Calibration

    NASA Astrophysics Data System (ADS)

    Schulz, B.; Tuffs, R. J.; Laureijs, R. J.; Lu, N.; Peschke, S. B.; Gabriel, C.; Khan, I.

    2002-12-01

    The reduction of P32 data is not always straight forward and the application of the transient model needs tight control by the user. This paper describes how to access the model parameters within the P32Tools software and how to work with the "Inspect signals per pixel" panel, in order to explore the parameter space and improve the model fit.

  15. Developing Street-Level PM2.5 and PM10 Land Use Regression Models in High-Density Hong Kong with Urban Morphological Factors.

    PubMed

    Shi, Yuan; Lau, Kevin Ka-Lun; Ng, Edward

    2016-08-02

    Monitoring street-level particulates is essential to air quality management but challenging in high-density Hong Kong due to limitations in local monitoring network and the complexities of street environment. By employing vehicle-based mobile measurements, land use regression (LUR) models were developed to estimate the spatial variation of PM2.5 and PM10 in the downtown area of Hong Kong. Sampling runs were conducted along routes measuring a total of 30 km during a selected measurement period of total 14 days. In total, 321 independent variables were examined to develop LUR models by using stepwise regression with PM2.5 and PM10 as dependent variables. Approximately, 10% increases in the model adjusted R(2) were achieved by integrating urban/building morphology as independent variables into the LUR models. Resultant LUR models show that the most decisive factors on street-level air quality in Hong Kong are frontal area index, an urban/building morphological parameter, and road network line density and traffic volume, two parameters of road traffic. The adjusted R(2) of the final LUR models of PM2.5 and PM10 are 0.633 and 0.707, respectively. These results indicate that urban morphology is more decisive to the street-level air quality in high-density cities than other cities. Air pollution hotspots were also identified based on the LUR mapping.

  16. Incorporating single-step strategy into random regression model to enhance genomic prediction of longitudinal trait.

    PubMed

    Kang, H; Zhou, L; Mrode, R; Zhang, Q; Liu, J-F

    2016-12-28

    In prediction of genomic values, single-step method has been demonstrated to outperform multi-step methods. In statistical analyses of longitudinal traits, random regression test-day model (RR-TDM) has clear advantages over other models. Our goal in this study was to evaluate the performance of the model integrating both single-step and RR-TDM prediction methods, called single-step random regression test-day model (SS RR-TDM), in comparison with the pedigree-based RR-TDM and genomic best linear unbiased prediction (GBLUP) model. We performed extensive simulations to exploit potential advantages of SS RR-TDM over the other two models under various scenarios with different level of heritability, number of quantitative trait loci as well as selection scheme. SS RR-TDM was found to achieve the highest accuracy and unbiasedness under all scenarios, exhibiting robust prediction ability in longitudinal trait analyses. Moreover, SS RR-TDM showed better persistency of accuracy over generations than GBLUP model. In addition, we also found that the SS RR-TDM had advantages over RR-TDM and GBLUP in terms of a real data set of human contributed by the Genetic Analysis Workshop 18. The findings in our study firstly proved the feasibility and advantages of the SS RR-TDM, and further enhanced strategies for the genomic prediction of longitudinal traits in the future.Heredity advance online publication, 28 December 2016; doi:10.1038/hdy.2016.91.

  17. A marginalized zero-inflated Poisson regression model with overall exposure effects.

    PubMed

    Long, D Leann; Preisser, John S; Herring, Amy H; Golin, Carol E

    2014-12-20

    The zero-inflated Poisson (ZIP) regression model is often employed in public health research to examine the relationships between exposures of interest and a count outcome exhibiting many zeros, in excess of the amount expected under sampling from a Poisson distribution. The regression coefficients of the ZIP model have latent class interpretations, which correspond to a susceptible subpopulation at risk for the condition with counts generated from a Poisson distribution and a non-susceptible subpopulation that provides the extra or excess zeros. The ZIP model parameters, however, are not well suited for inference targeted at marginal means, specifically, in quantifying the effect of an explanatory variable in the overall mixture population. We develop a marginalized ZIP model approach for independent responses to model the population mean count directly, allowing straightforward inference for overall exposure effects and empirical robust variance estimation for overall log-incidence density ratios. Through simulation studies, the performance of maximum likelihood estimation of the marginalized ZIP model is assessed and compared with other methods of estimating overall exposure effects. The marginalized ZIP model is applied to a recent study of a motivational interviewing-based safer sex counseling intervention, designed to reduce unprotected sexual act counts.

  18. Diagnostic tools to determine the quality of "transparent" regression-based QSARs: the "modelling power" plot.

    PubMed

    Sagrado, Salvador; Cronin, Mark T D

    2006-01-01

    A bivariate plot is presented for comparing two or more QSAR models. It is based on two new statistics associated with a regression model, the "descriptive power" (Dp), which is estimated through the relative uncertainty of model coefficients, and the "predictive power" (Pp), which is estimated through both the fitted and cross-validated explained variance of the response variable (i.e., biological activity). An algorithm was developed for performing equivalent multiple linear regression and partial-least-squares calculations. The results were validated by comparison with widely accepted commercial software. Dp and Pp statistics are defined to vary from 0 to 100%, so the modeler has a intuitive impression of the descriptive (i.e., global importance of the selected descriptors) and predictive (i.e., possibility of performing QSAR or just SAR estimations) power. These statistics represent a point in the Dp versus Pp "modelling power" plot, which facilitates visual multiple models comparison, but also could be used to substitute classical statistics and could even be combined to obtain a unique parameter to define (or compare) the model's quality.

  19. Dynamic regression modeling of daily nitrate-nitrogen concentrations in a large agricultural watershed.

    PubMed

    Feng, Zhujing; Schilling, Keith E; Chan, Kung-Sik

    2013-06-01

    Nitrate-nitrogen concentrations in rivers represent challenges for water supplies that use surface water sources. Nitrate concentrations are often modeled using time-series approaches, but previous efforts have typically relied on monthly time steps. In this study, we developed a dynamic regression model of daily nitrate concentrations in the Raccoon River, Iowa, that incorporated contemporaneous and lags of precipitation and discharge occurring at several locations around the basin. Results suggested that 95 % of the variation in daily nitrate concentrations measured at the outlet of a large agricultural watershed can be explained by time-series patterns of precipitation and discharge occurring in the basin. Discharge was found to be a more important regression variable than precipitation in our model but both regression parameters were strongly correlated with nitrate concentrations. The time-series model was consistent with known patterns of nitrate behavior in the watershed, successfully identifying contemporaneous dilution mechanisms from higher relief and urban areas of the basin while incorporating the delayed contribution of nitrate from tile-drained regions in a lagged response. The first difference of the model errors were modeled as an AR(16) process and suggest that daily nitrate concentration changes remain temporally correlated for more than 2 weeks although temporal correlation was stronger in the first few days before tapering off. Consequently, daily nitrate concentrations are non-stationary, i.e. of strong memory. Using time-series models to reliably forecast daily nitrate concentrations in a river based on patterns of precipitation and discharge occurring in its basin may be of great interest to water suppliers.

  20. Covariate-Adjusted Linear Mixed Effects Model with an Application to Longitudinal Data

    PubMed Central

    Nguyen, Danh V.; Şentürk, Damla; Carroll, Raymond J.

    2009-01-01

    Linear mixed effects (LME) models are useful for longitudinal data/repeated measurements. We propose a new class of covariate-adjusted LME models for longitudinal data that nonparametrically adjusts for a normalizing covariate. The proposed approach involves fitting a parametric LME model to the data after adjusting for the nonparametric effects of a baseline confounding covariate. In particular, the effect of the observable covariate on the response and predictors of the LME model is modeled nonparametrically via smooth unknown functions. In addition to covariate-adjusted estimation of fixed/population parameters and random effects, an estimation procedure for the variance components is also developed. Numerical properties of the proposed estimators are investigated with simulation studies. The consistency and convergence rates of the proposed estimators are also established. An application to a longitudinal data set on calcium absorption, accounting for baseline distortion from body mass index, illustrates the proposed methodology. PMID:19266053

  1. The relationship of values to adjustment in illness: a model for nursing practice.

    PubMed

    Harvey, R M

    1992-04-01

    This paper proposes a model of the relationship between values, in particular health value, and adjustment to illness. The importance of values as well as the need for value change are described in the literature related to adjustment to physical disability and chronic illness. An empirical model, however, that explains the relationship of values to adjustment or adaptation has not been found by this researcher. Balance theory and its application to the abstract and perceived cognitions of health value and health perception are described here to explain the relationship of values like health value to outcomes associated with adjustment or adaptation to illness. The proposed model is based on the balance theories of Heider, Festinger and Feather. Hypotheses based on the model were tested and supported in a study of 100 adults with visible and invisible chronic illness. Nursing interventions based on the model are described and suggestions for further research discussed.

  2. Applying quantile regression for modeling equivalent property damage only crashes to identify accident blackspots.

    PubMed

    Washington, Simon; Haque, Md Mazharul; Oh, Jutaek; Lee, Dongmin

    2014-05-01

    Hot spot identification (HSID) aims to identify potential sites-roadway segments, intersections, crosswalks, interchanges, ramps, etc.-with disproportionately high crash risk relative to similar sites. An inefficient HSID methodology might result in either identifying a safe site as high risk (false positive) or a high risk site as safe (false negative), and consequently lead to the misuse the available public funds, to poor investment decisions, and to inefficient risk management practice. Current HSID methods suffer from issues like underreporting of minor injury and property damage only (PDO) crashes, challenges of accounting for crash severity into the methodology, and selection of a proper safety performance function to model crash data that is often heavily skewed by a preponderance of zeros. Addressing these challenges, this paper proposes a combination of a PDO equivalency calculation and quantile regression technique to identify hot spots in a transportation network. In particular, issues related to underreporting and crash severity are tackled by incorporating equivalent PDO crashes, whilst the concerns related to the non-count nature of equivalent PDO crashes and the skewness of crash data are addressed by the non-parametric quantile regression technique. The proposed method identifies covariate effects on various quantiles of a population, rather than the population mean like most methods in practice, which more closely corresponds with how black spots are identified in practice. The proposed methodology is illustrated using rural road segment data from Korea and compared against the traditional EB method with negative binomial regression. Application of a quantile regression model on equivalent PDO crashes enables identification of a set of high-risk sites that reflect the true safety costs to the society, simultaneously reduces the influence of under-reported PDO and minor injury crashes, and overcomes the limitation of traditional NB model in dealing

  3. Noninvasive glucometer model using partial least square regression technique for human blood matrix

    NASA Astrophysics Data System (ADS)

    Parab, J. S.; Gad, R. S.; Naik, G. M.

    2010-05-01

    In this article, we have highlighted the partial least square regression (PLSR) model to predict the glucose level in human blood by considering only five variants. The PLSR model is experimentally validated for the 13 templates samples. The root mean square error analysis of design model and experimental sample is found to be satisfactory with the values of 3.459 and 5.543, respectively. In PLSR templates design is a critical issue for the number of variants participating in the model. Ensemble consisting of five major variants is simulated to replicate the signatures of these constituents in the human blood, i.e., alanine, urea, lactate, glucose, and ascorbate. Multivariate system using PLSR plays important role in understanding chemometrics of such ensemble. The resultant spectra of all these constituents are generated to create templates for the PLSR model. This model has potential scope in designing a near-infrared spectroscopy based noninvasive glucometer.

  4. Nitrogen dioxide concentrations in neighborhoods adjacent to a commercial airport: a land use regression modeling study

    PubMed Central

    2010-01-01

    Background There is growing concern in communities surrounding airports regarding the contribution of various emission sources (such as aircraft and ground support equipment) to nearby ambient concentrations. We used extensive monitoring of nitrogen dioxide (NO2) in neighborhoods surrounding T.F. Green Airport in Warwick, RI, and land-use regression (LUR) modeling techniques to determine the impact of proximity to the airport and local traffic on these concentrations. Methods Palmes diffusion tube samplers were deployed along the airport's fence line and within surrounding neighborhoods for one to two weeks. In total, 644 measurements were collected over three sampling campaigns (October 2007, March 2008 and June 2008) and each sampling location was geocoded. GIS-based variables were created as proxies for local traffic and airport activity. A forward stepwise regression methodology was employed to create general linear models (GLMs) of NO2 variability near the airport. The effect of local meteorology on associations with GIS-based variables was also explored. Results Higher concentrations of NO2 were seen near the airport terminal, entrance roads to the terminal, and near major roads, with qualitatively consistent spatial patterns between seasons. In our final multivariate model (R2 = 0.32), the local influences of highways and arterial/collector roads were statistically significant, as were local traffic density and distance to the airport terminal (all p < 0.001). Local meteorology did not significantly affect associations with principal GIS variables, and the regression model structure was robust to various model-building approaches. Conclusion Our study has shown that there are clear local variations in NO2 in the neighborhoods that surround an urban airport, which are spatially consistent across seasons. LUR modeling demonstrated a strong influence of local traffic, except the smallest roads that predominate in residential areas, as well as proximity to the

  5. Short-term Forecasting of the Prevalence of Trachoma: Expert Opinion, Statistical Regression, versus Transmission Models

    PubMed Central

    Liu, Fengchen; Porco, Travis C.; Amza, Abdou; Kadri, Boubacar; Nassirou, Baido; West, Sheila K.; Bailey, Robin L.; Keenan, Jeremy D.; Solomon, Anthony W.; Emerson, Paul M.; Gambhir, Manoj; Lietman, Thomas M.

    2015-01-01

    Background Trachoma programs rely on guidelines made in large part using expert opinion of what will happen with and without intervention. Large community-randomized trials offer an opportunity to actually compare forecasting methods in a masked fashion. Methods The Program for the Rapid Elimination of Trachoma trials estimated longitudinal prevalence of ocular chlamydial infection from 24 communities treated annually with mass azithromycin. Given antibiotic coverage and biannual assessments from baseline through 30 months, forecasts of the prevalence of infection in each of the 24 communities at 36 months were made by three methods: the sum of 15 experts’ opinion, statistical regression of the square-root-transformed prevalence, and a stochastic hidden Markov model of infection transmission (Susceptible-Infectious-Susceptible, or SIS model). All forecasters were masked to the 36-month results and to the other forecasts. Forecasts of the 24 communities were scored by the likelihood of the observed results and compared using Wilcoxon’s signed-rank statistic. Findings Regression and SIS hidden Markov models had significantly better likelihood than community expert opinion (p = 0.004 and p = 0.01, respectively). All forecasts scored better when perturbed to decrease Fisher’s information. Each individual expert’s forecast was poorer than the sum of experts. Interpretation Regression and SIS models performed significantly better than expert opinion, although all forecasts were overly confident. Further model refinements may score better, although would need to be tested and compared in new masked studies. Construction of guidelines that rely on forecasting future prevalence could consider use of mathematical and statistical models. PMID:26302380

  6. An hourly regression model for ultrafine particles in a near-highway urban area

    PubMed Central

    Patton, Allison P.; Collins, Caitlin; Naumova, Elena N.; Zamore, Wig; Brugge, Doug; Durant, John L.

    2015-01-01

    Estimating ultrafine particle number concentrations (PNC) near highways for exposure assessment in chronic health studies requires models capable of capturing PNC spatial and temporal variations over the course of a full year. The objectives of this work were to describe the relationship between near-highway PNC and potential predictors, and to build and validate hourly log-linear regression models. PNC was measured near Interstate 93 (I-93) in Somerville, MA (USA) using a mobile monitoring platform driven for 234 hours on 43 days between August 2009 and September 2010. Compared to urban background, PNC levels were consistently elevated within 100–200 m of I-93, with gradients impacted by meteorological and traffic conditions. Temporal and spatial variables including wind speed and direction, temperature, highway traffic, and distance to I-93 and major roads contributed significantly to the full regression model. Cross-validated model R2 values ranged from 0.38–0.47, with higher values achieved (0.43–0.53) when short-duration PNC spikes were removed. The model predicts highest PNC near major roads and on cold days with low wind speeds. The model allows estimation of hourly ambient PNC at 20-m resolution in a near-highway neighborhood. PMID:24559198

  7. Predicting the occurrence of wildfires with binary structured additive regression models.

    PubMed

    Ríos-Pena, Laura; Kneib, Thomas; Cadarso-Suárez, Carmen; Marey-Pérez, Manuel

    2017-02-01

    Wildfires are one of the main environmental problems facing societies today, and in the case of Galicia (north-west Spain), they are the main cause of forest destruction. This paper used binary structured additive regression (STAR) for modelling the occurrence of wildfires in Galicia. Binary STAR models are a recent contribution to the classical logistic regression and binary generalized additive models. Their main advantage lies in their flexibility for modelling non-linear effects, while simultaneously incorporating spatial and temporal variables directly, thereby making it possible to reveal possible relationships among the variables considered. The results showed that the occurrence of wildfires depends on many covariates which display variable behaviour across space and time, and which largely determine the likelihood of ignition of a fire. The joint possibility of working on spatial scales with a resolution of 1 × 1 km cells and mapping predictions in a colour range makes STAR models a useful tool for plotting and predicting wildfire occurrence. Lastly, it will facilitate the development of fire behaviour models, which can be invaluable when it comes to drawing up fire-prevention and firefighting plans.

  8. A national fine spatial scale land-use regression model for ozone.

    PubMed

    Kerckhoffs, Jules; Wang, Meng; Meliefste, Kees; Malmqvist, Ebba; Fischer, Paul; Janssen, Nicole A H; Beelen, Rob; Hoek, Gerard

    2015-07-01

    Uncertainty about health effects of long-term ozone exposure remains. Land use regression (LUR) models have been used successfully for modeling fine scale spatial variation of primary pollutants but very limited for ozone. Our objective was to assess the feasibility of developing a national LUR model for ozone at a fine spatial scale. Ozone concentrations were measured with passive samplers at 90 locations across the Netherlands (19 regional background, 36 urban background, 35 traffic). All sites were measured simultaneously during four 2-weekly campaigns spread over the seasons. LUR models were developed for the summer average as the primary exposure and annual average using predictor variables obtained with Geographic Information Systems. Summer average ozone concentrations varied between 32 and 61 µg/m(3). Ozone concentrations at traffic sites were on average 9 µg/m(3) lower compared to regional background sites. Ozone correlated highly negatively with nitrogen dioxide and moderately with fine particles. A LUR model including small-scale traffic, large-scale address density, urban green and a region indicator explained 71% of the spatial variation in summer average ozone concentrations. Land use regression modeling is a promising method to assess ozone spatial variation, but the high correlation with NO2 limits application in epidemiology.

  9. Bayesian regression models for the estimation of net cost of disease using aggregate data.

    PubMed

    Mitsakakis, Nicholas; Tomlinson, George

    2015-01-23

    Estimation of net costs attributed to a disease or other health condition is very important for health economists and policy makers. Skewness and heteroscedasticity are well-known characteristics for cost data, making linear models generally inappropriate and dictating the use of other types of models, such as gamma regression. Additional hurdles emerge when individual level data are not available. In this paper, we consider the latter case were data are only available at the aggregate level, containing means and standard deviations for different strata defined by a number of demographic and clinical factors. We summarize a number of methods that can be used for this estimation, and we propose a Bayesian approach that utilizes the sample stratum specific standard deviations as stochastic. We investigate the performance of two linear mixed models, comparing them with two proposed gamma regression mixed models, to analyze simulated data generated by gamma and log-normal distributions. Our proposed Bayesian approach seems to have significant advantages for net cost estimation when only aggregate data are available. The implemented gamma models do not seem to offer the expected benefits over the linear models; however, further investigation and refinement is needed.

  10. Capacitance Regression Modelling Analysis on Latex from Selected Rubber Tree Clones

    NASA Astrophysics Data System (ADS)

    Rosli, A. D.; Hashim, H.; Khairuzzaman, N. A.; Mohd Sampian, A. F.; Baharudin, R.; Abdullah, N. E.; Sulaiman, M. S.; Kamaru'zzaman, M.

    2015-11-01

    This paper investigates the capacitance regression modelling performance of latex for various rubber tree clones, namely clone 2002, 2008, 2014 and 3001. Conventionally, the rubber tree clones identification are based on observation towards tree features such as shape of leaf, trunk, branching habit and pattern of seeds texture. The former method requires expert persons and very time-consuming. Currently, there is no sensing device based on electrical properties that can be employed to measure different clones from latex samples. Hence, with a hypothesis that the dielectric constant of each clone varies, this paper discusses the development of a capacitance sensor via Capacitance Comparison Bridge (known as capacitance sensor) to measure an output voltage of different latex samples. The proposed sensor is initially tested with 30ml of latex sample prior to gradually addition of dilution water. The output voltage and capacitance obtained from the test are recorded and analyzed using Simple Linear Regression (SLR) model. This work outcome infers that latex clone of 2002 has produced the highest and reliable linear regression line with determination coefficient of 91.24%. In addition, the study also found that the capacitive elements in latex samples deteriorate if it is diluted with higher volume of water.

  11. The assessment of groundwater nitrate contamination by using logistic regression model in a representative rural area

    NASA Astrophysics Data System (ADS)

    Ko, K.; Cheong, B.; Koh, D.

    2010-12-01

    Groundwater has been used a main source to provide a drinking water in a rural area with no regional potable water supply system in Korea. More than 50 percent of rural area residents depend on groundwater as drinking water. Thus, research on predicting groundwater pollution for the sustainable groundwater usage and protection from potential pollutants was demanded. This study was carried out to know the vulnerability of groundwater nitrate contamination reflecting the effect of land use in Nonsan city of a representative rural area of South Korea. About 47% of the study area is occupied by cultivated land with high vulnerable area to groundwater nitrate contamination because it has higher nitrogen fertilizer input of 62.3 tons/km2 than that of country’s average of 44.0 tons/km2. The two vulnerability assessment methods, logistic regression and DRASTIC model, were tested and compared to know more suitable techniques for the assessment of groundwater nitrate contamination in Nonsan area. The groundwater quality data were acquired from the collection of analyses of 111 samples of small potable supply system in the study area. The analyzed values of nitrate were classified by land use such as resident, upland, paddy, and field area. One dependent and two independent variables were addressed for logistic regression analysis. One dependent variable was a binary categorical data with 0 or 1 whether or not nitrate exceeding thresholds of 1 through 10 mg/L. The independent variables were one continuous data of slope indicating topography and multiple categorical data of land use which are classified by resident, upland, paddy, and field area. The results of the Levene’s test and T-test for slope and land use were showed the significant difference of mean values among groups in 95% confidence level. From the logistic regression, we could know the negative correlation between slope and nitrate which was caused by the decrease of contaminants inputs into groundwater with

  12. Comparing Spatial and Multilevel Regression Models for Binary Outcomes in Neighborhood Studies

    PubMed Central

    Xu, Hongwei

    2013-01-01

    The standard multilevel regressions that are widely used in neighborhood research typically ignore potential between-neighborhood correlations due to underlying spatial processes, and hence produce inappropriate inferences about neighborhood effects. In contrast, spatial models make estimations and predictions across areas by explicitly modeling the spatial correlations among observations in different locations. A better understanding of the strengths and limitations of spatial models as compared to the standard multilevel model is needed to improve the research on neighborhood and spatial effects. This research systematically compares model estimations and predictions for binary outcomes between (distance- and lattice-based) spatial and the standard multilevel models in the presence of both within- and between-neighborhood correlations, through simulations. Results from simulation analysis reveal that the standard multilevel and spatial models produce similar estimates of fixed effects, but different estimates of random effects variances. Both the standard multilevel and pure spatial models tend to overestimate the corresponding random effects variances, compared to hybrid models when both non-spatial within neighborhood and spatial between-neighborhood effects exist. Spatial models also outperform the standard multilevel model by a narrow margin in case of fully out-of-sample predictions. Distance-based spatial models provide extra spatial information and have stronger predictive power than lattice-based models under certain circumstances. These merits of spatial modeling are exhibited in an empirical analysis of the child mortality data from 1880 Newark, New Jersey. PMID:25284905

  13. Longitudinal quantile regression in the presence of informative dropout through longitudinal-survival joint modeling.

    PubMed

    Farcomeni, Alessio; Viviani, Sara

    2015-03-30

    We propose a joint model for a time-to-event outcome and a quantile of a continuous response repeatedly measured over time. The quantile and survival processes are associated via shared latent and manifest variables. Our joint model provides a flexible approach to handle informative dropout in quantile regression. A Monte Carlo expectation maximization strategy based on importance sampling is proposed, which is directly applicable under any distributional assumption for the longitudinal outcome and random effects. We consider both parametric and nonparametric assumptions for the baseline hazard. We illustrate through a simulation study and an application to an original data set about dilated cardiomyopathies.

  14. Hierarchical regression and structural equation modeling: two useful analyses for life course research.

    PubMed

    Berndt, Andrea E; Williams, Priscilla C

    2013-01-01

    This article reviews the life course perspective and considers various life course hypotheses such as trajectories, transitions, critical periods, sequencing, duration, and cumulative effects. Hierarchical regression and structural equation modeling are suggested as analyses to use in life course research. Secondary analysis was performed on the Early Head Start Research and Evaluation Study, 1996-2010, to illustrate their strengths and challenges. Models investigated the influence of mother and infant characteristics and of parent-child dysfunction at 14 and 24 months to children's cognitive outcomes at 36 months. Findings were interpreted and discussed in the context of life course hypotheses.

  15. Regression models for the analysis of longitudinal Gaussian data from multiple sources.

    PubMed

    O'Brien, Liam M; Fitzmaurice, Garrett M

    2005-06-15

    We present a regression model for the joint analysis of longitudinal multiple source Gaussian data. Longitudinal multiple source data arise when repeated measurements are taken from two or more sources, and each source provides a measure of the same underlying variable and on the same scale. This type of data generally produces a relatively large number of observations per subject; thus estimation of an unstructured covariance matrix often may not be possible. We consider two methods by which parsimonious models for the covariance can be obtained for longitudinal multiple source data. The methods are illustrated with an example of multiple informant data arising from a longitudinal interventional trial in psychiatry.

  16. On the impact of covariate measurement error on spatial regression modelling

    PubMed Central

    Huque, Md Hamidul; Bondell, Howard; Ryan, Louise

    2015-01-01

    Summary Spatial regression models have grown in popularity in response to rapid advances in GIS (Geographic Information Systems) technology that allows epidemiologists to incorporate geographically indexed data into their studies. However, it turns out that there are some subtle pitfalls in the use of these models. We show that presence of covariate measurement error can lead to significant sensitivity of parameter estimation to the choice of spatial correlation structure. We quantify the effect of measurement error on parameter estimates, and then suggest two different ways to produce consistent estimates. We evaluate the methods through a simulation study. These methods are then applied to data on Ischemic Heart Disease (IHD). PMID:25729267

  17. Design Sensitivity for a Subsonic Aircraft Predicted by Neural Network and Regression Models

    NASA Technical Reports Server (NTRS)

    Hopkins, Dale A.; Patnaik, Surya N.

    2005-01-01

    A preliminary methodology was obtained for the design optimization of a subsonic aircraft by coupling NASA Langley Research Center s Flight Optimization System (FLOPS) with NASA Glenn Research Center s design optimization testbed (COMETBOARDS with regression and neural network analysis approximators). The aircraft modeled can carry 200 passengers at a cruise speed of Mach 0.85 over a range of 2500 n mi and can operate on standard 6000-ft takeoff and landing runways. The design simulation was extended to evaluate the optimal airframe and engine parameters for the subsonic aircraft to operate on nonstandard runways. Regression and neural network approximators were used to examine aircraft operation on runways ranging in length from 4500 to 7500 ft.

  18. Consistent model identification of varying coefficient quantile regression with BIC tuning parameter selection

    PubMed Central

    Zheng, Qi; Peng, Limin

    2016-01-01

    Quantile regression provides a flexible platform for evaluating covariate effects on different segments of the conditional distribution of response. As the effects of covariates may change with quantile level, contemporaneously examining a spectrum of quantiles is expected to have a better capacity to identify variables with either partial or full effects on the response distribution, as compared to focusing on a single quantile. Under this motivation, we study a general adaptively weighted LASSO penalization strategy in the quantile regression setting, where a continuum of quantile index is considered and coefficients are allowed to vary with quantile index. We establish the oracle properties of the resulting estimator of coefficient function. Furthermore, we formally investigate a BIC-type uniform tuning parameter selector and show that it can ensure consistent model selection. Our numerical studies confirm the theoretical findings and illustrate an application of the new variable selection procedure. PMID:28008212

  19. Adaptive modelling of gene regulatory network using Bayesian information criterion-guided sparse regression approach.

    PubMed

    Shi, Ming; Shen, Weiming; Wang, Hong-Qiang; Chong, Yanwen

    2016-12-01

    Inferring gene regulatory networks (GRNs) from microarray expression data are an important but challenging issue in systems biology. In this study, the authors propose a Bayesian information criterion (BIC)-guided sparse regression approach for GRN reconstruction. This approach can adaptively model GRNs by optimising the l1-norm regularisation of sparse regression based on a modified version of BIC. The use of the regularisation strategy ensures the inferred GRNs to be as sparse as natural, while the modified BIC allows incorporating prior knowledge on expression regulation and thus avoids the overestimation of expression regulators as usual. Especially, the proposed method provides a clear interpretation of combinatorial regulations of gene expression by optimally extracting regulation coordination for a given target gene. Experimental results on both simulation data and real-world microarray data demonstrate the competent performance of discovering regulatory relationships in GRN reconstruction.

  20. GIS-based groundwater potential mapping using boosted regression tree, classification and regression tree, and random forest machine learning models in Iran.

    PubMed

    Naghibi, Seyed Amir; Pourghasemi, Hamid Reza; Dixon, Barnali

    2016-01-01

    Groundwater is considered one of the most valuable fresh water resources. The main objective of this study was to produce groundwater spring potential maps in the Koohrang Watershed, Chaharmahal-e-Bakhtiari Province, Iran, using three machine learning models: boosted regression tree (BRT), classification and regression tree (CART), and random forest (RF). Thirteen hydrological-geological-physiographical (HGP) factors that influence locations of springs were considered in this research. These factors include slope degree, slope aspect, altitude, topographic wetness index (TWI), slope length (LS), plan curvature, profile curvature, distance to rivers, distance to faults, lithology, land use, drainage density, and fault density. Subsequently, groundwater spring potential was modeled and mapped using CART, RF, and BRT algorithms. The predicted results from the three models were validated using the receiver operating characteristics curve (ROC). From 864 springs identified, 605 (≈70 %) locations were used for the spring potential mapping, while the remaining 259 (≈30 %) springs were used for the model validation. The area under the curve (AUC) for the BRT model was calculated as 0.8103 and for CART and RF the AUC were 0.7870 and 0.7119, respectively. Therefore, it was concluded that the BRT model produced the best prediction results while predicting locations of springs followed by CART and RF models, respectively. Geospatially integrated BRT, CART, and RF methods proved to be useful in generating the spring potential map (SPM) with reasonable accuracy.

  1. An ensemble Kalman filter for statistical estimation of physics constrained nonlinear regression models

    SciTech Connect

    Harlim, John; Mahdi, Adam; Majda, Andrew J.

    2014-01-15

    A central issue in contemporary science is the development of nonlinear data driven statistical–dynamical models for time series of noisy partial observations from nature or a complex model. It has been established recently that ad-hoc quadratic multi-level regression models can have finite-time blow-up of statistical solutions and/or pathological behavior of their invariant measure. Recently, a new class of physics constrained nonlinear regression models were developed to ameliorate this pathological behavior. Here a new finite ensemble Kalman filtering algorithm is developed for estimating the state, the linear and nonlinear model coefficients, the model and the observation noise covariances from available partial noisy observations of the state. Several stringent tests and applications of the method are developed here. In the most complex application, the perfect model has 57 degrees of freedom involving a zonal (east–west) jet, two topographic Rossby waves, and 54 nonlinearly interacting Rossby waves; the perfect model has significant non-Gaussian statistics in the zonal jet with blocked and unblocked regimes and a non-Gaussian skewed distribution due to interaction with the other 56 modes. We only observe the zonal jet contaminated by noise and apply the ensemble filter algorithm for estimation. Numerically, we find that a three dimensional nonlinear stochastic model with one level of memory mimics the statistical effect of the other 56 modes on the zonal jet in an accurate fashion, including the skew non-Gaussian distribution and autocorrelation decay. On the other hand, a similar stochastic model with zero memory levels fails to capture the crucial non-Gaussian behavior of the zonal jet from the perfect 57-mode model.

  2. A New Climate Adjustment Tool: An update to EPA’s Storm Water Management Model

    EPA Science Inventory

    The US EPA’s newest tool, the Stormwater Management Model (SWMM) – Climate Adjustment Tool (CAT) is meant to help municipal stormwater utilities better address potential climate change impacts affecting their operations.

  3. Selecting Spatial Scale of Covariates in Regression Models of Environmental Exposures

    PubMed Central

    Grant, Lauren P.; Gennings, Chris; Wheeler, David C.

    2015-01-01

    Environmental factors or socioeconomic status variables used in regression models to explain environmental chemical exposures or health outcomes are often in practice modeled at the same buffer distance or spatial scale. In this paper, we present four model selection algorithms that select the best spatial scale for each buffer-based or area-level covariate. Contamination of drinking water by nitrate is a growing problem in agricultural areas of the United States, as ingested nitrate can lead to the endogenous formation of N-nitroso compounds, which are potent carcinogens. We applied our methods to model nitrate levels in private wells in Iowa. We found that environmental variables were selected at different spatial scales and that a model allowing spatial scale to vary across covariates provided the best goodness of fit. Our methods can be applied to investigate the association between environmental risk factors available at multiple spatial scales or buffer distances and measures of disease, including cancers. PMID:25983543

  4. Forecasting peak asthma admissions in London: an application of quantile regression models.

    PubMed

    Soyiri, Ireneous N; Reidpath, Daniel D; Sarran, Christophe

    2013-07-01

    Asthma is a chronic condition of great public health concern globally. The associated morbidity, mortality and healthcare utilisation place an enormous burden on healthcare infrastructure and services. This study demonstrates a multistage quantile regression approach to predicting excess demand for health care services in the form of asthma daily admissions in London, using retrospective data from the Hospital Episode Statistics, weather and air quality. Trivariate quantile regression models (QRM) of asthma daily admissions were fitted to a 14-day range of lags of environmental factors, accounting for seasonality in a hold-in sample of the data. Representative lags were pooled to form multivariate predictive models, selected through a systematic backward stepwise reduction approach. Models were cross-validated using a hold-out sample of the data, and their respective root mean square error measures, sensitivity, specificity and predictive values compared. Two of the predictive models were able to detect extreme number of daily asthma admissions at sensitivity levels of 76 % and 62 %, as well as specificities of 66 % and 76 %. Their positive predictive values were slightly higher for the hold-out sample (29 % and 28 %) than for the hold-in model development sample (16 % and 18 %). QRMs can be used in multistage to select suitable variables to forecast extreme asthma events. The associations between asthma and environmental factors, including temperature, ozone and carbon monoxide can be exploited in predicting future events using QRMs.

  5. Prediction of Wind Speeds Based on Digital Elevation Models Using Boosted Regression Trees

    NASA Astrophysics Data System (ADS)

    Fischer, P.; Etienne, C.; Tian, J.; Krauß, T.

    2015-12-01

    In this paper a new approach is presented to predict maximum wind speeds using Gradient Boosted Regression Trees (GBRT). GBRT are a non-parametric regression technique used in various applications, suitable to make predictions without having an in-depth a-priori knowledge about the functional dependancies between the predictors and the response variables. Our aim is to predict maximum wind speeds based on predictors, which are derived from a digital elevation model (DEM). The predictors describe the orography of the Area-of-Interest (AoI) by various means like first and second order derivatives of the DEM, but also higher sophisticated classifications describing exposure and shelterness of the terrain to wind flux. In order to take the different scales into account which probably influence the streams and turbulences of wind flow over complex terrain, the predictors are computed on different spatial resolutions ranging from 30 m up to 2000 m. The geographic area used for examination of the approach is Switzerland, a mountainious region in the heart of europe, dominated by the alps, but also covering large valleys. The full workflow is described in this paper, which consists of data preparation using image processing techniques, model training using a state-of-the-art machine learning algorithm, in-depth analysis of the trained model, validation of the model and application of the model to generate a wind speed map.

  6. Depicting Estimates Using the Intercept in Meta-Regression Models: The Moving Constant Technique

    PubMed Central

    Johnson, Blair T.; Huedo-Medina, Tania B.

    2012-01-01

    In any scientific discipline, the ability to portray research patterns graphically often aids greatly in interpreting a phenomenon. In part to depict phenomena, the statistics and capabilities of meta-analytic models have grown increasingly sophisticated. Accordingly, this article details how to move the constant in weighted meta-analysis regression models (viz. “meta-regression”) in order to illuminate the patterns in such models across a range of complexities. Although it is commonly ignored in practice, the constant (or intercept) in such models can be indispensible when it is not relegated to its usual static role. The moving constant technique makes possible estimates and confidence intervals at moderator levels of interest as well as continuous confidence bands around the meta-regression line itself. Such estimates, in turn, can be highly informative to interpret the nature of the phenomenon being studied in the meta-analysis, especially when a comparison to an absolute or a practical criterion is the goal. Knowing the point at which effect size estimates reach statistical significance or other practical criteria of effect size magnitude can be quite important. Examples ranging from simple to complex models illustrate these principles. Limitations and extensions of the strategy are discussed. PMID:24920964

  7. A Disequilibrium Adjustment Mechanism for CPE Macroeconometric Models: Initial Testing on SOVMOD.

    DTIC Science & Technology

    1979-02-01

    062 1H0 SRI INTERNATIONAL ARLINGTON VA STRATEGIC STUDIES CENTER F/ T 5/3 DISEQUILIBRIUM ADJUSTMENT MECHANISM FOR CPE MACROECOAI0AIETRIC -E (U) FEB...wC) u Approved for Review Distribution: 0 Richard B. Foster, Director Strategic Studies Center Approved for public release; distribution unlimited...describes work on the model aimed at facilitating the integration of a disequilibrium adjustment mechanism into the macroeconometric model. The

  8. Estimates of genetic parameters for stayability to consecutive calvings of Canadian Simmentals by random regression models.

    PubMed

    Jamrozik, J; McGrath, S; Kemp, R A; Miller, S P

    2013-08-01

    Stayability to consecutive calvings was selected as a measure of cow longevity in the Canadian Simmental population. Calving performance data on 188,579 cows and culling information from the Total Herd Reporting System were used to determine whether a cow stayed in a herd for her second and later (up to the eighth) calvings, given that she had calved as 2 yr old. Binary records (n = 1,164,319) were analyzed with animal linear and threshold models including fixed effects of year of birth by season of birth by parity number and age of cow at first calving by parity number and random effects of contemporary group (CG) defined as herd of birth within year by season, animal additive genetic effect, and a cow permanent environmental (PE) effect. All random effects were Legendre polynomial regressions of the same order, defined on the scale from second to the eighth calving. Bayesian methods with Gibbs sampling were used to estimate covariance components and genetic parameters for random effects of models and selected variables on the longitudinal scale. Bayes factors and analyses of mean squared error and correlation between observed and predicted observations indicated that the linear model with regressions of order 3 was most plausible for generating the current data compared with a fixed regression and other random regression (both linear and threshold) models of order up to 4. Estimates of variances for all random effects from the best fitting model changed with the calving number. Estimates of heritability decreased in time: from 0.35 (SD = 0.006) for stayability to second calving to 0.13 (SD = 0.004) for stayability to the eighth calving. Variance due to PE effect constituted the largest part of the total variance of stayability for all longitudinal points followed by genetic and CG components. Genetic effects of stayability to different calvings were relatively highly correlated, from 0.62 (SD = 0.011) to 0.99 (SD = 0.001), and correlation decreased with the time

  9. Modeling of an Adjustable Beam Solid State Light Project

    NASA Technical Reports Server (NTRS)

    Clark, Toni

    2015-01-01

    This proposal is for the development of a computational model of a prototype variable beam light source using optical modeling software, Zemax Optics Studio. The variable beam light source would be designed to generate flood, spot, and directional beam patterns, while maintaining the same average power usage. The optical model would demonstrate the possibility of such a light source and its ability to address several issues: commonality of design, human task variability, and light source design process improvements. An adaptive lighting solution that utilizes the same electronics footprint and power constraints while addressing variability of lighting needed for the range of exploration tasks can save costs and allow for the development of common avionics for lighting controls.

  10. Inhibition and regression of tumors in hamster DMBA model following laser microvascular targeting

    NASA Astrophysics Data System (ADS)

    McMillan, Kathleen; Wang, Zhi; Shapshay, Stanley M.

    1998-07-01

    Vascular targeting is a recent approach to cancer therapy that aims at damaging tumor vasculature to induce tumor cell hypoxia and subsequent cell death. Squamous cell cancer arises in the superficial mucosal and cutaneous epithelial layers, and tumor microvasculature therefore may be particularly well suited for targeting by selective photothermolysis. An initial evaluation of the effect of selective eradication of microvasculature on tumor development was undertaken here using the chemically-induced hamster cheek pouch model and a 585 nm pulsed dye laser. In a first group of 6 hamsters, progression of premalignant mucosal lesions was compared between control and laser treatment groups, and laser-induced regression of established tumors was evaluated. In a second group of 12 hamsters, the number of laser treatments required to produce complete regression of tumors of the buccal mucosa was determined. The effect of the laser on tumors appearing on the skin in these animals was also investigated. These experiments showed that laser treatment inhibited tumor development and caused complete regression of established tumors 10 mm3 or smaller. Photothermal microvascular targeting may be useful in treating dyplasia and early tumors of the upper aerodigestive tract and skin, with fewer adverse sequelae than existing modalities.

  11. Exact Analysis of Squared Cross-Validity Coefficient in Predictive Regression Models.

    PubMed

    Shieh, Gwowen

    2009-01-01

    In regression analysis, the notion of population validity is of theoretical interest for describing the usefulness of the underlying regression model, whereas the presumably more important concept of population cross-validity represents the predictive effectiveness for the regression equation in future research. It appears that the inference procedures of the squared multiple correlation coefficient have been extensively developed. In contrast, a full range of statistical methods for the analysis of the squared cross-validity coefficient is considerably far from complete. This article considers a distinct expression for the definition of the squared cross-validity coefficient as the direct connection and monotone transformation to the squared multiple correlation coefficient. Therefore, all the currently available exact methods for interval estimation, power calculation, and sample size determination of the squared multiple correlation coefficient are naturally modified and extended to the analysis of the squared cross-validity coefficient. The adequacies of the existing approximate procedures and the suggested exact method are evaluated through a Monte Carlo study. Furthermore, practical applications in areas of psychology and management are presented to illustrate the essential features of the proposed methodologies. The first empirical example uses 6 control variables related to driver characteristics and traffic congestion and their relation to stress in bus drivers, and the second example relates skills, cognitive performance, and personality to team performance measures. The results in this article can facilitate the recommended practice of cross-validation in psychological and other areas of social science research.

  12. A comparative study on entrepreneurial attitudes modeled with logistic regression and Bayes nets.

    PubMed

    López Puga, Jorge; García García, Juan

    2012-11-01

    Entrepreneurship research is receiving increasing attention in our context, as entrepreneurs are key social agents involved in economic development. We compare the success of the dichotomic logistic regression model and the Bayes simple classifier to predict entrepreneurship, after manipulating the percentage of missing data and the level of categorization in predictors. A sample of undergraduate university students (N = 1230) completed five scales (motivation, attitude towards business creation, obstacles, deficiencies, and training needs) and we found that each of them predicted different aspects of the tendency to business creation. Additionally, our results show that the receiver operating characteristic (ROC) curve is affected by the rate of missing data in both techniques, but logistic regression seems to be more vulnerable when faced with missing data, whereas Bayes nets underperform slightly when categorization has been manipulated. Our study sheds light on the potential entrepreneur profile and we propose to use Bayesian networks as an additional alternative to overcome the weaknesses of logistic regression when missing data are present in applied research.

  13. Analysis of pulsed eddy current data using regression models for steam generator tube support structure inspection

    NASA Astrophysics Data System (ADS)

    Buck, J. A.; Underhill, P. R.; Morelli, J.; Krause, T. W.

    2016-02-01

    Nuclear steam generators (SGs) are a critical component for ensuring safe and efficient operation of a reactor. Life management strategies are implemented in which SG tubes are regularly inspected by conventional eddy current testing (ECT) and ultrasonic testing (UT) technologies to size flaws, and safe operating life of SGs is predicted based on growth models. ECT, the more commonly used technique, due to the rapidity with which full SG tube wall inspection can be performed, is challenged when inspecting ferromagnetic support structure materials in the presence of magnetite sludge and multiple overlapping degradation modes. In this work, an emerging inspection method, pulsed eddy current (PEC), is being investigated to address some of these particular inspection conditions. Time-domain signals were collected by an 8 coil array PEC probe in which ferromagnetic drilled support hole diameter, depth of rectangular tube frets and 2D tube off-centering were varied. Data sets were analyzed with a modified principal components analysis (MPCA) to extract dominant signal features. Multiple linear regression models were applied to MPCA scores to size hole diameter as well as size rectangular outer diameter tube frets. Models were improved through exploratory factor analysis, which was applied to MPCA scores to refine selection for regression models inputs by removing nonessential information.

  14. Inference of dense spectral reflectance images from sparse reflectance measurement using non-linear regression modeling

    NASA Astrophysics Data System (ADS)

    Deglint, Jason; Kazemzadeh, Farnoud; Wong, Alexander; Clausi, David A.

    2015-09-01

    One method to acquire multispectral images is to sequentially capture a series of images where each image contains information from a different bandwidth of light. Another method is to use a series of beamsplitters and dichroic filters to guide different bandwidths of light onto different cameras. However, these methods are very time consuming and expensive and perform poorly in dynamic scenes or when observing transient phenomena. An alternative strategy to capturing multispectral data is to infer this data using sparse spectral reflectance measurements captured using an imaging device with overlapping bandpass filters, such as a consumer digital camera using a Bayer filter pattern. Currently the only method of inferring dense reflectance spectra is the Wiener adaptive filter, which makes Gaussian assumptions about the data. However, these assumptions may not always hold true for all data. We propose a new technique to infer dense reflectance spectra from sparse spectral measurements through the use of a non-linear regression model. The non-linear regression model used in this technique is the random forest model, which is an ensemble of decision trees and trained via the spectral characterization of the optical imaging system and spectral data pair generation. This model is then evaluated by spectrally characterizing different patches on the Macbeth color chart, as well as by reconstructing inferred multispectral images. Results show that the proposed technique can produce inferred dense reflectance spectra that correlate well with the true dense reflectance spectra, which illustrates the merits of the technique.

  15. Modeling the Philippines' real gross domestic product: A normal estimation equation for multiple linear regression

    NASA Astrophysics Data System (ADS)

    Urrutia, Jackie D.; Tampis, Razzcelle L.; Mercado, Joseph; Baygan, Aaron Vito M.; Baccay, Edcon B.

    2016-02-01

    The objective of this research is to formulate a mathematical model for the Philippines' Real Gross Domestic Product (Real GDP). The following factors are considered: Consumers' Spending (x1), Government's Spending (x2), Capital Formation (x3) and Imports (x4) as the Independent Variables that can actually influence in the Real GDP in the Philippines (y). The researchers used a Normal Estimation Equation using Matrices to create the model for Real GDP and used α = 0.01.The researchers analyzed quarterly data from 1990 to 2013. The data were acquired from the National Statistical Coordination Board (NSCB) resulting to a total of 96 observations for each variable. The data have undergone a logarithmic transformation particularly the Dependent Variable (y) to satisfy all the assumptions of the Multiple Linear Regression Analysis. The mathematical model for Real GDP was formulated using Matrices through MATLAB. Based on the results, only three of the Independent Variables are significant to the Dependent Variable namely: Consumers' Spending (x1), Capital Formation (x3) and Imports (x4), hence, can actually predict Real GDP (y). The regression analysis displays that 98.7% (coefficient of determination) of the Independent Variables can actually predict the Dependent Variable. With 97.6% of the result in Paired T-Test, the Predicted Values obtained from the model showed no significant difference from the Actual Values of Real GDP. This research will be essential in appraising the forthcoming changes to aid the Government in implementing policies for the development of the economy.

  16. Spherical Model Integrating Academic Competence with Social Adjustment and Psychopathology.

    ERIC Educational Resources Information Center

    Schaefer, Earl S.; And Others

    This study replicates and elaborates a three-dimensional, spherical model that integrates research findings concerning social and emotional behavior, psychopathology, and academic competence. Kindergarten teachers completed an extensive set of rating scales on 100 children, including the Classroom Behavior Inventory and the Child Adaptive Behavior…

  17. Watershed regressions for pesticides (warp) models for predicting atrazine concentrations in Corn Belt streams

    USGS Publications Warehouse

    Stone, Wesley W.; Gilliom, Robert J.

    2012-01-01

    Watershed Regressions for Pesticides (WARP) models, previously developed for atrazine at the national scale, are improved for application to the United States (U.S.) Corn Belt region by developing region-specific models that include watershed characteristics that are influential in predicting atrazine concentration statistics within the Corn Belt. WARP models for the Corn Belt (WARP-CB) were developed for annual maximum moving-average (14-, 21-, 30-, 60-, and 90-day durations) and annual 95th-percentile atrazine concentrations in streams of the Corn Belt region. The WARP-CB models accounted for 53 to 62% of the variability in the various concentration statistics among the model-development sites. Model predictions were within a factor of 5 of the observed concentration statistic for over 90% of the model-development sites. The WARP-CB residuals and uncertainty are lower than those of the National WARP model for the same sites. Although atrazine-use intensity is the most important explanatory variable in the National WARP models, it is not a significant variable in the WARP-CB models. The WARP-CB models provide improved predictions for Corn Belt streams draining watersheds with atrazine-use intensities of 17 kg/km2 of watershed area or greater.

  18. Joint Bayesian variable and graph selection for regression models with network-structured predictors.

    PubMed

    Peterson, Christine B; Stingo, Francesco C; Vannucci, Marina

    2016-03-30

    In this work, we develop a Bayesian approach to perform selection of predictors that are linked within a network. We achieve this by combining a sparse regression model relating the predictors to a response variable with a graphical model describing conditional dependencies among the predictors. The proposed method is well-suited for genomic applications because it allows the identification of pathways of functionally related genes or proteins that impact an outcome of interest. In contrast to previous approaches for network-guided variable selection, we infer the network among predictors using a Gaussian graphical model and do not assume that network information is available a priori. We demonstrate that our method outperforms existing methods in identifying network-structured predictors in simulation settings and illustrate our proposed model with an application to inference of proteins relevant to glioblastoma survival.

  19. Modeling nonlinear relationships in ERP data using mixed-effects regression with R examples.

    PubMed

    Tremblay, Antoine; Newman, Aaron J

    2015-01-01

    In the analysis of psychological and psychophysiological data, the relationship between two variables is often assumed to be a straight line. This may be due to the prevalence of the general linear model in data analysis in these fields, which makes this assumption implicitly. However, there are many problems for which this assumption does not hold. In this paper, we show that, in the analysis of event-related potential (ERP) data, the assumption of linearity comes at a cost and may significantly affect the inferences drawn from the data. We demonstrate why the assumption of linearity should be relaxed and how to model nonlinear relationships between ERP amplitudes and predictor variables within the familiar framework of generalized linear models, using regression splines and mixed-effects modeling.

  20. Back-extrapolation of estimates of exposure from current land-use regression models

    NASA Astrophysics Data System (ADS)

    Chen, Hong; Goldberg, Mark S.; Crouse, Dan L.; Burnett, Richard T.; Jerrett, Michael; Villeneuve, Paul J.; Wheeler, Amanda J.; Labrèche, France; Ross, Nancy A.

    2010-11-01

    Land use regression has been used in epidemiologic studies to estimate long-term exposure to air pollution within cities. The models are often developed toward the end of the study using recent air pollution data. Given that there may be spatially-dependent temporal trends in urban air pollution and that there is interest for epidemiologists in assessing period-specific exposures, especially early-life exposure, methods are required to extrapolate these models back in time. We present herein three new methods to back-extrapolate land use regression models. During three two-week periods in 2005-2006, we monitored nitrogen dioxide (NO 2) at about 130 locations in Montreal, Quebec, and then developed a land-use regression (LUR) model. Our three extrapolation methods entailed multiplying the predicted concentrations of NO 2 by the ratio of past estimates of concentrations from fixed-site monitors, such that they reflected the change in the spatial structure of NO 2 from measurements at fixed-site monitors. The specific methods depended on the availability of land use and traffic-related data, and we back-extrapolated the LUR model to 10 and 20 years into the past. We then applied these estimates to residential information from subjects enrolled in a case-control study of postmenopausal breast cancer that was conducted in 1996. Observed and predicted concentrations of NO 2 in Montreal decreased and were correlated in time. The estimated concentrations using the three extrapolation methods had similar distributions, except that one method yielded slightly lower values. The spatial distributions varied slightly between methods. In the analysis of the breast cancer study, the odds ratios were insensitive to the method but varied with time: for a 5 ppb increase in NO 2 using the 2006 LUR the odds ratio (OR) was about 1.4 and the ORs in predicted past concentrations of NO 2 varied (OR˜1.2 for 1985 and OR˜1.3-1.5 for 1996). Thus, the ORs per unit exposure increased with

  1. Application of regression and neural models to predict competitive swimming performance.

    PubMed

    Maszczyk, Adam; Roczniok, Robert; Waśkiewicz, Zbigniew; Czuba, Miłosz; Mikołajec, Kazimierz; Zajac, Adam; Stanula, Arkadiusz

    2012-04-01

    This research problem was indirectly but closely connected with the optimization of an athlete-selection process, based on predictions viewed as determinants of future successes. The research project involved a group of 249 competitive swimmers (age 12 yr., SD = 0.5) who trained and competed for four years. Measures involving fitness (e.g., lung capacity), strength (e.g., standing long jump), swimming technique (turn, glide, distance per stroke cycle), anthropometric variables (e.g., hand and foot size), as well as specific swimming measures (speeds in particular distances), were used. The participants (n = 189) trained from May 2008 to May 2009, which involved five days of swimming workouts per week, and three additional 45-min. sessions devoted to measurements necessary for this study. In June 2009, data from two groups of 30 swimmers each (n = 60) were used to identify predictor variables. Models were then constructed from these variables to predict final swimming performance in the 50 meter and 800 meter crawl events. Nonlinear regression models and neural models were built for the dependent variable of sport results (performance at 50m and 800m). In May 2010, the swimmers' actual race times for these events were compared to the predictions created a year prior to the beginning of the experiment. Results for the nonlinear regression models and perceptron networks structured as 8-4-1 and 4-3-1 indicated that the neural models overall more accurately predicted final swimming performance from initial training, strength, fitness, and body measurements. Differences in the sum of absolute error values were 4:11.96 (n = 30 for 800m) and 20.39 (n = 30 for 50m), for models structured as 8-4-1 and 4-3-1, respectively, with the neural models being more accurate. It seems possible that such models can be used to predict future performance, as well as in the process of recruiting athletes for specific styles and distances in swimming.

  2. Modeling animal-vehicle collisions using diagonal inflated bivariate Poisson regression.

    PubMed

    Lao, Yunteng; Wu, Yao-Jan; Corey, Jonathan; Wang, Yinhai

    2011-01-01

    Two types of animal-vehicle collision (AVC) data are commonly adopted for AVC-related risk analysis research: reported AVC data and carcass removal data. One issue with these two data sets is that they were found to have significant discrepancies by previous studies. In order to model these two types of data together and provide a better understanding of highway AVCs, this study adopts a diagonal inflated bivariate Poisson regression method, an inflated version of bivariate Poisson regression model, to fit the reported AVC and carcass removal data sets collected in Washington State during 2002-2006. The diagonal inflated bivariate Poisson model not only can model paired data with correlation, but also handle under- or over-dispersed data sets as well. Compared with three other types of models, double Poisson, bivariate Poisson, and zero-inflated double Poisson, the diagonal inflated bivariate Poisson model demonstrates its capability of fitting two data sets with remarkable overlapping portions resulting from the same stochastic process. Therefore, the diagonal inflated bivariate Poisson model provides researchers a new approach to investigating AVCs from a different perspective involving the three distribution parameters (λ(1), λ(2) and λ(3)). The modeling results show the impacts of traffic elements, geometric design and geographic characteristics on the occurrences of both reported AVC and carcass removal data. It is found that the increase of some associated factors, such as speed limit, annual average daily traffic, and shoulder width, will increase the numbers of reported AVCs and carcass removals. Conversely, the presence of some geometric factors, such as rolling and mountainous terrain, will decrease the number of reported AVCs.

  3. Random regression model of growth during the first three months of age in Spanish Merino sheep.

    PubMed

    Molina, A; Menéndez-Buxadera, A; Valera, M; Serradilla, J M

    2007-11-01

    A total of 88,727 individual BW records of Spanish Merino lambs, obtained from 30,214 animals between 2 and 92 d of age, were analyzed using a random regression model (RRM). These animals were progeny of 546 rams and 15,586 ewes raised in 30 flocks, between 1992 and 2002, with a total of 45,941 animals in the pedigree. The contemporary groups (animals of the same flock, year, and season, with 452 levels), the lambing number (11 levels), the combination sex of lambs with type of litter (4 levels), and a fixed regression coefficient of age on BW were included as fixed effects. A total of 7 RRM were compared, and the best fit was obtained for a model of order 3 for the direct and maternal genetic effects and for the individual permanent environmental effect. For the maternal permanent environmental effect the best model had an order 2. The residual variance was assumed to be heterogeneous with 10 age classes; the covariance between both genetic effects was included. According to the results of the selected RRM, the heritability for both genetic effects (h(a)2 and h(m)2) increased with age, with estimates of 0.123 to 0.186 for h(a)2 and of 0.059 to 0.108 for h(m)2. The correlations between direct and genetic maternal effects were -0.619 to -0.387 during the first 45 d of age and decreased as age increased, until reaching values from -0.366 to -0.275 between 45 to 75 d of age. Important changes in ranking of the animals were found based on the breeding value estimation with the current method and with the random regression procedure. The use of RRM to analyze the genetic trajectory of growth in this population of Merino sheep is highly recommended.

  4. Model-Based Evaluation of Spontaneous Tumor Regression in Pilocytic Astrocytoma.

    PubMed

    Buder, Thomas; Deutsch, Andreas; Klink, Barbara; Voss-Böhme, Anja

    2015-12-01

    Pilocytic astrocytoma (PA) is the most common brain tumor in children. This tumor is usually benign and has a good prognosis. Total resection is the treatment of choice and will cure the majority of patients. However, often only partial resection is possible due to the location of the tumor. In that case, spontaneous regression, regrowth, or progression to a more aggressive form have been observed. The dependency between the residual tumor size and spontaneous regression is not understood yet. Therefore, the prognosis is largely unpredictable and there is controversy regarding the management of patients for whom complete resection cannot be achieved. Strategies span from pure observation (wait and see) to combinations of surgery, adjuvant chemotherapy, and radiotherapy. Here, we introduce a mathematical model to investigate the growth and progression behavior of PA. In particular, we propose a Markov chain model incorporating cell proliferation and death as well as mutations. Our model analysis shows that the tumor behavior after partial resection is essentially determined by a risk coefficient γ, which can be deduced from epidemiological data about PA. Our results quantitatively predict the regression probability of a partially resected benign PA given the residual tumor size and lead to the hypothesis that this dependency is linear, implying that removing any amount of tumor mass will improve prognosis. This finding stands in contrast to diffuse malignant glioma where an extent of resection threshold has been experimentally shown, below which no benefit for survival is expected. These results have important implications for future therapeutic studies in PA that should include residual tumor volume as a prognostic factor.

  5. ``Regressed experts'' as a new state in teachers' professional development: lessons from Computer Science teachers' adjustments to substantial changes in the curriculum

    NASA Astrophysics Data System (ADS)

    Liberman, Neomi; Ben-David Kolikant, Yifat; Beeri, Catriel

    2012-09-01

    Due to a program reform in Israel, experienced CS high-school teachers faced the need to master and teach a new programming paradigm. This situation served as an opportunity to explore the relationship between teachers' content knowledge (CK) and their pedagogical content knowledge (PCK). This article focuses on three case studies, with emphasis on one of them. Using observations and interviews, we examine how the teachers, we observed taught and what development of their teaching occurred as a result of their teaching experience, if at all. Our findings suggest that this situation creates a new hybrid state of teachers, which we term "regressed experts." These teachers incorporate in their professional practice some elements typical of novices and some typical of experts. We also found that these teachers' experience, although established when teaching a different CK, serve as a leverage to improve their knowledge and understanding of aspects of the new content.

  6. Hazard regression model and cure rate model in colon cancer relative survival trends: are they telling the same story?

    PubMed

    Bejan-Angoulvant, Theodora; Bouvier, Anne-Marie; Bossard, Nadine; Belot, Aurelien; Jooste, Valérie; Launoy, Guy; Remontet, Laurent

    2008-01-01

    Hazard regression models and cure rate models can be advantageously used in cancer relative survival analysis. We explored the advantages and limits of these two models in colon cancer and focused on the prognostic impact of the year of diagnosis on survival according to the TNM stage at diagnosis. The analysis concerned 9,998 patients from three French registries. In the hazard regression model, the baseline excess death hazard and the time-dependent effects of covariates were modelled using regression splines. The cure rate model estimated the proportion of 'cured' patients and the excess death hazard in 'non-cured' patients. The effects of year of diagnosis on these parameters were estimated for each TNM cancer stage. With the hazard regression model, the excess death hazard decreased significantly with more recent years of diagnoses (hazard ratio, HR 0.97 in stage III and 0.98 in stage IV, P < 0.001). In these advanced stages, this favourable effect was limited to the first years of follow-up. With the cure rate model, recent years of diagnoses were significantly associated with longer survivals in 'non-cured' patients with advanced stages (HR 0.95 in stage III and 0.97 in stage IV, P < 0.001) but had no significant effect on cure (odds ratio, OR 0.99 in stages III and IV, P > 0.5). The two models were complementary and concordant in estimating colon cancer survival and the effects of covariates. They provided two different points of view of the same phenomenon: recent years of diagnosis had a favourable effect on survival, but not on cure.

  7. Semi-parametric estimation of random effects in a logistic regression model using conditional inference.

    PubMed

    Petersen, Jørgen Holm

    2016-01-15

    This paper describes a new approach to the estimation in a logistic regression model with two crossed random effects where special interest is in estimating the variance of one of the effects while not making distributional assumptions about the other effect. A composite likelihood is studied. For each term in the composite likelihood, a conditional likelihood is used that eliminates the influence of the random effects, which results in a composite conditional likelihood consisting of only one-dimensional integrals that may be solved numerically. Good properties of the resulting estimator are described in a small simulation study.

  8. Comparison of universal kriging and regression tree modelling for soil property mapping

    NASA Astrophysics Data System (ADS)

    Kempen, Bas

    2013-04-01

    Geostatistical modelling approaches have been dominating the field of digital soil mapping (DSM) since its inception in the early 1980s. In recent years, however, machine learning methods such as classification and regression trees, random forests, and neural networks have quickly gained popularity among researchers in the DSM community. The increased use of these methods has largely gone at the cost of geostatistical approaches. Despite the apparent shift in the application of DSM methods from geostatistics to machine learning, quantitative comparisons of the prediction performance of these methods are largely lacking. The aims of this research, therefore, are: i) to map two soil properties (topsoil organic matter content and thickness of the peat layer in the soil profile) using regression tree (RT) modelling and universal kriging (UK), and ii) to compare the prediction performance of these methods with independent data obtained by probability sampling. Using such data for validation does not only yield a statistically valid and unbiased estimates of the map accuracy, but it also allows a statistical comparison of the accuracies of the maps generated by the two methods. The topsoil organic matter content and the thickness of the peat layer were mapped for a 14,000 ha area in the province of Drenthe, The Netherlands. The calibration dataset contained soil property observations at 1,715 sites. The covariates used include layers derived from soil and paleogeography maps, land cover, relative elevation, drainage class, land reclamation period, elevation change, and historic land use. The validation dataset contained 125 observations selected by stratified simple random sampling of the study area. The root mean squared error (RMSE) of the soil organic matter map obtained by RT modelling was 0.603 log(%), that of the map obtained by UK 0.595 log(%). The difference in map accuracy was not significant (p = 0.377). The RMSE of the peat thickness map obtained by RT

  9. A hydrologic network supporting spatially referenced regression modeling in the Chesapeake Bay watershed

    USGS Publications Warehouse

    Brakebill, J.W.; Preston, S.D.

    2003-01-01

    The U.S. Geological Survey has developed a methodology for statistically relating nutrient sources and land-surface characteristics to nutrient loads of streams. The methodology is referred to as SPAtially Referenced Regressions On Watershed attributes (SPARROW), and relates measured stream nutrient loads to nutrient sources using nonlinear statistical regression models. A spatially detailed digital hydrologic network of stream reaches, stream-reach characteristics such as mean streamflow, water velocity, reach length, and travel time, and their associated watersheds supports the regression models. This network serves as the primary framework for spatially referencing potential nutrient source information such as atmospheric deposition, septic systems, point-sources, land use, land cover, and agricultural sources and land-surface characteristics such as land use, land cover, average-annual precipitation and temperature, slope, and soil permeability. In the Chesapeake Bay watershed that covers parts of Delaware, Maryland, Pennsylvania, New York, Virginia, West Virginia, and Washington D.C., SPARROW was used to generate models estimating loads of total nitrogen and total phosphorus representing 1987 and 1992 land-surface conditions. The 1987 models used a hydrologic network derived from an enhanced version of the U.S. Environmental Protection Agency's digital River Reach File, and course resolution Digital Elevation Models (DEMs). A new hydrologic network was created to support the 1992 models by generating stream reaches representing surface-water pathways defined by flow direction and flow accumulation algorithms from higher resolution DEMs. On a reach-by-reach basis, stream reach characteristics essential to the modeling were transferred to the newly generated pathways or reaches from the enhanced River Reach File used to support the 1987 models. To complete the new network, watersheds for each reach were generated using the direction of surface-water flow derived

  10. Detection of melamine in milk powders using near-infrared hyperspectral imaging combined with regression coefficient of partial least square regression model.

    PubMed

    Lim, Jongguk; Kim, Giyoung; Mo, Changyeun; Kim, Moon S; Chao, Kuanglin; Qin, Jianwei; Fu, Xiaping; Baek, Insuck; Cho, Byoung-Kwan

    2016-05-01

    Illegal use of nitrogen-rich melamine (C3H6N6) to boost perceived protein content of food products such as milk, infant formula, frozen yogurt, pet food, biscuits, and coffee drinks has caused serious food safety problems. Conventional methods to detect melamine in foods, such as Enzyme-linked immunosorbent assay (ELISA), High-performance liquid chromatography (HPLC), and Gas chromatography-mass spectrometry (GC-MS), are sensitive but they are time-consuming, expensive, and labor-intensive. In this research, near-infrared (NIR) hyperspectral imaging technique combined with regression coefficient of partial least squares regression (PLSR) model was used to detect melamine particles in milk powders easily and quickly. NIR hyperspectral reflectance imaging data in the spectral range of 990-1700nm were acquired from melamine-milk powder mixture samples prepared at various concentrations ranging from 0.02% to 1%. PLSR models were developed to correlate the spectral data (independent variables) with melamine concentration (dependent variables) in melamine-milk powder mixture samples. PLSR models applying various pretreatment methods were used to reconstruct the two-dimensional PLS images. PLS images were converted to the binary images to detect the suspected melamine pixels in milk powder. As the melamine concentration was increased, the numbers of suspected melamine pixels of binary images were also increased. These results suggested that NIR hyperspectral imaging technique and the PLSR model can be regarded as an effective tool to detect melamine particles in milk powders.

  11. Identification of Patients Affected by Mitral Valve Prolapse with Severe Regurgitation: A Multivariable Regression Model

    PubMed Central

    Songia, Paola; Chiesa, Mattia; Alamanni, Francesco; Tremoli, Elena

    2017-01-01

    Background. Mitral valve prolapse (MVP) is the most common cause of severe mitral regurgitation. Besides echocardiography, up to now there are no reliable biomarkers available for the identification of this pathology. We aim to generate a predictive model, based on circulating biomarkers, able to identify MVP patients with the highest accuracy. Methods. We analysed 43 patients who underwent mitral valve repair due to MVP and compared to 29 matched controls. We assessed the oxidative stress status measuring the oxidized and the reduced form of glutathione by liquid chromatography-tandem mass spectrometry method. Osteoprotegerin (OPG) plasma levels were measured by an enzyme-linked immunosorbent assay. The combination of these biochemical variables was used to implement several logistic regression models. Results. Oxidative stress levels and OPG concentrations were significantly higher in patients compared to control subjects (0.116 ± 0.007 versus 0.053 ± 0.013 and 1748 ± 100.2 versus 1109 ± 45.3 pg/mL, respectively; p < 0.0001). The best regression model was able to correctly classify 62 samples out of 72 with accuracy in terms of area under the curve of 0.92. Conclusions. To the best of our knowledge, this is the first study to show a strong association between OPG and oxidative stress status in patients affected by MVP with severe regurgitation. PMID:28261377

  12. Marginal regression approach for additive hazards models with clustered current status data.

    PubMed

    Su, Pei-Fang; Chi, Yunchan

    2014-01-15

    Current status data arise naturally from tumorigenicity experiments, epidemiology studies, biomedicine, econometrics and demographic and sociology studies. Moreover, clustered current status data may occur with animals from the same litter in tumorigenicity experiments or with subjects from the same family in epidemiology studies. Because the only information extracted from current status data is whether the survival times are before or after the monitoring or censoring times, the nonparametric maximum likelihood estimator of survival function converges at a rate of n(1/3) to a complicated limiting distribution. Hence, semiparametric regression models such as the additive hazards model have been extended for independent current status data to derive the test statistics, whose distributions converge at a rate of n(1/2) , for testing the regression parameters. However, a straightforward application of these statistical methods to clustered current status data is not appropriate because intracluster correlation needs to be taken into account. Therefore, this paper proposes two estimating functions for estimating the parameters in the additive hazards model for clustered current status data. The comparative results from simulation studies are presented, and the application of the proposed estimating functions to one real data set is illustrated.

  13. Automated Decisional Model for Optimum Economic Order Quantity Determination Using Price Regressive Rates

    NASA Astrophysics Data System (ADS)

    Roşu, M. M.; Tarbă, C. I.; Neagu, C.

    2016-11-01

    The current models for inventory management are complementary, but together they offer a large pallet of elements for solving complex problems of companies when wanting to establish the optimum economic order quantity for unfinished products, row of materials, goods etc. The main objective of this paper is to elaborate an automated decisional model for the calculus of the economic order quantity taking into account the price regressive rates for the total order quantity. This model has two main objectives: first, to determine the periodicity when to be done the order n or the quantity order q; second, to determine the levels of stock: lighting control, security stock etc. In this way we can provide the answer to two fundamental questions: How much must be ordered? When to Order? In the current practice, the business relationships with its suppliers are based on regressive rates for price. This means that suppliers may grant discounts, from a certain level of quantities ordered. Thus, the unit price of the products is a variable which depends on the order size. So, the most important element for choosing the optimum for the economic order quantity is the total cost for ordering and this cost depends on the following elements: the medium price per units, the stock cost, the ordering cost etc.

  14. Binary Classifier Calibration using an Ensemble of Near Isotonic Regression Models.

    PubMed

    Naeini, Mahdi Pakdaman; Cooper, Gregory F

    2016-12-01

    Learning accurate probabilistic models from data is crucial in many practical tasks in data mining. In this paper we present a new non-parametric calibration method called ensemble of near isotonic regression (ENIR). The method can be considered as an extension of BBQ [20], a recently proposed calibration method, as well as the commonly used calibration method based on isotonic regression (IsoRegC) [27]. ENIR is designed to address the key limitation of IsoRegC which is the monotonicity assumption of the predictions. Similar to BBQ, the method post-processes the output of a binary classifier to obtain calibrated probabilities. Thus it can be used with many existing classification models to generate accurate probabilistic predictions. We demonstrate the performance of ENIR on synthetic and real datasets for commonly applied binary classification models. Experimental results show that the method outperforms several common binary classifier calibration methods. In particular on the real data, ENIR commonly performs statistically significantly better than the other methods, and never worse. It is able to improve the calibration power of classifiers, while retaining their discrimination power. The method is also computationally tractable for large scale datasets, as it is O(N log N) time, where N is the number of samples.

  15. Perceived Organizational Support for Enhancing Welfare at Work: A Regression Tree Model.

    PubMed

    Giorgi, Gabriele; Dubin, David; Perez, Javier Fiz

    2016-01-01

    When trying to examine outcomes such as welfare and well-being, research tends to focus on main effects and take into account limited numbers of variables at a time. There are a number of techniques that may help address this problem. For example, many statistical packages available in R provide easy-to-use methods of modeling complicated analysis such as classification and tree regression (i.e., recursive partitioning). The present research illustrates the value of recursive partitioning in the prediction of perceived organizational support in a sample of more than 6000 Italian bankers. Utilizing the tree function party package in R, we estimated a regression tree model predicting perceived organizational support from a multitude of job characteristics including job demand, lack of job control, lack of supervisor support, training, etc. The resulting model appears particularly helpful in pointing out several interactions in the prediction of perceived organizational support. In particular, training is the dominant factor. Another dimension that seems to influence organizational support is reporting (perceived communication about safety and stress concerns). Results are discussed from a theoretical and methodological point of view.

  16. Copula-based regression modeling of bivariate severity of temporary disability and permanent motor injuries.

    PubMed

    Ayuso, Mercedes; Bermúdez, Lluís; Santolino, Miguel

    2016-04-01

    The analysis of factors influencing the severity of the personal injuries suffered by victims of motor accidents is an issue of major interest. Yet, most of the extant literature has tended to address this question by focusing on either the severity of temporary disability or the severity of permanent injury. In this paper, a bivariate copula-based regression model for temporary disability and permanent injury severities is introduced for the joint analysis of the relationship with the set of factors that might influence both categories of injury. Using a motor insurance database with 21,361 observations, the copula-based regression model is shown to give a better performance than that of a model based on the assumption of independence. The inclusion of the dependence structure in the analysis has a higher impact on the variance estimates of the injury severities than it does on the point estimates. By taking into account the dependence between temporary and permanent severities a more extensive factor analysis can be conducted. We illustrate that the conditional distribution functions of injury severities may be estimated, thus, providing decision makers with valuable information.

  17. Surface Roughness Prediction Model using Zirconia Toughened Alumina (ZTA) Turning Inserts: Taguchi Method and Regression Analysis

    NASA Astrophysics Data System (ADS)

    Mandal, Nilrudra; Doloi, Biswanath; Mondal, Biswanath

    2016-01-01

    In the present study, an attempt has been made to apply the Taguchi parameter design method and regression analysis for optimizing the cutting conditions on surface finish while machining AISI 4340 steel with the help of the newly developed yttria based Zirconia Toughened Alumina (ZTA) inserts. These inserts are prepared through wet chemical co-precipitation route followed by powder metallurgy process. Experiments have been carried out based on an orthogonal array L9 with three parameters (cutting speed, depth of cut and feed rate) at three levels (low, medium and high). Based on the mean response and signal to noise ratio (SNR), the best optimal cutting condition has been arrived at A3B1C1 i.e. cutting speed is 420 m/min, depth of cut is 0.5 mm and feed rate is 0.12 m/min considering the condition smaller is the better approach. Analysis of Variance (ANOVA) is applied to find out the significance and percentage contribution of each parameter. The mathematical model of surface roughness has been developed using regression analysis as a function of the above mentioned independent variables. The predicted values from the developed model and experimental values are found to be very close to each other justifying the significance of the model. A confirmation run has been carried out with 95 % confidence level to verify the optimized result and the values obtained are within the prescribed limit.

  18. Mixed hidden Markov quantile regression models for longitudinal data with possibly incomplete sequences.

    PubMed

    Marino, Maria Francesca; Tzavidis, Nikos; Alfò, Marco

    2016-01-01

    Quantile regression provides a detailed and robust picture of the distribution of a response variable, conditional on a set of observed covariates. Recently, it has be been extended to the analysis of longitudinal continuous outcomes using either time-constant or time-varying random parameters. However, in real-life data, we frequently observe both temporal shocks in the overall trend and individual-specific heterogeneity in model parameters. A benchmark dataset on HIV progression gives a clear example. Here, the evolution of the CD4 log counts exhibits both sudden temporal changes in the overall trend and heterogeneity in the effect of the time since seroconversion on the response dynamics. To accommodate such situations, we propose a quantile regression model, where time-varying and time-constant random coefficients are jointly considered. Since observed data may be incomplete due to early drop-out, we also extend the proposed model in a pattern mixture perspective. We assess the performance of the proposals via a large-scale simulation study and the analysis of the CD4 count data.

  19. Binary Classifier Calibration using an Ensemble of Near Isotonic Regression Models

    PubMed Central

    Naeini, Mahdi Pakdaman; Cooper, Gregory F.

    2017-01-01

    Learning accurate probabilistic models from data is crucial in many practical tasks in data mining. In this paper we present a new non-parametric calibration method called ensemble of near isotonic regression (ENIR). The method can be considered as an extension of BBQ [20], a recently proposed calibration method, as well as the commonly used calibration method based on isotonic regression (IsoRegC) [27]. ENIR is designed to address the key limitation of IsoRegC which is the monotonicity assumption of the predictions. Similar to BBQ, the method post-processes the output of a binary classifier to obtain calibrated probabilities. Thus it can be used with many existing classification models to generate accurate probabilistic predictions. We demonstrate the performance of ENIR on synthetic and real datasets for commonly applied binary classification models. Experimental results show that the method outperforms several common binary classifier calibration methods. In particular on the real data, ENIR commonly performs statistically significantly better than the other methods, and never worse. It is able to improve the calibration power of classifiers, while retaining their discrimination power. The method is also computationally tractable for large scale datasets, as it is O(N log N) time, where N is the number of samples. PMID:28316511

  20. Perceived Organizational Support for Enhancing Welfare at Work: A Regression Tree Model

    PubMed Central

    Giorgi, Gabriele; Dubin, David; Perez, Javier Fiz

    2016-01-01

    When trying to examine outcomes such as welfare and well-being, research tends to focus on main effects and take into account limited numbers of variables at a time. There are a number of techniques that may help address this problem. For example, many statistical packages available in R provide easy-to-use methods of modeling complicated analysis such as classification and tree regression (i.e., recursive partitioning). The present research illustrates the value of recursive partitioning in the prediction of perceived organizational support in a sample of more than 6000 Italian bankers. Utilizing the tree function party package in R, we estimated a regression tree model predicting perceived organizational support from a multitude of job characteristics including job demand, lack of job control, lack of supervisor support, training, etc. The resulting model appears particularly helpful in pointing out several interactions in the prediction of perceived organizational support. In particular, training is the dominant factor. Another dimension that seems to influence organizational support is reporting (perceived communication about safety and stress concerns). Results are discussed from a theoretical and methodological point of view. PMID:28082924

  1. Multiple regression based imputation for individualizing template human model from a small number of measured dimensions.

    PubMed

    Nohara, Ryuki; Endo, Yui; Murai, Akihiko; Takemura, Hiroshi; Kouchi, Makiko; Tada, Mitsunori

    2016-08-01

    Individual human models are usually created by direct 3D scanning or deforming a template model according to the measured dimensions. In this paper, we propose a method to estimate all the necessary dimensions (full set) for the human model individualization from a small number of measured dimensions (subset) and human dimension database. For this purpose, we solved multiple regression equation from the dimension database given full set dimensions as the objective variable and subset dimensions as the explanatory variables. Thus, the full set dimensions are obtained by simply multiplying the subset dimensions to the coefficient matrix of the regression equation. We verified the accuracy of our method by imputing hand, foot, and whole body dimensions from their dimension database. The leave-one-out cross validation is employed in this evaluation. The mean absolute errors (MAE) between the measured and the estimated dimensions computed from 4 dimensions (hand length, breadth, middle finger breadth at proximal, and middle finger depth at proximal) in the hand, 2 dimensions (foot length, breadth, and lateral malleolus height) in the foot, and 1 dimension (height) and weight in the whole body are computed. The average MAE of non-measured dimensions were 4.58% in the hand, 4.42% in the foot, and 3.54% in the whole body, while that of measured dimensions were 0.00%.

  2. Land Use Regression Models for Ultrafine Particles in Six European Areas

    PubMed Central

    2017-01-01

    Long-term ultrafine particle (UFP) exposure estimates at a fine spatial scale are needed for epidemiological studies. Land use regression (LUR) models were developed and evaluated for six European areas based on repeated 30 min monitoring following standardized protocols. In each area; Basel (Switzerland), Heraklion (Greece), Amsterdam, Maastricht, and Utrecht (“The Netherlands”), Norwich (United Kingdom), Sabadell (Spain), and Turin (Italy), 160–240 sites were monitored to develop LUR models by supervised stepwise selection of GIS predictors. For each area and all areas combined, 10 models were developed in stratified random selections of 90% of sites. UFP prediction robustness was evaluated with the intraclass correlation coefficient (ICC) at 31–50 external sites per area. Models from Basel and The Netherlands were validated against repeated 24 h outdoor measurements. Structure and model R2 of local models were similar within, but varied between areas (e.g., 38–43% Turin; 25–31% Sabadell). Robustness of predictions within areas was high (ICC 0.73–0.98). External validation R2 was 53% in Basel and 50% in The Netherlands. Combined area models were robust (ICC 0.93–1.00) and explained UFP variation almost equally well as local models. In conclusion, robust UFP LUR models could be developed on short-term monitoring, explaining around 50% of spatial variance in longer-term measurements. PMID:28244744

  3. Land Use Regression Models for Ultrafine Particles in Six European Areas.

    PubMed

    van Nunen, Erik; Vermeulen, Roel; Tsai, Ming-Yi; Probst-Hensch, Nicole; Ineichen, Alex; Davey, Mark; Imboden, Medea; Ducret-Stich, Regina; Naccarati, Alessio; Raffaele, Daniela; Ranzi, Andrea; Ivaldi, Cristiana; Galassi, Claudia; Nieuwenhuijsen, Mark; Curto, Ariadna; Donaire-Gonzalez, David; Cirach, Marta; Chatzi, Leda; Kampouri, Mariza; Vlaanderen, Jelle; Meliefste, Kees; Buijtenhuijs, Daan; Brunekreef, Bert; Morley, David; Vineis, Paolo; Gulliver, John; Hoek, Gerard

    2017-03-21

    Long-term ultrafine particle (UFP) exposure estimates at a fine spatial scale are needed for epidemiological studies. Land use regression (LUR) models were developed and evaluated for six European areas based on repeated 30 min monitoring following standardized protocols. In each area; Basel (Switzerland), Heraklion (Greece), Amsterdam, Maastricht, and Utrecht ("The Netherlands"), Norwich (United Kingdom), Sabadell (Spain), and Turin (Italy), 160-240 sites were monitored to develop LUR models by supervised stepwise selection of GIS predictors. For each area and all areas combined, 10 models were developed in stratified random selections of 90% of sites. UFP prediction robustness was evaluated with the intraclass correlation coefficient (ICC) at 31-50 external sites per area. Models from Basel and The Netherlands were validated against repeated 24 h outdoor measurements. Structure and model R(2) of local models were similar within, but varied between areas (e.g., 38-43% Turin; 25-31% Sabadell). Robustness of predictions within areas was high (ICC 0.73-0.98). External validation R(2) was 53% in Basel and 50% in The Netherlands. Combined area models were robust (ICC 0.93-1.00) and explained UFP variation almost equally well as local models. In conclusion, robust UFP LUR models could be developed on short-term monitoring, explaining around 50% of spatial variance in longer-term measurements.

  4. Maximum likelihood training of connectionist models: comparison with least squares back-propagation and logistic regression.

    PubMed Central

    Spackman, K. A.

    1991-01-01

    This paper presents maximum likelihood back-propagation (ML-BP), an approach to training neural networks. The widely reported original approach uses least squares back-propagation (LS-BP), minimizing the sum of squared errors (SSE). Unfortunately, least squares estimation does not give a maximum likelihood (ML) estimate of the weights in the network. Logistic regression, on the other hand, gives ML estimates for single layer linear models only. This report describes how to obtain ML estimates of the weights in a multi-layer model, and compares LS-BP to ML-BP using several examples. It shows that in many neural networks, least squares estimation gives inferior results and should be abandoned in favor of maximum likelihood estimation. Questions remain about the potential uses of multi-level connectionist models in such areas as diagnostic systems and risk-stratification in outcomes research. PMID:1807606

  5. Model selection for marginal regression analysis of longitudinal data with missing observations and covariate measurement error.

    PubMed

    Shen, Chung-Wei; Chen, Yi-Hau

    2015-10-01

    Missing observations and covariate measurement error commonly arise in longitudinal data. However, existing methods for model selection in marginal regression analysis of longitudinal data fail to address the potential bias resulting from these issues. To tackle this problem, we propose a new model selection criterion, the Generalized Longitudinal Information Criterion, which is based on an approximately unbiased estimator for the expected quadratic error of a considered marginal model accounting for both data missingness and covariate measurement error. The simulation results reveal that the proposed method performs quite well in the presence of missing data and covariate measurement error. On the contrary, the naive procedures without taking care of such complexity in data may perform quite poorly. The proposed method is applied to data from the Taiwan Longitudinal Study on Aging to assess the relationship of depression with health and social status in the elderly, accommodating measurement error in the covariate as well as missing observations.

  6. Efficient Inference of Parsimonious Phenomenological Models of Cellular Dynamics Using S-Systems and Alternating Regression

    PubMed Central

    Daniels, Bryan C.; Nemenman, Ilya

    2015-01-01

    The nonlinearity of dynamics in systems biology makes it hard to infer them from experimental data. Simple linear models are computationally efficient, but cannot incorporate these important nonlinearities. An adaptive method based on the S-system formalism, which is a sensible representation of nonlinear mass-action kinetics typically found in cellular dynamics, maintains the efficiency of linear regression. We combine this approach with adaptive model selection to obtain efficient and parsimonious representations of cellular dynamics. The approach is tested by inferring the dynamics of yeast glycolysis from simulated data. With little computing time, it produces dynamical models with high predictive power and with structural complexity adapted to the difficulty of the inference problem. PMID:25806510

  7. The development of a risk-adjusted capitation payment system: the Maryland Medicaid model.

    PubMed

    Weiner, J P; Tucker, A M; Collins, A M; Fakhraei, H; Lieberman, R; Abrams, C; Trapnell, G R; Folkemer, J G

    1998-10-01

    This article describes the risk-adjusted payment methodology employed by the Maryland Medicaid program to pay managed care organizations. It also presents an empirical simulation analysis using claims data from 230,000 Maryland Medicaid recipients. This simulation suggests that the new payment model will help adjust for adverse or favorable selection. The article is intended for a wide audience, including state and national policy makers concerned with the design of managed care Medicaid programs and actuaries, analysts, and researchers involved in the design and implementation of risk-adjusted capitation payment systems.

  8. Using regression heteroscedasticity to model trends in the mean and variance of floods

    NASA Astrophysics Data System (ADS)

    Hecht, Jory; Vogel, Richard

    2015-04-01

    Changes in the frequency of extreme floods have been observed and anticipated in many hydrological settings in response to numerous drivers of environmental change, including climate, land cover, and infrastructure. To help decision-makers design flood control infrastructure in settings with non-stationary hydrological regimes, a parsimonious approach for detecting and modeling trends in extreme floods is needed. An approach using ordinary least squares (OLS) to fit a heteroscedastic regression model can accommodate nonstationarity in both the mean and variance of flood series while simultaneously offering a means of (i) analytically evaluating type I and type II trend detection errors, (ii) analytically generating expressions of uncertainty, such as confidence and prediction intervals, (iii) providing updated estimates of the frequency of floods exceeding the flood of record, (iv) accommodating a wide range of non-linear functions through ladder of powers transformations, and (v) communicating hydrological changes in a single graphical image. Previous research has shown that the two-parameter lognormal distribution can adequately model the annual maximum flood distribution of both stationary and non-stationary hydrological regimes in many regions of the United States. A simple logarithmic transformation of annual maximum flood series enables an OLS heteroscedastic regression modeling approach to be especially suitable for creating a non-stationary flood frequency distribution with parameters that are conditional upon time or physically meaningful covariates. While heteroscedasticity is often viewed as an impediment, we document how detecting and modeling heteroscedasticity presents an opportunity for characterizing both the conditional mean and variance of annual maximum floods. We introduce an approach through which variance trend models can be analytically derived from the behavior of residuals of the conditional mean flood model. Through case studies of

  9. Combining genomic and genealogical information in a reproducing kernel Hilbert spaces regression model for genome-enabled predictions in dairy cattle.

    PubMed

    Rodríguez-Ramilo, Silvia Teresa; García-Cortés, Luis Alberto; González-Recio, Oscar

    2014-01-01

    Genome-enhanced genotypic evaluations are becoming popular in several livestock species. For this purpose, the combination of the pedigree-based relationship matrix with a genomic similarities matrix between individuals is a common approach. However, the weight placed on each matrix has been so far established with ad hoc procedures, without formal estimation thereof. In addition, when using marker- and pedigree-based relationship matrices together, the resulting combined relationship matrix needs to be adjusted to the same scale in reference to the base population. This study proposes a semi-parametric Bayesian method for combining marker- and pedigree-based information on genome-enabled predictions. A kernel matrix from a reproducing kernel Hilbert spaces regression model was used to combine genomic and genealogical information in a semi-parametric scenario, avoiding inversion and adjustment complications. In addition, the weights on marker- versus pedigree-based information were inferred from a Bayesian model with Markov chain Monte Carlo. The proposed method was assessed involving a large number of SNPs and a large reference population. Five phenotypes, including production and type traits of dairy cattle were evaluated. The reliability of the genome-based predictions was assessed using the correlation, regression coefficient and mean squared error between the predicted and observed values. The results indicated that when a larger weight was given to the pedigree-based relationship matrix the correlation coefficient was lower than in situations where more weight was given to genomic information. Importantly, the posterior means of the inferred weight were near the maximum of 1. The behavior of the regression coefficient and the mean squared error was similar to the performance of the correlation, that is, more weight to the genomic information provided a regression coefficient closer to one and a smaller mean squared error. Our results also indicated a greater

  10. Modeling and Control of the Redundant Parallel Adjustment Mechanism on a Deployable Antenna Panel

    PubMed Central

    Tian, Lili; Bao, Hong; Wang, Meng; Duan, Xuechao

    2016-01-01

    With the aim of developing multiple input and multiple output (MIMO) coupling systems with a redundant parallel adjustment mechanism on the deployable antenna panel, a structural control integrated design methodology is proposed in this paper. Firstly, the modal information from the finite element model of the structure of the antenna panel is extracted, and then the mathematical model is established with the Hamilton principle; Secondly, the discrete Linear Quadratic Regulator (LQR) controller is added to the model in order to control the actuators and adjust the shape of the panel. Finally, the engineering practicality of the modeling and control method based on finite element analysis simulation is verified. PMID:27706076

  11. Modeling and Control of the Redundant Parallel Adjustment Mechanism on a Deployable Antenna Panel.

    PubMed

    Tian, Lili; Bao, Hong; Wang, Meng; Duan, Xuechao

    2016-10-01

    With the aim of developing multiple input and multiple output (MIMO) coupling systems with a redundant parallel adjustment mechanism on the deployable antenna panel, a structural control integrated design methodology is proposed in this paper. Firstly, the modal information from the finite element model of the structure of the antenna panel is extracted, and then the mathematical model is established with the Hamilton principle; Secondly, the discrete Linear Quadratic Regulator (LQR) controller is added to the model in order to control the actuators and adjust the shape of the panel. Finally, the engineering practicality of the modeling and control method based on finite element analysis simulation is verified.

  12. Breeding value accuracy estimates for growth traits using random regression and multi-trait models in Nelore cattle.

    PubMed

    Boligon, A A; Baldi, F; Mercadante, M E Z; Lobo, R B; Pereira, R J; Albuquerque, L G

    2011-06-28

    We quantified the potential increase in accuracy of expected breeding value for weights of Nelore cattle, from birth to mature age, using multi-trait and random regression models on Legendre polynomials and B-spline functions. A total of 87,712 weight records from 8144 females were used, recorded every three months from birth to mature age from the Nelore Brazil Program. For random regression analyses, all female weight records from birth to eight years of age (data set I) were considered. From this general data set, a subset was created (data set II), which included only nine weight records: at birth, weaning, 365 and 550 days of age, and 2, 3, 4, 5, and 6 years of age. Data set II was analyzed using random regression and multi-trait models. The model of analysis included the contemporary group as fixed effects and age of dam as a linear and quadratic covariable. In the random regression analyses, average growth trends were modeled using a cubic regression on orthogonal polynomials of age. Residual variances were modeled by a step function with five classes. Legendre polynomials of fourth and sixth order were utilized to model the direct genetic and animal permanent environmental effects, respectively, while third-order Legendre polynomials were considered for maternal genetic and maternal permanent environmental effects. Quadratic polynomials were applied to model all random effects in random regression models on B-spline functions. Direct genetic and animal permanent environmental effects were modeled using three segments or five coefficients, and genetic maternal and maternal permanent environmental effects were modeled with one segment or three coefficients in the random regression models on B-spline functions. For both data sets (I and II), animals ranked differently according to expected breeding value obtained by random regression or multi-trait models. With random regression models, the highest gains in accuracy were obtained at ages with a low number of

  13. Modeling Group Size and Scalar Stress by Logistic Regression from an Archaeological Perspective

    PubMed Central

    Alberti, Gianmarco

    2014-01-01

    Johnson’s scalar stress theory, describing the mechanics of (and the remedies to) the increase in in-group conflictuality that parallels the increase in groups’ size, provides scholars with a useful theoretical framework for the understanding of different aspects of the material culture of past communities (i.e., social organization, communal food consumption, ceramic style, architecture and settlement layout). Due to its relevance in archaeology and anthropology, the article aims at proposing a predictive model of critical level of scalar stress on the basis of community size. Drawing upon Johnson’s theory and on Dunbar’s findings on the cognitive constrains to human group size, a model is built by means of Logistic Regression on the basis of the data on colony fissioning among the Hutterites of North America. On the grounds of the theoretical framework sketched in the first part of the article, the absence or presence of colony fissioning is considered expression of not critical vs. critical level of scalar stress for the sake of the model building. The model, which is also tested against a sample of archaeological and ethnographic cases: a) confirms the existence of a significant relationship between critical scalar stress and group size, setting the issue on firmer statistical grounds; b) allows calculating the intercept and slope of the logistic regression model, which can be used in any time to estimate the probability that a community experienced a critical level of scalar stress; c) allows locating a critical scalar stress threshold at community size 127 (95% CI: 122–132), while the maximum probability of critical scale stress is predicted at size 158 (95% CI: 147–170). The model ultimately provides grounds to assess, for the sake of any further archaeological/anthropological interpretation, the probability that a group reached a hot spot of size development critical for its internal cohesion. PMID:24626241

  14. Modeling group size and scalar stress by logistic regression from an archaeological perspective.

    PubMed

    Alberti, Gianmarco

    2014-01-01

    Johnson's scalar stress theory, describing the mechanics of (and the remedies to) the increase in in-group conflictuality that parallels the increase in groups' size, provides scholars with a useful theoretical framework for the understanding of different aspects of the material culture of past communities (i.e., social organization, communal food consumption, ceramic style, architecture and settlement layout). Due to its relevance in archaeology and anthropology, the article aims at proposing a predictive model of critical level of scalar stress on the basis of community size. Drawing upon Johnson's theory and on Dunbar's findings on the cognitive constrains to human group size, a model is built by means of Logistic Regression on the basis of the data on colony fissioning among the Hutterites of North America. On the grounds of the theoretical framework sketched in the first part of the article, the absence or presence of colony fissioning is considered expression of not critical vs. critical level of scalar stress for the sake of the model building. The model, which is also tested against a sample of archaeological and ethnographic cases: a) confirms the existence of a significant relationship between critical scalar stress and group size, setting the issue on firmer statistical grounds; b) allows calculating the intercept and slope of the logistic regression model, which can be used in any time to estimate the probability that a community experienced a critical level of scalar stress; c) allows locating a critical scalar stress threshold at community size 127 (95% CI: 122-132), while the maximum probability of critical scale stress is predicted at size 158 (95% CI: 147-170). The model ultimately provides grounds to assess, for the sake of any further archaeological/anthropological interpretation, the probability that a group reached a hot spot of size development critical for its internal cohesion.

  15. A Bayesian Nonlinear Mixed-Effects Regression Model for the Characterization of Early Bactericidal Activity of Tuberculosis Drugs

    PubMed Central

    Burger, Divan Aristo; Schall, Robert

    2015-01-01

    Trials of the early bactericidal activity (EBA) of tuberculosis (TB) treatments assess the decline, during the first few days to weeks of treatment, in colony forming unit (CFU) count of Mycobacterium tuberculosis in the sputum of patients with smear-microscopy-positive pulmonary TB. Profiles over time of CFU data have conventionally been modeled using linear, bilinear, or bi-exponential regression. We propose a new biphasic nonlinear regression model for CFU data that comprises linear and bilinear regression models as special cases and is more flexible than bi-exponential regression models. A Bayesian nonlinear mixed-effects (NLME) regression model is fitted jointly to the data of all patients from a trial, and statistical inference about the mean EBA of TB treatments is based on the Bayesian NLME regression model. The posterior predictive distribution of relevant slope parameters of the Bayesian NLME regression model provides insight into the nature of the EBA of TB treatments; specifically, the posterior predictive distribution allows one to judge whether treatments are associated with monolinear or bilinear decline of log(CFU) count, and whether CFU count initially decreases fast, followed by a slower rate of decrease, or vice versa. PMID:25322214

  16. Near infrared spectrometric technique for testing fruit quality: optimisation of regression models using genetic algorithms

    NASA Astrophysics Data System (ADS)

    Isingizwe Nturambirwe, J. Frédéric; Perold, Willem J.; Opara, Umezuruike L.

    2016-02-01

    Near infrared (NIR) spectroscopy has gained extensive use in quality evaluation. It is arguably one of the most advanced spectroscopic tools in non-destructive quality testing of food stuff, from measurement to data analysis and interpretation. NIR spectral data are interpreted through means often involving multivariate statistical analysis, sometimes associated with optimisation techniques for model improvement. The objective of this research was to explore the extent to which genetic algorithms (GA) can be used to enhance model development, for predicting fruit quality. Apple fruits were used, and NIR spectra in the range from 12000 to 4000 cm-1 were acquired on both bruised and healthy tissues, with different degrees of mechanical damage. GAs were used in combination with partial least squares regression methods to develop bruise severity prediction models, and compared to PLS models developed using the full NIR spectrum. A classification model was developed, which clearly separated bruised from unbruised apple tissue. GAs helped improve prediction models by over 10%, in comparison with full spectrum-based models, as evaluated in terms of error of prediction (Root Mean Square Error of Cross-validation). PLS models to predict internal quality, such as sugar content and acidity were developed and compared to the versions optimized by genetic algorithm. Overall, the results highlighted the potential use of GA method to improve speed and accuracy of fruit quality prediction.

  17. A New Global Regression Analysis Method for the Prediction of Wind Tunnel Model Weight Corrections

    NASA Technical Reports Server (NTRS)

    Ulbrich, Norbert Manfred; Bridge, Thomas M.; Amaya, Max A.

    2014-01-01

    A new global regression analysis method is discussed that predicts wind tunnel model weight corrections for strain-gage balance loads during a wind tunnel test. The method determines corrections by combining "wind-on" model attitude measurements with least squares estimates of the model weight and center of gravity coordinates that are obtained from "wind-off" data points. The method treats the least squares fit of the model weight separate from the fit of the center of gravity coordinates. Therefore, it performs two fits of "wind- off" data points and uses the least squares estimator of the model weight as an input for the fit of the center of gravity coordinates. Explicit equations for the least squares estimators of the weight and center of gravity coordinates are derived that simplify the implementation of the method in the data system software of a wind tunnel. In addition, recommendations for sets of "wind-off" data points are made that take typical model support system constraints into account. Explicit equations of the confidence intervals on the model weight and center of gravity coordinates and two different error analyses of the model weight prediction are also discussed in the appendices of the paper.

  18. Boosted structured additive regression for Escherichia coli fed-batch fermentation modeling.

    PubMed

    Melcher, Michael; Scharl, Theresa; Luchner, Markus; Striedner, Gerald; Leisch, Friedrich

    2017-02-01

    The quality of biopharmaceuticals and patients' safety are of highest priority and there are tremendous efforts to replace empirical production process designs by knowledge-based approaches. Main challenge in this context is that real-time access to process variables related to product quality and quantity is severely limited. To date comprehensive on- and offline monitoring platforms are used to generate process data sets that allow for development of mechanistic and/or data driven models for real-time prediction of these important quantities. Ultimate goal is to implement model based feed-back control loops that facilitate online control of product quality. In this contribution, we explore structured additive regression (STAR) models in combination with boosting as a variable selection tool for modeling the cell dry mass, product concentration, and optical density on the basis of online available process variables and two-dimensional fluorescence spectroscopic data. STAR models are powerful extensions of linear models allowing for inclusion of smooth effects or interactions between predictors. Boosting constructs the final model in a stepwise manner and provides a variable importance measure via predictor selection frequencies. Our results show that the cell dry mass can be modeled with a relative error of about ±3%, the optical density with ±6%, the soluble protein with ±16%, and the insoluble product with an accuracy of ±12%. Biotechnol. Bioeng. 2017;114: 321-334. © 2016 Wiley Periodicals, Inc.

  19. Simultaneous confidence bands for nonlinear regression models with application to population pharmacokinetic analyses.

    PubMed

    Gsteiger, S; Bretz, F; Liu, W

    2011-07-01

    Many applications in biostatistics rely on nonlinear regression models, such as, for example, population pharmacokinetic and pharmacodynamic modeling, or modeling approaches for dose-response characterization and dose selection. Such models are often expressed as nonlinear mixed-effects models, which are implemented in all major statistical software packages. Inference on the model curve can be based on the estimated parameters, from which pointwise confidence intervals for the mean profile at any single point in the covariate region (time, dose, etc.) can be derived. These pointwise confidence intervals, however, should not be used for simultaneous inferences beyond that single covariate value. If assessment over the entire covariate region is required, the joint coverage probability by using the combined pointwise confidence intervals is likely to be less than the nominal coverage probability. In this paper we consider simultaneous confidence bands for the mean profile over the covariate region of interest and propose two large-sample methods for their construction. The first method is based on the Schwarz inequality and an asymptotic χ(2) distribution. The second method relies on simulating from a multivariate normal distribution. We illustrate the methods with the pharmacokinetics of theophylline. In addition, we report the results of an extensive simulation study to investigate the operating characteristics of the two construction methods. Finally, we present extensions to construct simultaneous confidence bands for the difference of two models and to assess equivalence between two models in biosimilarity applications.

  20. [Application of Land-use Regression Models in Spatial-temporal Differentiation of Air Pollution].

    PubMed

    Wu, Jian-sheng; Xie, Wu-dan; Li, Jia-cheng

    2016-02-15

    With the rapid development of urbanization, industrialization and motorization, air pollution has become one of the most serious environmental problems in our country, which has negative impacts on public health and ecological environment. LUR model is one of the common methods simulating spatial-temporal differentiation of air pollution at city scale. It has broad application in Europe and North America, but not really in China. Based on many studies at home and abroad, this study started with the main steps to develop LUR model, including obtaining the monitoring data, generating variables, developing models, model validation and regression mapping. Then a conclusion was drawn on the progress of LUR models in spatial-temporal differentiation of air pollution. Furthermore, the research focus and orientation in the future were prospected, including highlighting spatial-temporal differentiation, increasing classes of model variables and improving the methods of model development. This paper was aimed to popularize the application of LUR model in China, and provide a methodological basis for human exposure, epidemiologic study and health risk assessment.

  1. Bayesian quantile regression-based nonlinear mixed-effects joint models for time-to-event and longitudinal data with multiple features.

    PubMed

    Huang, Yangxin; Chen, Jiaqing

    2016-12-30

    This article explores Bayesian joint models for a quantile of longitudinal response, mismeasured covariate and event time outcome with an attempt to (i) characterize the entire conditional distribution of the response variable based on quantile regression that may be more robust to outliers and misspecification of error distribution; (ii) tailor accuracy from measurement error, evaluate non-ignorable missing observations, and adjust departures from normality in covariate; and (iii) overcome shortages of confidence in specifying a time-to-event model. When statistical inference is carried out for a longitudinal data set with non-central location, non-linearity, non-normality, measurement error, and missing values as well as event time with being interval censored, it is important to account for the simultaneous treatment of these data features in order to obtain more reliable and robust inferential results. Toward this end, we develop Bayesian joint modeling approach to simultaneously estimating all parameters in the three models: quantile regression-based nonlinear mixed-effects model for response using asymmetric Laplace distribution, linear mixed-effects model with skew-t distribution for mismeasured covariate in the presence of informative missingness and accelerated failure time model with unspecified nonparametric distribution for event time. We apply the proposed modeling approach to analyzing an AIDS clinical data set and conduct simulation studies to assess the performance of the proposed joint models and method. Copyright © 2016 John Wiley & Sons, Ltd.

  2. On the hydrologic adjustment of climate-model projections: The potential pitfall of potential evapotranspiration

    USGS Publications Warehouse

    Milly, P.C.D.; Dunne, K.A.

    2011-01-01

    Hydrologic models often are applied to adjust projections of hydroclimatic change that come from climate models. Such adjustment includes climate-bias correction, spatial refinement ("downscaling"), and consideration of the roles of hydrologic processes that were neglected in the climate model. Described herein is a quantitative analysis of the effects of hydrologic adjustment on the projections of runoff change associated with projected twenty-first-century climate change. In a case study including three climate models and 10 river basins in the contiguous United States, the authors find that relative (i.e., fractional or percentage) runoff change computed with hydrologic adjustment more often than not was less positive (or, equivalently, more negative) than what was projected by the climate models. The dominant contributor to this decrease in runoff was a ubiquitous change in runoff (median 211%) caused by the hydrologic model's apparent amplification of the climate-model-implied growth in potential evapotranspiration. Analysis suggests that the hydrologic model, on the basis of the empirical, temperature-based modified Jensen-Haise formula, calculates a change in potential evapotranspiration that is typically 3 times the change implied by the climate models, which explicitly track surface energy budgets. In comparison with the amplification of potential evapotranspiration, central tendencies of other contributions from hydrologic adjustment (spatial refinement, climate-bias adjustment, and process refinement) were relatively small. The authors' findings highlight the need for caution when projecting changes in potential evapotranspiration for use in hydrologic models or drought indices to evaluate climatechange impacts on water. Copyright ?? 2011, Paper 15-001; 35,952 words, 3 Figures, 0 Animations, 1 Tables.

  3. Method validation using weighted linear regression models for quantification of UV filters in water samples.

    PubMed

    da Silva, Claudia Pereira; Emídio, Elissandro Soares; de Marchi, Mary Rosa Rodrigues

    2015-01-01

    This paper describes the validation of a method consisting of solid-phase extraction followed by gas chromatography-tandem mass spectrometry for the analysis of the ultraviolet (UV) filters benzophenone-3, ethylhexyl salicylate, ethylhexyl methoxycinnamate and octocrylene. The method validation criteria included evaluation of selectivity, analytical curve, trueness, precision, limits of detection and limits of quantification. The non-weighted linear regression model has traditionally been used for calibration, but it is not necessarily the optimal model in all cases. Because the assumption of homoscedasticity was not met for the analytical data in this work, a weighted least squares linear regression was used for the calibration method. The evaluated analytical parameters were satisfactory for the analytes and showed recoveries at four fortification levels between 62% and 107%, with relative standard deviations less than 14%. The detection limits ranged from 7.6 to 24.1 ng L(-1). The proposed method was used to determine the amount of UV filters in water samples from water treatment plants in Araraquara and Jau in São Paulo, Brazil.

  4. Causal relationship model between variables using linear regression to improve professional commitment of lecturer

    NASA Astrophysics Data System (ADS)

    Setyaningsih, S.

    2017-01-01

    The main element to build a leading university requires lecturer commitment in a professional manner. Commitment is measured through willpower, loyalty, pride, loyalty, and integrity as a professional lecturer. A total of 135 from 337 university lecturers were sampled to collect data. Data were analyzed using validity and reliability test and multiple linear regression. Many studies have found a link on the commitment of lecturers, but the basic cause of the causal relationship is generally neglected. These results indicate that the professional commitment of lecturers affected by variables empowerment, academic culture, and trust. The relationship model between variables is composed of three substructures. The first substructure consists of endogenous variables professional commitment and exogenous three variables, namely the academic culture, empowerment and trust, as well as residue variable ɛ y . The second substructure consists of one endogenous variable that is trust and two exogenous variables, namely empowerment and academic culture and the residue variable ɛ 3. The third substructure consists of one endogenous variable, namely the academic culture and exogenous variables, namely empowerment as well as residue variable ɛ 2. Multiple linear regression was used in the path model for each substructure. The results showed that the hypothesis has been proved and these findings provide empirical evidence that increasing the variables will have an impact on increasing the professional commitment of the lecturers.

  5. Technology diffusion in hospitals: a log odds random effects regression model.

    PubMed

    Blank, Jos L T; Valdmanis, Vivian G

    2015-01-01

    This study identifies the factors that affect the diffusion of hospital innovations. We apply a log odds random effects regression model on hospital micro data. We introduce the concept of clustering innovations and the application of a log odds random effects regression model to describe the diffusion of technologies. We distinguish a number of determinants, such as service, physician, and environmental, financial and organizational characteristics of the 60 Dutch hospitals in our sample. On the basis of this data set on Dutch general hospitals over the period 1995-2002, we conclude that there is a relation between a number of determinants and the diffusion of innovations underlining conclusions from earlier research. Positive effects were found on the basis of the size of the hospitals, competition and a hospital's commitment to innovation. It appears that if a policy is developed to further diffuse innovations, the external effects of demand and market competition need to be examined, which would de facto lead to an efficient use of technology. For the individual hospital, instituting an innovations office appears to be the most prudent course of action.

  6. Regression model for estimating inactivation of microbial aerosols by solar radiation.

    PubMed

    Ben-David, Avishai; Sagripanti, Jose-Luis

    2013-01-01

    The inactivation of pathogenic aerosols by solar radiation is relevant to public health and biodefense. We investigated whether a relatively simple method to calculate solar diffuse and total irradiances could be developed and used in environmental photobiology estimations instead of complex atmospheric radiative transfer computer programs. The second-order regression model that we developed reproduced 13 radiation quantities calculated for equinoxes and solstices at 35(°) latitude with a computer-intensive and rather complex atmospheric radiative transfer program (MODTRAN) with a mean error <6% (2% for most radiation quantities). Extending the application of the regression model from a reference latitude and date (chosen as 35° latitude for 21 March) to different latitudes and days of the year was accomplished with variable success: usually with a mean error <15% (but as high as 150% for some combination of latitudes and days of year). This accuracy of the methodology proposed here compares favorably to photobiological experiments where the microbial survival is usually measured with an accuracy no better than ±0.5 log10 units. The approach and equations presented in this study should assist in estimating the maximum time during which microbial pathogens remain infectious after accidental or intentional aerosolization in open environments.

  7. Combining regression analysis and air quality modelling to predict benzene concentration levels

    NASA Astrophysics Data System (ADS)

    Vlachokostas, Ch.; Achillas, Ch.; Chourdakis, E.; Moussiopoulos, N.

    2011-05-01

    State of the art epidemiological research has found consistent associations between traffic-related air pollution and various outcomes, such as respiratory symptoms and premature mortality. However, many urban areas are characterised by the absence of the necessary monitoring infrastructure, especially for benzene (C 6H 6), which is a known human carcinogen. The use of environmental statistics combined with air quality modelling can be of vital importance in order to assess air quality levels of traffic-related pollutants in an urban area in the case where there are no available measurements. This paper aims at developing and presenting a reliable approach, in order to forecast C 6H 6 levels in urban environments, demonstrated for Thessaloniki, Greece. Multiple stepwise regression analysis is used and a strong statistical relationship is detected between C 6H 6 and CO. The adopted regression model is validated in order to depict its applicability and representativeness. The presented results demonstrate that the adopted approach is capable of capturing C 6H 6 concentration trends and should be considered as complementary to air quality monitoring.

  8. Robust inference in the negative binomial regression model with an application to falls data.

    PubMed

    Aeberhard, William H; Cantoni, Eva; Heritier, Stephane

    2014-12-01

    A popular way to model overdispersed count data, such as the number of falls reported during intervention studies, is by means of the negative binomial (NB) distribution. Classical estimating methods are well-known to be sensitive to model misspecifications, taking the form of patients falling much more than expected in such intervention studies where the NB regression model is used. We extend in this article two approaches for building robust M-estimators of the regression parameters in the class of generalized linear models to the NB distribution. The first approach achieves robustness in the response by applying a bounded function on the Pearson residuals arising in the maximum likelihood estimating equations, while the second approach achieves robustness by bounding the unscaled deviance components. For both approaches, we explore different choices for the bounding functions. Through a unified notation, we show how close these approaches may actually be as long as the bounding functions are chosen and tuned appropriately, and provide the asymptotic distributions of the resulting estimators. Moreover, we introduce a robust weighted maximum likelihood estimator for the overdispersion parameter, specific to the NB distribution. Simulations under various settings show that redescending bounding functions yield estimates with smaller biases under contamination while keeping high efficiency at the assumed model, and this for both approaches. We present an application to a recent randomized controlled trial measuring the effectiveness of an exercise program at reducing the number of falls among people suffering from Parkinsons disease to illustrate the diagnostic use of such robust procedures and their need for reliable inference.

  9. A land use regression model incorporating data on industrial point source pollution.

    PubMed

    Chen, Li; Wang, Yuming; Li, Peiwu; Ji, Yaqin; Kong, Shaofei; Li, Zhiyong; Bai, Zhipeng

    2012-01-01

    Advancing the understanding of the spatial aspects of air pollution in the city regional environment is an area where improved methods can be of great benefit to exposure assessment and policy support. We created land use regression (LUR) models for SO2, NO2 and PM10 for Tianjin, China. Traffic volumes, road networks, land use data, population density, meteorological conditions, physical conditions and satellite-derived greenness, brightness and wetness were used for predicting SO2, NO2 and PM10 concentrations. We incorporated data on industrial point sources to improve LUR model performance. In order to consider the impact of different sources, we calculated the PSIndex, LSIndex and area of different land use types (agricultural land, industrial land, commercial land, residential land, green space and water area) within different buffer radii (1 to 20 km). This method makes up for the lack of consideration of source impact based on the LUR model. Remote sensing-derived variables were significantly correlated with gaseous pollutant concentrations such as SO2 and NO2. R2 values of the multiple linear regression equations for SO2, NO2 and PM10 were 0.78, 0.89 and 0.84, respectively, and the RMSE values were 0.32, 0.18 and 0.21, respectively. Model predictions at validation monitoring sites went well with predictions generally within 15% of measured values. Compared to the relationship between dependent variables and simple variables (such as traffic variables or meteorological condition variables), the relationship between dependent variables and integrated variables was more consistent with a linear relationship. Such integration has a discernable influence on both the overall model prediction and health effects assessment on the spatial distribution of air pollution in the city region.

  10. Object-Oriented Regression for Building Predictive Models with High Dimensional Omics Data from Translational Studies

    PubMed Central

    Zhao, Lue Ping; Bolouri, Hamid

    2016-01-01

    Maturing omics technologies enable researchers to generate high dimension omics data (HDOD) routinely in translational clinical studies. In the field of oncology, The Cancer Genome Atlas (TCGA) provided funding support to researchers to generate different types of omics data on a common set of biospecimens with accompanying clinical data and to make the data available for the research community to mine. One important application, and the focus of this manuscript, is to build predictive models for prognostic outcomes based on HDOD. To complement prevailing regression-based approaches, we propose to use an object-oriented regression (OOR) methodology to identify exemplars specified by HDOD patterns and to assess their associations with prognostic outcome. Through computing patient’s similarities to these exemplars, the OOR-based predictive model produces a risk estimate using a patient’s HDOD. The primary advantages of OOR are twofold: reducing the penalty of high dimensionality and retaining the interpretability to clinical practitioners. To illustrate its utility, we apply OOR to gene expression data from non-small cell lung cancer patients in TCGA and build a predictive model for prognostic survivorship among stage I patients, i.e., we stratify these patients by their prognostic survival risks beyond histological classifications. Identification of these high-risk patients helps oncologists to develop effective treatment protocols and post-treatment disease management plans. Using the TCGA data, the total sample is divided into training and validation data sets. After building up a predictive model in the training set, we compute risk scores from the predictive model, and validate associations of risk scores with prognostic outcome in the validation data (p=0.015). PMID:26972839

  11. Object-oriented regression for building predictive models with high dimensional omics data from translational studies.

    PubMed

    Zhao, Lue Ping; Bolouri, Hamid

    2016-04-01

    Maturing omics technologies enable researchers to generate high dimension omics data (HDOD) routinely in translational clinical studies. In the field of oncology, The Cancer Genome Atlas (TCGA) provided funding support to researchers to generate different types of omics data on a common set of biospecimens with accompanying clinical data and has made the data available for the research community to mine. One important application, and the focus of this manuscript, is to build predictive models for prognostic outcomes based on HDOD. To complement prevailing regression-based approaches, we propose to use an object-oriented regression (OOR) methodology to identify exemplars specified by HDOD patterns and to assess their associations with prognostic outcome. Through computing patient's similarities to these exemplars, the OOR-based predictive model produces a risk estimate using a patient's HDOD. The primary advantages of OOR are twofold: reducing the penalty of high dimensionality and retaining the interpretability to clinical practitioners. To illustrate its utility, we apply OOR to gene expression data from non-small cell lung cancer patients in TCGA and build a predictive model for prognostic survivorship among stage I patients, i.e., we stratify these patients by their prognostic survival risks beyond histological classifications. Identification of these high-risk patients helps oncologists to develop effective treatment protocols and post-treatment disease management plans. Using the TCGA data, the total sample is divided into training and validation data sets. After building up a predictive model in the training set, we compute risk scores from the predictive model, and validate associations of risk scores with prognostic outcome in the validation data (P-value=0.015).

  12. A Pearson-type goodness-of-fit test for stationary and time-continuous Markov regression models.

    PubMed

    Aguirre-Hernández, R; Farewell, V T

    2002-07-15

    Markov regression models describe the way in which a categorical response variable changes over time for subjects with different explanatory variables. Frequently it is difficult to measure the response variable on equally spaced discrete time intervals. Here we propose a Pearson-type goodness-of-fit test for stationary Markov regression models fitted to panel data. A parametric bootstrap algorithm is used to study the distribution of the test statistic. The proposed technique is applied to examine the fit of a Markov regression model used to identify markers for disease progression in psoriatic arthritis.

  13. Sample size matters: Investigating the optimal sample size for a logistic regression debris flow susceptibility model

    NASA Astrophysics Data System (ADS)

    Heckmann, Tobias; Gegg, Katharina; Becht, Michael

    2013-04-01

    Statistical approaches to landslide susceptibility modelling on the catchment and regional scale are used very frequently compared to heuristic and physically based approaches. In the present study, we deal with the problem of the optimal sample size for a logistic regression model. More specifically, a stepwise approach has been chosen in order to select those independent variables (from a number of derivatives of a digital elevation model and landcover data) that explain best the spatial distribution of debris flow initiation zones in two neighbouring central alpine catchments in Austria (used mutually for model calculation and validation). In order to minimise problems arising from spatial autocorrelation, we sample a single raster cell from each debris flow initiation zone within an inventory. In addition, as suggested by previous work using the "rare events logistic regression" approach, we take a sample of the remaining "non-event" raster cells. The recommendations given in the literature on the size of this sample appear to be motivated by practical considerations, e.g. the time and cost of acquiring data for non-event cases, which do not apply to the case of spatial data. In our study, we aim at finding empirically an "optimal" sample size in order to avoid two problems: First, a sample too large will violate the independent sample assumption as the independent variables are spatially autocorrelated; hence, a variogram analysis leads to a sample size threshold above which the average distance between sampled cells falls below the autocorrelation range of the independent variables. Second, if the sample is too small, repeated sampling will lead to very different results, i.e. the independent variables and hence the result of a single model calculation will be extremely dependent on the choice of non-event cells. Using a Monte-Carlo analysis with stepwise logistic regression, 1000 models are calculated for a wide range of sample sizes. For each sample size

  14. Self-modeling ordinal regression with time invariant covariates - An application to prostate cancer data.

    PubMed

    Shirazi, Aliakbar Mastani; Das, Kalyan; Pinheiro, Aluisio

    2015-07-28

    In a prostate cancer study, the severity of genito-urinary (bladder) toxicity is assessed for patients who were given different doses of radiation. The ordinal responses (severity of side effects) are recorded longitudinally along with the cancer stage of a patient. Differences among the patients due to time-invariant covariates are captured by the parameters. To build up a suitable framework for an analysis of such data, we propose the use of self-modeling ordinal longitudinal model where the conditional cumulative probabilities for a category of an outcome have a relation with shape-invariant model. Since patients suffering from a common disease usually exhibit a similar pattern, it is natural to build up a nonlinear model that is shape invariant. The model is essentially semi-parametric where the population time curve is modeled with penalized regression spline. Monte Carlo expectation maximization technique is used to estimate the parameters of the model. A simulation study is also carried out to justify the methodology used.

  15. Selection of relevant input variables in storm water quality modeling by multiobjective evolutionary polynomial regression paradigm

    NASA Astrophysics Data System (ADS)

    Creaco, E.; Berardi, L.; Sun, Siao; Giustolisi, O.; Savic, D.

    2016-04-01

    The growing availability of field data, from information and communication technologies (ICTs) in "smart" urban infrastructures, allows data modeling to understand complex phenomena and to support management decisions. Among the analyzed phenomena, those related to storm water quality modeling have recently been gaining interest in the scientific literature. Nonetheless, the large amount of available data poses the problem of selecting relevant variables to describe a phenomenon and enable robust data modeling. This paper presents a procedure for the selection of relevant input variables using the multiobjective evolutionary polynomial regression (EPR-MOGA) paradigm. The procedure is based on scrutinizing the explanatory variables that appear inside the set of EPR-MOGA symbolic model expressions of increasing complexity and goodness of fit to target output. The strategy also enables the selection to be validated by engineering judgement. In such context, the multiple case study extension of EPR-MOGA, called MCS-EPR-MOGA, is adopted. The application of the proposed procedure to modeling storm water quality parameters in two French catchments shows that it was able to significantly reduce the number of explanatory variables for successive analyses. Finally, the EPR-MOGA models obtained after the input selection are compared with those obtained by using the same technique without benefitting from input selection and with those obtained in previous works where other data-modeling techniques were used on the same data. The comparison highlights the effectiveness of both EPR-MOGA and the input selection procedure.

  16. On the Hydrologic Adjustment of Climate-Model Projections: The Potential Pitfall of Potential Evapotranspiration

    USGS Publications Warehouse

    Milly, Paul C.D.; Dunne, Krista A.

    2011-01-01

    Hydrologic models often are applied to adjust projections of hydroclimatic change that come from climate models. Such adjustment includes climate-bias correction, spatial refinement ("downscaling"), and consideration of the roles of hydrologic processes that were neglected in the climate model. Described herein is a quantitative analysis of the effects of hydrologic adjustment on the projections of runoff change associated with projected twenty-first-century climate change. In a case study including three climate models and 10 river basins in the contiguous United States, the authors find that relative (i.e., fractional or percentage) runoff change computed with hydrologic adjustment more often than not was less positive (or, equivalently, more negative) than what was projected by the climate models. The dominant contributor to this decrease in runoff was a ubiquitous change in runoff (median -11%) caused by the hydrologic model’s apparent amplification of the climate-model-implied growth in potential evapotranspiration. Analysis suggests that the hydrologic model, on the basis of the empirical, temperature-based modified Jensen–Haise formula, calculates a change in potential evapotranspiration that is typically 3 times the change implied by the climate models, which explicitly track surface energy budgets. In comparison with the amplification of potential evapotranspiration, central tendencies of other contributions from hydrologic adjustment (spatial refinement, climate-bias adjustment, and process refinement) were relatively small. The authors’ findings highlight the need for caution when projecting changes in potential evapotranspiration for use in hydrologic models or drought indices to evaluate climate-change impacts on water.

  17. Identifying significant covariates for anti-HIV treatment response: mechanism-based differential equation models and empirical semiparametric regression models.

    PubMed

    Huang, Yangxin; Liang, Hua; Wu, Hulin

    2008-10-15

    In this paper, the mechanism-based ordinary differential equation (ODE) model and the flexible semiparametric regression model are employed to identify the significant covariates for antiretroviral response in AIDS clinical trials. We consider the treatment effect as a function of three factors (or covariates) including pharmacokinetics, drug adherence and susceptibility. Both clinical and simulated data examples are given to illustrate these two different kinds of modeling approaches. We found that the ODE model is more powerful to model the mechanism-based nonlinear relationship between treatment effects and virological response biomarkers. The ODE model is also better in identifying the significant factors for virological response, although it is slightly liberal and there is a trend to include more factors (or covariates) in the model. The semiparametric mixed-effects regression model is very flexible to fit the virological response data, but it is too liberal to identify correct factors for the virological response; sometimes it may miss the correct factors. The ODE model is also biologically justifiable and good for predictions and simulations for various biological scenarios. The limitations of the ODE models include the high cost of computation and the requirement of biological assumptions that sometimes may not be easy to validate. The methodologies reviewed in this paper are also generally applicable to studies of other viruses such as hepatitis B virus or hepatitis C virus.

  18. Prediction of Filamentous Sludge Bulking using a State-based Gaussian Processes Regression Model

    NASA Astrophysics Data System (ADS)

    Liu, Yiqi; Guo, Jianhua; Wang, Qilin; Huang, Daoping

    2016-08-01

    Activated sludge process has been widely adopted to remove pollutants in wastewater treatment plants (WWTPs). However, stable operation of activated sludge process is often compromised by the occurrence of filamentous bulking. The aim of this study is to build a proper model for timely diagnosis and prediction of filamentous sludge bulking in an activated sludge process. This study developed a state-based Gaussian Process Regression (GPR) model to monitor the filamentous sludge bulking related parameter, sludge volume index (SVI), in such a way that the evolution of SVI can be predicted over multi-step ahead. This methodology was validated with SVI data collected from one full-scale WWTP. Online diagnosis and prediction of filamentous bulking sludge with real-time SVI prediction was tested through a simulation study. The results showed that the proposed methodology was capable of predicting future SVIs with good accuracy, thus providing sufficient time for predicting and controlling filamentous sludge bulking.

  19. Prediction of Filamentous Sludge Bulking using a State-based Gaussian Processes Regression Model

    PubMed Central

    Liu, Yiqi; Guo, Jianhua; Wang, Qilin; Huang, Daoping

    2016-01-01

    Activated sludge process has been widely adopted to remove pollutants in wastewater treatment plants (WWTPs). However, stable operation of activated sludge process is often compromised by the occurrence of filamentous bulking. The aim of this study is to build a proper model for timely diagnosis and prediction of filamentous sludge bulking in an activated sludge process. This study developed a state-based Gaussian Process Regression (GPR) model to monitor the filamentous sludge bulking related parameter, sludge volume index (SVI), in such a way that the evolution of SVI can be predicted over multi-step ahead. This methodology was validated with SVI data collected from one full-scale WWTP. Online diagnosis and prediction of filamentous bulking sludge with real-time SVI prediction was tested through a simulation study. The results showed that the proposed methodology was capable of predicting future SVIs with good accuracy, thus providing sufficient time for predicting and controlling filamentous sludge bulking. PMID:27498888

  20. Prediction of Corrosion Resistance of Some Dental Metallic Materials with an Adaptive Regression Model

    NASA Astrophysics Data System (ADS)

    Chelariu, Romeu; Suditu, Gabriel Dan; Mareci, Daniel; Bolat, Georgiana; Cimpoesu, Nicanor; Leon, Florin; Curteanu, Silvia

    2015-04-01

    The aim of this study is to investigate the electrochemical behavior of some dental metallic materials in artificial saliva for different pH (5.6 and 3.4), NaF content (500 ppm, 1000 ppm, and 2000 ppm), and with albumin protein addition (0.6 wt.%) for pH 3.4. The corrosion resistance of the alloys was quantitatively evaluated by polarization resistance, estimated by electrochemical impedance spectroscopy method. An adaptive k-nearest-neighbor regression method was applied for evaluating the corrosion resistance of the alloys by simulation, depending on the operation conditions. The predictions provided by the model are useful for experimental practice, as they can replace or, at least, help to plan the experiments. The accurate results obtained prove that the developed model is reliable and efficient.

  1. Mapping soil organic carbon stocks by robust geostatistical and boosted regression models

    NASA Astrophysics Data System (ADS)

    Nussbaum, Madlene; Papritz, Andreas; Baltensweiler, Andri; Walthert, Lorenz

    2013-04-01

    Carbon (C) sequestration in forests offsets greenhouse gas emissions. Therefore, quantifying C stocks and fluxes in forest ecosystems is of interest for greenhouse gas reporting according to the Kyoto protocol. In Switzerland, the National Forest Inventory offers comprehensive data to quantify the aboveground forest biomass and its change in time. Estimating stocks of soil organic C (SOC) in forests is more difficult because the variables needed to quantify stocks vary strongly in space and precise quantification of some of them is very costly. Based on data from 1'033 plots we modeled SOC stocks of the organic layer and the mineral soil to depths of 30 cm and 100 cm for the Swiss forested area. For the statistical modeling a broad range of covariates were available: Climate data (e. g. precipitation, temperature), two elevation models (resolutions 25 and 2 m) with respective terrain attributes and spectral reflectance data representing vegetation. Furthermore, the main mapping units of an overview soil map and a coarse scale geological map were used to coarsely represent the parent material of the soils. The selection of important covariates for SOC stocks modeling out of a large set was a major challenge for the statistical modeling. We used two approaches to deal with this problem: 1) A robust restricted maximum likelihood method to fit linear regression model with spatially correlated errors. The large number of covariates was first reduced by LASSO (Least Absolute Shrinkage and Selection Operator) and then further narrowed down to a parsimonious set of important covariates by cross-validation of the robustly fitted model. To account for nonlinear dependencies of the response on the covariates interaction terms of the latter were included in model if this improved the fit. 2) A boosted structured regression model with componentwise linear least squares or componentwise smoothing splines as base procedures. The selection of important covariates was done by the

  2. A note on modeling of tumor regression for estimation of radiobiological parameters

    SciTech Connect

    Zhong, Hualiang Chetty, Indrin

    2014-08-15

    Purpose: Accurate calculation of radiobiological parameters is crucial to predicting radiation treatment response. Modeling differences may have a significant impact on derived parameters. In this study, the authors have integrated two existing models with kinetic differential equations to formulate a new tumor regression model for estimation of radiobiological parameters for individual patients. Methods: A system of differential equations that characterizes the birth-and-death process of tumor cells in radiation treatment was analytically solved. The solution of this system was used to construct an iterative model (Z-model). The model consists of three parameters: tumor doubling time T{sub d}, half-life of dead cells T{sub r}, and cell survival fraction SF{sub D} under dose D. The Jacobian determinant of this model was proposed as a constraint to optimize the three parameters for six head and neck cancer patients. The derived parameters were compared with those generated from the two existing models: Chvetsov's model (C-model) and Lim's model (L-model). The C-model and L-model were optimized with the parameter T{sub d} fixed. Results: With the Jacobian-constrained Z-model, the mean of the optimized cell survival fractions is 0.43 ± 0.08, and the half-life of dead cells averaged over the six patients is 17.5 ± 3.2 days. The parameters T{sub r} and SF{sub D} optimized with the Z-model differ by 1.2% and 20.3% from those optimized with the T{sub d}-fixed C-model, and by 32.1% and 112.3% from those optimized with the T{sub d}-fixed L-model, respectively. Conclusions: The Z-model was analytically constructed from the differential equations of cell populations that describe changes in the number of different tumor cells during the course of radiation treatment. The Jacobian constraints were proposed to optimize the three radiobiological parameters. The generated model and its optimization method may help develop high-quality treatment regimens for individual patients.

  3. Inverse regression-based uncertainty quantification algorithms for high-dimensional models: Theory and practice

    SciTech Connect

    Li, Weixuan; Lin, Guang; Li, Bing

    2016-09-01

    A well-known challenge in uncertainty quantification (UQ) is the "curse of dimensionality". However, many high-dimensional UQ problems are essentially low-dimensional, because the randomness of the quantity of interest (QoI) is caused only by uncertain parameters varying within a low-dimensional subspace, known as the sufficient dimension reduction (SDR) subspace. Motivated by this observation, we propose and demonstrate in this paper an inverse regression-based UQ approach (IRUQ) for high-dimensional problems. Specifically, we use an inverse regression procedure to estimate the SDR subspace and then convert the original problem to a low-dimensional one, which can be efficiently solved by building a response surface model such as a polynomial chaos expansion. The novelty and advantages of the proposed approach is seen in its computational efficiency and practicality. Comparing with Monte Carlo, the traditionally preferred approach for high-dimensional UQ, IRUQ with a comparable cost generally gives much more accurate solutions even for high-dimensional problems, and even when the dimension reduction is not exactly sufficient. Theoretically, IRUQ is proved to converge twice as fast as the approach it uses seeking the SDR subspace. For example, while a sliced inverse regression method converges to the SDR subspace at the rate of $O(n^{-1/2})$, the corresponding IRUQ converges at $O(n^{-1})$. IRUQ also provides several desired conveniences in practice. It is non-intrusive, requiring only a simulator to generate realizations of the QoI, and there is no need to compute the high-dimensional gradient of the QoI. Finally, error bars can be derived for the estimation results reported by IRUQ.

  4. Gradually softening hydrogels for modeling hepatic stellate cell behavior during fibrosis regression.

    PubMed

    Caliari, Steven R; Perepelyuk, Maryna; Soulas, Elizabeth M; Lee, Gi Yun; Wells, Rebecca G; Burdick, Jason A

    2016-06-13

    The extracellular matrix (ECM) presents an e