A method for fitting regression splines with varying polynomial order in the linear mixed model.
Edwards, Lloyd J; Stewart, Paul W; MacDougall, James E; Helms, Ronald W
2006-02-15
The linear mixed model has become a widely used tool for longitudinal analysis of continuous variables. The use of regression splines in these models offers the analyst additional flexibility in the formulation of descriptive analyses, exploratory analyses and hypothesis-driven confirmatory analyses. We propose a method for fitting piecewise polynomial regression splines with varying polynomial order in the fixed effects and/or random effects of the linear mixed model. The polynomial segments are explicitly constrained by side conditions for continuity and some smoothness at the points where they join. By using a reparameterization of this explicitly constrained linear mixed model, an implicitly constrained linear mixed model is constructed that simplifies implementation of fixed-knot regression splines. The proposed approach is relatively simple, handles splines in one variable or multiple variables, and can be easily programmed using existing commercial software such as SAS or S-plus. The method is illustrated using two examples: an analysis of longitudinal viral load data from a study of subjects with acute HIV-1 infection and an analysis of 24-hour ambulatory blood pressure profiles.
ERIC Educational Resources Information Center
Ker, H. W.
2014-01-01
Multilevel data are very common in educational research. Hierarchical linear models/linear mixed-effects models (HLMs/LMEs) are often utilized to analyze multilevel data nowadays. This paper discusses the problems of utilizing ordinary regressions for modeling multilevel educational data, compare the data analytic results from three regression…
Correlation and simple linear regression.
Eberly, Lynn E
2007-01-01
This chapter highlights important steps in using correlation and simple linear regression to address scientific questions about the association of two continuous variables with each other. These steps include estimation and inference, assessing model fit, the connection between regression and ANOVA, and study design. Examples in microbiology are used throughout. This chapter provides a framework that is helpful in understanding more complex statistical techniques, such as multiple linear regression, linear mixed effects models, logistic regression, and proportional hazards regression.
Grajeda, Laura M; Ivanescu, Andrada; Saito, Mayuko; Crainiceanu, Ciprian; Jaganath, Devan; Gilman, Robert H; Crabtree, Jean E; Kelleher, Dermott; Cabrera, Lilia; Cama, Vitaliano; Checkley, William
2016-01-01
Childhood growth is a cornerstone of pediatric research. Statistical models need to consider individual trajectories to adequately describe growth outcomes. Specifically, well-defined longitudinal models are essential to characterize both population and subject-specific growth. Linear mixed-effect models with cubic regression splines can account for the nonlinearity of growth curves and provide reasonable estimators of population and subject-specific growth, velocity and acceleration. We provide a stepwise approach that builds from simple to complex models, and account for the intrinsic complexity of the data. We start with standard cubic splines regression models and build up to a model that includes subject-specific random intercepts and slopes and residual autocorrelation. We then compared cubic regression splines vis-à-vis linear piecewise splines, and with varying number of knots and positions. Statistical code is provided to ensure reproducibility and improve dissemination of methods. Models are applied to longitudinal height measurements in a cohort of 215 Peruvian children followed from birth until their fourth year of life. Unexplained variability, as measured by the variance of the regression model, was reduced from 7.34 when using ordinary least squares to 0.81 (p < 0.001) when using a linear mixed-effect models with random slopes and a first order continuous autoregressive error term. There was substantial heterogeneity in both the intercept (p < 0.001) and slopes (p < 0.001) of the individual growth trajectories. We also identified important serial correlation within the structure of the data (ρ = 0.66; 95 % CI 0.64 to 0.68; p < 0.001), which we modeled with a first order continuous autoregressive error term as evidenced by the variogram of the residuals and by a lack of association among residuals. The final model provides a parametric linear regression equation for both estimation and prediction of population- and individual-level growth in height. We show that cubic regression splines are superior to linear regression splines for the case of a small number of knots in both estimation and prediction with the full linear mixed effect model (AIC 19,352 vs. 19,598, respectively). While the regression parameters are more complex to interpret in the former, we argue that inference for any problem depends more on the estimated curve or differences in curves rather than the coefficients. Moreover, use of cubic regression splines provides biological meaningful growth velocity and acceleration curves despite increased complexity in coefficient interpretation. Through this stepwise approach, we provide a set of tools to model longitudinal childhood data for non-statisticians using linear mixed-effect models.
DOT National Transportation Integrated Search
2016-09-01
We consider the problem of solving mixed random linear equations with k components. This is the noiseless setting of mixed linear regression. The goal is to estimate multiple linear models from mixed samples in the case where the labels (which sample...
Modeling containment of large wildfires using generalized linear mixed-model analysis
Mark Finney; Isaac C. Grenfell; Charles W. McHugh
2009-01-01
Billions of dollars are spent annually in the United States to contain large wildland fires, but the factors contributing to suppression success remain poorly understood. We used a regression model (generalized linear mixed-model) to model containment probability of individual fires, assuming that containment was a repeated-measures problem (fixed effect) and...
Zhang, Hanze; Huang, Yangxin; Wang, Wei; Chen, Henian; Langland-Orban, Barbara
2017-01-01
In longitudinal AIDS studies, it is of interest to investigate the relationship between HIV viral load and CD4 cell counts, as well as the complicated time effect. Most of common models to analyze such complex longitudinal data are based on mean-regression, which fails to provide efficient estimates due to outliers and/or heavy tails. Quantile regression-based partially linear mixed-effects models, a special case of semiparametric models enjoying benefits of both parametric and nonparametric models, have the flexibility to monitor the viral dynamics nonparametrically and detect the varying CD4 effects parametrically at different quantiles of viral load. Meanwhile, it is critical to consider various data features of repeated measurements, including left-censoring due to a limit of detection, covariate measurement error, and asymmetric distribution. In this research, we first establish a Bayesian joint models that accounts for all these data features simultaneously in the framework of quantile regression-based partially linear mixed-effects models. The proposed models are applied to analyze the Multicenter AIDS Cohort Study (MACS) data. Simulation studies are also conducted to assess the performance of the proposed methods under different scenarios.
Confidence Intervals for Assessing Heterogeneity in Generalized Linear Mixed Models
ERIC Educational Resources Information Center
Wagler, Amy E.
2014-01-01
Generalized linear mixed models are frequently applied to data with clustered categorical outcomes. The effect of clustering on the response is often difficult to practically assess partly because it is reported on a scale on which comparisons with regression parameters are difficult to make. This article proposes confidence intervals for…
Spatial Assessment of Model Errors from Four Regression Techniques
Lianjun Zhang; Jeffrey H. Gove; Jeffrey H. Gove
2005-01-01
Fomst modelers have attempted to account for the spatial autocorrelations among trees in growth and yield models by applying alternative regression techniques such as linear mixed models (LMM), generalized additive models (GAM), and geographicalIy weighted regression (GWR). However, the model errors are commonly assessed using average errors across the entire study...
Tutorial on Biostatistics: Linear Regression Analysis of Continuous Correlated Eye Data.
Ying, Gui-Shuang; Maguire, Maureen G; Glynn, Robert; Rosner, Bernard
2017-04-01
To describe and demonstrate appropriate linear regression methods for analyzing correlated continuous eye data. We describe several approaches to regression analysis involving both eyes, including mixed effects and marginal models under various covariance structures to account for inter-eye correlation. We demonstrate, with SAS statistical software, applications in a study comparing baseline refractive error between one eye with choroidal neovascularization (CNV) and the unaffected fellow eye, and in a study determining factors associated with visual field in the elderly. When refractive error from both eyes were analyzed with standard linear regression without accounting for inter-eye correlation (adjusting for demographic and ocular covariates), the difference between eyes with CNV and fellow eyes was 0.15 diopters (D; 95% confidence interval, CI -0.03 to 0.32D, p = 0.10). Using a mixed effects model or a marginal model, the estimated difference was the same but with narrower 95% CI (0.01 to 0.28D, p = 0.03). Standard regression for visual field data from both eyes provided biased estimates of standard error (generally underestimated) and smaller p-values, while analysis of the worse eye provided larger p-values than mixed effects models and marginal models. In research involving both eyes, ignoring inter-eye correlation can lead to invalid inferences. Analysis using only right or left eyes is valid, but decreases power. Worse-eye analysis can provide less power and biased estimates of effect. Mixed effects or marginal models using the eye as the unit of analysis should be used to appropriately account for inter-eye correlation and maximize power and precision.
Montoye, Alexander H K; Begum, Munni; Henning, Zachary; Pfeiffer, Karin A
2017-02-01
This study had three purposes, all related to evaluating energy expenditure (EE) prediction accuracy from body-worn accelerometers: (1) compare linear regression to linear mixed models, (2) compare linear models to artificial neural network models, and (3) compare accuracy of accelerometers placed on the hip, thigh, and wrists. Forty individuals performed 13 activities in a 90 min semi-structured, laboratory-based protocol. Participants wore accelerometers on the right hip, right thigh, and both wrists and a portable metabolic analyzer (EE criterion). Four EE prediction models were developed for each accelerometer: linear regression, linear mixed, and two ANN models. EE prediction accuracy was assessed using correlations, root mean square error (RMSE), and bias and was compared across models and accelerometers using repeated-measures analysis of variance. For all accelerometer placements, there were no significant differences for correlations or RMSE between linear regression and linear mixed models (correlations: r = 0.71-0.88, RMSE: 1.11-1.61 METs; p > 0.05). For the thigh-worn accelerometer, there were no differences in correlations or RMSE between linear and ANN models (ANN-correlations: r = 0.89, RMSE: 1.07-1.08 METs. Linear models-correlations: r = 0.88, RMSE: 1.10-1.11 METs; p > 0.05). Conversely, one ANN had higher correlations and lower RMSE than both linear models for the hip (ANN-correlation: r = 0.88, RMSE: 1.12 METs. Linear models-correlations: r = 0.86, RMSE: 1.18-1.19 METs; p < 0.05), and both ANNs had higher correlations and lower RMSE than both linear models for the wrist-worn accelerometers (ANN-correlations: r = 0.82-0.84, RMSE: 1.26-1.32 METs. Linear models-correlations: r = 0.71-0.73, RMSE: 1.55-1.61 METs; p < 0.01). For studies using wrist-worn accelerometers, machine learning models offer a significant improvement in EE prediction accuracy over linear models. Conversely, linear models showed similar EE prediction accuracy to machine learning models for hip- and thigh-worn accelerometers and may be viable alternative modeling techniques for EE prediction for hip- or thigh-worn accelerometers.
Tutorial on Biostatistics: Linear Regression Analysis of Continuous Correlated Eye Data
Ying, Gui-shuang; Maguire, Maureen G; Glynn, Robert; Rosner, Bernard
2017-01-01
Purpose To describe and demonstrate appropriate linear regression methods for analyzing correlated continuous eye data. Methods We describe several approaches to regression analysis involving both eyes, including mixed effects and marginal models under various covariance structures to account for inter-eye correlation. We demonstrate, with SAS statistical software, applications in a study comparing baseline refractive error between one eye with choroidal neovascularization (CNV) and the unaffected fellow eye, and in a study determining factors associated with visual field data in the elderly. Results When refractive error from both eyes were analyzed with standard linear regression without accounting for inter-eye correlation (adjusting for demographic and ocular covariates), the difference between eyes with CNV and fellow eyes was 0.15 diopters (D; 95% confidence interval, CI −0.03 to 0.32D, P=0.10). Using a mixed effects model or a marginal model, the estimated difference was the same but with narrower 95% CI (0.01 to 0.28D, P=0.03). Standard regression for visual field data from both eyes provided biased estimates of standard error (generally underestimated) and smaller P-values, while analysis of the worse eye provided larger P-values than mixed effects models and marginal models. Conclusion In research involving both eyes, ignoring inter-eye correlation can lead to invalid inferences. Analysis using only right or left eyes is valid, but decreases power. Worse-eye analysis can provide less power and biased estimates of effect. Mixed effects or marginal models using the eye as the unit of analysis should be used to appropriately account for inter-eye correlation and maximize power and precision. PMID:28102741
A Bayesian Semiparametric Latent Variable Model for Mixed Responses
ERIC Educational Resources Information Center
Fahrmeir, Ludwig; Raach, Alexander
2007-01-01
In this paper we introduce a latent variable model (LVM) for mixed ordinal and continuous responses, where covariate effects on the continuous latent variables are modelled through a flexible semiparametric Gaussian regression model. We extend existing LVMs with the usual linear covariate effects by including nonparametric components for nonlinear…
Baqué, Michèle; Amendt, Jens
2013-01-01
Developmental data of juvenile blow flies (Diptera: Calliphoridae) are typically used to calculate the age of immature stages found on or around a corpse and thus to estimate a minimum post-mortem interval (PMI(min)). However, many of those data sets don't take into account that immature blow flies grow in a non-linear fashion. Linear models do not supply a sufficient reliability on age estimates and may even lead to an erroneous determination of the PMI(min). According to the Daubert standard and the need for improvements in forensic science, new statistic tools like smoothing methods and mixed models allow the modelling of non-linear relationships and expand the field of statistical analyses. The present study introduces into the background and application of these statistical techniques by analysing a model which describes the development of the forensically important blow fly Calliphora vicina at different temperatures. The comparison of three statistical methods (linear regression, generalised additive modelling and generalised additive mixed modelling) clearly demonstrates that only the latter provided regression parameters that reflect the data adequately. We focus explicitly on both the exploration of the data--to assure their quality and to show the importance of checking it carefully prior to conducting the statistical tests--and the validation of the resulting models. Hence, we present a common method for evaluating and testing forensic entomological data sets by using for the first time generalised additive mixed models.
Factors associated with parasite dominance in fishes from Brazil.
Amarante, Cristina Fernandes do; Tassinari, Wagner de Souza; Luque, Jose Luis; Pereira, Maria Julia Salim
2016-06-14
The present study used regression models to evaluate the existence of factors that may influence the numerical parasite dominance with an epidemiological approximation. A database including 3,746 fish specimens and their respective parasites were used to evaluate the relationship between parasite dominance and biotic characteristics inherent to the studied hosts and the parasite taxa. Multivariate, classical, and mixed effects linear regression models were fitted. The calculations were performed using R software (95% CI). In the fitting of the classical multiple linear regression model, freshwater and planktivorous fish species and body length, as well as the species of the taxa Trematoda, Monogenea, and Hirudinea, were associated with parasite dominance. However, the fitting of the mixed effects model showed that the body length of the host and the species of the taxa Nematoda, Trematoda, Monogenea, Hirudinea, and Crustacea were significantly associated with parasite dominance. Studies that consider specific biological aspects of the hosts and parasites should expand the knowledge regarding factors that influence the numerical dominance of fish in Brazil. The use of a mixed model shows, once again, the importance of the appropriate use of a model correlated with the characteristics of the data to obtain consistent results.
A comparison of methods for the analysis of binomial clustered outcomes in behavioral research.
Ferrari, Alberto; Comelli, Mario
2016-12-01
In behavioral research, data consisting of a per-subject proportion of "successes" and "failures" over a finite number of trials often arise. This clustered binary data are usually non-normally distributed, which can distort inference if the usual general linear model is applied and sample size is small. A number of more advanced methods is available, but they are often technically challenging and a comparative assessment of their performances in behavioral setups has not been performed. We studied the performances of some methods applicable to the analysis of proportions; namely linear regression, Poisson regression, beta-binomial regression and Generalized Linear Mixed Models (GLMMs). We report on a simulation study evaluating power and Type I error rate of these models in hypothetical scenarios met by behavioral researchers; plus, we describe results from the application of these methods on data from real experiments. Our results show that, while GLMMs are powerful instruments for the analysis of clustered binary outcomes, beta-binomial regression can outperform them in a range of scenarios. Linear regression gave results consistent with the nominal level of significance, but was overall less powerful. Poisson regression, instead, mostly led to anticonservative inference. GLMMs and beta-binomial regression are generally more powerful than linear regression; yet linear regression is robust to model misspecification in some conditions, whereas Poisson regression suffers heavily from violations of the assumptions when used to model proportion data. We conclude providing directions to behavioral scientists dealing with clustered binary data and small sample sizes. Copyright © 2016 Elsevier B.V. All rights reserved.
The PX-EM algorithm for fast stable fitting of Henderson's mixed model
Foulley, Jean-Louis; Van Dyk, David A
2000-01-01
This paper presents procedures for implementing the PX-EM algorithm of Liu, Rubin and Wu to compute REML estimates of variance covariance components in Henderson's linear mixed models. The class of models considered encompasses several correlated random factors having the same vector length e.g., as in random regression models for longitudinal data analysis and in sire-maternal grandsire models for genetic evaluation. Numerical examples are presented to illustrate the procedures. Much better results in terms of convergence characteristics (number of iterations and time required for convergence) are obtained for PX-EM relative to the basic EM algorithm in the random regression. PMID:14736399
Fischer, A; Friggens, N C; Berry, D P; Faverdin, P
2018-07-01
The ability to properly assess and accurately phenotype true differences in feed efficiency among dairy cows is key to the development of breeding programs for improving feed efficiency. The variability among individuals in feed efficiency is commonly characterised by the residual intake approach. Residual feed intake is represented by the residuals of a linear regression of intake on the corresponding quantities of the biological functions that consume (or release) energy. However, the residuals include both, model fitting and measurement errors as well as any variability in cow efficiency. The objective of this study was to isolate the individual animal variability in feed efficiency from the residual component. Two separate models were fitted, in one the standard residual energy intake (REI) was calculated as the residual of a multiple linear regression of lactation average net energy intake (NEI) on lactation average milk energy output, average metabolic BW, as well as lactation loss and gain of body condition score. In the other, a linear mixed model was used to simultaneously fit fixed linear regressions and random cow levels on the biological traits and intercept using fortnight repeated measures for the variables. This method split the predicted NEI in two parts: one quantifying the population mean intercept and coefficients, and one quantifying cow-specific deviations in the intercept and coefficients. The cow-specific part of predicted NEI was assumed to isolate true differences in feed efficiency among cows. NEI and associated energy expenditure phenotypes were available for the first 17 fortnights of lactation from 119 Holstein cows; all fed a constant energy-rich diet. Mixed models fitting cow-specific intercept and coefficients to different combinations of the aforementioned energy expenditure traits, calculated on a fortnightly basis, were compared. The variance of REI estimated with the lactation average model represented only 8% of the variance of measured NEI. Among all compared mixed models, the variance of the cow-specific part of predicted NEI represented between 53% and 59% of the variance of REI estimated from the lactation average model or between 4% and 5% of the variance of measured NEI. The remaining 41% to 47% of the variance of REI estimated with the lactation average model may therefore reflect model fitting errors or measurement errors. In conclusion, the use of a mixed model framework with cow-specific random regressions seems to be a promising method to isolate the cow-specific component of REI in dairy cows.
Regression analysis using dependent Polya trees.
Schörgendorfer, Angela; Branscum, Adam J
2013-11-30
Many commonly used models for linear regression analysis force overly simplistic shape and scale constraints on the residual structure of data. We propose a semiparametric Bayesian model for regression analysis that produces data-driven inference by using a new type of dependent Polya tree prior to model arbitrary residual distributions that are allowed to evolve across increasing levels of an ordinal covariate (e.g., time, in repeated measurement studies). By modeling residual distributions at consecutive covariate levels or time points using separate, but dependent Polya tree priors, distributional information is pooled while allowing for broad pliability to accommodate many types of changing residual distributions. We can use the proposed dependent residual structure in a wide range of regression settings, including fixed-effects and mixed-effects linear and nonlinear models for cross-sectional, prospective, and repeated measurement data. A simulation study illustrates the flexibility of our novel semiparametric regression model to accurately capture evolving residual distributions. In an application to immune development data on immunoglobulin G antibodies in children, our new model outperforms several contemporary semiparametric regression models based on a predictive model selection criterion. Copyright © 2013 John Wiley & Sons, Ltd.
Log-normal frailty models fitted as Poisson generalized linear mixed models.
Hirsch, Katharina; Wienke, Andreas; Kuss, Oliver
2016-12-01
The equivalence of a survival model with a piecewise constant baseline hazard function and a Poisson regression model has been known since decades. As shown in recent studies, this equivalence carries over to clustered survival data: A frailty model with a log-normal frailty term can be interpreted and estimated as a generalized linear mixed model with a binary response, a Poisson likelihood, and a specific offset. Proceeding this way, statistical theory and software for generalized linear mixed models are readily available for fitting frailty models. This gain in flexibility comes at the small price of (1) having to fix the number of pieces for the baseline hazard in advance and (2) having to "explode" the data set by the number of pieces. In this paper we extend the simulations of former studies by using a more realistic baseline hazard (Gompertz) and by comparing the model under consideration with competing models. Furthermore, the SAS macro %PCFrailty is introduced to apply the Poisson generalized linear mixed approach to frailty models. The simulations show good results for the shared frailty model. Our new %PCFrailty macro provides proper estimates, especially in case of 4 events per piece. The suggested Poisson generalized linear mixed approach for log-normal frailty models based on the %PCFrailty macro provides several advantages in the analysis of clustered survival data with respect to more flexible modelling of fixed and random effects, exact (in the sense of non-approximate) maximum likelihood estimation, and standard errors and different types of confidence intervals for all variance parameters. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Bowen, Stephen R; Chappell, Richard J; Bentzen, Søren M; Deveau, Michael A; Forrest, Lisa J; Jeraj, Robert
2012-01-01
Purpose To quantify associations between pre-radiotherapy and post-radiotherapy PET parameters via spatially resolved regression. Materials and methods Ten canine sinonasal cancer patients underwent PET/CT scans of [18F]FDG (FDGpre), [18F]FLT (FLTpre), and [61Cu]Cu-ATSM (Cu-ATSMpre). Following radiotherapy regimens of 50 Gy in 10 fractions, veterinary patients underwent FDG PET/CT scans at three months (FDGpost). Regression of standardized uptake values in baseline FDGpre, FLTpre and Cu-ATSMpre tumour voxels to those in FDGpost images was performed for linear, log-linear, generalized-linear and mixed-fit linear models. Goodness-of-fit in regression coefficients was assessed by R2. Hypothesis testing of coefficients over the patient population was performed. Results Multivariate linear model fits of FDGpre to FDGpost were significantly positive over the population (FDGpost~0.17 FDGpre, p=0.03), and classified slopes of RECIST non-responders and responders to be different (0.37 vs. 0.07, p=0.01). Generalized-linear model fits related FDGpre to FDGpost by a linear power law (FDGpost~FDGpre0.93, p<0.001). Univariate mixture model fits of FDGpre improved R2 from 0.17 to 0.52. Neither baseline FLT PET nor Cu-ATSM PET uptake contributed statistically significant multivariate regression coefficients. Conclusions Spatially resolved regression analysis indicates that pre-treatment FDG PET uptake is most strongly associated with three-month post-treatment FDG PET uptake in this patient population, though associations are histopathology-dependent. PMID:22682748
Mendez, Javier; Monleon-Getino, Antonio; Jofre, Juan; Lucena, Francisco
2017-10-01
The present study aimed to establish the kinetics of the appearance of coliphage plaques using the double agar layer titration technique to evaluate the feasibility of using traditional coliphage plaque forming unit (PFU) enumeration as a rapid quantification method. Repeated measurements of the appearance of plaques of coliphages titrated according to ISO 10705-2 at different times were analysed using non-linear mixed-effects regression to determine the most suitable model of their appearance kinetics. Although this model is adequate, to simplify its applicability two linear models were developed to predict the numbers of coliphages reliably, using the PFU counts as determined by the ISO after only 3 hours of incubation. One linear model, when the number of plaques detected was between 4 and 26 PFU after 3 hours, had a linear fit of: (1.48 × Counts 3 h + 1.97); and the other, values >26 PFU, had a fit of (1.18 × Counts 3 h + 2.95). If the number of plaques detected was <4 PFU after 3 hours, we recommend incubation for (18 ± 3) hours. The study indicates that the traditional coliphage plating technique has a reasonable potential to provide results in a single working day without the need to invest in additional laboratory equipment.
Mixed effect Poisson log-linear models for clinical and epidemiological sleep hypnogram data
Swihart, Bruce J.; Caffo, Brian S.; Crainiceanu, Ciprian; Punjabi, Naresh M.
2013-01-01
Bayesian Poisson log-linear multilevel models scalable to epidemiological studies are proposed to investigate population variability in sleep state transition rates. Hierarchical random effects are used to account for pairings of subjects and repeated measures within those subjects, as comparing diseased to non-diseased subjects while minimizing bias is of importance. Essentially, non-parametric piecewise constant hazards are estimated and smoothed, allowing for time-varying covariates and segment of the night comparisons. The Bayesian Poisson regression is justified through a re-derivation of a classical algebraic likelihood equivalence of Poisson regression with a log(time) offset and survival regression assuming exponentially distributed survival times. Such re-derivation allows synthesis of two methods currently used to analyze sleep transition phenomena: stratified multi-state proportional hazards models and log-linear models with GEE for transition counts. An example data set from the Sleep Heart Health Study is analyzed. Supplementary material includes the analyzed data set as well as the code for a reproducible analysis. PMID:22241689
Fokkema, M; Smits, N; Zeileis, A; Hothorn, T; Kelderman, H
2017-10-25
Identification of subgroups of patients for whom treatment A is more effective than treatment B, and vice versa, is of key importance to the development of personalized medicine. Tree-based algorithms are helpful tools for the detection of such interactions, but none of the available algorithms allow for taking into account clustered or nested dataset structures, which are particularly common in psychological research. Therefore, we propose the generalized linear mixed-effects model tree (GLMM tree) algorithm, which allows for the detection of treatment-subgroup interactions, while accounting for the clustered structure of a dataset. The algorithm uses model-based recursive partitioning to detect treatment-subgroup interactions, and a GLMM to estimate the random-effects parameters. In a simulation study, GLMM trees show higher accuracy in recovering treatment-subgroup interactions, higher predictive accuracy, and lower type II error rates than linear-model-based recursive partitioning and mixed-effects regression trees. Also, GLMM trees show somewhat higher predictive accuracy than linear mixed-effects models with pre-specified interaction effects, on average. We illustrate the application of GLMM trees on an individual patient-level data meta-analysis on treatments for depression. We conclude that GLMM trees are a promising exploratory tool for the detection of treatment-subgroup interactions in clustered datasets.
Alan K. Swanson; Solomon Z. Dobrowski; Andrew O. Finley; James H. Thorne; Michael K. Schwartz
2013-01-01
The uncertainty associated with species distribution model (SDM) projections is poorly characterized, despite its potential value to decision makers. Error estimates from most modelling techniques have been shown to be biased due to their failure to account for spatial autocorrelation (SAC) of residual error. Generalized linear mixed models (GLMM) have the ability to...
Zhang, J; Feng, J-Y; Ni, Y-L; Wen, Y-J; Niu, Y; Tamba, C L; Yue, C; Song, Q; Zhang, Y-M
2017-06-01
Multilocus genome-wide association studies (GWAS) have become the state-of-the-art procedure to identify quantitative trait nucleotides (QTNs) associated with complex traits. However, implementation of multilocus model in GWAS is still difficult. In this study, we integrated least angle regression with empirical Bayes to perform multilocus GWAS under polygenic background control. We used an algorithm of model transformation that whitened the covariance matrix of the polygenic matrix K and environmental noise. Markers on one chromosome were included simultaneously in a multilocus model and least angle regression was used to select the most potentially associated single-nucleotide polymorphisms (SNPs), whereas the markers on the other chromosomes were used to calculate kinship matrix as polygenic background control. The selected SNPs in multilocus model were further detected for their association with the trait by empirical Bayes and likelihood ratio test. We herein refer to this method as the pLARmEB (polygenic-background-control-based least angle regression plus empirical Bayes). Results from simulation studies showed that pLARmEB was more powerful in QTN detection and more accurate in QTN effect estimation, had less false positive rate and required less computing time than Bayesian hierarchical generalized linear model, efficient mixed model association (EMMA) and least angle regression plus empirical Bayes. pLARmEB, multilocus random-SNP-effect mixed linear model and fast multilocus random-SNP-effect EMMA methods had almost equal power of QTN detection in simulation experiments. However, only pLARmEB identified 48 previously reported genes for 7 flowering time-related traits in Arabidopsis thaliana.
Cook, James P; Mahajan, Anubha; Morris, Andrew P
2017-02-01
Linear mixed models are increasingly used for the analysis of genome-wide association studies (GWAS) of binary phenotypes because they can efficiently and robustly account for population stratification and relatedness through inclusion of random effects for a genetic relationship matrix. However, the utility of linear (mixed) models in the context of meta-analysis of GWAS of binary phenotypes has not been previously explored. In this investigation, we present simulations to compare the performance of linear and logistic regression models under alternative weighting schemes in a fixed-effects meta-analysis framework, considering designs that incorporate variable case-control imbalance, confounding factors and population stratification. Our results demonstrate that linear models can be used for meta-analysis of GWAS of binary phenotypes, without loss of power, even in the presence of extreme case-control imbalance, provided that one of the following schemes is used: (i) effective sample size weighting of Z-scores or (ii) inverse-variance weighting of allelic effect sizes after conversion onto the log-odds scale. Our conclusions thus provide essential recommendations for the development of robust protocols for meta-analysis of binary phenotypes with linear models.
Correcting for population structure and kinship using the linear mixed model: theory and extensions.
Hoffman, Gabriel E
2013-01-01
Population structure and kinship are widespread confounding factors in genome-wide association studies (GWAS). It has been standard practice to include principal components of the genotypes in a regression model in order to account for population structure. More recently, the linear mixed model (LMM) has emerged as a powerful method for simultaneously accounting for population structure and kinship. The statistical theory underlying the differences in empirical performance between modeling principal components as fixed versus random effects has not been thoroughly examined. We undertake an analysis to formalize the relationship between these widely used methods and elucidate the statistical properties of each. Moreover, we introduce a new statistic, effective degrees of freedom, that serves as a metric of model complexity and a novel low rank linear mixed model (LRLMM) to learn the dimensionality of the correction for population structure and kinship, and we assess its performance through simulations. A comparison of the results of LRLMM and a standard LMM analysis applied to GWAS data from the Multi-Ethnic Study of Atherosclerosis (MESA) illustrates how our theoretical results translate into empirical properties of the mixed model. Finally, the analysis demonstrates the ability of the LRLMM to substantially boost the strength of an association for HDL cholesterol in Europeans.
GWAS with longitudinal phenotypes: performance of approximate procedures
Sikorska, Karolina; Montazeri, Nahid Mostafavi; Uitterlinden, André; Rivadeneira, Fernando; Eilers, Paul HC; Lesaffre, Emmanuel
2015-01-01
Analysis of genome-wide association studies with longitudinal data using standard procedures, such as linear mixed model (LMM) fitting, leads to discouragingly long computation times. There is a need to speed up the computations significantly. In our previous work (Sikorska et al: Fast linear mixed model computations for genome-wide association studies with longitudinal data. Stat Med 2012; 32.1: 165–180), we proposed the conditional two-step (CTS) approach as a fast method providing an approximation to the P-value for the longitudinal single-nucleotide polymorphism (SNP) effect. In the first step a reduced conditional LMM is fit, omitting all the SNP terms. In the second step, the estimated random slopes are regressed on SNPs. The CTS has been applied to the bone mineral density data from the Rotterdam Study and proved to work very well even in unbalanced situations. In another article (Sikorska et al: GWAS on your notebook: fast semi-parallel linear and logistic regression for genome-wide association studies. BMC Bioinformatics 2013; 14: 166), we suggested semi-parallel computations, greatly speeding up fitting many linear regressions. Combining CTS with fast linear regression reduces the computation time from several weeks to a few minutes on a single computer. Here, we explore further the properties of the CTS both analytically and by simulations. We investigate the performance of our proposal in comparison with a related but different approach, the two-step procedure. It is analytically shown that for the balanced case, under mild assumptions, the P-value provided by the CTS is the same as from the LMM. For unbalanced data and in realistic situations, simulations show that the CTS method does not inflate the type I error rate and implies only a minimal loss of power. PMID:25712081
Dong, Ling-Bo; Liu, Zhao-Gang; Li, Feng-Ri; Jiang, Li-Chun
2013-09-01
By using the branch analysis data of 955 standard branches from 60 sampled trees in 12 sampling plots of Pinus koraiensis plantation in Mengjiagang Forest Farm in Heilongjiang Province of Northeast China, and based on the linear mixed-effect model theory and methods, the models for predicting branch variables, including primary branch diameter, length, and angle, were developed. Considering tree effect, the MIXED module of SAS software was used to fit the prediction models. The results indicated that the fitting precision of the models could be improved by choosing appropriate random-effect parameters and variance-covariance structure. Then, the correlation structures including complex symmetry structure (CS), first-order autoregressive structure [AR(1)], and first-order autoregressive and moving average structure [ARMA(1,1)] were added to the optimal branch size mixed-effect model. The AR(1) improved the fitting precision of branch diameter and length mixed-effect model significantly, but all the three structures didn't improve the precision of branch angle mixed-effect model. In order to describe the heteroscedasticity during building mixed-effect model, the CF1 and CF2 functions were added to the branch mixed-effect model. CF1 function improved the fitting effect of branch angle mixed model significantly, whereas CF2 function improved the fitting effect of branch diameter and length mixed model significantly. Model validation confirmed that the mixed-effect model could improve the precision of prediction, as compare to the traditional regression model for the branch size prediction of Pinus koraiensis plantation.
NASA Astrophysics Data System (ADS)
Hapugoda, J. C.; Sooriyarachchi, M. R.
2017-09-01
Survival time of patients with a disease and the incidence of that particular disease (count) is frequently observed in medical studies with the data of a clustered nature. In many cases, though, the survival times and the count can be correlated in a way that, diseases that occur rarely could have shorter survival times or vice versa. Due to this fact, joint modelling of these two variables will provide interesting and certainly improved results than modelling these separately. Authors have previously proposed a methodology using Generalized Linear Mixed Models (GLMM) by joining the Discrete Time Hazard model with the Poisson Regression model to jointly model survival and count model. As Aritificial Neural Network (ANN) has become a most powerful computational tool to model complex non-linear systems, it was proposed to develop a new joint model of survival and count of Dengue patients of Sri Lanka by using that approach. Thus, the objective of this study is to develop a model using ANN approach and compare the results with the previously developed GLMM model. As the response variables are continuous in nature, Generalized Regression Neural Network (GRNN) approach was adopted to model the data. To compare the model fit, measures such as root mean square error (RMSE), absolute mean error (AME) and correlation coefficient (R) were used. The measures indicate the GRNN model fits the data better than the GLMM model.
Competing regression models for longitudinal data.
Alencar, Airlane P; Singer, Julio M; Rocha, Francisco Marcelo M
2012-03-01
The choice of an appropriate family of linear models for the analysis of longitudinal data is often a matter of concern for practitioners. To attenuate such difficulties, we discuss some issues that emerge when analyzing this type of data via a practical example involving pretest-posttest longitudinal data. In particular, we consider log-normal linear mixed models (LNLMM), generalized linear mixed models (GLMM), and models based on generalized estimating equations (GEE). We show how some special features of the data, like a nonconstant coefficient of variation, may be handled in the three approaches and evaluate their performance with respect to the magnitude of standard errors of interpretable and comparable parameters. We also show how different diagnostic tools may be employed to identify outliers and comment on available software. We conclude by noting that the results are similar, but that GEE-based models may be preferable when the goal is to compare the marginal expected responses. © 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Wu, Lingtao; Lord, Dominique
2017-05-01
This study further examined the use of regression models for developing crash modification factors (CMFs), specifically focusing on the misspecification in the link function. The primary objectives were to validate the accuracy of CMFs derived from the commonly used regression models (i.e., generalized linear models or GLMs with additive linear link functions) when some of the variables have nonlinear relationships and quantify the amount of bias as a function of the nonlinearity. Using the concept of artificial realistic data, various linear and nonlinear crash modification functions (CM-Functions) were assumed for three variables. Crash counts were randomly generated based on these CM-Functions. CMFs were then derived from regression models for three different scenarios. The results were compared with the assumed true values. The main findings are summarized as follows: (1) when some variables have nonlinear relationships with crash risk, the CMFs for these variables derived from the commonly used GLMs are all biased, especially around areas away from the baseline conditions (e.g., boundary areas); (2) with the increase in nonlinearity (i.e., nonlinear relationship becomes stronger), the bias becomes more significant; (3) the quality of CMFs for other variables having linear relationships can be influenced when mixed with those having nonlinear relationships, but the accuracy may still be acceptable; and (4) the misuse of the link function for one or more variables can also lead to biased estimates for other parameters. This study raised the importance of the link function when using regression models for developing CMFs. Copyright © 2017 Elsevier Ltd. All rights reserved.
Analysis and generation of groundwater concentration time series
NASA Astrophysics Data System (ADS)
Crăciun, Maria; Vamoş, Călin; Suciu, Nicolae
2018-01-01
Concentration time series are provided by simulated concentrations of a nonreactive solute transported in groundwater, integrated over the transverse direction of a two-dimensional computational domain and recorded at the plume center of mass. The analysis of a statistical ensemble of time series reveals subtle features that are not captured by the first two moments which characterize the approximate Gaussian distribution of the two-dimensional concentration fields. The concentration time series exhibit a complex preasymptotic behavior driven by a nonstationary trend and correlated fluctuations with time-variable amplitude. Time series with almost the same statistics are generated by successively adding to a time-dependent trend a sum of linear regression terms, accounting for correlations between fluctuations around the trend and their increments in time, and terms of an amplitude modulated autoregressive noise of order one with time-varying parameter. The algorithm generalizes mixing models used in probability density function approaches. The well-known interaction by exchange with the mean mixing model is a special case consisting of a linear regression with constant coefficients.
Mixed models, linear dependency, and identification in age-period-cohort models.
O'Brien, Robert M
2017-07-20
This paper examines the identification problem in age-period-cohort models that use either linear or categorically coded ages, periods, and cohorts or combinations of these parameterizations. These models are not identified using the traditional fixed effect regression model approach because of a linear dependency between the ages, periods, and cohorts. However, these models can be identified if the researcher introduces a single just identifying constraint on the model coefficients. The problem with such constraints is that the results can differ substantially depending on the constraint chosen. Somewhat surprisingly, age-period-cohort models that specify one or more of ages and/or periods and/or cohorts as random effects are identified. This is the case without introducing an additional constraint. I label this identification as statistical model identification and show how statistical model identification comes about in mixed models and why which effects are treated as fixed and which are treated as random can substantially change the estimates of the age, period, and cohort effects. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
The Weight of Euro Coins: Its Distribution Might Not Be as Normal as You Would Expect
ERIC Educational Resources Information Center
Shkedy, Ziv; Aerts, Marc; Callaert, Herman
2006-01-01
Classical regression models, ANOVA models and linear mixed models are just three examples (out of many) in which the normal distribution of the response is an essential assumption of the model. In this paper we use a dataset of 2000 euro coins containing information (up to the milligram) about the weight of each coin, to illustrate that the…
Solving large mixed linear models using preconditioned conjugate gradient iteration.
Strandén, I; Lidauer, M
1999-12-01
Continuous evaluation of dairy cattle with a random regression test-day model requires a fast solving method and algorithm. A new computing technique feasible in Jacobi and conjugate gradient based iterative methods using iteration on data is presented. In the new computing technique, the calculations in multiplication of a vector by a matrix were recorded to three steps instead of the commonly used two steps. The three-step method was implemented in a general mixed linear model program that used preconditioned conjugate gradient iteration. Performance of this program in comparison to other general solving programs was assessed via estimation of breeding values using univariate, multivariate, and random regression test-day models. Central processing unit time per iteration with the new three-step technique was, at best, one-third that needed with the old technique. Performance was best with the test-day model, which was the largest and most complex model used. The new program did well in comparison to other general software. Programs keeping the mixed model equations in random access memory required at least 20 and 435% more time to solve the univariate and multivariate animal models, respectively. Computations of the second best iteration on data took approximately three and five times longer for the animal and test-day models, respectively, than did the new program. Good performance was due to fast computing time per iteration and quick convergence to the final solutions. Use of preconditioned conjugate gradient based methods in solving large breeding value problems is supported by our findings.
Coupé, Christophe
2018-01-01
As statistical approaches are getting increasingly used in linguistics, attention must be paid to the choice of methods and algorithms used. This is especially true since they require assumptions to be satisfied to provide valid results, and because scientific articles still often fall short of reporting whether such assumptions are met. Progress is being, however, made in various directions, one of them being the introduction of techniques able to model data that cannot be properly analyzed with simpler linear regression models. We report recent advances in statistical modeling in linguistics. We first describe linear mixed-effects regression models (LMM), which address grouping of observations, and generalized linear mixed-effects models (GLMM), which offer a family of distributions for the dependent variable. Generalized additive models (GAM) are then introduced, which allow modeling non-linear parametric or non-parametric relationships between the dependent variable and the predictors. We then highlight the possibilities offered by generalized additive models for location, scale, and shape (GAMLSS). We explain how they make it possible to go beyond common distributions, such as Gaussian or Poisson, and offer the appropriate inferential framework to account for ‘difficult’ variables such as count data with strong overdispersion. We also demonstrate how they offer interesting perspectives on data when not only the mean of the dependent variable is modeled, but also its variance, skewness, and kurtosis. As an illustration, the case of phonemic inventory size is analyzed throughout the article. For over 1,500 languages, we consider as predictors the number of speakers, the distance from Africa, an estimation of the intensity of language contact, and linguistic relationships. We discuss the use of random effects to account for genealogical relationships, the choice of appropriate distributions to model count data, and non-linear relationships. Relying on GAMLSS, we assess a range of candidate distributions, including the Sichel, Delaporte, Box-Cox Green and Cole, and Box-Cox t distributions. We find that the Box-Cox t distribution, with appropriate modeling of its parameters, best fits the conditional distribution of phonemic inventory size. We finally discuss the specificities of phoneme counts, weak effects, and how GAMLSS should be considered for other linguistic variables. PMID:29713298
Coupé, Christophe
2018-01-01
As statistical approaches are getting increasingly used in linguistics, attention must be paid to the choice of methods and algorithms used. This is especially true since they require assumptions to be satisfied to provide valid results, and because scientific articles still often fall short of reporting whether such assumptions are met. Progress is being, however, made in various directions, one of them being the introduction of techniques able to model data that cannot be properly analyzed with simpler linear regression models. We report recent advances in statistical modeling in linguistics. We first describe linear mixed-effects regression models (LMM), which address grouping of observations, and generalized linear mixed-effects models (GLMM), which offer a family of distributions for the dependent variable. Generalized additive models (GAM) are then introduced, which allow modeling non-linear parametric or non-parametric relationships between the dependent variable and the predictors. We then highlight the possibilities offered by generalized additive models for location, scale, and shape (GAMLSS). We explain how they make it possible to go beyond common distributions, such as Gaussian or Poisson, and offer the appropriate inferential framework to account for 'difficult' variables such as count data with strong overdispersion. We also demonstrate how they offer interesting perspectives on data when not only the mean of the dependent variable is modeled, but also its variance, skewness, and kurtosis. As an illustration, the case of phonemic inventory size is analyzed throughout the article. For over 1,500 languages, we consider as predictors the number of speakers, the distance from Africa, an estimation of the intensity of language contact, and linguistic relationships. We discuss the use of random effects to account for genealogical relationships, the choice of appropriate distributions to model count data, and non-linear relationships. Relying on GAMLSS, we assess a range of candidate distributions, including the Sichel, Delaporte, Box-Cox Green and Cole, and Box-Cox t distributions. We find that the Box-Cox t distribution, with appropriate modeling of its parameters, best fits the conditional distribution of phonemic inventory size. We finally discuss the specificities of phoneme counts, weak effects, and how GAMLSS should be considered for other linguistic variables.
The Bayesian group lasso for confounded spatial data
Hefley, Trevor J.; Hooten, Mevin B.; Hanks, Ephraim M.; Russell, Robin E.; Walsh, Daniel P.
2017-01-01
Generalized linear mixed models for spatial processes are widely used in applied statistics. In many applications of the spatial generalized linear mixed model (SGLMM), the goal is to obtain inference about regression coefficients while achieving optimal predictive ability. When implementing the SGLMM, multicollinearity among covariates and the spatial random effects can make computation challenging and influence inference. We present a Bayesian group lasso prior with a single tuning parameter that can be chosen to optimize predictive ability of the SGLMM and jointly regularize the regression coefficients and spatial random effect. We implement the group lasso SGLMM using efficient Markov chain Monte Carlo (MCMC) algorithms and demonstrate how multicollinearity among covariates and the spatial random effect can be monitored as a derived quantity. To test our method, we compared several parameterizations of the SGLMM using simulated data and two examples from plant ecology and disease ecology. In all examples, problematic levels multicollinearity occurred and influenced sampling efficiency and inference. We found that the group lasso prior resulted in roughly twice the effective sample size for MCMC samples of regression coefficients and can have higher and less variable predictive accuracy based on out-of-sample data when compared to the standard SGLMM.
A Tutorial on Multilevel Survival Analysis: Methods, Models and Applications
Austin, Peter C.
2017-01-01
Summary Data that have a multilevel structure occur frequently across a range of disciplines, including epidemiology, health services research, public health, education and sociology. We describe three families of regression models for the analysis of multilevel survival data. First, Cox proportional hazards models with mixed effects incorporate cluster-specific random effects that modify the baseline hazard function. Second, piecewise exponential survival models partition the duration of follow-up into mutually exclusive intervals and fit a model that assumes that the hazard function is constant within each interval. This is equivalent to a Poisson regression model that incorporates the duration of exposure within each interval. By incorporating cluster-specific random effects, generalised linear mixed models can be used to analyse these data. Third, after partitioning the duration of follow-up into mutually exclusive intervals, one can use discrete time survival models that use a complementary log–log generalised linear model to model the occurrence of the outcome of interest within each interval. Random effects can be incorporated to account for within-cluster homogeneity in outcomes. We illustrate the application of these methods using data consisting of patients hospitalised with a heart attack. We illustrate the application of these methods using three statistical programming languages (R, SAS and Stata). PMID:29307954
A Tutorial on Multilevel Survival Analysis: Methods, Models and Applications.
Austin, Peter C
2017-08-01
Data that have a multilevel structure occur frequently across a range of disciplines, including epidemiology, health services research, public health, education and sociology. We describe three families of regression models for the analysis of multilevel survival data. First, Cox proportional hazards models with mixed effects incorporate cluster-specific random effects that modify the baseline hazard function. Second, piecewise exponential survival models partition the duration of follow-up into mutually exclusive intervals and fit a model that assumes that the hazard function is constant within each interval. This is equivalent to a Poisson regression model that incorporates the duration of exposure within each interval. By incorporating cluster-specific random effects, generalised linear mixed models can be used to analyse these data. Third, after partitioning the duration of follow-up into mutually exclusive intervals, one can use discrete time survival models that use a complementary log-log generalised linear model to model the occurrence of the outcome of interest within each interval. Random effects can be incorporated to account for within-cluster homogeneity in outcomes. We illustrate the application of these methods using data consisting of patients hospitalised with a heart attack. We illustrate the application of these methods using three statistical programming languages (R, SAS and Stata).
Penalized nonparametric scalar-on-function regression via principal coordinates
Reiss, Philip T.; Miller, David L.; Wu, Pei-Shien; Hua, Wen-Yu
2016-01-01
A number of classical approaches to nonparametric regression have recently been extended to the case of functional predictors. This paper introduces a new method of this type, which extends intermediate-rank penalized smoothing to scalar-on-function regression. In the proposed method, which we call principal coordinate ridge regression, one regresses the response on leading principal coordinates defined by a relevant distance among the functional predictors, while applying a ridge penalty. Our publicly available implementation, based on generalized additive modeling software, allows for fast optimal tuning parameter selection and for extensions to multiple functional predictors, exponential family-valued responses, and mixed-effects models. In an application to signature verification data, principal coordinate ridge regression, with dynamic time warping distance used to define the principal coordinates, is shown to outperform a functional generalized linear model. PMID:29217963
Malloy, Elizabeth J; Morris, Jeffrey S; Adar, Sara D; Suh, Helen; Gold, Diane R; Coull, Brent A
2010-07-01
Frequently, exposure data are measured over time on a grid of discrete values that collectively define a functional observation. In many applications, researchers are interested in using these measurements as covariates to predict a scalar response in a regression setting, with interest focusing on the most biologically relevant time window of exposure. One example is in panel studies of the health effects of particulate matter (PM), where particle levels are measured over time. In such studies, there are many more values of the functional data than observations in the data set so that regularization of the corresponding functional regression coefficient is necessary for estimation. Additional issues in this setting are the possibility of exposure measurement error and the need to incorporate additional potential confounders, such as meteorological or co-pollutant measures, that themselves may have effects that vary over time. To accommodate all these features, we develop wavelet-based linear mixed distributed lag models that incorporate repeated measures of functional data as covariates into a linear mixed model. A Bayesian approach to model fitting uses wavelet shrinkage to regularize functional coefficients. We show that, as long as the exposure error induces fine-scale variability in the functional exposure profile and the distributed lag function representing the exposure effect varies smoothly in time, the model corrects for the exposure measurement error without further adjustment. Both these conditions are likely to hold in the environmental applications we consider. We examine properties of the method using simulations and apply the method to data from a study examining the association between PM, measured as hourly averages for 1-7 days, and markers of acute systemic inflammation. We use the method to fully control for the effects of confounding by other time-varying predictors, such as temperature and co-pollutants.
Huff, Andrew G; Hodges, James S; Kennedy, Shaun P; Kircher, Amy
2015-08-01
To protect and secure food resources for the United States, it is crucial to have a method to compare food systems' criticality. In 2007, the U.S. government funded development of the Food and Agriculture Sector Criticality Assessment Tool (FASCAT) to determine which food and agriculture systems were most critical to the nation. FASCAT was developed in a collaborative process involving government officials and food industry subject matter experts (SMEs). After development, data were collected using FASCAT to quantify threats, vulnerabilities, consequences, and the impacts on the United States from failure of evaluated food and agriculture systems. To examine FASCAT's utility, linear regression models were used to determine: (1) which groups of questions posed in FASCAT were better predictors of cumulative criticality scores; (2) whether the items included in FASCAT's criticality method or the smaller subset of FASCAT items included in DHS's risk analysis method predicted similar criticality scores. Akaike's information criterion was used to determine which regression models best described criticality, and a mixed linear model was used to shrink estimates of criticality for individual food and agriculture systems. The results indicated that: (1) some of the questions used in FASCAT strongly predicted food or agriculture system criticality; (2) the FASCAT criticality formula was a stronger predictor of criticality compared to the DHS risk formula; (3) the cumulative criticality formula predicted criticality more strongly than weighted criticality formula; and (4) the mixed linear regression model did not change the rank-order of food and agriculture system criticality to a large degree. © 2015 Society for Risk Analysis.
NASA Astrophysics Data System (ADS)
Fernández-Manso, O.; Fernández-Manso, A.; Quintano, C.
2014-09-01
Aboveground biomass (AGB) estimation from optical satellite data is usually based on regression models of original or synthetic bands. To overcome the poor relation between AGB and spectral bands due to mixed-pixels when a medium spatial resolution sensor is considered, we propose to base the AGB estimation on fraction images from Linear Spectral Mixture Analysis (LSMA). Our study area is a managed Mediterranean pine woodland (Pinus pinaster Ait.) in central Spain. A total of 1033 circular field plots were used to estimate AGB from Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) optical data. We applied Pearson correlation statistics and stepwise multiple regression to identify suitable predictors from the set of variables of original bands, fraction imagery, Normalized Difference Vegetation Index and Tasselled Cap components. Four linear models and one nonlinear model were tested. A linear combination of ASTER band 2 (red, 0.630-0.690 μm), band 8 (short wave infrared 5, 2.295-2.365 μm) and green vegetation fraction (from LSMA) was the best AGB predictor (Radj2=0.632, the root-mean-squared error of estimated AGB was 13.3 Mg ha-1 (or 37.7%), resulting from cross-validation), rather than other combinations of the above cited independent variables. Results indicated that using ASTER fraction images in regression models improves the AGB estimation in Mediterranean pine forests. The spatial distribution of the estimated AGB, based on a multiple linear regression model, may be used as baseline information for forest managers in future studies, such as quantifying the regional carbon budget, fuel accumulation or monitoring of management practices.
Hao, Xu; Yujun, Sun; Xinjie, Wang; Jin, Wang; Yao, Fu
2015-01-01
A multiple linear model was developed for individual tree crown width of Cunninghamia lanceolata (Lamb.) Hook in Fujian province, southeast China. Data were obtained from 55 sample plots of pure China-fir plantation stands. An Ordinary Linear Least Squares (OLS) regression was used to establish the crown width model. To adjust for correlations between observations from the same sample plots, we developed one level linear mixed-effects (LME) models based on the multiple linear model, which take into account the random effects of plots. The best random effects combinations for the LME models were determined by the Akaike's information criterion, the Bayesian information criterion and the -2logarithm likelihood. Heteroscedasticity was reduced by three residual variance functions: the power function, the exponential function and the constant plus power function. The spatial correlation was modeled by three correlation structures: the first-order autoregressive structure [AR(1)], a combination of first-order autoregressive and moving average structures [ARMA(1,1)], and the compound symmetry structure (CS). Then, the LME model was compared to the multiple linear model using the absolute mean residual (AMR), the root mean square error (RMSE), and the adjusted coefficient of determination (adj-R2). For individual tree crown width models, the one level LME model showed the best performance. An independent dataset was used to test the performance of the models and to demonstrate the advantage of calibrating LME models.
Fitzsimmons, Eric J; Kvam, Vanessa; Souleyrette, Reginald R; Nambisan, Shashi S; Bonett, Douglas G
2013-01-01
Despite recent improvements in highway safety in the United States, serious crashes on curves remain a significant problem. To assist in better understanding causal factors leading to this problem, this article presents and demonstrates a methodology for collection and analysis of vehicle trajectory and speed data for rural and urban curves using Z-configured road tubes. For a large number of vehicle observations at 2 horizontal curves located in Dexter and Ames, Iowa, the article develops vehicle speed and lateral position prediction models for multiple points along these curves. Linear mixed-effects models were used to predict vehicle lateral position and speed along the curves as explained by operational, vehicle, and environmental variables. Behavior was visually represented for an identified subset of "risky" drivers. Linear mixed-effect regression models provided the means to predict vehicle speed and lateral position while taking into account repeated observations of the same vehicle along horizontal curves. Speed and lateral position at point of entry were observed to influence trajectory and speed profiles. Rural horizontal curve site models are presented that indicate that the following variables were significant and influenced both vehicle speed and lateral position: time of day, direction of travel (inside or outside lane), and type of vehicle.
González, Juan R; Carrasco, Josep L; Armengol, Lluís; Villatoro, Sergi; Jover, Lluís; Yasui, Yutaka; Estivill, Xavier
2008-01-01
Background MLPA method is a potentially useful semi-quantitative method to detect copy number alterations in targeted regions. In this paper, we propose a method for the normalization procedure based on a non-linear mixed-model, as well as a new approach for determining the statistical significance of altered probes based on linear mixed-model. This method establishes a threshold by using different tolerance intervals that accommodates the specific random error variability observed in each test sample. Results Through simulation studies we have shown that our proposed method outperforms two existing methods that are based on simple threshold rules or iterative regression. We have illustrated the method using a controlled MLPA assay in which targeted regions are variable in copy number in individuals suffering from different disorders such as Prader-Willi, DiGeorge or Autism showing the best performace. Conclusion Using the proposed mixed-model, we are able to determine thresholds to decide whether a region is altered. These threholds are specific for each individual, incorporating experimental variability, resulting in improved sensitivity and specificity as the examples with real data have revealed. PMID:18522760
Using existing case-mix methods to fund trauma cases.
Monakova, Julia; Blais, Irene; Botz, Charles; Chechulin, Yuriy; Picciano, Gino; Basinski, Antoni
2010-01-01
Policymakers frequently face the need to increase funding in isolated and frequently heterogeneous (clinically and in terms of resource consumption) patient subpopulations. This article presents a methodologic solution for testing the appropriateness of using existing grouping and weighting methodologies for funding subsets of patients in the scenario where a case-mix approach is preferable to a flat-rate based payment system. Using as an example the subpopulation of trauma cases of Ontario lead trauma hospitals, the statistical techniques of linear and nonlinear regression models, regression trees, and spline models were applied to examine the fit of the existing case-mix groups and reference weights for the trauma cases. The analyses demonstrated that for funding Ontario trauma cases, the existing case-mix systems can form the basis for rational and equitable hospital funding, decreasing the need to develop a different grouper for this subset of patients. This study confirmed that Injury Severity Score is a poor predictor of costs for trauma patients. Although our analysis used the Canadian case-mix classification system and cost weights, the demonstrated concept of using existing case-mix systems to develop funding rates for specific subsets of patient populations may be applicable internationally.
Goeyvaerts, Nele; Leuridan, Elke; Faes, Christel; Van Damme, Pierre; Hens, Niel
2015-09-10
Biomedical studies often generate repeated measures of multiple outcomes on a set of subjects. It may be of interest to develop a biologically intuitive model for the joint evolution of these outcomes while assessing inter-subject heterogeneity. Even though it is common for biological processes to entail non-linear relationships, examples of multivariate non-linear mixed models (MNMMs) are still fairly rare. We contribute to this area by jointly analyzing the maternal antibody decay for measles, mumps, rubella, and varicella, allowing for a different non-linear decay model for each infectious disease. We present a general modeling framework to analyze multivariate non-linear longitudinal profiles subject to censoring, by combining multivariate random effects, non-linear growth and Tobit regression. We explore the hypothesis of a common infant-specific mechanism underlying maternal immunity using a pairwise correlated random-effects approach and evaluating different correlation matrix structures. The implied marginal correlation between maternal antibody levels is estimated using simulations. The mean duration of passive immunity was less than 4 months for all diseases with substantial heterogeneity between infants. The maternal antibody levels against rubella and varicella were found to be positively correlated, while little to no correlation could be inferred for the other disease pairs. For some pairs, computational issues occurred with increasing correlation matrix complexity, which underlines the importance of further developing estimation methods for MNMMs. Copyright © 2015 John Wiley & Sons, Ltd.
Radio Propagation Prediction Software for Complex Mixed Path Physical Channels
2006-08-14
63 4.4.6. Applied Linear Regression Analysis in the Frequency Range 1-50 MHz 69 4.4.7. Projected Scaling to...4.4.6. Applied Linear Regression Analysis in the Frequency Range 1-50 MHz In order to construct a comprehensive numerical algorithm capable of
Advanced Statistical Analyses to Reduce Inconsistency of Bond Strength Data.
Minamino, T; Mine, A; Shintani, A; Higashi, M; Kawaguchi-Uemura, A; Kabetani, T; Hagino, R; Imai, D; Tajiri, Y; Matsumoto, M; Yatani, H
2017-11-01
This study was designed to clarify the interrelationship of factors that affect the value of microtensile bond strength (µTBS), focusing on nondestructive testing by which information of the specimens can be stored and quantified. µTBS test specimens were prepared from 10 noncarious human molars. Six factors of µTBS test specimens were evaluated: presence of voids at the interface, X-ray absorption coefficient of resin, X-ray absorption coefficient of dentin, length of dentin part, size of adhesion area, and individual differences of teeth. All specimens were observed nondestructively by optical coherence tomography and micro-computed tomography before µTBS testing. After µTBS testing, the effect of these factors on µTBS data was analyzed by the general linear model, linear mixed effects regression model, and nonlinear regression model with 95% confidence intervals. By the general linear model, a significant difference in individual differences of teeth was observed ( P < 0.001). A significantly positive correlation was shown between µTBS and length of dentin part ( P < 0.001); however, there was no significant nonlinearity ( P = 0.157). Moreover, a significantly negative correlation was observed between µTBS and size of adhesion area ( P = 0.001), with significant nonlinearity ( P = 0.014). No correlation was observed between µTBS and X-ray absorption coefficient of resin ( P = 0.147), and there was no significant nonlinearity ( P = 0.089). Additionally, a significantly positive correlation was observed between µTBS and X-ray absorption coefficient of dentin ( P = 0.022), with significant nonlinearity ( P = 0.036). A significant difference was also observed between the presence and absence of voids by linear mixed effects regression analysis. Our results showed correlations between various parameters of tooth specimens and µTBS data. To evaluate the performance of the adhesive more precisely, the effect of tooth variability and a method to reduce variation in bond strength values should also be considered.
Kim, Yoonsang; Choi, Young-Ku; Emery, Sherry
2013-08-01
Several statistical packages are capable of estimating generalized linear mixed models and these packages provide one or more of three estimation methods: penalized quasi-likelihood, Laplace, and Gauss-Hermite. Many studies have investigated these methods' performance for the mixed-effects logistic regression model. However, the authors focused on models with one or two random effects and assumed a simple covariance structure between them, which may not be realistic. When there are multiple correlated random effects in a model, the computation becomes intensive, and often an algorithm fails to converge. Moreover, in our analysis of smoking status and exposure to anti-tobacco advertisements, we have observed that when a model included multiple random effects, parameter estimates varied considerably from one statistical package to another even when using the same estimation method. This article presents a comprehensive review of the advantages and disadvantages of each estimation method. In addition, we compare the performances of the three methods across statistical packages via simulation, which involves two- and three-level logistic regression models with at least three correlated random effects. We apply our findings to a real dataset. Our results suggest that two packages-SAS GLIMMIX Laplace and SuperMix Gaussian quadrature-perform well in terms of accuracy, precision, convergence rates, and computing speed. We also discuss the strengths and weaknesses of the two packages in regard to sample sizes.
Kim, Yoonsang; Emery, Sherry
2013-01-01
Several statistical packages are capable of estimating generalized linear mixed models and these packages provide one or more of three estimation methods: penalized quasi-likelihood, Laplace, and Gauss-Hermite. Many studies have investigated these methods’ performance for the mixed-effects logistic regression model. However, the authors focused on models with one or two random effects and assumed a simple covariance structure between them, which may not be realistic. When there are multiple correlated random effects in a model, the computation becomes intensive, and often an algorithm fails to converge. Moreover, in our analysis of smoking status and exposure to anti-tobacco advertisements, we have observed that when a model included multiple random effects, parameter estimates varied considerably from one statistical package to another even when using the same estimation method. This article presents a comprehensive review of the advantages and disadvantages of each estimation method. In addition, we compare the performances of the three methods across statistical packages via simulation, which involves two- and three-level logistic regression models with at least three correlated random effects. We apply our findings to a real dataset. Our results suggest that two packages—SAS GLIMMIX Laplace and SuperMix Gaussian quadrature—perform well in terms of accuracy, precision, convergence rates, and computing speed. We also discuss the strengths and weaknesses of the two packages in regard to sample sizes. PMID:24288415
ERIC Educational Resources Information Center
Malin, Heather; Han, Hyemin; Liauw, Indrawati
2017-01-01
This study investigated the effects of internal and demographic variables on civic development in late adolescence using the construct "civic purpose." We conducted surveys on civic engagement with 480 high school seniors, and surveyed them again 2 years later. Using multivariate regression and linear mixed models, we tested the main…
ERIC Educational Resources Information Center
Pratt, Charlotte; Webber, Larry S.; Baggett, Chris D.; Ward, Dianne; Pate, Russell R.; Murray, David; Lohman, Timothy; Lytle, Leslie; Elder, John P.
2008-01-01
This study describes the relationships between sedentary activity and body composition in 1,458 sixth-grade girls from 36 middle schools across the United States. Multivariate associations between sedentary activity and body composition were examined with regression analyses using general linear mixed models. Mean age, body mass index, and…
Curriculum-Based Measurement of Oral Reading: Quality of Progress Monitoring Outcomes
ERIC Educational Resources Information Center
Christ, Theodore J.; Zopluoglu, Cengiz; Long, Jeffery D.; Monaghen, Barbara D.
2012-01-01
Curriculum-based measurement of oral reading (CBM-R) is frequently used to set student goals and monitor student progress. This study examined the quality of growth estimates derived from CBM-R progress monitoring data. The authors used a linear mixed effects regression (LMER) model to simulate progress monitoring data for multiple levels of…
Hobbs, Brian P.; Sargent, Daniel J.; Carlin, Bradley P.
2014-01-01
Assessing between-study variability in the context of conventional random-effects meta-analysis is notoriously difficult when incorporating data from only a small number of historical studies. In order to borrow strength, historical and current data are often assumed to be fully homogeneous, but this can have drastic consequences for power and Type I error if the historical information is biased. In this paper, we propose empirical and fully Bayesian modifications of the commensurate prior model (Hobbs et al., 2011) extending Pocock (1976), and evaluate their frequentist and Bayesian properties for incorporating patient-level historical data using general and generalized linear mixed regression models. Our proposed commensurate prior models lead to preposterior admissible estimators that facilitate alternative bias-variance trade-offs than those offered by pre-existing methodologies for incorporating historical data from a small number of historical studies. We also provide a sample analysis of a colon cancer trial comparing time-to-disease progression using a Weibull regression model. PMID:24795786
Modeling Longitudinal Data Containing Non-Normal Within Subject Errors
NASA Technical Reports Server (NTRS)
Feiveson, Alan; Glenn, Nancy L.
2013-01-01
The mission of the National Aeronautics and Space Administration’s (NASA) human research program is to advance safe human spaceflight. This involves conducting experiments, collecting data, and analyzing data. The data are longitudinal and result from a relatively few number of subjects; typically 10 – 20. A longitudinal study refers to an investigation where participant outcomes and possibly treatments are collected at multiple follow-up times. Standard statistical designs such as mean regression with random effects and mixed–effects regression are inadequate for such data because the population is typically not approximately normally distributed. Hence, more advanced data analysis methods are necessary. This research focuses on four such methods for longitudinal data analysis: the recently proposed linear quantile mixed models (lqmm) by Geraci and Bottai (2013), quantile regression, multilevel mixed–effects linear regression, and robust regression. This research also provides computational algorithms for longitudinal data that scientists can directly use for human spaceflight and other longitudinal data applications, then presents statistical evidence that verifies which method is best for specific situations. This advances the study of longitudinal data in a broad range of applications including applications in the sciences, technology, engineering and mathematics fields.
NASA Astrophysics Data System (ADS)
Qie, G.; Wang, G.; Wang, M.
2016-12-01
Mixed pixels and shadows due to buildings in urban areas impede accurate estimation and mapping of city vegetation carbon density. In most of previous studies, these factors are often ignored, which thus result in underestimation of city vegetation carbon density. In this study we presented an integrated methodology to improve the accuracy of mapping city vegetation carbon density. Firstly, we applied a linear shadow remove analysis (LSRA) on remotely sensed Landsat 8 images to reduce the shadow effects on carbon estimation. Secondly, we integrated a linear spectral unmixing analysis (LSUA) with a linear stepwise regression (LSR), a logistic model-based stepwise regression (LMSR) and k-Nearest Neighbors (kNN), and utilized and compared the integrated models on shadow-removed images to map vegetation carbon density. This methodology was examined in Shenzhen City of Southeast China. A data set from a total of 175 sample plots measured in 2013 and 2014 was used to train the models. The independent variables statistically significantly contributing to improving the fit of the models to the data and reducing the sum of squared errors were selected from a total of 608 variables derived from different image band combinations and transformations. The vegetation fraction from LSUA was then added into the models as an important independent variable. The estimates obtained were evaluated using a cross-validation method. Our results showed that higher accuracies were obtained from the integrated models compared with the ones using traditional methods which ignore the effects of mixed pixels and shadows. This study indicates that the integrated method has great potential on improving the accuracy of urban vegetation carbon density estimation. Key words: Urban vegetation carbon, shadow, spectral unmixing, spatial modeling, Landsat 8 images
ERIC Educational Resources Information Center
Lazar, Ann A.; Zerbe, Gary O.
2011-01-01
Researchers often compare the relationship between an outcome and covariate for two or more groups by evaluating whether the fitted regression curves differ significantly. When they do, researchers need to determine the "significance region," or the values of the covariate where the curves significantly differ. In analysis of covariance (ANCOVA),…
ERIC Educational Resources Information Center
Van Norman, Ethan R.; Christ, Theodore J.; Zopluoglu, Cengiz
2013-01-01
This study examined the effect of baseline estimation on the quality of trend estimates derived from Curriculum Based Measurement of Oral Reading (CBM-R) progress monitoring data. The authors used a linear mixed effects regression (LMER) model to simulate progress monitoring data for schedules ranging from 6-20 weeks for datasets with high and low…
Hossain, Ahmed; Beyene, Joseph
2014-01-01
This article compares baseline, average, and longitudinal data analysis methods for identifying genetic variants in genome-wide association study using the Genetic Analysis Workshop 18 data. We apply methods that include (a) linear mixed models with baseline measures, (b) random intercept linear mixed models with mean measures outcome, and (c) random intercept linear mixed models with longitudinal measurements. In the linear mixed models, covariates are included as fixed effects, whereas relatedness among individuals is incorporated as the variance-covariance structure of the random effect for the individuals. The overall strategy of applying linear mixed models decorrelate the data is based on Aulchenko et al.'s GRAMMAR. By analyzing systolic and diastolic blood pressure, which are used separately as outcomes, we compare the 3 methods in identifying a known genetic variant that is associated with blood pressure from chromosome 3 and simulated phenotype data. We also analyze the real phenotype data to illustrate the methods. We conclude that the linear mixed model with longitudinal measurements of diastolic blood pressure is the most accurate at identifying the known single-nucleotide polymorphism among the methods, but linear mixed models with baseline measures perform best with systolic blood pressure as the outcome.
Pérez-Rodríguez, Paulino; Gianola, Daniel; González-Camacho, Juan Manuel; Crossa, José; Manès, Yann; Dreisigacker, Susanne
2012-01-01
In genome-enabled prediction, parametric, semi-parametric, and non-parametric regression models have been used. This study assessed the predictive ability of linear and non-linear models using dense molecular markers. The linear models were linear on marker effects and included the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B. The non-linear models (this refers to non-linearity on markers) were reproducing kernel Hilbert space (RKHS) regression, Bayesian regularized neural networks (BRNN), and radial basis function neural networks (RBFNN). These statistical models were compared using 306 elite wheat lines from CIMMYT genotyped with 1717 diversity array technology (DArT) markers and two traits, days to heading (DTH) and grain yield (GY), measured in each of 12 environments. It was found that the three non-linear models had better overall prediction accuracy than the linear regression specification. Results showed a consistent superiority of RKHS and RBFNN over the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B models. PMID:23275882
Pérez-Rodríguez, Paulino; Gianola, Daniel; González-Camacho, Juan Manuel; Crossa, José; Manès, Yann; Dreisigacker, Susanne
2012-12-01
In genome-enabled prediction, parametric, semi-parametric, and non-parametric regression models have been used. This study assessed the predictive ability of linear and non-linear models using dense molecular markers. The linear models were linear on marker effects and included the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B. The non-linear models (this refers to non-linearity on markers) were reproducing kernel Hilbert space (RKHS) regression, Bayesian regularized neural networks (BRNN), and radial basis function neural networks (RBFNN). These statistical models were compared using 306 elite wheat lines from CIMMYT genotyped with 1717 diversity array technology (DArT) markers and two traits, days to heading (DTH) and grain yield (GY), measured in each of 12 environments. It was found that the three non-linear models had better overall prediction accuracy than the linear regression specification. Results showed a consistent superiority of RKHS and RBFNN over the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B models.
Pre-natal exposures to cocaine and alcohol and physical growth patterns to age 8 years
Lumeng, Julie C.; Cabral, Howard J.; Gannon, Katherine; Heeren, Timothy; Frank, Deborah A.
2007-01-01
Two hundred and two primarily African American/Caribbean children (classified by maternal report and infant meconium as 38 heavier, 74 lighter and 89 not cocaine-exposed) were measured repeatedly from birth to age 8 years to assess whether there is an independent effect of prenatal cocaine exposure on physical growth patterns. Children with fetal alcohol syndrome identifiable at birth were excluded. At birth, cocaine and alcohol exposures were significantly and independently associated with lower weight, length and head circumference in cross-sectional multiple regression analyses. The relationship over time of pre-natal exposures to weight, height, and head circumference was then examined by multiple linear regression using mixed linear models including covariates: child’s gestational age, gender, ethnicity, age at assessment, current caregiver, birth mother’s use of alcohol, marijuana and tobacco during the pregnancy and pre-pregnancy weight (for child’s weight) and height (for child’s height and head circumference). The cocaine effects did not persist beyond infancy in piecewise linear mixed models, but a significant and independent negative effect of pre-natal alcohol exposure persisted for weight, height, and head circumference. Catch-up growth in cocaine-exposed infants occurred primarily by 6 months of age for all growth parameters, with some small fluctuations in growth rates in the preschool age range but no detectable differences between heavier versus unexposed nor lighter versus unexposed thereafter. PMID:17412558
Conditional Monte Carlo randomization tests for regression models.
Parhat, Parwen; Rosenberger, William F; Diao, Guoqing
2014-08-15
We discuss the computation of randomization tests for clinical trials of two treatments when the primary outcome is based on a regression model. We begin by revisiting the seminal paper of Gail, Tan, and Piantadosi (1988), and then describe a method based on Monte Carlo generation of randomization sequences. The tests based on this Monte Carlo procedure are design based, in that they incorporate the particular randomization procedure used. We discuss permuted block designs, complete randomization, and biased coin designs. We also use a new technique by Plamadeala and Rosenberger (2012) for simple computation of conditional randomization tests. Like Gail, Tan, and Piantadosi, we focus on residuals from generalized linear models and martingale residuals from survival models. Such techniques do not apply to longitudinal data analysis, and we introduce a method for computation of randomization tests based on the predicted rate of change from a generalized linear mixed model when outcomes are longitudinal. We show, by simulation, that these randomization tests preserve the size and power well under model misspecification. Copyright © 2014 John Wiley & Sons, Ltd.
The Application of the Cumulative Logistic Regression Model to Automated Essay Scoring
ERIC Educational Resources Information Center
Haberman, Shelby J.; Sinharay, Sandip
2010-01-01
Most automated essay scoring programs use a linear regression model to predict an essay score from several essay features. This article applied a cumulative logit model instead of the linear regression model to automated essay scoring. Comparison of the performances of the linear regression model and the cumulative logit model was performed on a…
BIODEGRADATION PROBABILITY PROGRAM (BIODEG)
The Biodegradation Probability Program (BIODEG) calculates the probability that a chemical under aerobic conditions with mixed cultures of microorganisms will biodegrade rapidly or slowly. It uses fragment constants developed using multiple linear and non-linear regressions and d...
Coker, Freya; Williams, Cylie M; Taylor, Nicholas F; Caspers, Kirsten; McAlinden, Fiona; Wilton, Anita; Shields, Nora; Haines, Terry P
2018-05-10
This protocol considers three allied health staffing models across public health subacute hospitals. This quasi-experimental mixed-methods study, including qualitative process evaluation, aims to evaluate the impact of additional allied health services in subacute care, in rehabilitation and geriatric evaluation management settings, on patient, health service and societal outcomes. This health services research will analyse outcomes of patients exposed to different allied health models of care at three health services. Each health service will have a control ward (routine care) and an intervention ward (additional allied health). This project has two parts. Part 1: a whole of site data extraction for included wards. Outcome measures will include: length of stay, rate of readmissions, discharge destinations, community referrals, patient feedback and staff perspectives. Part 2: Functional Independence Measure scores will be collected every 2-3 days for the duration of 60 patient admissions.Data from part 1 will be analysed by linear regression analysis for continuous outcomes using patient-level data and logistic regression analysis for binary outcomes. Qualitative data will be analysed using a deductive thematic approach. For part 2, a linear mixed model analysis will be conducted using therapy service delivery and days since admission to subacute care as fixed factors in the model and individual participant as a random factor. Graphical analysis will be used to examine the growth curve of the model and transformations. The days since admission factor will be used to examine non-linear growth trajectories to determine if they lead to better model fit. Findings will be disseminated through local reports and to the Department of Health and Human Services Victoria. Results will be presented at conferences and submitted to peer-reviewed journals. The Monash Health Human Research Ethics committee approved this multisite research (HREC/17/MonH/144 and HREC/17/MonH/547). © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
A Technique of Fuzzy C-Mean in Multiple Linear Regression Model toward Paddy Yield
NASA Astrophysics Data System (ADS)
Syazwan Wahab, Nur; Saifullah Rusiman, Mohd; Mohamad, Mahathir; Amira Azmi, Nur; Che Him, Norziha; Ghazali Kamardan, M.; Ali, Maselan
2018-04-01
In this paper, we propose a hybrid model which is a combination of multiple linear regression model and fuzzy c-means method. This research involved a relationship between 20 variates of the top soil that are analyzed prior to planting of paddy yields at standard fertilizer rates. Data used were from the multi-location trials for rice carried out by MARDI at major paddy granary in Peninsular Malaysia during the period from 2009 to 2012. Missing observations were estimated using mean estimation techniques. The data were analyzed using multiple linear regression model and a combination of multiple linear regression model and fuzzy c-means method. Analysis of normality and multicollinearity indicate that the data is normally scattered without multicollinearity among independent variables. Analysis of fuzzy c-means cluster the yield of paddy into two clusters before the multiple linear regression model can be used. The comparison between two method indicate that the hybrid of multiple linear regression model and fuzzy c-means method outperform the multiple linear regression model with lower value of mean square error.
Liu, Zun-Lei; Yuan, Xing-Wei; Yan, Li-Ping; Yang, Lin-Lin; Cheng, Jia-Hua
2013-09-01
By using the 2008-2010 investigation data about the body condition of small yellow croaker in the offshore waters of southern Yellow Sea (SYS), open waters of northern East China Sea (NECS), and offshore waters of middle East China Sea (MECS), this paper analyzed the spatial heterogeneity of body length-body mass of juvenile and adult small yellow croakers by the statistical approaches of mean regression model and quantile regression model. The results showed that the residual standard errors from the analysis of covariance (ANCOVA) and the linear mixed-effects model were similar, and those from the simple linear regression were the highest. For the juvenile small yellow croakers, their mean body mass in SYS and NECS estimated by the mixed-effects mean regression model was higher than the overall average mass across the three regions, while the mean body mass in MECS was below the overall average. For the adult small yellow croakers, their mean body mass in NECS was higher than the overall average, while the mean body mass in SYS and MECS was below the overall average. The results from quantile regression indicated the substantial differences in the allometric relationships of juvenile small yellow croakers between SYS, NECS, and MECS, with the estimated mean exponent of the allometric relationship in SYS being 2.85, and the interquartile range being from 2.63 to 2.96, which indicated the heterogeneity of body form. The results from ANCOVA showed that the allometric body length-body mass relationships were significantly different between the 25th and 75th percentile exponent values (F=6.38, df=1737, P<0.01) and the 25th percentile and median exponent values (F=2.35, df=1737, P=0.039). The relationship was marginally different between the median and 75th percentile exponent values (F=2.21, df=1737, P=0.051). The estimated body length-body mass exponent of adult small yellow croakers in SYS was 3.01 (10th and 95th percentiles = 2.77 and 3.1, respectively). The estimated body length-body mass relationships were significantly different from the lower and upper quantiles of the exponent (F=3.31, df=2793, P=0.01) and the median and upper quantiles (F=3.56, df=2793, P<0.01), while no significant difference was observed between the lower and median quantiles (F=0.98, df=2793, P=0.43).
Estimation of the linear mixed integrated Ornstein–Uhlenbeck model
Hughes, Rachael A.; Kenward, Michael G.; Sterne, Jonathan A. C.; Tilling, Kate
2017-01-01
ABSTRACT The linear mixed model with an added integrated Ornstein–Uhlenbeck (IOU) process (linear mixed IOU model) allows for serial correlation and estimation of the degree of derivative tracking. It is rarely used, partly due to the lack of available software. We implemented the linear mixed IOU model in Stata and using simulations we assessed the feasibility of fitting the model by restricted maximum likelihood when applied to balanced and unbalanced data. We compared different (1) optimization algorithms, (2) parameterizations of the IOU process, (3) data structures and (4) random-effects structures. Fitting the model was practical and feasible when applied to large and moderately sized balanced datasets (20,000 and 500 observations), and large unbalanced datasets with (non-informative) dropout and intermittent missingness. Analysis of a real dataset showed that the linear mixed IOU model was a better fit to the data than the standard linear mixed model (i.e. independent within-subject errors with constant variance). PMID:28515536
Koerner, Tess K; Zhang, Yang
2017-02-27
Neurophysiological studies are often designed to examine relationships between measures from different testing conditions, time points, or analysis techniques within the same group of participants. Appropriate statistical techniques that can take into account repeated measures and multivariate predictor variables are integral and essential to successful data analysis and interpretation. This work implements and compares conventional Pearson correlations and linear mixed-effects (LME) regression models using data from two recently published auditory electrophysiology studies. For the specific research questions in both studies, the Pearson correlation test is inappropriate for determining strengths between the behavioral responses for speech-in-noise recognition and the multiple neurophysiological measures as the neural responses across listening conditions were simply treated as independent measures. In contrast, the LME models allow a systematic approach to incorporate both fixed-effect and random-effect terms to deal with the categorical grouping factor of listening conditions, between-subject baseline differences in the multiple measures, and the correlational structure among the predictor variables. Together, the comparative data demonstrate the advantages as well as the necessity to apply mixed-effects models to properly account for the built-in relationships among the multiple predictor variables, which has important implications for proper statistical modeling and interpretation of human behavior in terms of neural correlates and biomarkers.
Mixed conditional logistic regression for habitat selection studies.
Duchesne, Thierry; Fortin, Daniel; Courbin, Nicolas
2010-05-01
1. Resource selection functions (RSFs) are becoming a dominant tool in habitat selection studies. RSF coefficients can be estimated with unconditional (standard) and conditional logistic regressions. While the advantage of mixed-effects models is recognized for standard logistic regression, mixed conditional logistic regression remains largely overlooked in ecological studies. 2. We demonstrate the significance of mixed conditional logistic regression for habitat selection studies. First, we use spatially explicit models to illustrate how mixed-effects RSFs can be useful in the presence of inter-individual heterogeneity in selection and when the assumption of independence from irrelevant alternatives (IIA) is violated. The IIA hypothesis states that the strength of preference for habitat type A over habitat type B does not depend on the other habitat types also available. Secondly, we demonstrate the significance of mixed-effects models to evaluate habitat selection of free-ranging bison Bison bison. 3. When movement rules were homogeneous among individuals and the IIA assumption was respected, fixed-effects RSFs adequately described habitat selection by simulated animals. In situations violating the inter-individual homogeneity and IIA assumptions, however, RSFs were best estimated with mixed-effects regressions, and fixed-effects models could even provide faulty conclusions. 4. Mixed-effects models indicate that bison did not select farmlands, but exhibited strong inter-individual variations in their response to farmlands. Less than half of the bison preferred farmlands over forests. Conversely, the fixed-effect model simply suggested an overall selection for farmlands. 5. Conditional logistic regression is recognized as a powerful approach to evaluate habitat selection when resource availability changes. This regression is increasingly used in ecological studies, but almost exclusively in the context of fixed-effects models. Fitness maximization can imply differences in trade-offs among individuals, which can yield inter-individual differences in selection and lead to departure from IIA. These situations are best modelled with mixed-effects models. Mixed-effects conditional logistic regression should become a valuable tool for ecological research.
Kumar, K Vasanth
2007-04-02
Kinetic experiments were carried out for the sorption of safranin onto activated carbon particles. The kinetic data were fitted to pseudo-second order model of Ho, Sobkowsk and Czerwinski, Blanchard et al. and Ritchie by linear and non-linear regression methods. Non-linear method was found to be a better way of obtaining the parameters involved in the second order rate kinetic expressions. Both linear and non-linear regression showed that the Sobkowsk and Czerwinski and Ritchie's pseudo-second order models were the same. Non-linear regression analysis showed that both Blanchard et al. and Ho have similar ideas on the pseudo-second order model but with different assumptions. The best fit of experimental data in Ho's pseudo-second order expression by linear and non-linear regression method showed that Ho pseudo-second order model was a better kinetic expression when compared to other pseudo-second order kinetic expressions.
Wang, S; Martinez-Lage, M; Sakai, Y; Chawla, S; Kim, S G; Alonso-Basanta, M; Lustig, R A; Brem, S; Mohan, S; Wolf, R L; Desai, A; Poptani, H
2016-01-01
Early assessment of treatment response is critical in patients with glioblastomas. A combination of DTI and DSC perfusion imaging parameters was evaluated to distinguish glioblastomas with true progression from mixed response and pseudoprogression. Forty-one patients with glioblastomas exhibiting enhancing lesions within 6 months after completion of chemoradiation therapy were retrospectively studied. All patients underwent surgery after MR imaging and were histologically classified as having true progression (>75% tumor), mixed response (25%-75% tumor), or pseudoprogression (<25% tumor). Mean diffusivity, fractional anisotropy, linear anisotropy coefficient, planar anisotropy coefficient, spheric anisotropy coefficient, and maximum relative cerebral blood volume values were measured from the enhancing tissue. A multivariate logistic regression analysis was used to determine the best model for classification of true progression from mixed response or pseudoprogression. Significantly elevated maximum relative cerebral blood volume, fractional anisotropy, linear anisotropy coefficient, and planar anisotropy coefficient and decreased spheric anisotropy coefficient were observed in true progression compared with pseudoprogression (P < .05). There were also significant differences in maximum relative cerebral blood volume, fractional anisotropy, planar anisotropy coefficient, and spheric anisotropy coefficient measurements between mixed response and true progression groups. The best model to distinguish true progression from non-true progression (pseudoprogression and mixed) consisted of fractional anisotropy, linear anisotropy coefficient, and maximum relative cerebral blood volume, resulting in an area under the curve of 0.905. This model also differentiated true progression from mixed response with an area under the curve of 0.901. A combination of fractional anisotropy and maximum relative cerebral blood volume differentiated pseudoprogression from nonpseudoprogression (true progression and mixed) with an area under the curve of 0.807. DTI and DSC perfusion imaging can improve accuracy in assessing treatment response and may aid in individualized treatment of patients with glioblastomas. © 2016 by American Journal of Neuroradiology.
NASA Astrophysics Data System (ADS)
Lucifredi, A.; Mazzieri, C.; Rossi, M.
2000-05-01
Since the operational conditions of a hydroelectric unit can vary within a wide range, the monitoring system must be able to distinguish between the variations of the monitored variable caused by variations of the operation conditions and those due to arising and progressing of failures and misoperations. The paper aims to identify the best technique to be adopted for the monitoring system. Three different methods have been implemented and compared. Two of them use statistical techniques: the first, the linear multiple regression, expresses the monitored variable as a linear function of the process parameters (independent variables), while the second, the dynamic kriging technique, is a modified technique of multiple linear regression representing the monitored variable as a linear combination of the process variables in such a way as to minimize the variance of the estimate error. The third is based on neural networks. Tests have shown that the monitoring system based on the kriging technique is not affected by some problems common to the other two models e.g. the requirement of a large amount of data for their tuning, both for training the neural network and defining the optimum plane for the multiple regression, not only in the system starting phase but also after a trivial operation of maintenance involving the substitution of machinery components having a direct impact on the observed variable. Or, in addition, the necessity of different models to describe in a satisfactory way the different ranges of operation of the plant. The monitoring system based on the kriging statistical technique overrides the previous difficulties: it does not require a large amount of data to be tuned and is immediately operational: given two points, the third can be immediately estimated; in addition the model follows the system without adapting itself to it. The results of the experimentation performed seem to indicate that a model based on a neural network or on a linear multiple regression is not optimal, and that a different approach is necessary to reduce the amount of work during the learning phase using, when available, all the information stored during the initial phase of the plant to build the reference baseline, elaborating, if it is the case, the raw information available. A mixed approach using the kriging statistical technique and neural network techniques could optimise the result.
Multivariate statistical approach to estimate mixing proportions for unknown end members
Valder, Joshua F.; Long, Andrew J.; Davis, Arden D.; Kenner, Scott J.
2012-01-01
A multivariate statistical method is presented, which includes principal components analysis (PCA) and an end-member mixing model to estimate unknown end-member hydrochemical compositions and the relative mixing proportions of those end members in mixed waters. PCA, together with the Hotelling T2 statistic and a conceptual model of groundwater flow and mixing, was used in selecting samples that best approximate end members, which then were used as initial values in optimization of the end-member mixing model. This method was tested on controlled datasets (i.e., true values of estimates were known a priori) and found effective in estimating these end members and mixing proportions. The controlled datasets included synthetically generated hydrochemical data, synthetically generated mixing proportions, and laboratory analyses of sample mixtures, which were used in an evaluation of the effectiveness of this method for potential use in actual hydrological settings. For three different scenarios tested, correlation coefficients (R2) for linear regression between the estimated and known values ranged from 0.968 to 0.993 for mixing proportions and from 0.839 to 0.998 for end-member compositions. The method also was applied to field data from a study of end-member mixing in groundwater as a field example and partial method validation.
Item Purification in Differential Item Functioning Using Generalized Linear Mixed Models
ERIC Educational Resources Information Center
Liu, Qian
2011-01-01
For this dissertation, four item purification procedures were implemented onto the generalized linear mixed model for differential item functioning (DIF) analysis, and the performance of these item purification procedures was investigated through a series of simulations. Among the four procedures, forward and generalized linear mixed model (GLMM)…
1974-01-01
REGRESSION MODEL - THE UNCONSTRAINED, LINEAR EQUALITY AND INEQUALITY CONSTRAINED APPROACHES January 1974 Nelson Delfino d’Avila Mascarenha;? Image...Report 520 DIGITAL IMAGE RESTORATION UNDER A REGRESSION MODEL THE UNCONSTRAINED, LINEAR EQUALITY AND INEQUALITY CONSTRAINED APPROACHES January...a two- dimensional form adequately describes the linear model . A dis- cretization is performed by using quadrature methods. By trans
Advanced statistics: linear regression, part I: simple linear regression.
Marill, Keith A
2004-01-01
Simple linear regression is a mathematical technique used to model the relationship between a single independent predictor variable and a single dependent outcome variable. In this, the first of a two-part series exploring concepts in linear regression analysis, the four fundamental assumptions and the mechanics of simple linear regression are reviewed. The most common technique used to derive the regression line, the method of least squares, is described. The reader will be acquainted with other important concepts in simple linear regression, including: variable transformations, dummy variables, relationship to inference testing, and leverage. Simplified clinical examples with small datasets and graphic models are used to illustrate the points. This will provide a foundation for the second article in this series: a discussion of multiple linear regression, in which there are multiple predictor variables.
Linear regression crash prediction models : issues and proposed solutions.
DOT National Transportation Integrated Search
2010-05-01
The paper develops a linear regression model approach that can be applied to : crash data to predict vehicle crashes. The proposed approach involves novice data aggregation : to satisfy linear regression assumptions; namely error structure normality ...
Du, Hongying; Wang, Jie; Yao, Xiaojun; Hu, Zhide
2009-01-01
The heuristic method (HM) and support vector machine (SVM) were used to construct quantitative structure-retention relationship models by a series of compounds to predict the gradient retention times of reversed-phase high-performance liquid chromatography (HPLC) in three different columns. The aims of this investigation were to predict the retention times of multifarious compounds, to find the main properties of the three columns, and to indicate the theory of separation procedures. In our method, we correlated the retention times of many diverse structural analytes in three columns (Symmetry C18, Chromolith, and SG-MIX) with their representative molecular descriptors, calculated from the molecular structures alone. HM was used to select the most important molecular descriptors and build linear regression models. Furthermore, non-linear regression models were built using the SVM method; the performance of the SVM models were better than that of the HM models, and the prediction results were in good agreement with the experimental values. This paper could give some insights into the factors that were likely to govern the gradient retention process of the three investigated HPLC columns, which could theoretically supervise the practical experiment.
Mixed kernel function support vector regression for global sensitivity analysis
NASA Astrophysics Data System (ADS)
Cheng, Kai; Lu, Zhenzhou; Wei, Yuhao; Shi, Yan; Zhou, Yicheng
2017-11-01
Global sensitivity analysis (GSA) plays an important role in exploring the respective effects of input variables on an assigned output response. Amongst the wide sensitivity analyses in literature, the Sobol indices have attracted much attention since they can provide accurate information for most models. In this paper, a mixed kernel function (MKF) based support vector regression (SVR) model is employed to evaluate the Sobol indices at low computational cost. By the proposed derivation, the estimation of the Sobol indices can be obtained by post-processing the coefficients of the SVR meta-model. The MKF is constituted by the orthogonal polynomials kernel function and Gaussian radial basis kernel function, thus the MKF possesses both the global characteristic advantage of the polynomials kernel function and the local characteristic advantage of the Gaussian radial basis kernel function. The proposed approach is suitable for high-dimensional and non-linear problems. Performance of the proposed approach is validated by various analytical functions and compared with the popular polynomial chaos expansion (PCE). Results demonstrate that the proposed approach is an efficient method for global sensitivity analysis.
Simplified large African carnivore density estimators from track indices.
Winterbach, Christiaan W; Ferreira, Sam M; Funston, Paul J; Somers, Michael J
2016-01-01
The range, population size and trend of large carnivores are important parameters to assess their status globally and to plan conservation strategies. One can use linear models to assess population size and trends of large carnivores from track-based surveys on suitable substrates. The conventional approach of a linear model with intercept may not intercept at zero, but may fit the data better than linear model through the origin. We assess whether a linear regression through the origin is more appropriate than a linear regression with intercept to model large African carnivore densities and track indices. We did simple linear regression with intercept analysis and simple linear regression through the origin and used the confidence interval for ß in the linear model y = αx + ß, Standard Error of Estimate, Mean Squares Residual and Akaike Information Criteria to evaluate the models. The Lion on Clay and Low Density on Sand models with intercept were not significant ( P > 0.05). The other four models with intercept and the six models thorough origin were all significant ( P < 0.05). The models using linear regression with intercept all included zero in the confidence interval for ß and the null hypothesis that ß = 0 could not be rejected. All models showed that the linear model through the origin provided a better fit than the linear model with intercept, as indicated by the Standard Error of Estimate and Mean Square Residuals. Akaike Information Criteria showed that linear models through the origin were better and that none of the linear models with intercept had substantial support. Our results showed that linear regression through the origin is justified over the more typical linear regression with intercept for all models we tested. A general model can be used to estimate large carnivore densities from track densities across species and study areas. The formula observed track density = 3.26 × carnivore density can be used to estimate densities of large African carnivores using track counts on sandy substrates in areas where carnivore densities are 0.27 carnivores/100 km 2 or higher. To improve the current models, we need independent data to validate the models and data to test for non-linear relationship between track indices and true density at low densities.
Punzo, Antonio; Ingrassia, Salvatore; Maruotti, Antonello
2018-04-22
A time-varying latent variable model is proposed to jointly analyze multivariate mixed-support longitudinal data. The proposal can be viewed as an extension of hidden Markov regression models with fixed covariates (HMRMFCs), which is the state of the art for modelling longitudinal data, with a special focus on the underlying clustering structure. HMRMFCs are inadequate for applications in which a clustering structure can be identified in the distribution of the covariates, as the clustering is independent from the covariates distribution. Here, hidden Markov regression models with random covariates are introduced by explicitly specifying state-specific distributions for the covariates, with the aim of improving the recovering of the clusters in the data with respect to a fixed covariates paradigm. The hidden Markov regression models with random covariates class is defined focusing on the exponential family, in a generalized linear model framework. Model identifiability conditions are sketched, an expectation-maximization algorithm is outlined for parameter estimation, and various implementation and operational issues are discussed. Properties of the estimators of the regression coefficients, as well as of the hidden path parameters, are evaluated through simulation experiments and compared with those of HMRMFCs. The method is applied to physical activity data. Copyright © 2018 John Wiley & Sons, Ltd.
An R2 statistic for fixed effects in the linear mixed model.
Edwards, Lloyd J; Muller, Keith E; Wolfinger, Russell D; Qaqish, Bahjat F; Schabenberger, Oliver
2008-12-20
Statisticians most often use the linear mixed model to analyze Gaussian longitudinal data. The value and familiarity of the R(2) statistic in the linear univariate model naturally creates great interest in extending it to the linear mixed model. We define and describe how to compute a model R(2) statistic for the linear mixed model by using only a single model. The proposed R(2) statistic measures multivariate association between the repeated outcomes and the fixed effects in the linear mixed model. The R(2) statistic arises as a 1-1 function of an appropriate F statistic for testing all fixed effects (except typically the intercept) in a full model. The statistic compares the full model with a null model with all fixed effects deleted (except typically the intercept) while retaining exactly the same covariance structure. Furthermore, the R(2) statistic leads immediately to a natural definition of a partial R(2) statistic. A mixed model in which ethnicity gives a very small p-value as a longitudinal predictor of blood pressure (BP) compellingly illustrates the value of the statistic. In sharp contrast to the extreme p-value, a very small R(2) , a measure of statistical and scientific importance, indicates that ethnicity has an almost negligible association with the repeated BP outcomes for the study.
Koerner, Tess K.; Zhang, Yang
2017-01-01
Neurophysiological studies are often designed to examine relationships between measures from different testing conditions, time points, or analysis techniques within the same group of participants. Appropriate statistical techniques that can take into account repeated measures and multivariate predictor variables are integral and essential to successful data analysis and interpretation. This work implements and compares conventional Pearson correlations and linear mixed-effects (LME) regression models using data from two recently published auditory electrophysiology studies. For the specific research questions in both studies, the Pearson correlation test is inappropriate for determining strengths between the behavioral responses for speech-in-noise recognition and the multiple neurophysiological measures as the neural responses across listening conditions were simply treated as independent measures. In contrast, the LME models allow a systematic approach to incorporate both fixed-effect and random-effect terms to deal with the categorical grouping factor of listening conditions, between-subject baseline differences in the multiple measures, and the correlational structure among the predictor variables. Together, the comparative data demonstrate the advantages as well as the necessity to apply mixed-effects models to properly account for the built-in relationships among the multiple predictor variables, which has important implications for proper statistical modeling and interpretation of human behavior in terms of neural correlates and biomarkers. PMID:28264422
The microcomputer scientific software series 2: general linear model--regression.
Harold M. Rauscher
1983-01-01
The general linear model regression (GLMR) program provides the microcomputer user with a sophisticated regression analysis capability. The output provides a regression ANOVA table, estimators of the regression model coefficients, their confidence intervals, confidence intervals around the predicted Y-values, residuals for plotting, a check for multicollinearity, a...
Lloyd-Jones, Luke R; Robinson, Matthew R; Yang, Jian; Visscher, Peter M
2018-04-01
Genome-wide association studies (GWAS) have identified thousands of loci that are robustly associated with complex diseases. The use of linear mixed model (LMM) methodology for GWAS is becoming more prevalent due to its ability to control for population structure and cryptic relatedness and to increase power. The odds ratio (OR) is a common measure of the association of a disease with an exposure ( e.g. , a genetic variant) and is readably available from logistic regression. However, when the LMM is applied to all-or-none traits it provides estimates of genetic effects on the observed 0-1 scale, a different scale to that in logistic regression. This limits the comparability of results across studies, for example in a meta-analysis, and makes the interpretation of the magnitude of an effect from an LMM GWAS difficult. In this study, we derived transformations from the genetic effects estimated under the LMM to the OR that only rely on summary statistics. To test the proposed transformations, we used real genotypes from two large, publicly available data sets to simulate all-or-none phenotypes for a set of scenarios that differ in underlying model, disease prevalence, and heritability. Furthermore, we applied these transformations to GWAS summary statistics for type 2 diabetes generated from 108,042 individuals in the UK Biobank. In both simulation and real-data application, we observed very high concordance between the transformed OR from the LMM and either the simulated truth or estimates from logistic regression. The transformations derived and validated in this study improve the comparability of results from prospective and already performed LMM GWAS on complex diseases by providing a reliable transformation to a common comparative scale for the genetic effects. Copyright © 2018 by the Genetics Society of America.
Comparison between Linear and Nonlinear Regression in a Laboratory Heat Transfer Experiment
ERIC Educational Resources Information Center
Gonçalves, Carine Messias; Schwaab, Marcio; Pinto, José Carlos
2013-01-01
In order to interpret laboratory experimental data, undergraduate students are used to perform linear regression through linearized versions of nonlinear models. However, the use of linearized models can lead to statistically biased parameter estimates. Even so, it is not an easy task to introduce nonlinear regression and show for the students…
NASA Technical Reports Server (NTRS)
Merenyi, E.; Miller, J. S.; Singer, R. B.
1992-01-01
The linear mixing model approach was successfully applied to data sets of various natures. In these sets, the measured radiance could be assumed to be a linear combination of radiance contributions. The present work is an attempt to analyze a spectral image of Mars with linear mixing modeling.
Mazo Lopera, Mauricio A; Coombes, Brandon J; de Andrade, Mariza
2017-09-27
Gene-environment (GE) interaction has important implications in the etiology of complex diseases that are caused by a combination of genetic factors and environment variables. Several authors have developed GE analysis in the context of independent subjects or longitudinal data using a gene-set. In this paper, we propose to analyze GE interaction for discrete and continuous phenotypes in family studies by incorporating the relatedness among the relatives for each family into a generalized linear mixed model (GLMM) and by using a gene-based variance component test. In addition, we deal with collinearity problems arising from linkage disequilibrium among single nucleotide polymorphisms (SNPs) by considering their coefficients as random effects under the null model estimation. We show that the best linear unbiased predictor (BLUP) of such random effects in the GLMM is equivalent to the ridge regression estimator. This equivalence provides a simple method to estimate the ridge penalty parameter in comparison to other computationally-demanding estimation approaches based on cross-validation schemes. We evaluated the proposed test using simulation studies and applied it to real data from the Baependi Heart Study consisting of 76 families. Using our approach, we identified an interaction between BMI and the Peroxisome Proliferator Activated Receptor Gamma ( PPARG ) gene associated with diabetes.
Wang, Ke-Sheng; Liu, Xuefeng; Ategbole, Muyiwa; Xie, Xin; Liu, Ying; Xu, Chun; Xie, Changchun; Sha, Zhanxin
2017-01-01
Objective: Screening for colorectal cancer (CRC) can reduce disease incidence, morbidity, and mortality. However, few studies have investigated the urban-rural differences in social and behavioral factors influencing CRC screening. The objective of the study was to investigate the potential factors across urban-rural groups on the usage of CRC screening. Methods: A total of 38,505 adults (aged ≥40 years) were selected from the 2009 California Health Interview Survey (CHIS) data - the latest CHIS data on CRC screening. The weighted generalized linear mixed-model (WGLIMM) was used to deal with this hierarchical structure data. Weighted simple and multiple mixed logistic regression analyses in SAS ver. 9.4 were used to obtain the odds ratios (ORs) and their 95% confidence intervals (CIs). Results: The overall prevalence of CRC screening was 48.1% while the prevalence in four residence groups - urban, second city, suburban, and town/rural, were 45.8%, 46.9%, 53.7% and 50.1%, respectively. The results of WGLIMM analysis showed that there was residence effect (p<0.0001) and residence groups had significant interactions with gender, age group, education level, and employment status (p<0.05). Multiple logistic regression analysis revealed that age, race, marital status, education level, employment stats, binge drinking, and smoking status were associated with CRC screening (p<0.05). Stratified by residence regions, age and poverty level showed associations with CRC screening in all four residence groups. Education level was positively associated with CRC screening in second city and suburban. Infrequent binge drinking was associated with CRC screening in urban and suburban; while current smoking was a protective factor in urban and town/rural groups. Conclusions: Mixed models are useful to deal with the clustered survey data. Social factors and behavioral factors (binge drinking and smoking) were associated with CRC screening and the associations were affected by living areas such as urban and rural regions. PMID:28952708
Wang, Ke-Sheng; Liu, Xuefeng; Ategbole, Muyiwa; Xie, Xin; Liu, Ying; Xu, Chun; Xie, Changchun; Sha, Zhanxin
2017-09-27
Objective: Screening for colorectal cancer (CRC) can reduce disease incidence, morbidity, and mortality. However, few studies have investigated the urban-rural differences in social and behavioral factors influencing CRC screening. The objective of the study was to investigate the potential factors across urban-rural groups on the usage of CRC screening. Methods: A total of 38,505 adults (aged ≥40 years) were selected from the 2009 California Health Interview Survey (CHIS) data - the latest CHIS data on CRC screening. The weighted generalized linear mixed-model (WGLIMM) was used to deal with this hierarchical structure data. Weighted simple and multiple mixed logistic regression analyses in SAS ver. 9.4 were used to obtain the odds ratios (ORs) and their 95% confidence intervals (CIs). Results: The overall prevalence of CRC screening was 48.1% while the prevalence in four residence groups - urban, second city, suburban, and town/rural, were 45.8%, 46.9%, 53.7% and 50.1%, respectively. The results of WGLIMM analysis showed that there was residence effect (p<0.0001) and residence groups had significant interactions with gender, age group, education level, and employment status (p<0.05). Multiple logistic regression analysis revealed that age, race, marital status, education level, employment stats, binge drinking, and smoking status were associated with CRC screening (p<0.05). Stratified by residence regions, age and poverty level showed associations with CRC screening in all four residence groups. Education level was positively associated with CRC screening in second city and suburban. Infrequent binge drinking was associated with CRC screening in urban and suburban; while current smoking was a protective factor in urban and town/rural groups. Conclusions: Mixed models are useful to deal with the clustered survey data. Social factors and behavioral factors (binge drinking and smoking) were associated with CRC screening and the associations were affected by living areas such as urban and rural regions. Creative Commons Attribution License
Water mass mixing: The dominant control on the zinc distribution in the North Atlantic Ocean
NASA Astrophysics Data System (ADS)
Roshan, Saeed; Wu, Jingfeng
2015-07-01
Dissolved zinc (dZn) concentration was determined in the North Atlantic during the U.S. GEOTRACES 2010 and 2011 cruise (GOETRACES GA03). A relatively poor linear correlation (R2 = 0.756) was observed between dZn and silicic acid (Si), the slope of which was 0.0577 nM/µmol/kg. We attribute the relatively poor dZn-Si correlation to the following processes: (a) differential regeneration of zinc relative to silicic acid, (b) mixing of multiple water masses that have different Zn/Si, and (c) zinc sources such as sedimentary or hydrothermal. To quantitatively distinguish these possibilities, we use the results of Optimum Multi-Parameter Water Mass Analysis by Jenkins et al. (2015) to model the zinc distribution below 500 m. We hypothesized two scenarios: conservative mixing and regenerative mixing. The first scenario (conservative) could be modeled to results in a correlation with observations with a R2 = 0.846. In the second scenario, we took a Si-related regeneration into account, which could model the observations with a R2 = 0.867. Through this regenerative mixing scenario, we estimated a Zn/Si = 0.0548 nM/µmol/kg that may be more realistic than linear regression slope due to accounting for process b. However, this did not improve the model substantially (R2 = 0.867 versus0.846), which may indicate the insignificant effect of remineralization on the zinc distribution in this region. The relative weakness in the model-observation correlation (R2~0.85 for both scenarios) implies that processes (a) and (c) may be plausible. Furthermore, dZn in the upper 500 m exhibited a very poor correlation with apparent oxygen utilization, suggesting a minimal role for the organic matter-associated remineralization process.
Interpretation of commonly used statistical regression models.
Kasza, Jessica; Wolfe, Rory
2014-01-01
A review of some regression models commonly used in respiratory health applications is provided in this article. Simple linear regression, multiple linear regression, logistic regression and ordinal logistic regression are considered. The focus of this article is on the interpretation of the regression coefficients of each model, which are illustrated through the application of these models to a respiratory health research study. © 2013 The Authors. Respirology © 2013 Asian Pacific Society of Respirology.
Pagniez, Dominique; Duhamel, Alain; Boulanger, Eric; Lessore de Sainte Foy, Celia; Beuscart, Jean-Baptiste
2017-08-31
Glucose is widely used as an osmotic agent in peritoneal dialysis (PD), but exerts untoward effects on the peritoneum. The potential protective effect of a reduced exposure to hypertonic glucose has never been investigated. The cohort of PD patients attending our center which tackled the challenge of a restricted use of hypertonic glucose solutions has been prospectively followed since 1992. Small-solute transport was assessed using an equivalent of the glucose peritoneal equilibration test after 6 months, and then every year. Study was stopped on July 1st, 2008, before use of biocompatible solutions. Repeated measures in patients treated with PD for 54 months were analyzed by using (1) the slopes of the linear regression for D 4 /D 0 ratios over time computed for each individual, and (2) a linear mixed model. In the study period, 44 patients were treated for a total of 2376 months, 2058 without hypertonic glucose. There was one episode of peritoneal infection every 18 patient-months. The mean of slopes of the linear regression for D 4 /D 0 ratios was found to be significantly positive (Student's test, p < .001) and the results of the mixed model reflected a similar significant increase for D 4 /D 0 ratios over time. These results reflected a significant decrease of small-solute transport. In this large series, minimizing the use of hypertonic glucose solutions was associated in patients on long term PD with an overall decrease of small-solute transport within 54 months, despite a high rate of peritoneal infection.
Boosting structured additive quantile regression for longitudinal childhood obesity data.
Fenske, Nora; Fahrmeir, Ludwig; Hothorn, Torsten; Rzehak, Peter; Höhle, Michael
2013-07-25
Childhood obesity and the investigation of its risk factors has become an important public health issue. Our work is based on and motivated by a German longitudinal study including 2,226 children with up to ten measurements on their body mass index (BMI) and risk factors from birth to the age of 10 years. We introduce boosting of structured additive quantile regression as a novel distribution-free approach for longitudinal quantile regression. The quantile-specific predictors of our model include conventional linear population effects, smooth nonlinear functional effects, varying-coefficient terms, and individual-specific effects, such as intercepts and slopes. Estimation is based on boosting, a computer intensive inference method for highly complex models. We propose a component-wise functional gradient descent boosting algorithm that allows for penalized estimation of the large variety of different effects, particularly leading to individual-specific effects shrunken toward zero. This concept allows us to flexibly estimate the nonlinear age curves of upper quantiles of the BMI distribution, both on population and on individual-specific level, adjusted for further risk factors and to detect age-varying effects of categorical risk factors. Our model approach can be regarded as the quantile regression analog of Gaussian additive mixed models (or structured additive mean regression models), and we compare both model classes with respect to our obesity data.
D.b.h./crown diameter relationships in mixed Appalachian hardwood stands
Neil I. Lamson; Neil I. Lamson
1987-01-01
Linear regression formulae for predicting crown diameter as a function of stem diameter are presented for nine species found in 50- to 80-year-old mixed hardwood stands in north-central West Virginia. Generally, crown diameter was closely related to tolerance; more tolerant species had larger crowns.
Vajargah, Kianoush Fathi; Sadeghi-Bazargani, Homayoun; Mehdizadeh-Esfanjani, Robab; Savadi-Oskouei, Daryoush; Farhoudi, Mehdi
2012-01-01
The objective of the present study was to assess the comparable applicability of orthogonal projections to latent structures (OPLS) statistical model vs traditional linear regression in order to investigate the role of trans cranial doppler (TCD) sonography in predicting ischemic stroke prognosis. The study was conducted on 116 ischemic stroke patients admitted to a specialty neurology ward. The Unified Neurological Stroke Scale was used once for clinical evaluation on the first week of admission and again six months later. All data was primarily analyzed using simple linear regression and later considered for multivariate analysis using PLS/OPLS models through the SIMCA P+12 statistical software package. The linear regression analysis results used for the identification of TCD predictors of stroke prognosis were confirmed through the OPLS modeling technique. Moreover, in comparison to linear regression, the OPLS model appeared to have higher sensitivity in detecting the predictors of ischemic stroke prognosis and detected several more predictors. Applying the OPLS model made it possible to use both single TCD measures/indicators and arbitrarily dichotomized measures of TCD single vessel involvement as well as the overall TCD result. In conclusion, the authors recommend PLS/OPLS methods as complementary rather than alternative to the available classical regression models such as linear regression.
Convex set and linear mixing model
NASA Technical Reports Server (NTRS)
Xu, P.; Greeley, R.
1993-01-01
A major goal of optical remote sensing is to determine surface compositions of the earth and other planetary objects. For assessment of composition, single pixels in multi-spectral images usually record a mixture of the signals from various materials within the corresponding surface area. In this report, we introduce a closed and bounded convex set as a mathematical model for linear mixing. This model has a clear geometric implication because the closed and bounded convex set is a natural generalization of a triangle in n-space. The endmembers are extreme points of the convex set. Every point in the convex closure of the endmembers is a linear mixture of those endmembers, which is exactly how linear mixing is defined. With this model, some general criteria for selecting endmembers could be described. This model can lead to a better understanding of linear mixing models.
An Expert System for the Evaluation of Cost Models
1990-09-01
contrast to the condition of equal error variance, called homoscedasticity. (Reference: Applied Linear Regression Models by John Neter - page 423...normal. (Reference: Applied Linear Regression Models by John Neter - page 125) Click Here to continue -> Autocorrelation Click Here for the index - Index...over time. Error terms correlated over time are said to be autocorrelated or serially correlated. (REFERENCE: Applied Linear Regression Models by John
Schaid, Daniel J
2010-01-01
Measures of genomic similarity are the basis of many statistical analytic methods. We review the mathematical and statistical basis of similarity methods, particularly based on kernel methods. A kernel function converts information for a pair of subjects to a quantitative value representing either similarity (larger values meaning more similar) or distance (smaller values meaning more similar), with the requirement that it must create a positive semidefinite matrix when applied to all pairs of subjects. This review emphasizes the wide range of statistical methods and software that can be used when similarity is based on kernel methods, such as nonparametric regression, linear mixed models and generalized linear mixed models, hierarchical models, score statistics, and support vector machines. The mathematical rigor for these methods is summarized, as is the mathematical framework for making kernels. This review provides a framework to move from intuitive and heuristic approaches to define genomic similarities to more rigorous methods that can take advantage of powerful statistical modeling and existing software. A companion paper reviews novel approaches to creating kernels that might be useful for genomic analyses, providing insights with examples [1]. Copyright © 2010 S. Karger AG, Basel.
An overview of longitudinal data analysis methods for neurological research.
Locascio, Joseph J; Atri, Alireza
2011-01-01
The purpose of this article is to provide a concise, broad and readily accessible overview of longitudinal data analysis methods, aimed to be a practical guide for clinical investigators in neurology. In general, we advise that older, traditional methods, including (1) simple regression of the dependent variable on a time measure, (2) analyzing a single summary subject level number that indexes changes for each subject and (3) a general linear model approach with a fixed-subject effect, should be reserved for quick, simple or preliminary analyses. We advocate the general use of mixed-random and fixed-effect regression models for analyses of most longitudinal clinical studies. Under restrictive situations or to provide validation, we recommend: (1) repeated-measure analysis of covariance (ANCOVA), (2) ANCOVA for two time points, (3) generalized estimating equations and (4) latent growth curve/structural equation models.
Assessing Discriminative Performance at External Validation of Clinical Prediction Models
Nieboer, Daan; van der Ploeg, Tjeerd; Steyerberg, Ewout W.
2016-01-01
Introduction External validation studies are essential to study the generalizability of prediction models. Recently a permutation test, focusing on discrimination as quantified by the c-statistic, was proposed to judge whether a prediction model is transportable to a new setting. We aimed to evaluate this test and compare it to previously proposed procedures to judge any changes in c-statistic from development to external validation setting. Methods We compared the use of the permutation test to the use of benchmark values of the c-statistic following from a previously proposed framework to judge transportability of a prediction model. In a simulation study we developed a prediction model with logistic regression on a development set and validated them in the validation set. We concentrated on two scenarios: 1) the case-mix was more heterogeneous and predictor effects were weaker in the validation set compared to the development set, and 2) the case-mix was less heterogeneous in the validation set and predictor effects were identical in the validation and development set. Furthermore we illustrated the methods in a case study using 15 datasets of patients suffering from traumatic brain injury. Results The permutation test indicated that the validation and development set were homogenous in scenario 1 (in almost all simulated samples) and heterogeneous in scenario 2 (in 17%-39% of simulated samples). Previously proposed benchmark values of the c-statistic and the standard deviation of the linear predictors correctly pointed at the more heterogeneous case-mix in scenario 1 and the less heterogeneous case-mix in scenario 2. Conclusion The recently proposed permutation test may provide misleading results when externally validating prediction models in the presence of case-mix differences between the development and validation population. To correctly interpret the c-statistic found at external validation it is crucial to disentangle case-mix differences from incorrect regression coefficients. PMID:26881753
Assessing Discriminative Performance at External Validation of Clinical Prediction Models.
Nieboer, Daan; van der Ploeg, Tjeerd; Steyerberg, Ewout W
2016-01-01
External validation studies are essential to study the generalizability of prediction models. Recently a permutation test, focusing on discrimination as quantified by the c-statistic, was proposed to judge whether a prediction model is transportable to a new setting. We aimed to evaluate this test and compare it to previously proposed procedures to judge any changes in c-statistic from development to external validation setting. We compared the use of the permutation test to the use of benchmark values of the c-statistic following from a previously proposed framework to judge transportability of a prediction model. In a simulation study we developed a prediction model with logistic regression on a development set and validated them in the validation set. We concentrated on two scenarios: 1) the case-mix was more heterogeneous and predictor effects were weaker in the validation set compared to the development set, and 2) the case-mix was less heterogeneous in the validation set and predictor effects were identical in the validation and development set. Furthermore we illustrated the methods in a case study using 15 datasets of patients suffering from traumatic brain injury. The permutation test indicated that the validation and development set were homogenous in scenario 1 (in almost all simulated samples) and heterogeneous in scenario 2 (in 17%-39% of simulated samples). Previously proposed benchmark values of the c-statistic and the standard deviation of the linear predictors correctly pointed at the more heterogeneous case-mix in scenario 1 and the less heterogeneous case-mix in scenario 2. The recently proposed permutation test may provide misleading results when externally validating prediction models in the presence of case-mix differences between the development and validation population. To correctly interpret the c-statistic found at external validation it is crucial to disentangle case-mix differences from incorrect regression coefficients.
Statistical Methodology for the Analysis of Repeated Duration Data in Behavioral Studies.
Letué, Frédérique; Martinez, Marie-José; Samson, Adeline; Vilain, Anne; Vilain, Coriandre
2018-03-15
Repeated duration data are frequently used in behavioral studies. Classical linear or log-linear mixed models are often inadequate to analyze such data, because they usually consist of nonnegative and skew-distributed variables. Therefore, we recommend use of a statistical methodology specific to duration data. We propose a methodology based on Cox mixed models and written under the R language. This semiparametric model is indeed flexible enough to fit duration data. To compare log-linear and Cox mixed models in terms of goodness-of-fit on real data sets, we also provide a procedure based on simulations and quantile-quantile plots. We present two examples from a data set of speech and gesture interactions, which illustrate the limitations of linear and log-linear mixed models, as compared to Cox models. The linear models are not validated on our data, whereas Cox models are. Moreover, in the second example, the Cox model exhibits a significant effect that the linear model does not. We provide methods to select the best-fitting models for repeated duration data and to compare statistical methodologies. In this study, we show that Cox models are best suited to the analysis of our data set.
A Constrained Linear Estimator for Multiple Regression
ERIC Educational Resources Information Center
Davis-Stober, Clintin P.; Dana, Jason; Budescu, David V.
2010-01-01
"Improper linear models" (see Dawes, Am. Psychol. 34:571-582, "1979"), such as equal weighting, have garnered interest as alternatives to standard regression models. We analyze the general circumstances under which these models perform well by recasting a class of "improper" linear models as "proper" statistical models with a single predictor. We…
Estimating linear temporal trends from aggregated environmental monitoring data
Erickson, Richard A.; Gray, Brian R.; Eager, Eric A.
2017-01-01
Trend estimates are often used as part of environmental monitoring programs. These trends inform managers (e.g., are desired species increasing or undesired species decreasing?). Data collected from environmental monitoring programs is often aggregated (i.e., averaged), which confounds sampling and process variation. State-space models allow sampling variation and process variations to be separated. We used simulated time-series to compare linear trend estimations from three state-space models, a simple linear regression model, and an auto-regressive model. We also compared the performance of these five models to estimate trends from a long term monitoring program. We specifically estimated trends for two species of fish and four species of aquatic vegetation from the Upper Mississippi River system. We found that the simple linear regression had the best performance of all the given models because it was best able to recover parameters and had consistent numerical convergence. Conversely, the simple linear regression did the worst job estimating populations in a given year. The state-space models did not estimate trends well, but estimated population sizes best when the models converged. We found that a simple linear regression performed better than more complex autoregression and state-space models when used to analyze aggregated environmental monitoring data.
Unitary Response Regression Models
ERIC Educational Resources Information Center
Lipovetsky, S.
2007-01-01
The dependent variable in a regular linear regression is a numerical variable, and in a logistic regression it is a binary or categorical variable. In these models the dependent variable has varying values. However, there are problems yielding an identity output of a constant value which can also be modelled in a linear or logistic regression with…
Composition and structure of Pinus koraiensis mixed forest respond to spatial climatic changes.
Zhang, Jingli; Zhou, Yong; Zhou, Guangsheng; Xiao, Chunwang
2014-01-01
Although some studies have indicated that climate changes can affect Pinus koraiensis mixed forest, the responses of composition and structure of Pinus koraiensis mixed forests to climatic changes are unknown and the key climatic factors controlling the composition and structure of Pinus koraiensis mixed forest are uncertain. Field survey was conducted in the natural Pinus koraiensis mixed forests along a latitudinal gradient and an elevational gradient in Northeast China. In order to build the mathematical models for simulating the relationships of compositional and structural attributes of the Pinus koraiensis mixed forest with climatic and non-climatic factors, stepwise linear regression analyses were performed, incorporating 14 dependent variables and the linear and quadratic components of 9 factors. All the selected new models were computed under the +2°C and +10% precipitation and +4°C and +10% precipitation scenarios. The Max Temperature of Warmest Month, Mean Temperature of Warmest Quarter and Precipitation of Wettest Month were observed to be key climatic factors controlling the stand densities and total basal areas of Pinus koraiensis mixed forest. Increased summer temperatures and precipitations strongly enhanced the stand densities and total basal areas of broadleaf trees but had little effect on Pinus koraiensis under the +2°C and +10% precipitation scenario and +4°C and +10% precipitation scenario. These results show that the Max Temperature of Warmest Month, Mean Temperature of Warmest Quarter and Precipitation of Wettest Month are key climatic factors which shape the composition and structure of Pinus koraiensis mixed forest. Although the Pinus koraiensis would persist, the current forests dominated by Pinus koraiensis in the region would all shift and become broadleaf-dominated forests due to the dramatic increase of broadleaf trees under the future global warming and increased precipitation.
KMgene: a unified R package for gene-based association analysis for complex traits.
Yan, Qi; Fang, Zhou; Chen, Wei; Stegle, Oliver
2018-02-09
In this report, we introduce an R package KMgene for performing gene-based association tests for familial, multivariate or longitudinal traits using kernel machine (KM) regression under a generalized linear mixed model (GLMM) framework. Extensive simulations were performed to evaluate the validity of the approaches implemented in KMgene. http://cran.r-project.org/web/packages/KMgene. qi.yan@chp.edu or wei.chen@chp.edu. Supplementary data are available at Bioinformatics online. © The Author(s) 2018. Published by Oxford University Press.
NASA Astrophysics Data System (ADS)
Zulvia, Pepi; Kurnia, Anang; Soleh, Agus M.
2017-03-01
Individual and environment are a hierarchical structure consist of units grouped at different levels. Hierarchical data structures are analyzed based on several levels, with the lowest level nested in the highest level. This modeling is commonly call multilevel modeling. Multilevel modeling is widely used in education research, for example, the average score of National Examination (UN). While in Indonesia UN for high school student is divided into natural science and social science. The purpose of this research is to develop multilevel and panel data modeling using linear mixed model on educational data. The first step is data exploration and identification relationships between independent and dependent variable by checking correlation coefficient and variance inflation factor (VIF). Furthermore, we use a simple model approach with highest level of the hierarchy (level-2) is regency/city while school is the lowest of hierarchy (level-1). The best model was determined by comparing goodness-of-fit and checking assumption from residual plots and predictions for each model. Our finding that for natural science and social science, the regression with random effects of regency/city and fixed effects of the time i.e multilevel model has better performance than the linear mixed model in explaining the variability of the dependent variable, which is the average scores of UN.
[From clinical judgment to linear regression model.
Palacios-Cruz, Lino; Pérez, Marcela; Rivas-Ruiz, Rodolfo; Talavera, Juan O
2013-01-01
When we think about mathematical models, such as linear regression model, we think that these terms are only used by those engaged in research, a notion that is far from the truth. Legendre described the first mathematical model in 1805, and Galton introduced the formal term in 1886. Linear regression is one of the most commonly used regression models in clinical practice. It is useful to predict or show the relationship between two or more variables as long as the dependent variable is quantitative and has normal distribution. Stated in another way, the regression is used to predict a measure based on the knowledge of at least one other variable. Linear regression has as it's first objective to determine the slope or inclination of the regression line: Y = a + bx, where "a" is the intercept or regression constant and it is equivalent to "Y" value when "X" equals 0 and "b" (also called slope) indicates the increase or decrease that occurs when the variable "x" increases or decreases in one unit. In the regression line, "b" is called regression coefficient. The coefficient of determination (R 2 ) indicates the importance of independent variables in the outcome.
Advanced statistics: linear regression, part II: multiple linear regression.
Marill, Keith A
2004-01-01
The applications of simple linear regression in medical research are limited, because in most situations, there are multiple relevant predictor variables. Univariate statistical techniques such as simple linear regression use a single predictor variable, and they often may be mathematically correct but clinically misleading. Multiple linear regression is a mathematical technique used to model the relationship between multiple independent predictor variables and a single dependent outcome variable. It is used in medical research to model observational data, as well as in diagnostic and therapeutic studies in which the outcome is dependent on more than one factor. Although the technique generally is limited to data that can be expressed with a linear function, it benefits from a well-developed mathematical framework that yields unique solutions and exact confidence intervals for regression coefficients. Building on Part I of this series, this article acquaints the reader with some of the important concepts in multiple regression analysis. These include multicollinearity, interaction effects, and an expansion of the discussion of inference testing, leverage, and variable transformations to multivariate models. Examples from the first article in this series are expanded on using a primarily graphic, rather than mathematical, approach. The importance of the relationships among the predictor variables and the dependence of the multivariate model coefficients on the choice of these variables are stressed. Finally, concepts in regression model building are discussed.
Valid statistical approaches for analyzing sholl data: Mixed effects versus simple linear models.
Wilson, Machelle D; Sethi, Sunjay; Lein, Pamela J; Keil, Kimberly P
2017-03-01
The Sholl technique is widely used to quantify dendritic morphology. Data from such studies, which typically sample multiple neurons per animal, are often analyzed using simple linear models. However, simple linear models fail to account for intra-class correlation that occurs with clustered data, which can lead to faulty inferences. Mixed effects models account for intra-class correlation that occurs with clustered data; thus, these models more accurately estimate the standard deviation of the parameter estimate, which produces more accurate p-values. While mixed models are not new, their use in neuroscience has lagged behind their use in other disciplines. A review of the published literature illustrates common mistakes in analyses of Sholl data. Analysis of Sholl data collected from Golgi-stained pyramidal neurons in the hippocampus of male and female mice using both simple linear and mixed effects models demonstrates that the p-values and standard deviations obtained using the simple linear models are biased downwards and lead to erroneous rejection of the null hypothesis in some analyses. The mixed effects approach more accurately models the true variability in the data set, which leads to correct inference. Mixed effects models avoid faulty inference in Sholl analysis of data sampled from multiple neurons per animal by accounting for intra-class correlation. Given the widespread practice in neuroscience of obtaining multiple measurements per subject, there is a critical need to apply mixed effects models more widely. Copyright © 2017 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Astuti, H. N.; Saputro, D. R. S.; Susanti, Y.
2017-06-01
MGWR model is combination of linear regression model and geographically weighted regression (GWR) model, therefore, MGWR model could produce parameter estimation that had global parameter estimation, and other parameter that had local parameter in accordance with its observation location. The linkage between locations of the observations expressed in specific weighting that is adaptive bi-square. In this research, we applied MGWR model with weighted adaptive bi-square for case of DHF in Surakarta based on 10 factors (variables) that is supposed to influence the number of people with DHF. The observation unit in the research is 51 urban villages and the variables are number of inhabitants, number of houses, house index, many public places, number of healthy homes, number of Posyandu, area width, level population density, welfare of the family, and high-region. Based on this research, we obtained 51 MGWR models. The MGWR model were divided into 4 groups with significant variable is house index as a global variable, an area width as a local variable and the remaining variables vary in each. Global variables are variables that significantly affect all locations, while local variables are variables that significantly affect a specific location.
ERIC Educational Resources Information Center
Kobrin, Jennifer L.; Sinharay, Sandip; Haberman, Shelby J.; Chajewski, Michael
2011-01-01
This study examined the adequacy of a multiple linear regression model for predicting first-year college grade point average (FYGPA) using SAT[R] scores and high school grade point average (HSGPA). A variety of techniques, both graphical and statistical, were used to examine if it is possible to improve on the linear regression model. The results…
Covariate Selection for Multilevel Models with Missing Data
Marino, Miguel; Buxton, Orfeu M.; Li, Yi
2017-01-01
Missing covariate data hampers variable selection in multilevel regression settings. Current variable selection techniques for multiply-imputed data commonly address missingness in the predictors through list-wise deletion and stepwise-selection methods which are problematic. Moreover, most variable selection methods are developed for independent linear regression models and do not accommodate multilevel mixed effects regression models with incomplete covariate data. We develop a novel methodology that is able to perform covariate selection across multiply-imputed data for multilevel random effects models when missing data is present. Specifically, we propose to stack the multiply-imputed data sets from a multiple imputation procedure and to apply a group variable selection procedure through group lasso regularization to assess the overall impact of each predictor on the outcome across the imputed data sets. Simulations confirm the advantageous performance of the proposed method compared with the competing methods. We applied the method to reanalyze the Healthy Directions-Small Business cancer prevention study, which evaluated a behavioral intervention program targeting multiple risk-related behaviors in a working-class, multi-ethnic population. PMID:28239457
NASA Technical Reports Server (NTRS)
Flynn, Clare; Pickering, Kenneth E.; Crawford, James H.; Lamsol, Lok; Krotkov, Nickolay; Herman, Jay; Weinheimer, Andrew; Chen, Gao; Liu, Xiong; Szykman, James;
2014-01-01
To investigate the ability of column (or partial column) information to represent surface air quality, results of linear regression analyses between surface mixing ratio data and column abundances for O3 and NO2 are presented for the July 2011 Maryland deployment of the DISCOVER-AQ mission. Data collected by the P-3B aircraft, ground-based Pandora spectrometers, Aura/OMI satellite instrument, and simulations for July 2011 from the CMAQ air quality model during this deployment provide a large and varied data set, allowing this problem to be approached from multiple perspectives. O3 columns typically exhibited a statistically significant and high degree of correlation with surface data (R(sup 2) > 0.64) in the P- 3B data set, a moderate degree of correlation (0.16 < R(sup 2) < 0.64) in the CMAQ data set, and a low degree of correlation (R(sup 2) < 0.16) in the Pandora and OMI data sets. NO2 columns typically exhibited a low to moderate degree of correlation with surface data in each data set. The results of linear regression analyses for O3 exhibited smaller errors relative to the observations than NO2 regressions. These results suggest that O3 partial column observations from future satellite instruments with sufficient sensitivity to the lower troposphere can be meaningful for surface air quality analysis.
An Overview of Longitudinal Data Analysis Methods for Neurological Research
Locascio, Joseph J.; Atri, Alireza
2011-01-01
The purpose of this article is to provide a concise, broad and readily accessible overview of longitudinal data analysis methods, aimed to be a practical guide for clinical investigators in neurology. In general, we advise that older, traditional methods, including (1) simple regression of the dependent variable on a time measure, (2) analyzing a single summary subject level number that indexes changes for each subject and (3) a general linear model approach with a fixed-subject effect, should be reserved for quick, simple or preliminary analyses. We advocate the general use of mixed-random and fixed-effect regression models for analyses of most longitudinal clinical studies. Under restrictive situations or to provide validation, we recommend: (1) repeated-measure analysis of covariance (ANCOVA), (2) ANCOVA for two time points, (3) generalized estimating equations and (4) latent growth curve/structural equation models. PMID:22203825
Huber, Stefan; Klein, Elise; Moeller, Korbinian; Willmes, Klaus
2015-10-01
In neuropsychological research, single-cases are often compared with a small control sample. Crawford and colleagues developed inferential methods (i.e., the modified t-test) for such a research design. In the present article, we suggest an extension of the methods of Crawford and colleagues employing linear mixed models (LMM). We first show that a t-test for the significance of a dummy coded predictor variable in a linear regression is equivalent to the modified t-test of Crawford and colleagues. As an extension to this idea, we then generalized the modified t-test to repeated measures data by using LMMs to compare the performance difference in two conditions observed in a single participant to that of a small control group. The performance of LMMs regarding Type I error rates and statistical power were tested based on Monte-Carlo simulations. We found that starting with about 15-20 participants in the control sample Type I error rates were close to the nominal Type I error rate using the Satterthwaite approximation for the degrees of freedom. Moreover, statistical power was acceptable. Therefore, we conclude that LMMs can be applied successfully to statistically evaluate performance differences between a single-case and a control sample. Copyright © 2015 Elsevier Ltd. All rights reserved.
Sensitivity of Chemical Shift-Encoded Fat Quantification to Calibration of Fat MR Spectrum
Wang, Xiaoke; Hernando, Diego; Reeder, Scott B.
2015-01-01
Purpose To evaluate the impact of different fat spectral models on proton density fat-fraction (PDFF) quantification using chemical shift-encoded (CSE) MRI. Material and Methods Simulations and in vivo imaging were performed. In a simulation study, spectral models of fat were compared pairwise. Comparison of magnitude fitting and mixed fitting was performed over a range of echo times and fat fractions. In vivo acquisitions from 41 patients were reconstructed using 7 published spectral models of fat. T2-corrected STEAM-MRS was used as reference. Results Simulations demonstrate that imperfectly calibrated spectral models of fat result in biases that depend on echo times and fat fraction. Mixed fitting is more robust against this bias than magnitude fitting. Multi-peak spectral models showed much smaller differences among themselves than when compared to the single-peak spectral model. In vivo studies show all multi-peak models agree better (for mixed fitting, slope ranged from 0.967–1.045 using linear regression) with reference standard than the single-peak model (for mixed fitting, slope=0.76). Conclusion It is essential to use a multi-peak fat model for accurate quantification of fat with CSE-MRI. Further, fat quantification techniques using multi-peak fat models are comparable and no specific choice of spectral model is shown to be superior to the rest. PMID:25845713
Generalized Multilevel Structural Equation Modeling
ERIC Educational Resources Information Center
Rabe-Hesketh, Sophia; Skrondal, Anders; Pickles, Andrew
2004-01-01
A unifying framework for generalized multilevel structural equation modeling is introduced. The models in the framework, called generalized linear latent and mixed models (GLLAMM), combine features of generalized linear mixed models (GLMM) and structural equation models (SEM) and consist of a response model and a structural model for the latent…
Shirk, Andrew J; Landguth, Erin L; Cushman, Samuel A
2018-01-01
Anthropogenic migration barriers fragment many populations and limit the ability of species to respond to climate-induced biome shifts. Conservation actions designed to conserve habitat connectivity and mitigate barriers are needed to unite fragmented populations into larger, more viable metapopulations, and to allow species to track their climate envelope over time. Landscape genetic analysis provides an empirical means to infer landscape factors influencing gene flow and thereby inform such conservation actions. However, there are currently many methods available for model selection in landscape genetics, and considerable uncertainty as to which provide the greatest accuracy in identifying the true landscape model influencing gene flow among competing alternative hypotheses. In this study, we used population genetic simulations to evaluate the performance of seven regression-based model selection methods on a broad array of landscapes that varied by the number and type of variables contributing to resistance, the magnitude and cohesion of resistance, as well as the functional relationship between variables and resistance. We also assessed the effect of transformations designed to linearize the relationship between genetic and landscape distances. We found that linear mixed effects models had the highest accuracy in every way we evaluated model performance; however, other methods also performed well in many circumstances, particularly when landscape resistance was high and the correlation among competing hypotheses was limited. Our results provide guidance for which regression-based model selection methods provide the most accurate inferences in landscape genetic analysis and thereby best inform connectivity conservation actions. Published 2017. This article is a U.S. Government work and is in the public domain in the USA.
Comparing The Effectiveness of a90/95 Calculations (Preprint)
2006-09-01
Nachtsheim, John Neter, William Li, Applied Linear Statistical Models , 5th ed., McGraw-Hill/Irwin, 2005 5. Mood, Graybill and Boes, Introduction...curves is based on methods that are only valid for ordinary linear regression. Requirements for a valid Ordinary Least-Squares Regression Model There... linear . For example is a linear model ; is not. 2. Uniform variance (homoscedasticity
NASA Astrophysics Data System (ADS)
Zhao, Wei; Fan, Shaojia; Guo, Hai; Gao, Bo; Sun, Jiaren; Chen, Laiguo
2016-11-01
The quantile regression (QR) method has been increasingly introduced to atmospheric environmental studies to explore the non-linear relationship between local meteorological conditions and ozone mixing ratios. In this study, we applied QR for the first time, together with multiple linear regression (MLR), to analyze the dominant meteorological parameters influencing the mean, 10th percentile, 90th percentile and 99th percentile of maximum daily 8-h average (MDA8) ozone concentrations in 2000-2015 in Hong Kong. The dominance analysis (DA) was used to assess the relative importance of meteorological variables in the regression models. Results showed that the MLR models worked better at suburban and rural sites than at urban sites, and worked better in winter than in summer. QR models performed better in summer for 99th and 90th percentiles and performed better in autumn and winter for 10th percentile. And QR models also performed better in suburban and rural areas for 10th percentile. The top 3 dominant variables associated with MDA8 ozone concentrations, changing with seasons and regions, were frequently associated with the six meteorological parameters: boundary layer height, humidity, wind direction, surface solar radiation, total cloud cover and sea level pressure. Temperature rarely became a significant variable in any season, which could partly explain the peak of monthly average ozone concentrations in October in Hong Kong. And we found the effect of solar radiation would be enhanced during extremely ozone pollution episodes (i.e., the 99th percentile). Finally, meteorological effects on MDA8 ozone had no significant changes before and after the 2010 Asian Games.
2015-07-15
Long-term effects on cancer survivors’ quality of life of physical training versus physical training combined with cognitive-behavioral therapy ...COMPARISON OF NEURAL NETWORK AND LINEAR REGRESSION MODELS IN STATISTICALLY PREDICTING MENTAL AND PHYSICAL HEALTH STATUS OF BREAST...34Comparison of Neural Network and Linear Regression Models in Statistically Predicting Mental and Physical Health Status of Breast Cancer Survivors
ERIC Educational Resources Information Center
Rocconi, Louis M.
2011-01-01
Hierarchical linear models (HLM) solve the problems associated with the unit of analysis problem such as misestimated standard errors, heterogeneity of regression and aggregation bias by modeling all levels of interest simultaneously. Hierarchical linear modeling resolves the problem of misestimated standard errors by incorporating a unique random…
Haem, Elham; Harling, Kajsa; Ayatollahi, Seyyed Mohammad Taghi; Zare, Najaf; Karlsson, Mats O
2017-02-01
One important aim in population pharmacokinetics (PK) and pharmacodynamics is identification and quantification of the relationships between the parameters and covariates. Lasso has been suggested as a technique for simultaneous estimation and covariate selection. In linear regression, it has been shown that Lasso possesses no oracle properties, which means it asymptotically performs as though the true underlying model was given in advance. Adaptive Lasso (ALasso) with appropriate initial weights is claimed to possess oracle properties; however, it can lead to poor predictive performance when there is multicollinearity between covariates. This simulation study implemented a new version of ALasso, called adjusted ALasso (AALasso), to take into account the ratio of the standard error of the maximum likelihood (ML) estimator to the ML coefficient as the initial weight in ALasso to deal with multicollinearity in non-linear mixed-effect models. The performance of AALasso was compared with that of ALasso and Lasso. PK data was simulated in four set-ups from a one-compartment bolus input model. Covariates were created by sampling from a multivariate standard normal distribution with no, low (0.2), moderate (0.5) or high (0.7) correlation. The true covariates influenced only clearance at different magnitudes. AALasso, ALasso and Lasso were compared in terms of mean absolute prediction error and error of the estimated covariate coefficient. The results show that AALasso performed better in small data sets, even in those in which a high correlation existed between covariates. This makes AALasso a promising method for covariate selection in nonlinear mixed-effect models.
Omedo, Irene; Mogeni, Polycarp; Bousema, Teun; Rockett, Kirk; Amambua-Ngwa, Alfred; Oyier, Isabella; C. Stevenson, Jennifer; Y. Baidjoe, Amrish; de Villiers, Etienne P.; Fegan, Greg; Ross, Amanda; Hubbart, Christina; Jeffreys, Anne; N. Williams, Thomas; Kwiatkowski, Dominic; Bejon, Philip
2017-01-01
Background: The first models of malaria transmission assumed a completely mixed and homogeneous population of parasites. Recent models include spatial heterogeneity and variably mixed populations. However, there are few empiric estimates of parasite mixing with which to parametize such models. Methods: Here we genotype 276 single nucleotide polymorphisms (SNPs) in 5199 P. falciparum isolates from two Kenyan sites (Kilifi county and Rachuonyo South district) and one Gambian site (Kombo coastal districts) to determine the spatio-temporal extent of parasite mixing, and use Principal Component Analysis (PCA) and linear regression to examine the relationship between genetic relatedness and distance in space and time for parasite pairs. Results: Using 107, 177 and 82 SNPs that were successfully genotyped in 133, 1602, and 1034 parasite isolates from The Gambia, Kilifi and Rachuonyo South district, respectively, we show that there are no discrete geographically restricted parasite sub-populations, but instead we see a diffuse spatio-temporal structure to parasite genotypes. Genetic relatedness of sample pairs is predicted by relatedness in space and time. Conclusions: Our findings suggest that targeted malaria control will benefit the surrounding community, but unfortunately also that emerging drug resistance will spread rapidly through the population. PMID:28612053
The effects of climate change on harp seals (Pagophilus groenlandicus).
Johnston, David W; Bowers, Matthew T; Friedlaender, Ari S; Lavigne, David M
2012-01-01
Harp seals (Pagophilus groenlandicus) have evolved life history strategies to exploit seasonal sea ice as a breeding platform. As such, individuals are prepared to deal with fluctuations in the quantity and quality of ice in their breeding areas. It remains unclear, however, how shifts in climate may affect seal populations. The present study assesses the effects of climate change on harp seals through three linked analyses. First, we tested the effects of short-term climate variability on young-of-the year harp seal mortality using a linear regression of sea ice cover in the Gulf of St. Lawrence against stranding rates of dead harp seals in the region during 1992 to 2010. A similar regression of stranding rates and North Atlantic Oscillation (NAO) index values was also conducted. These analyses revealed negative correlations between both ice cover and NAO conditions and seal mortality, indicating that lighter ice cover and lower NAO values result in higher mortality. A retrospective cross-correlation analysis of NAO conditions and sea ice cover from 1978 to 2011 revealed that NAO-related changes in sea ice may have contributed to the depletion of seals on the east coast of Canada during 1950 to 1972, and to their recovery during 1973 to 2000. This historical retrospective also reveals opposite links between neonatal mortality in harp seals in the Northeast Atlantic and NAO phase. Finally, an assessment of the long-term trends in sea ice cover in the breeding regions of harp seals across the entire North Atlantic during 1979 through 2011 using multiple linear regression models and mixed effects linear regression models revealed that sea ice cover in all harp seal breeding regions has been declining by as much as 6 percent per decade over the time series of available satellite data.
The Effects of Climate Change on Harp Seals (Pagophilus groenlandicus)
Johnston, David W.; Bowers, Matthew T.; Friedlaender, Ari S.; Lavigne, David M.
2012-01-01
Harp seals (Pagophilus groenlandicus) have evolved life history strategies to exploit seasonal sea ice as a breeding platform. As such, individuals are prepared to deal with fluctuations in the quantity and quality of ice in their breeding areas. It remains unclear, however, how shifts in climate may affect seal populations. The present study assesses the effects of climate change on harp seals through three linked analyses. First, we tested the effects of short-term climate variability on young-of-the year harp seal mortality using a linear regression of sea ice cover in the Gulf of St. Lawrence against stranding rates of dead harp seals in the region during 1992 to 2010. A similar regression of stranding rates and North Atlantic Oscillation (NAO) index values was also conducted. These analyses revealed negative correlations between both ice cover and NAO conditions and seal mortality, indicating that lighter ice cover and lower NAO values result in higher mortality. A retrospective cross-correlation analysis of NAO conditions and sea ice cover from 1978 to 2011 revealed that NAO-related changes in sea ice may have contributed to the depletion of seals on the east coast of Canada during 1950 to 1972, and to their recovery during 1973 to 2000. This historical retrospective also reveals opposite links between neonatal mortality in harp seals in the Northeast Atlantic and NAO phase. Finally, an assessment of the long-term trends in sea ice cover in the breeding regions of harp seals across the entire North Atlantic during 1979 through 2011 using multiple linear regression models and mixed effects linear regression models revealed that sea ice cover in all harp seal breeding regions has been declining by as much as 6 percent per decade over the time series of available satellite data. PMID:22238591
A SEMIPARAMETRIC BAYESIAN MODEL FOR CIRCULAR-LINEAR REGRESSION
We present a Bayesian approach to regress a circular variable on a linear predictor. The regression coefficients are assumed to have a nonparametric distribution with a Dirichlet process prior. The semiparametric Bayesian approach gives added flexibility to the model and is usefu...
Watanabe, Hiroyuki; Miyazaki, Hiroyasu
2006-01-01
Over- and/or under-correction of QT intervals for changes in heart rate may lead to misleading conclusions and/or masking the potential of a drug to prolong the QT interval. This study examines a nonparametric regression model (Loess Smoother) to adjust the QT interval for differences in heart rate, with an improved fitness over a wide range of heart rates. 240 sets of (QT, RR) observations collected from each of 8 conscious and non-treated beagle dogs were used as the materials for investigation. The fitness of the nonparametric regression model to the QT-RR relationship was compared with four models (individual linear regression, common linear regression, and Bazett's and Fridericia's correlation models) with reference to Akaike's Information Criterion (AIC). Residuals were visually assessed. The bias-corrected AIC of the nonparametric regression model was the best of the models examined in this study. Although the parametric models did not fit, the nonparametric regression model improved the fitting at both fast and slow heart rates. The nonparametric regression model is the more flexible method compared with the parametric method. The mathematical fit for linear regression models was unsatisfactory at both fast and slow heart rates, while the nonparametric regression model showed significant improvement at all heart rates in beagle dogs.
Anderson, Carl A; McRae, Allan F; Visscher, Peter M
2006-07-01
Standard quantitative trait loci (QTL) mapping techniques commonly assume that the trait is both fully observed and normally distributed. When considering survival or age-at-onset traits these assumptions are often incorrect. Methods have been developed to map QTL for survival traits; however, they are both computationally intensive and not available in standard genome analysis software packages. We propose a grouped linear regression method for the analysis of continuous survival data. Using simulation we compare this method to both the Cox and Weibull proportional hazards models and a standard linear regression method that ignores censoring. The grouped linear regression method is of equivalent power to both the Cox and Weibull proportional hazards methods and is significantly better than the standard linear regression method when censored observations are present. The method is also robust to the proportion of censored individuals and the underlying distribution of the trait. On the basis of linear regression methodology, the grouped linear regression model is computationally simple and fast and can be implemented readily in freely available statistical software.
Rasmussen, Patrick P.; Gray, John R.; Glysson, G. Douglas; Ziegler, Andrew C.
2009-01-01
In-stream continuous turbidity and streamflow data, calibrated with measured suspended-sediment concentration data, can be used to compute a time series of suspended-sediment concentration and load at a stream site. Development of a simple linear (ordinary least squares) regression model for computing suspended-sediment concentrations from instantaneous turbidity data is the first step in the computation process. If the model standard percentage error (MSPE) of the simple linear regression model meets a minimum criterion, this model should be used to compute a time series of suspended-sediment concentrations. Otherwise, a multiple linear regression model using paired instantaneous turbidity and streamflow data is developed and compared to the simple regression model. If the inclusion of the streamflow variable proves to be statistically significant and the uncertainty associated with the multiple regression model results in an improvement over that for the simple linear model, the turbidity-streamflow multiple linear regression model should be used to compute a suspended-sediment concentration time series. The computed concentration time series is subsequently used with its paired streamflow time series to compute suspended-sediment loads by standard U.S. Geological Survey techniques. Once an acceptable regression model is developed, it can be used to compute suspended-sediment concentration beyond the period of record used in model development with proper ongoing collection and analysis of calibration samples. Regression models to compute suspended-sediment concentrations are generally site specific and should never be considered static, but they represent a set period in a continually dynamic system in which additional data will help verify any change in sediment load, type, and source.
Ulibarri, Monica D; Hiller, Sarah P; Lozada, Remedios; Rangel, M Gudelia; Stockman, Jamila K; Silverman, Jay G; Ojeda, Victoria D
2013-01-01
This mixed methods study examined the prevalence and characteristics of physical and sexual abuse and depression symptoms among 624 injection drug-using female sex workers (FSW-IDUs) in Tijuana and Ciudad Juarez, Mexico; a subset of 47 from Tijuana also underwent qualitative interviews. Linear regressions identified correlates of current depression symptoms. In the interviews, FSW-IDUs identified drug use as a method of coping with the trauma they experienced from abuse that occurred before and after age 18 and during the course of sex work. In a multivariate linear regression model, two factors-ever experiencing forced sex and forced sex in the context of sex work-were significantly associated with higher levels of depression symptoms. Our findings suggest the need for integrated mental health and drug abuse services for FSW-IDUs addressing history of trauma as well as for further research on violence revictimization in the context of sex work in Mexico.
Ulibarri, Monica D.; Hiller, Sarah P.; Lozada, Remedios; Rangel, M. Gudelia; Stockman, Jamila K.; Silverman, Jay G.; Ojeda, Victoria D.
2013-01-01
This mixed methods study examined the prevalence and characteristics of physical and sexual abuse and depression symptoms among 624 injection drug-using female sex workers (FSW-IDUs) in Tijuana and Ciudad Juarez, Mexico; a subset of 47 from Tijuana also underwent qualitative interviews. Linear regressions identified correlates of current depression symptoms. In the interviews, FSW-IDUs identified drug use as a method of coping with the trauma they experienced from abuse that occurred before and after age 18 and during the course of sex work. In a multivariate linear regression model, two factors—ever experiencing forced sex and forced sex in the context of sex work—were significantly associated with higher levels of depression symptoms. Our findings suggest the need for integrated mental health and drug abuse services for FSW-IDUs addressing history of trauma as well as for further research on violence revictimization in the context of sex work in Mexico. PMID:23737808
Nikoloulopoulos, Aristidis K
2017-10-01
A bivariate copula mixed model has been recently proposed to synthesize diagnostic test accuracy studies and it has been shown that it is superior to the standard generalized linear mixed model in this context. Here, we call trivariate vine copulas to extend the bivariate meta-analysis of diagnostic test accuracy studies by accounting for disease prevalence. Our vine copula mixed model includes the trivariate generalized linear mixed model as a special case and can also operate on the original scale of sensitivity, specificity, and disease prevalence. Our general methodology is illustrated by re-analyzing the data of two published meta-analyses. Our study suggests that there can be an improvement on trivariate generalized linear mixed model in fit to data and makes the argument for moving to vine copula random effects models especially because of their richness, including reflection asymmetric tail dependence, and computational feasibility despite their three dimensionality.
Yoneoka, Daisuke; Henmi, Masayuki
2017-11-30
Recently, the number of clinical prediction models sharing the same regression task has increased in the medical literature. However, evidence synthesis methodologies that use the results of these regression models have not been sufficiently studied, particularly in meta-analysis settings where only regression coefficients are available. One of the difficulties lies in the differences between the categorization schemes of continuous covariates across different studies. In general, categorization methods using cutoff values are study specific across available models, even if they focus on the same covariates of interest. Differences in the categorization of covariates could lead to serious bias in the estimated regression coefficients and thus in subsequent syntheses. To tackle this issue, we developed synthesis methods for linear regression models with different categorization schemes of covariates. A 2-step approach to aggregate the regression coefficient estimates is proposed. The first step is to estimate the joint distribution of covariates by introducing a latent sampling distribution, which uses one set of individual participant data to estimate the marginal distribution of covariates with categorization. The second step is to use a nonlinear mixed-effects model with correction terms for the bias due to categorization to estimate the overall regression coefficients. Especially in terms of precision, numerical simulations show that our approach outperforms conventional methods, which only use studies with common covariates or ignore the differences between categorization schemes. The method developed in this study is also applied to a series of WHO epidemiologic studies on white blood cell counts. Copyright © 2017 John Wiley & Sons, Ltd.
NASA Astrophysics Data System (ADS)
Wang, Fei; Zhang, Yijun; Zheng, Dong; Xu, Liangtao; Zhang, Wenjuan; Meng, Qing
2017-10-01
A three-dimensional charge-discharge numerical model is used, in a semi-idealized mode, to simulate a thunder-storm cell. Characteristics of the graupel microphysics and vertical air motion associated with the lightning initiation are revealed, which could be useful in retrieving charge strength during lightning when no charge-discharge model is available. The results show that the vertical air motion at the lightning initiation sites ( W ini) has a cubic polynomial correlation with the maximum updraft of the storm cell ( W cell-max), with the adjusted regression coefficient R 2 of approximately 0.97. Meanwhile, the graupel mixing ratio at the lightning initiation sites ( q g-ini) has a linear correlation with the maximum graupel mixing ratio of the storm cell ( q g-cell-max) and the initiation height ( z ini), with the coefficients being 0.86 and 0.85, respectively. These linear correlations are more significant during the middle and late stages of lightning activity. A zero-charge zone, namely, the area with very low net charge density between the main positive and negative charge layers, appears above the area of q g-cell-max and below the upper edge of the graupel region, and is found to be an important area for lightning initiation. Inside the zero-charge zone, large electric intensity forms, and the ratio of q ice (ice crystal mixing ratio) to q g (graupel mixing ratio) illustrates an exponential relationship to q g-ini. These relationships provide valuable clues to more accurately locating the high-risk area of lightning initiation in thunderstorms when only dual-polarization radar data or outputs from numerical models without charging/discharging schemes are available. The results can also help understand the environmental conditions at lightning initiation sites.
Srinivas, Nuggehally R
2016-01-01
Although an optimized delivery of rivastigmine for management of Alzhiemer disease (AD) is provided by the transdermal patch, it is critical to establish a limited sampling strategy for the measurement of exposure of rivastigmine/NAP226-90. The relationship Cmax versus AUC0-24h for rivastigmine/NAP226-90 was established by regression models. The derived regression equations enabled the prediction AUC0-24h for rivastigmine and NAP226-90. Models were evaluated using statistical criteria. Mixed model was used to predict AUC0-24h for rivastigmine/NAP226-90 from time points such as 8 (C8h), 12 (C12h), and 18 (C18h) hours. Excellent correlation was established for between Cmax and AUC0-24h for rivastigmine and NAP226-90. AUC0-24h predictions of either rivastigmine or NAP226-90 were within 0.8- to 1.25-fold difference. The RMSE in the AUC0-24h predictions ranged from 17.6% to 21.9%, and the R for prediction were greater than 0.96 for both rivastigmine and NAP226-90. Use of mixed model for C8h, C12h, and C18h resulted in AUC0-24h within 1.5-fold difference for rivastigmine or NAP226-90. Cmax of rivastigmine and NAP226-90 was highly correlated with the corresponding AUC0-24h values confirming the role of a time point closer to Cmax for an effective AUC measurement of rivastigmine or the metabolite.
New methodology for modeling annual-aircraft emissions at airports
DOE Office of Scientific and Technical Information (OSTI.GOV)
Woodmansey, B.G.; Patterson, J.G.
An as-accurate-as-possible estimation of total-aircraft emissions are an essential component of any environmental-impact assessment done for proposed expansions at major airports. To determine the amount of emissions generated by aircraft using present models it is necessary to know the emission characteristics of all engines that are on all planes using the airport. However, the published data base does not cover all engine types and, therefore, a new methodology is needed to assist in estimating annual emissions from aircraft at airports. Linear regression equations relating quantity of emissions to aircraft weight using a known-fleet mix are developed in this paper. Total-annualmore » emissions for CO, NO[sub x], NMHC, SO[sub x], CO[sub 2], and N[sub 2]O are tabulated for Toronto's international airport for 1990. The regression equations are statistically significant for all emissions except for NMHC from large jets and NO[sub x] and NMHC for piston-engine aircraft. This regression model is a relatively simple, fast, and inexpensive method of obtaining an annual-emission inventory for an airport.« less
NASA Astrophysics Data System (ADS)
Mahaboob, B.; Venkateswarlu, B.; Sankar, J. Ravi; Balasiddamuni, P.
2017-11-01
This paper uses matrix calculus techniques to obtain Nonlinear Least Squares Estimator (NLSE), Maximum Likelihood Estimator (MLE) and Linear Pseudo model for nonlinear regression model. David Pollard and Peter Radchenko [1] explained analytic techniques to compute the NLSE. However the present research paper introduces an innovative method to compute the NLSE using principles in multivariate calculus. This study is concerned with very new optimization techniques used to compute MLE and NLSE. Anh [2] derived NLSE and MLE of a heteroscedatistic regression model. Lemcoff [3] discussed a procedure to get linear pseudo model for nonlinear regression model. In this research article a new technique is developed to get the linear pseudo model for nonlinear regression model using multivariate calculus. The linear pseudo model of Edmond Malinvaud [4] has been explained in a very different way in this paper. David Pollard et.al used empirical process techniques to study the asymptotic of the LSE (Least-squares estimation) for the fitting of nonlinear regression function in 2006. In Jae Myung [13] provided a go conceptual for Maximum likelihood estimation in his work “Tutorial on maximum likelihood estimation
ERIC Educational Resources Information Center
Pliszka, Steven R.; Matthews, Thomas L.; Braslow, Kenneth J.; Watson, Melissa A.
2006-01-01
Objective: To determine whether methylphenidate (MPH) and mixed salts amphetamine (MSA) have different effects on growth in children with attention-deficit/hyperactivity disorder. Method: Patients treated for at least 1 year with MPH or MSA were identified. A linear regression was performed to determine the effect of stimulant type, patient…
Functional Additive Mixed Models
Scheipl, Fabian; Staicu, Ana-Maria; Greven, Sonja
2014-01-01
We propose an extensive framework for additive regression models for correlated functional responses, allowing for multiple partially nested or crossed functional random effects with flexible correlation structures for, e.g., spatial, temporal, or longitudinal functional data. Additionally, our framework includes linear and nonlinear effects of functional and scalar covariates that may vary smoothly over the index of the functional response. It accommodates densely or sparsely observed functional responses and predictors which may be observed with additional error and includes both spline-based and functional principal component-based terms. Estimation and inference in this framework is based on standard additive mixed models, allowing us to take advantage of established methods and robust, flexible algorithms. We provide easy-to-use open source software in the pffr() function for the R-package refund. Simulations show that the proposed method recovers relevant effects reliably, handles small sample sizes well and also scales to larger data sets. Applications with spatially and longitudinally observed functional data demonstrate the flexibility in modeling and interpretability of results of our approach. PMID:26347592
Functional Additive Mixed Models.
Scheipl, Fabian; Staicu, Ana-Maria; Greven, Sonja
2015-04-01
We propose an extensive framework for additive regression models for correlated functional responses, allowing for multiple partially nested or crossed functional random effects with flexible correlation structures for, e.g., spatial, temporal, or longitudinal functional data. Additionally, our framework includes linear and nonlinear effects of functional and scalar covariates that may vary smoothly over the index of the functional response. It accommodates densely or sparsely observed functional responses and predictors which may be observed with additional error and includes both spline-based and functional principal component-based terms. Estimation and inference in this framework is based on standard additive mixed models, allowing us to take advantage of established methods and robust, flexible algorithms. We provide easy-to-use open source software in the pffr() function for the R-package refund. Simulations show that the proposed method recovers relevant effects reliably, handles small sample sizes well and also scales to larger data sets. Applications with spatially and longitudinally observed functional data demonstrate the flexibility in modeling and interpretability of results of our approach.
ERIC Educational Resources Information Center
Preacher, Kristopher J.; Curran, Patrick J.; Bauer, Daniel J.
2006-01-01
Simple slopes, regions of significance, and confidence bands are commonly used to evaluate interactions in multiple linear regression (MLR) models, and the use of these techniques has recently been extended to multilevel or hierarchical linear modeling (HLM) and latent curve analysis (LCA). However, conducting these tests and plotting the…
[Application of artificial neural networks on the prediction of surface ozone concentrations].
Shen, Lu-Lu; Wang, Yu-Xuan; Duan, Lei
2011-08-01
Ozone is an important secondary air pollutant in the lower atmosphere. In order to predict the hourly maximum ozone one day in advance based on the meteorological variables for the Wanqingsha site in Guangzhou, Guangdong province, a neural network model (Multi-Layer Perceptron) and a multiple linear regression model were used and compared. Model inputs are meteorological parameters (wind speed, wind direction, air temperature, relative humidity, barometric pressure and solar radiation) of the next day and hourly maximum ozone concentration of the previous day. The OBS (optimal brain surgeon) was adopted to prune the neutral work, to reduce its complexity and to improve its generalization ability. We find that the pruned neural network has the capacity to predict the peak ozone, with an agreement index of 92.3%, the root mean square error of 0.0428 mg/m3, the R-square of 0.737 and the success index of threshold exceedance 77.0% (the threshold O3 mixing ratio of 0.20 mg/m3). When the neural classifier was added to the neural network model, the success index of threshold exceedance increased to 83.6%. Through comparison of the performance indices between the multiple linear regression model and the neural network model, we conclud that that neural network is a better choice to predict peak ozone from meteorological forecast, which may be applied to practical prediction of ozone concentration.
Hanks, Ephraim M.; Schliep, Erin M.; Hooten, Mevin B.; Hoeting, Jennifer A.
2015-01-01
In spatial generalized linear mixed models (SGLMMs), covariates that are spatially smooth are often collinear with spatially smooth random effects. This phenomenon is known as spatial confounding and has been studied primarily in the case where the spatial support of the process being studied is discrete (e.g., areal spatial data). In this case, the most common approach suggested is restricted spatial regression (RSR) in which the spatial random effects are constrained to be orthogonal to the fixed effects. We consider spatial confounding and RSR in the geostatistical (continuous spatial support) setting. We show that RSR provides computational benefits relative to the confounded SGLMM, but that Bayesian credible intervals under RSR can be inappropriately narrow under model misspecification. We propose a posterior predictive approach to alleviating this potential problem and discuss the appropriateness of RSR in a variety of situations. We illustrate RSR and SGLMM approaches through simulation studies and an analysis of malaria frequencies in The Gambia, Africa.
ERIC Educational Resources Information Center
Li, Deping; Oranje, Andreas
2007-01-01
Two versions of a general method for approximating standard error of regression effect estimates within an IRT-based latent regression model are compared. The general method is based on Binder's (1983) approach, accounting for complex samples and finite populations by Taylor series linearization. In contrast, the current National Assessment of…
Case-mix groups for VA hospital-based home care.
Smith, M E; Baker, C R; Branch, L G; Walls, R C; Grimes, R M; Karklins, J M; Kashner, M; Burrage, R; Parks, A; Rogers, P
1992-01-01
The purpose of this study is to group hospital-based home care (HBHC) patients homogeneously by their characteristics with respect to cost of care to develop alternative case mix methods for management and reimbursement (allocation) purposes. Six Veterans Affairs (VA) HBHC programs in Fiscal Year (FY) 1986 that maximized patient, program, and regional variation were selected, all of which agreed to participate. All HBHC patients active in each program on October 1, 1987, in addition to all new admissions through September 30, 1988 (FY88), comprised the sample of 874 unique patients. Statistical methods include the use of classification and regression trees (CART software: Statistical Software; Lafayette, CA), analysis of variance, and multiple linear regression techniques. The resulting algorithm is a three-factor model that explains 20% of the cost variance (R2 = 20%, with a cross validation R2 of 12%). Similar classifications such as the RUG-II, which is utilized for VA nursing home and intermediate care, the VA outpatient resource allocation model, and the RUG-HHC, utilized in some states for reimbursing home health care in the private sector, explained less of the cost variance and, therefore, are less adequate for VA home care resource allocation.
Non-Linear Concentration-Response Relationships between Ambient Ozone and Daily Mortality.
Bae, Sanghyuk; Lim, Youn-Hee; Kashima, Saori; Yorifuji, Takashi; Honda, Yasushi; Kim, Ho; Hong, Yun-Chul
2015-01-01
Ambient ozone (O3) concentration has been reported to be significantly associated with mortality. However, linearity of the relationships and the presence of a threshold has been controversial. The aim of the present study was to examine the concentration-response relationship and threshold of the association between ambient O3 concentration and non-accidental mortality in 13 Japanese and Korean cities from 2000 to 2009. We selected Japanese and Korean cities which have population of over 1 million. We constructed Poisson regression models adjusting daily mean temperature, daily mean PM10, humidity, time trend, season, year, day of the week, holidays and yearly population. The association between O3 concentration and mortality was examined using linear, spline and linear-threshold models. The thresholds were estimated for each city, by constructing linear-threshold models. We also examined the city-combined association using a generalized additive mixed model. The mean O3 concentration did not differ greatly between Korea and Japan, which were 26.2 ppb and 24.2 ppb, respectively. Seven out of 13 cities showed better fits for the spline model compared with the linear model, supporting a non-linear relationships between O3 concentration and mortality. All of the 7 cities showed J or U shaped associations suggesting the existence of thresholds. The range of city-specific thresholds was from 11 to 34 ppb. The city-combined analysis also showed a non-linear association with a threshold around 30-40 ppb. We have observed non-linear concentration-response relationship with thresholds between daily mean ambient O3 concentration and daily number of non-accidental death in Japanese and Korean cities.
Model Selection with the Linear Mixed Model for Longitudinal Data
ERIC Educational Resources Information Center
Ryoo, Ji Hoon
2011-01-01
Model building or model selection with linear mixed models (LMMs) is complicated by the presence of both fixed effects and random effects. The fixed effects structure and random effects structure are codependent, so selection of one influences the other. Most presentations of LMM in psychology and education are based on a multilevel or…
Regression Model Term Selection for the Analysis of Strain-Gage Balance Calibration Data
NASA Technical Reports Server (NTRS)
Ulbrich, Norbert Manfred; Volden, Thomas R.
2010-01-01
The paper discusses the selection of regression model terms for the analysis of wind tunnel strain-gage balance calibration data. Different function class combinations are presented that may be used to analyze calibration data using either a non-iterative or an iterative method. The role of the intercept term in a regression model of calibration data is reviewed. In addition, useful algorithms and metrics originating from linear algebra and statistics are recommended that will help an analyst (i) to identify and avoid both linear and near-linear dependencies between regression model terms and (ii) to make sure that the selected regression model of the calibration data uses only statistically significant terms. Three different tests are suggested that may be used to objectively assess the predictive capability of the final regression model of the calibration data. These tests use both the original data points and regression model independent confirmation points. Finally, data from a simplified manual calibration of the Ames MK40 balance is used to illustrate the application of some of the metrics and tests to a realistic calibration data set.
MULTIVARIATE LINEAR MIXED MODELS FOR MULTIPLE OUTCOMES. (R824757)
We propose a multivariate linear mixed (MLMM) for the analysis of multiple outcomes, which generalizes the latent variable model of Sammel and Ryan. The proposed model assumes a flexible correlation structure among the multiple outcomes, and allows a global test of the impact of ...
Ice cream structural elements that affect melting rate and hardness.
Muse, M R; Hartel, R W
2004-01-01
Statistical models were developed to reveal which structural elements of ice cream affect melting rate and hardness. Ice creams were frozen in a batch freezer with three types of sweetener, three levels of the emulsifier polysorbate 80, and two different draw temperatures to produce ice creams with a range of microstructures. Ice cream mixes were analyzed for viscosity, and finished ice creams were analyzed for air cell and ice crystal size, overrun, and fat destabilization. The ice phase volume of each ice cream were calculated based on the freezing point of the mix. Melting rate and hardness of each hardened ice cream was measured and correlated with the structural attributes by using analysis of variance and multiple linear regression. Fat destabilization, ice crystal size, and the consistency coefficient of the mix were found to affect the melting rate of ice cream, whereas hardness was influenced by ice phase volume, ice crystal size, overrun, fat destabilization, and the rheological properties of the mix.
Validation of ACG Case-mix for equitable resource allocation in Swedish primary health care.
Zielinski, Andrzej; Kronogård, Maria; Lenhoff, Håkan; Halling, Anders
2009-09-18
Adequate resource allocation is an important factor to ensure equity in health care. Previous reimbursement models have been based on age, gender and socioeconomic factors. An explanatory model based on individual need of primary health care (PHC) has not yet been used in Sweden to allocate resources. The aim of this study was to examine to what extent the ACG case-mix system could explain concurrent costs in Swedish PHC. Diagnoses were obtained from electronic PHC records of inhabitants in Blekinge County (approx. 150,000) listed with public PHC (approx. 120,000) for three consecutive years, 2004-2006. The inhabitants were then classified into six different resource utilization bands (RUB) using the ACG case-mix system. The mean costs for primary health care were calculated for each RUB and year. Using linear regression models and log-cost as dependent variable the adjusted R2 was calculated in the unadjusted model (gender) and in consecutive models where age, listing with specific PHC and RUB were added. In an additional model the ACG groups were added. Gender, age and listing with specific PHC explained 14.48-14.88% of the variance in individual costs for PHC. By also adding information on level of co-morbidity, as measured by the ACG case-mix system, to specific PHC the adjusted R2 increased to 60.89-63.41%. The ACG case-mix system explains patient costs in primary care to a high degree. Age and gender are important explanatory factors, but most of the variance in concurrent patient costs was explained by the ACG case-mix system.
Equilibrium Climate Sensitivity Obtained From Multimillennial Runs of Two GFDL Climate Models
NASA Astrophysics Data System (ADS)
Paynter, D.; Frölicher, T. L.; Horowitz, L. W.; Silvers, L. G.
2018-02-01
Equilibrium climate sensitivity (ECS), defined as the long-term change in global mean surface air temperature in response to doubling atmospheric CO2, is usually computed from short atmospheric simulations over a mixed layer ocean, or inferred using a linear regression over a short-time period of adjustment. We report the actual ECS from multimillenial simulations of two Geophysical Fluid Dynamics Laboratory (GFDL) general circulation models (GCMs), ESM2M, and CM3 of 3.3 K and 4.8 K, respectively. Both values are 1 K higher than estimates for the same models reported in the Fifth Assessment Report of the Intergovernmental Panel on Climate Change obtained by regressing the Earth's energy imbalance against temperature. This underestimate is mainly due to changes in the climate feedback parameter (-α) within the first century after atmospheric CO2 has stabilized. For both GCMs it is possible to estimate ECS with linear regression to within 0.3 K by increasing CO2 at 1% per year to doubling and using years 51-350 after CO2 is constant. We show that changes in -α differ between the two GCMs and are strongly tied to the changes in both vertical velocity at 500 hPa (ω500) and estimated inversion strength that the GCMs experience during the progression toward the equilibrium. This suggests that while cloud physics parametrizations are important for determining the strength of -α, the substantially different atmospheric state resulting from a changed sea surface temperature pattern may be of equal importance.
Mikulich-Gilbertson, Susan K; Wagner, Brandie D; Grunwald, Gary K; Riggs, Paula D; Zerbe, Gary O
2018-01-01
Medical research is often designed to investigate changes in a collection of response variables that are measured repeatedly on the same subjects. The multivariate generalized linear mixed model (MGLMM) can be used to evaluate random coefficient associations (e.g. simple correlations, partial regression coefficients) among outcomes that may be non-normal and differently distributed by specifying a multivariate normal distribution for their random effects and then evaluating the latent relationship between them. Empirical Bayes predictors are readily available for each subject from any mixed model and are observable and hence, plotable. Here, we evaluate whether second-stage association analyses of empirical Bayes predictors from a MGLMM, provide a good approximation and visual representation of these latent association analyses using medical examples and simulations. Additionally, we compare these results with association analyses of empirical Bayes predictors generated from separate mixed models for each outcome, a procedure that could circumvent computational problems that arise when the dimension of the joint covariance matrix of random effects is large and prohibits estimation of latent associations. As has been shown in other analytic contexts, the p-values for all second-stage coefficients that were determined by naively assuming normality of empirical Bayes predictors provide a good approximation to p-values determined via permutation analysis. Analyzing outcomes that are interrelated with separate models in the first stage and then associating the resulting empirical Bayes predictors in a second stage results in different mean and covariance parameter estimates from the maximum likelihood estimates generated by a MGLMM. The potential for erroneous inference from using results from these separate models increases as the magnitude of the association among the outcomes increases. Thus if computable, scatterplots of the conditionally independent empirical Bayes predictors from a MGLMM are always preferable to scatterplots of empirical Bayes predictors generated by separate models, unless the true association between outcomes is zero.
Meng, Xia; Fu, Qingyan; Ma, Zongwei; Chen, Li; Zou, Bin; Zhang, Yan; Xue, Wenbo; Wang, Jinnan; Wang, Dongfang; Kan, Haidong; Liu, Yang
2016-01-01
Development of exposure assessment model is the key component for epidemiological studies concerning air pollution, but the evidence from China is limited. Therefore, a linear mixed effects (LME) model was established in this study in a Chinese metropolis by incorporating aerosol optical depth (AOD), meteorological information and the land use regression (LUR) model to predict ground PM10 levels on high spatiotemporal resolution. The cross validation (CV) R(2) and the RMSE of the LME model were 0.87 and 19.2 μg/m(3), respectively. The relative prediction error (RPE) of daily and annual mean predicted PM10 concentrations were 19.1% and 7.5%, respectively. This study was the first attempt in China to estimate both short-term and long-term variation of PM10 levels with high spatial resolution in a Chinese metropolis with the LME model. The results suggested that the LME model could provide exposure assessment for short-term and long-term epidemiological studies in China. Copyright © 2015 Elsevier Ltd. All rights reserved.
Element enrichment factor calculation using grain-size distribution and functional data regression.
Sierra, C; Ordóñez, C; Saavedra, A; Gallego, J R
2015-01-01
In environmental geochemistry studies it is common practice to normalize element concentrations in order to remove the effect of grain size. Linear regression with respect to a particular grain size or conservative element is a widely used method of normalization. In this paper, the utility of functional linear regression, in which the grain-size curve is the independent variable and the concentration of pollutant the dependent variable, is analyzed and applied to detrital sediment. After implementing functional linear regression and classical linear regression models to normalize and calculate enrichment factors, we concluded that the former regression technique has some advantages over the latter. First, functional linear regression directly considers the grain-size distribution of the samples as the explanatory variable. Second, as the regression coefficients are not constant values but functions depending on the grain size, it is easier to comprehend the relationship between grain size and pollutant concentration. Third, regularization can be introduced into the model in order to establish equilibrium between reliability of the data and smoothness of the solutions. Copyright © 2014 Elsevier Ltd. All rights reserved.
A Linear Regression and Markov Chain Model for the Arabian Horse Registry
1993-04-01
as a tax deduction? Yes No T-4367 68 26. Regardless of previous equine tax deductions, do you consider your current horse activities to be... (Mark one...E L T-4367 A Linear Regression and Markov Chain Model For the Arabian Horse Registry Accesion For NTIS CRA&I UT 7 4:iC=D 5 D-IC JA" LI J:13tjlC,3 lO...the Arabian Horse Registry, which needed to forecast its future registration of purebred Arabian horses . A linear regression model was utilized to
A primer for biomedical scientists on how to execute model II linear regression analysis.
Ludbrook, John
2012-04-01
1. There are two very different ways of executing linear regression analysis. One is Model I, when the x-values are fixed by the experimenter. The other is Model II, in which the x-values are free to vary and are subject to error. 2. I have received numerous complaints from biomedical scientists that they have great difficulty in executing Model II linear regression analysis. This may explain the results of a Google Scholar search, which showed that the authors of articles in journals of physiology, pharmacology and biochemistry rarely use Model II regression analysis. 3. I repeat my previous arguments in favour of using least products linear regression analysis for Model II regressions. I review three methods for executing ordinary least products (OLP) and weighted least products (WLP) regression analysis: (i) scientific calculator and/or computer spreadsheet; (ii) specific purpose computer programs; and (iii) general purpose computer programs. 4. Using a scientific calculator and/or computer spreadsheet, it is easy to obtain correct values for OLP slope and intercept, but the corresponding 95% confidence intervals (CI) are inaccurate. 5. Using specific purpose computer programs, the freeware computer program smatr gives the correct OLP regression coefficients and obtains 95% CI by bootstrapping. In addition, smatr can be used to compare the slopes of OLP lines. 6. When using general purpose computer programs, I recommend the commercial programs systat and Statistica for those who regularly undertake linear regression analysis and I give step-by-step instructions in the Supplementary Information as to how to use loss functions. © 2011 The Author. Clinical and Experimental Pharmacology and Physiology. © 2011 Blackwell Publishing Asia Pty Ltd.
Linear regression in astronomy. II
NASA Technical Reports Server (NTRS)
Feigelson, Eric D.; Babu, Gutti J.
1992-01-01
A wide variety of least-squares linear regression procedures used in observational astronomy, particularly investigations of the cosmic distance scale, are presented and discussed. The classes of linear models considered are (1) unweighted regression lines, with bootstrap and jackknife resampling; (2) regression solutions when measurement error, in one or both variables, dominates the scatter; (3) methods to apply a calibration line to new data; (4) truncated regression models, which apply to flux-limited data sets; and (5) censored regression models, which apply when nondetections are present. For the calibration problem we develop two new procedures: a formula for the intercept offset between two parallel data sets, which propagates slope errors from one regression to the other; and a generalization of the Working-Hotelling confidence bands to nonstandard least-squares lines. They can provide improved error analysis for Faber-Jackson, Tully-Fisher, and similar cosmic distance scale relations.
Jose F. Negron; Willis C. Schaupp; Kenneth E. Gibson; John Anhold; Dawn Hansen; Ralph Thier; Phil Mocettini
1999-01-01
Data collected from Douglas-fir stands infected by the Douglas-fir beetle in Wyoming, Montana, Idaho, and Utah, were used to develop models to estimate amount of mortality in terms of basal area killed. Models were built using stepwise linear regression and regression tree approaches. Linear regression models using initial Douglas-fir basal area were built for all...
NASA Astrophysics Data System (ADS)
Samhouri, M.; Al-Ghandoor, A.; Fouad, R. H.
2009-08-01
In this study two techniques, for modeling electricity consumption of the Jordanian industrial sector, are presented: (i) multivariate linear regression and (ii) neuro-fuzzy models. Electricity consumption is modeled as function of different variables such as number of establishments, number of employees, electricity tariff, prevailing fuel prices, production outputs, capacity utilizations, and structural effects. It was found that industrial production and capacity utilization are the most important variables that have significant effect on future electrical power demand. The results showed that both the multivariate linear regression and neuro-fuzzy models are generally comparable and can be used adequately to simulate industrial electricity consumption. However, comparison that is based on the square root average squared error of data suggests that the neuro-fuzzy model performs slightly better for future prediction of electricity consumption than the multivariate linear regression model. Such results are in full agreement with similar work, using different methods, for other countries.
Rajeswaran, Jeevanantham; Blackstone, Eugene H
2017-02-01
In medical sciences, we often encounter longitudinal temporal relationships that are non-linear in nature. The influence of risk factors may also change across longitudinal follow-up. A system of multiphase non-linear mixed effects model is presented to model temporal patterns of longitudinal continuous measurements, with temporal decomposition to identify the phases and risk factors within each phase. Application of this model is illustrated using spirometry data after lung transplantation using readily available statistical software. This application illustrates the usefulness of our flexible model when dealing with complex non-linear patterns and time-varying coefficients.
Weichenthal, Scott; Ryswyk, Keith Van; Goldstein, Alon; Bagg, Scott; Shekkarizfard, Maryam; Hatzopoulou, Marianne
2016-04-01
Existing evidence suggests that ambient ultrafine particles (UFPs) (<0.1µm) may contribute to acute cardiorespiratory morbidity. However, few studies have examined the long-term health effects of these pollutants owing in part to a need for exposure surfaces that can be applied in large population-based studies. To address this need, we developed a land use regression model for UFPs in Montreal, Canada using mobile monitoring data collected from 414 road segments during the summer and winter months between 2011 and 2012. Two different approaches were examined for model development including standard multivariable linear regression and a machine learning approach (kernel-based regularized least squares (KRLS)) that learns the functional form of covariate impacts on ambient UFP concentrations from the data. The final models included parameters for population density, ambient temperature and wind speed, land use parameters (park space and open space), length of local roads and rail, and estimated annual average NOx emissions from traffic. The final multivariable linear regression model explained 62% of the spatial variation in ambient UFP concentrations whereas the KRLS model explained 79% of the variance. The KRLS model performed slightly better than the linear regression model when evaluated using an external dataset (R(2)=0.58 vs. 0.55) or a cross-validation procedure (R(2)=0.67 vs. 0.60). In general, our findings suggest that the KRLS approach may offer modest improvements in predictive performance compared to standard multivariable linear regression models used to estimate spatial variations in ambient UFPs. However, differences in predictive performance were not statistically significant when evaluated using the cross-validation procedure. Crown Copyright © 2015. Published by Elsevier Inc. All rights reserved.
Hemmila, April; McGill, Jim; Ritter, David
2008-03-01
To determine if changes in fingerprint infrared spectra linear with age can be found, partial least squares (PLS1) regression of 155 fingerprint infrared spectra against the person's age was constructed. The regression produced a linear model of age as a function of spectrum with a root mean square error of calibration of less than 4 years, showing an inflection at about 25 years of age. The spectral ranges emphasized by the regression do not correspond to the highest concentration constituents of the fingerprints. Separate linear regression models for old and young people can be constructed with even more statistical rigor. The success of the regression demonstrates that a combination of constituents can be found that changes linearly with age, with a significant shift around puberty.
Karabatsos, George
2017-02-01
Most of applied statistics involves regression analysis of data. In practice, it is important to specify a regression model that has minimal assumptions which are not violated by data, to ensure that statistical inferences from the model are informative and not misleading. This paper presents a stand-alone and menu-driven software package, Bayesian Regression: Nonparametric and Parametric Models, constructed from MATLAB Compiler. Currently, this package gives the user a choice from 83 Bayesian models for data analysis. They include 47 Bayesian nonparametric (BNP) infinite-mixture regression models; 5 BNP infinite-mixture models for density estimation; and 31 normal random effects models (HLMs), including normal linear models. Each of the 78 regression models handles either a continuous, binary, or ordinal dependent variable, and can handle multi-level (grouped) data. All 83 Bayesian models can handle the analysis of weighted observations (e.g., for meta-analysis), and the analysis of left-censored, right-censored, and/or interval-censored data. Each BNP infinite-mixture model has a mixture distribution assigned one of various BNP prior distributions, including priors defined by either the Dirichlet process, Pitman-Yor process (including the normalized stable process), beta (two-parameter) process, normalized inverse-Gaussian process, geometric weights prior, dependent Dirichlet process, or the dependent infinite-probits prior. The software user can mouse-click to select a Bayesian model and perform data analysis via Markov chain Monte Carlo (MCMC) sampling. After the sampling completes, the software automatically opens text output that reports MCMC-based estimates of the model's posterior distribution and model predictive fit to the data. Additional text and/or graphical output can be generated by mouse-clicking other menu options. This includes output of MCMC convergence analyses, and estimates of the model's posterior predictive distribution, for selected functionals and values of covariates. The software is illustrated through the BNP regression analysis of real data.
Orthogonal Regression: A Teaching Perspective
ERIC Educational Resources Information Center
Carr, James R.
2012-01-01
A well-known approach to linear least squares regression is that which involves minimizing the sum of squared orthogonal projections of data points onto the best fit line. This form of regression is known as orthogonal regression, and the linear model that it yields is known as the major axis. A similar method, reduced major axis regression, is…
Shi, J Q; Wang, B; Will, E J; West, R M
2012-11-20
We propose a new semiparametric model for functional regression analysis, combining a parametric mixed-effects model with a nonparametric Gaussian process regression model, namely a mixed-effects Gaussian process functional regression model. The parametric component can provide explanatory information between the response and the covariates, whereas the nonparametric component can add nonlinearity. We can model the mean and covariance structures simultaneously, combining the information borrowed from other subjects with the information collected from each individual subject. We apply the model to dose-response curves that describe changes in the responses of subjects for differing levels of the dose of a drug or agent and have a wide application in many areas. We illustrate the method for the management of renal anaemia. An individual dose-response curve is improved when more information is included by this mechanism from the subject/patient over time, enabling a patient-specific treatment regime. Copyright © 2012 John Wiley & Sons, Ltd.
ERIC Educational Resources Information Center
Rocconi, Louis M.
2013-01-01
This study examined the differing conclusions one may come to depending upon the type of analysis chosen, hierarchical linear modeling or ordinary least squares (OLS) regression. To illustrate this point, this study examined the influences of seniors' self-reported critical thinking abilities three ways: (1) an OLS regression with the student…
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sun Wei; Huang, Guo H., E-mail: huang@iseis.org; Institute for Energy, Environment and Sustainable Communities, University of Regina, Regina, Saskatchewan, S4S 0A2
2012-06-15
Highlights: Black-Right-Pointing-Pointer Inexact piecewise-linearization-based fuzzy flexible programming is proposed. Black-Right-Pointing-Pointer It's the first application to waste management under multiple complexities. Black-Right-Pointing-Pointer It tackles nonlinear economies-of-scale effects in interval-parameter constraints. Black-Right-Pointing-Pointer It estimates costs more accurately than the linear-regression-based model. Black-Right-Pointing-Pointer Uncertainties are decreased and more satisfactory interval solutions are obtained. - Abstract: To tackle nonlinear economies-of-scale (EOS) effects in interval-parameter constraints for a representative waste management problem, an inexact piecewise-linearization-based fuzzy flexible programming (IPFP) model is developed. In IPFP, interval parameters for waste amounts and transportation/operation costs can be quantified; aspiration levels for net system costs, as well as tolerancemore » intervals for both capacities of waste treatment facilities and waste generation rates can be reflected; and the nonlinear EOS effects transformed from objective function to constraints can be approximated. An interactive algorithm is proposed for solving the IPFP model, which in nature is an interval-parameter mixed-integer quadratically constrained programming model. To demonstrate the IPFP's advantages, two alternative models are developed to compare their performances. One is a conventional linear-regression-based inexact fuzzy programming model (IPFP2) and the other is an IPFP model with all right-hand-sides of fussy constraints being the corresponding interval numbers (IPFP3). The comparison results between IPFP and IPFP2 indicate that the optimized waste amounts would have the similar patterns in both models. However, when dealing with EOS effects in constraints, the IPFP2 may underestimate the net system costs while the IPFP can estimate the costs more accurately. The comparison results between IPFP and IPFP3 indicate that their solutions would be significantly different. The decreased system uncertainties in IPFP's solutions demonstrate its effectiveness for providing more satisfactory interval solutions than IPFP3. Following its first application to waste management, the IPFP can be potentially applied to other environmental problems under multiple complexities.« less
Zhang, Xin; Liu, Pan; Chen, Yuguang; Bai, Lu; Wang, Wei
2014-01-01
The primary objective of this study was to identify whether the frequency of traffic conflicts at signalized intersections can be modeled. The opposing left-turn conflicts were selected for the development of conflict predictive models. Using data collected at 30 approaches at 20 signalized intersections, the underlying distributions of the conflicts under different traffic conditions were examined. Different conflict-predictive models were developed to relate the frequency of opposing left-turn conflicts to various explanatory variables. The models considered include a linear regression model, a negative binomial model, and separate models developed for four traffic scenarios. The prediction performance of different models was compared. The frequency of traffic conflicts follows a negative binominal distribution. The linear regression model is not appropriate for the conflict frequency data. In addition, drivers behaved differently under different traffic conditions. Accordingly, the effects of conflicting traffic volumes on conflict frequency vary across different traffic conditions. The occurrences of traffic conflicts at signalized intersections can be modeled using generalized linear regression models. The use of conflict predictive models has potential to expand the uses of surrogate safety measures in safety estimation and evaluation.
Statistical classification of drug incidents due to look-alike sound-alike mix-ups.
Wong, Zoie Shui Yee
2016-06-01
It has been recognised that medication names that look or sound similar are a cause of medication errors. This study builds statistical classifiers for identifying medication incidents due to look-alike sound-alike mix-ups. A total of 227 patient safety incident advisories related to medication were obtained from the Canadian Patient Safety Institute's Global Patient Safety Alerts system. Eight feature selection strategies based on frequent terms, frequent drug terms and constituent terms were performed. Statistical text classifiers based on logistic regression, support vector machines with linear, polynomial, radial-basis and sigmoid kernels and decision tree were trained and tested. The models developed achieved an average accuracy of above 0.8 across all the model settings. The receiver operating characteristic curves indicated the classifiers performed reasonably well. The results obtained in this study suggest that statistical text classification can be a feasible method for identifying medication incidents due to look-alike sound-alike mix-ups based on a database of advisories from Global Patient Safety Alerts. © The Author(s) 2014.
Wavelet regression model in forecasting crude oil price
NASA Astrophysics Data System (ADS)
Hamid, Mohd Helmie; Shabri, Ani
2017-05-01
This study presents the performance of wavelet multiple linear regression (WMLR) technique in daily crude oil forecasting. WMLR model was developed by integrating the discrete wavelet transform (DWT) and multiple linear regression (MLR) model. The original time series was decomposed to sub-time series with different scales by wavelet theory. Correlation analysis was conducted to assist in the selection of optimal decomposed components as inputs for the WMLR model. The daily WTI crude oil price series has been used in this study to test the prediction capability of the proposed model. The forecasting performance of WMLR model were also compared with regular multiple linear regression (MLR), Autoregressive Moving Average (ARIMA) and Generalized Autoregressive Conditional Heteroscedasticity (GARCH) using root mean square errors (RMSE) and mean absolute errors (MAE). Based on the experimental results, it appears that the WMLR model performs better than the other forecasting technique tested in this study.
Due to the complexity of the processes contributing to beach bacteria concentrations, many researchers rely on statistical modeling, among which multiple linear regression (MLR) modeling is most widely used. Despite its ease of use and interpretation, there may be time dependence...
USING LINEAR AND POLYNOMIAL MODELS TO EXAMINE THE ENVIRONMENTAL STABILITY OF VIRUSES
The article presents the development of model equations for describing the fate of viral infectivity in environmental samples. Most of the models were based upon the use of a two-step linear regression approach. The first step employs regression of log base 10 transformed viral t...
Linear regression metamodeling as a tool to summarize and present simulation model results.
Jalal, Hawre; Dowd, Bryan; Sainfort, François; Kuntz, Karen M
2013-10-01
Modelers lack a tool to systematically and clearly present complex model results, including those from sensitivity analyses. The objective was to propose linear regression metamodeling as a tool to increase transparency of decision analytic models and better communicate their results. We used a simplified cancer cure model to demonstrate our approach. The model computed the lifetime cost and benefit of 3 treatment options for cancer patients. We simulated 10,000 cohorts in a probabilistic sensitivity analysis (PSA) and regressed the model outcomes on the standardized input parameter values in a set of regression analyses. We used the regression coefficients to describe measures of sensitivity analyses, including threshold and parameter sensitivity analyses. We also compared the results of the PSA to deterministic full-factorial and one-factor-at-a-time designs. The regression intercept represented the estimated base-case outcome, and the other coefficients described the relative parameter uncertainty in the model. We defined simple relationships that compute the average and incremental net benefit of each intervention. Metamodeling produced outputs similar to traditional deterministic 1-way or 2-way sensitivity analyses but was more reliable since it used all parameter values. Linear regression metamodeling is a simple, yet powerful, tool that can assist modelers in communicating model characteristics and sensitivity analyses.
Scoring and staging systems using cox linear regression modeling and recursive partitioning.
Lee, J W; Um, S H; Lee, J B; Mun, J; Cho, H
2006-01-01
Scoring and staging systems are used to determine the order and class of data according to predictors. Systems used for medical data, such as the Child-Turcotte-Pugh scoring and staging systems for ordering and classifying patients with liver disease, are often derived strictly from physicians' experience and intuition. We construct objective and data-based scoring/staging systems using statistical methods. We consider Cox linear regression modeling and recursive partitioning techniques for censored survival data. In particular, to obtain a target number of stages we propose cross-validation and amalgamation algorithms. We also propose an algorithm for constructing scoring and staging systems by integrating local Cox linear regression models into recursive partitioning, so that we can retain the merits of both methods such as superior predictive accuracy, ease of use, and detection of interactions between predictors. The staging system construction algorithms are compared by cross-validation evaluation of real data. The data-based cross-validation comparison shows that Cox linear regression modeling is somewhat better than recursive partitioning when there are only continuous predictors, while recursive partitioning is better when there are significant categorical predictors. The proposed local Cox linear recursive partitioning has better predictive accuracy than Cox linear modeling and simple recursive partitioning. This study indicates that integrating local linear modeling into recursive partitioning can significantly improve prediction accuracy in constructing scoring and staging systems.
Ling, Ru; Liu, Jiawang
2011-12-01
To construct prediction model for health workforce and hospital beds in county hospitals of Hunan by multiple linear regression. We surveyed 16 counties in Hunan with stratified random sampling according to uniform questionnaires,and multiple linear regression analysis with 20 quotas selected by literature view was done. Independent variables in the multiple linear regression model on medical personnels in county hospitals included the counties' urban residents' income, crude death rate, medical beds, business occupancy, professional equipment value, the number of devices valued above 10 000 yuan, fixed assets, long-term debt, medical income, medical expenses, outpatient and emergency visits, hospital visits, actual available bed days, and utilization rate of hospital beds. Independent variables in the multiple linear regression model on county hospital beds included the the population of aged 65 and above in the counties, disposable income of urban residents, medical personnel of medical institutions in county area, business occupancy, the total value of professional equipment, fixed assets, long-term debt, medical income, medical expenses, outpatient and emergency visits, hospital visits, actual available bed days, utilization rate of hospital beds, and length of hospitalization. The prediction model shows good explanatory and fitting, and may be used for short- and mid-term forecasting.
Xiaoqiu Zuo; Urs Buehlmann; R. Edward Thomas
2004-01-01
Solving the least-cost lumber grade mix problem allows dimension mills to minimize the cost of dimension part production. This problem, due to its economic importance, has attracted much attention from researchers and industry in the past. Most solutions used linear programming models and assumed that a simple linear relationship existed between lumber grade mix and...
Kumar, K Vasanth; Sivanesan, S
2006-08-25
Pseudo second order kinetic expressions of Ho, Sobkowsk and Czerwinski, Blanachard et al. and Ritchie were fitted to the experimental kinetic data of malachite green onto activated carbon by non-linear and linear method. Non-linear method was found to be a better way of obtaining the parameters involved in the second order rate kinetic expressions. Both linear and non-linear regression showed that the Sobkowsk and Czerwinski and Ritchie's pseudo second order model were the same. Non-linear regression analysis showed that both Blanachard et al. and Ho have similar ideas on the pseudo second order model but with different assumptions. The best fit of experimental data in Ho's pseudo second order expression by linear and non-linear regression method showed that Ho pseudo second order model was a better kinetic expression when compared to other pseudo second order kinetic expressions. The amount of dye adsorbed at equilibrium, q(e), was predicted from Ho pseudo second order expression and were fitted to the Langmuir, Freundlich and Redlich Peterson expressions by both linear and non-linear method to obtain the pseudo isotherms. The best fitting pseudo isotherm was found to be the Langmuir and Redlich Peterson isotherm. Redlich Peterson is a special case of Langmuir when the constant g equals unity.
Leung, Michael; Bassani, Diego G; Racine-Poon, Amy; Goldenberg, Anna; Ali, Syed Asad; Kang, Gagandeep; Premkumar, Prasanna S; Roth, Daniel E
2017-09-10
Conditioning child growth measures on baseline accounts for regression to the mean (RTM). Here, we present the "conditional random slope" (CRS) model, based on a linear-mixed effects model that incorporates a baseline-time interaction term that can accommodate multiple data points for a child while also directly accounting for RTM. In two birth cohorts, we applied five approaches to estimate child growth velocities from 0 to 12 months to assess the effect of increasing data density (number of measures per child) on the magnitude of RTM of unconditional estimates, and the correlation and concordance between the CRS and four alternative metrics. Further, we demonstrated the differential effect of the choice of velocity metric on the magnitude of the association between infant growth and stunting at 2 years. RTM was minimally attenuated by increasing data density for unconditional growth modeling approaches. CRS and classical conditional models gave nearly identical estimates with two measures per child. Compared to the CRS estimates, unconditional metrics had moderate correlation (r = 0.65-0.91), but poor agreement in the classification of infants with relatively slow growth (kappa = 0.38-0.78). Estimates of the velocity-stunting association were the same for CRS and classical conditional models but differed substantially between conditional versus unconditional metrics. The CRS can leverage the flexibility of linear mixed models while addressing RTM in longitudinal analyses. © 2017 The Authors American Journal of Human Biology Published by Wiley Periodicals, Inc.
Predicting recycling behaviour: Comparison of a linear regression model and a fuzzy logic model.
Vesely, Stepan; Klöckner, Christian A; Dohnal, Mirko
2016-03-01
In this paper we demonstrate that fuzzy logic can provide a better tool for predicting recycling behaviour than the customarily used linear regression. To show this, we take a set of empirical data on recycling behaviour (N=664), which we randomly divide into two halves. The first half is used to estimate a linear regression model of recycling behaviour, and to develop a fuzzy logic model of recycling behaviour. As the first comparison, the fit of both models to the data included in estimation of the models (N=332) is evaluated. As the second comparison, predictive accuracy of both models for "new" cases (hold-out data not included in building the models, N=332) is assessed. In both cases, the fuzzy logic model significantly outperforms the regression model in terms of fit. To conclude, when accurate predictions of recycling and possibly other environmental behaviours are needed, fuzzy logic modelling seems to be a promising technique. Copyright © 2015 Elsevier Ltd. All rights reserved.
A Multiphase Non-Linear Mixed Effects Model: An Application to Spirometry after Lung Transplantation
Rajeswaran, Jeevanantham; Blackstone, Eugene H.
2014-01-01
In medical sciences, we often encounter longitudinal temporal relationships that are non-linear in nature. The influence of risk factors may also change across longitudinal follow-up. A system of multiphase non-linear mixed effects model is presented to model temporal patterns of longitudinal continuous measurements, with temporal decomposition to identify the phases and risk factors within each phase. Application of this model is illustrated using spirometry data after lung transplantation using readily available statistical software. This application illustrates the usefulness of our flexible model when dealing with complex non-linear patterns and time varying coefficients. PMID:24919830
Choudhary, Pushpa; Velaga, Nagendra R
2017-09-01
This study analysed and modelled the effects of conversation and texting (each with two difficulty levels) on driving performance of Indian drivers in terms of their mean speed and accident avoiding abilities; and further explored the relationship between speed reduction strategy of the drivers and their corresponding accident frequency. 100 drivers of three different age groups (young, mid-age and old-age) participated in the simulator study. Two sudden events of Indian context: unexpected crossing of pedestrians and joining of parked vehicles from road side, were simulated for estimating the accident probabilities. Generalized linear mixed models approach was used for developing linear regression models for mean speed and binary logistic regression models for accident probability. The results of the models showed that the drivers significantly compensated the increased workload by reducing their mean speed by 2.62m/s and 5.29m/s in the presence of conversation and texting tasks respectively. The logistic models for accident probabilities showed that the accident probabilities increased by 3 and 4 times respectively when the drivers were conversing or texting on a phone during driving. Further, the relationship between the speed reduction patterns and their corresponding accident frequencies showed that all the drivers compensated differently; but, among all the drivers, only few drivers, who compensated by reducing the speed by 30% or more, were able to fully offset the increased accident risk associated with the phone use. Copyright © 2017 Elsevier Ltd. All rights reserved.
Classical Testing in Functional Linear Models.
Kong, Dehan; Staicu, Ana-Maria; Maity, Arnab
2016-01-01
We extend four tests common in classical regression - Wald, score, likelihood ratio and F tests - to functional linear regression, for testing the null hypothesis, that there is no association between a scalar response and a functional covariate. Using functional principal component analysis, we re-express the functional linear model as a standard linear model, where the effect of the functional covariate can be approximated by a finite linear combination of the functional principal component scores. In this setting, we consider application of the four traditional tests. The proposed testing procedures are investigated theoretically for densely observed functional covariates when the number of principal components diverges. Using the theoretical distribution of the tests under the alternative hypothesis, we develop a procedure for sample size calculation in the context of functional linear regression. The four tests are further compared numerically for both densely and sparsely observed noisy functional data in simulation experiments and using two real data applications.
Classical Testing in Functional Linear Models
Kong, Dehan; Staicu, Ana-Maria; Maity, Arnab
2016-01-01
We extend four tests common in classical regression - Wald, score, likelihood ratio and F tests - to functional linear regression, for testing the null hypothesis, that there is no association between a scalar response and a functional covariate. Using functional principal component analysis, we re-express the functional linear model as a standard linear model, where the effect of the functional covariate can be approximated by a finite linear combination of the functional principal component scores. In this setting, we consider application of the four traditional tests. The proposed testing procedures are investigated theoretically for densely observed functional covariates when the number of principal components diverges. Using the theoretical distribution of the tests under the alternative hypothesis, we develop a procedure for sample size calculation in the context of functional linear regression. The four tests are further compared numerically for both densely and sparsely observed noisy functional data in simulation experiments and using two real data applications. PMID:28955155
Functional Mixed Effects Model for Small Area Estimation.
Maiti, Tapabrata; Sinha, Samiran; Zhong, Ping-Shou
2016-09-01
Functional data analysis has become an important area of research due to its ability of handling high dimensional and complex data structures. However, the development is limited in the context of linear mixed effect models, and in particular, for small area estimation. The linear mixed effect models are the backbone of small area estimation. In this article, we consider area level data, and fit a varying coefficient linear mixed effect model where the varying coefficients are semi-parametrically modeled via B-splines. We propose a method of estimating the fixed effect parameters and consider prediction of random effects that can be implemented using a standard software. For measuring prediction uncertainties, we derive an analytical expression for the mean squared errors, and propose a method of estimating the mean squared errors. The procedure is illustrated via a real data example, and operating characteristics of the method are judged using finite sample simulation studies.
Random regression analyses using B-splines to model growth of Australian Angus cattle
Meyer, Karin
2005-01-01
Regression on the basis function of B-splines has been advocated as an alternative to orthogonal polynomials in random regression analyses. Basic theory of splines in mixed model analyses is reviewed, and estimates from analyses of weights of Australian Angus cattle from birth to 820 days of age are presented. Data comprised 84 533 records on 20 731 animals in 43 herds, with a high proportion of animals with 4 or more weights recorded. Changes in weights with age were modelled through B-splines of age at recording. A total of thirteen analyses, considering different combinations of linear, quadratic and cubic B-splines and up to six knots, were carried out. Results showed good agreement for all ages with many records, but fluctuated where data were sparse. On the whole, analyses using B-splines appeared more robust against "end-of-range" problems and yielded more consistent and accurate estimates of the first eigenfunctions than previous, polynomial analyses. A model fitting quadratic B-splines, with knots at 0, 200, 400, 600 and 821 days and a total of 91 covariance components, appeared to be a good compromise between detailedness of the model, number of parameters to be estimated, plausibility of results, and fit, measured as residual mean square error. PMID:16093011
ERIC Educational Resources Information Center
Yan, Jun; Aseltine, Robert H., Jr.; Harel, Ofer
2013-01-01
Comparing regression coefficients between models when one model is nested within another is of great practical interest when two explanations of a given phenomenon are specified as linear models. The statistical problem is whether the coefficients associated with a given set of covariates change significantly when other covariates are added into…
Sources of variation for indoor nitrogen dioxide in rural residences of Ethiopia
2009-01-01
Background Unprocessed biomass fuel is the primary source of indoor air pollution (IAP) in developing countries. The use of biomass fuel has been linked with acute respiratory infections. This study assesses sources of variations associated with the level of indoor nitrogen dioxide (NO2). Materials and methods This study examines household factors affecting the level of indoor pollution by measuring NO2. Repeated measurements of NO2 were made using a passive diffusive sampler. A Saltzman colorimetric method using a spectrometer calibrated at 540 nm was employed to analyze the mass of NO2 on the collection filter that was then subjected to a mass transfer equation to calculate the level of NO2 for the 24 hours of sampling duration. Structured questionnaire was used to collect data on fuel use characteristics. Data entry and cleaning was done in EPI INFO version 6.04, while data was analyzed using SPSS version 15.0. Analysis of variance, multiple linear regression and linear mixed model were used to isolate determining factors contributing to the variation of NO2 concentration. Results A total of 17,215 air samples were fully analyzed during the study period. Wood and crop were principal source of household energy. Biomass fuel characteristics were strongly related to indoor NO2 concentration in one-way analysis of variance. There was variation in repeated measurements of indoor NO2 over time. In a linear mixed model regression analysis, highland setting, wet season, cooking, use of fire events at least twice a day, frequency of cooked food items, and interaction between ecology and season were predictors of indoor NO2 concentration. The volume of the housing unit and the presence of kitchen showed little relevance in the level of NO2 concentration. Conclusion Agro-ecology, season, purpose of fire events, frequency of fire activities, frequency of cooking and physical conditions of housing are predictors of NO2 concentration. Improved kitchen conditions and ventilation are highly recommended. PMID:19922645
Markov and semi-Markov switching linear mixed models used to identify forest tree growth components.
Chaubert-Pereira, Florence; Guédon, Yann; Lavergne, Christian; Trottier, Catherine
2010-09-01
Tree growth is assumed to be mainly the result of three components: (i) an endogenous component assumed to be structured as a succession of roughly stationary phases separated by marked change points that are asynchronous among individuals, (ii) a time-varying environmental component assumed to take the form of synchronous fluctuations among individuals, and (iii) an individual component corresponding mainly to the local environment of each tree. To identify and characterize these three components, we propose to use semi-Markov switching linear mixed models, i.e., models that combine linear mixed models in a semi-Markovian manner. The underlying semi-Markov chain represents the succession of growth phases and their lengths (endogenous component) whereas the linear mixed models attached to each state of the underlying semi-Markov chain represent-in the corresponding growth phase-both the influence of time-varying climatic covariates (environmental component) as fixed effects, and interindividual heterogeneity (individual component) as random effects. In this article, we address the estimation of Markov and semi-Markov switching linear mixed models in a general framework. We propose a Monte Carlo expectation-maximization like algorithm whose iterations decompose into three steps: (i) sampling of state sequences given random effects, (ii) prediction of random effects given state sequences, and (iii) maximization. The proposed statistical modeling approach is illustrated by the analysis of successive annual shoots along Corsican pine trunks influenced by climatic covariates. © 2009, The International Biometric Society.
Breivik, Cathrine Nansdal; Nilsen, Roy Miodini; Myrseth, Erling; Pedersen, Paal Henning; Varughese, Jobin K; Chaudhry, Aqeel Asghar; Lund-Johansen, Morten
2013-07-01
There are few reports about the course of vestibular schwannoma (VS) patients following gamma knife radiosurgery (GKRS) compared with the course following conservative management (CM). In this study, we present prospectively collected data of 237 patients with unilateral VS extending outside the internal acoustic canal who received either GKRS (113) or CM (124). The aim was to measure the effect of GKRS compared with the natural course on tumor growth rate and hearing loss. Secondary end points were postinclusion additional treatment, quality of life (QoL), and symptom development. The patients underwent magnetic resonance imaging scans, clinical examination, and QoL assessment by SF-36 questionnaire. Statistics were performed by using Spearman correlation coefficient, Kaplan-Meier plot, Poisson regression model, mixed linear regression models, and mixed logistic regression models. Mean follow-up time was 55.0 months (26.1 standard deviation, range 10-132). Thirteen patients were lost to follow-up. Serviceable hearing was lost in 54 of 71 (76%) (CM) and 34 of 53 (64%) (GKRS) patients during the study period (not significant, log-rank test). There was a significant reduction in tumor volume over time in the GKRS group. The need for treatment following initial GKRS or CM differed at highly significant levels (log-rank test, P < .001). Symptom and QoL development did not differ significantly between the groups. In VS patients, GKRS reduces the tumor growth rate and thereby the incidence rate of new treatment about tenfold. Hearing is lost at similar rates in both groups. Symptoms and QoL seem not to be significantly affected by GKRS.
We investigated the use of output from Bayesian stable isotope mixing models as constraints for a linear inverse food web model of a temperate intertidal seagrass system in the Marennes-Oléron Bay, France. Linear inverse modeling (LIM) is a technique that estimates a complete net...
A phenomenological biological dose model for proton therapy based on linear energy transfer spectra.
Rørvik, Eivind; Thörnqvist, Sara; Stokkevåg, Camilla H; Dahle, Tordis J; Fjaera, Lars Fredrik; Ytre-Hauge, Kristian S
2017-06-01
The relative biological effectiveness (RBE) of protons varies with the radiation quality, quantified by the linear energy transfer (LET). Most phenomenological models employ a linear dependency of the dose-averaged LET (LET d ) to calculate the biological dose. However, several experiments have indicated a possible non-linear trend. Our aim was to investigate if biological dose models including non-linear LET dependencies should be considered, by introducing a LET spectrum based dose model. The RBE-LET relationship was investigated by fitting of polynomials from 1st to 5th degree to a database of 85 data points from aerobic in vitro experiments. We included both unweighted and weighted regression, the latter taking into account experimental uncertainties. Statistical testing was performed to decide whether higher degree polynomials provided better fits to the data as compared to lower degrees. The newly developed models were compared to three published LET d based models for a simulated spread out Bragg peak (SOBP) scenario. The statistical analysis of the weighted regression analysis favored a non-linear RBE-LET relationship, with the quartic polynomial found to best represent the experimental data (P = 0.010). The results of the unweighted regression analysis were on the borderline of statistical significance for non-linear functions (P = 0.053), and with the current database a linear dependency could not be rejected. For the SOBP scenario, the weighted non-linear model estimated a similar mean RBE value (1.14) compared to the three established models (1.13-1.17). The unweighted model calculated a considerably higher RBE value (1.22). The analysis indicated that non-linear models could give a better representation of the RBE-LET relationship. However, this is not decisive, as inclusion of the experimental uncertainties in the regression analysis had a significant impact on the determination and ranking of the models. As differences between the models were observed for the SOBP scenario, both non-linear LET spectrum- and linear LET d based models should be further evaluated in clinically realistic scenarios. © 2017 American Association of Physicists in Medicine.
Random effects coefficient of determination for mixed and meta-analysis models
Demidenko, Eugene; Sargent, James; Onega, Tracy
2011-01-01
The key feature of a mixed model is the presence of random effects. We have developed a coefficient, called the random effects coefficient of determination, Rr2, that estimates the proportion of the conditional variance of the dependent variable explained by random effects. This coefficient takes values from 0 to 1 and indicates how strong the random effects are. The difference from the earlier suggested fixed effects coefficient of determination is emphasized. If Rr2 is close to 0, there is weak support for random effects in the model because the reduction of the variance of the dependent variable due to random effects is small; consequently, random effects may be ignored and the model simplifies to standard linear regression. The value of Rr2 apart from 0 indicates the evidence of the variance reduction in support of the mixed model. If random effects coefficient of determination is close to 1 the variance of random effects is very large and random effects turn into free fixed effects—the model can be estimated using the dummy variable approach. We derive explicit formulas for Rr2 in three special cases: the random intercept model, the growth curve model, and meta-analysis model. Theoretical results are illustrated with three mixed model examples: (1) travel time to the nearest cancer center for women with breast cancer in the U.S., (2) cumulative time watching alcohol related scenes in movies among young U.S. teens, as a risk factor for early drinking onset, and (3) the classic example of the meta-analysis model for combination of 13 studies on tuberculosis vaccine. PMID:23750070
Strand, Matthew; Sillau, Stefan; Grunwald, Gary K; Rabinovitch, Nathan
2014-02-10
Regression calibration provides a way to obtain unbiased estimators of fixed effects in regression models when one or more predictors are measured with error. Recent development of measurement error methods has focused on models that include interaction terms between measured-with-error predictors, and separately, methods for estimation in models that account for correlated data. In this work, we derive explicit and novel forms of regression calibration estimators and associated asymptotic variances for longitudinal models that include interaction terms, when data from instrumental and unbiased surrogate variables are available but not the actual predictors of interest. The longitudinal data are fit using linear mixed models that contain random intercepts and account for serial correlation and unequally spaced observations. The motivating application involves a longitudinal study of exposure to two pollutants (predictors) - outdoor fine particulate matter and cigarette smoke - and their association in interactive form with levels of a biomarker of inflammation, leukotriene E4 (LTE 4 , outcome) in asthmatic children. Because the exposure concentrations could not be directly observed, we used measurements from a fixed outdoor monitor and urinary cotinine concentrations as instrumental variables, and we used concentrations of fine ambient particulate matter and cigarette smoke measured with error by personal monitors as unbiased surrogate variables. We applied the derived regression calibration methods to estimate coefficients of the unobserved predictors and their interaction, allowing for direct comparison of toxicity of the different pollutants. We used simulations to verify accuracy of inferential methods based on asymptotic theory. Copyright © 2013 John Wiley & Sons, Ltd.
1994-09-01
Institute of Technology, Wright- Patterson AFB OH, January 1994. 4. Neter, John and others. Applied Linear Regression Models. Boston: Irwin, 1989. 5...Technology, Wright-Patterson AFB OH 5 April 1994. 29. Neter, John and others. Applied Linear Regression Models. Boston: Irwin, 1989. 30. Office of
NASA Astrophysics Data System (ADS)
Benhalouche, Fatima Zohra; Karoui, Moussa Sofiane; Deville, Yannick; Ouamri, Abdelaziz
2015-10-01
In this paper, a new Spectral-Unmixing-based approach, using Nonnegative Matrix Factorization (NMF), is proposed to locally multi-sharpen hyperspectral data by integrating a Digital Surface Model (DSM) obtained from LIDAR data. In this new approach, the nature of the local mixing model is detected by using the local variance of the object elevations. The hyper/multispectral images are explored using small zones. In each zone, the variance of the object elevations is calculated from the DSM data in this zone. This variance is compared to a threshold value and the adequate linear/linearquadratic spectral unmixing technique is used in the considered zone to independently unmix hyperspectral and multispectral data, using an adequate linear/linear-quadratic NMF-based approach. The obtained spectral and spatial information thus respectively extracted from the hyper/multispectral images are then recombined in the considered zone, according to the selected mixing model. Experiments based on synthetic hyper/multispectral data are carried out to evaluate the performance of the proposed multi-sharpening approach and literature linear/linear-quadratic approaches used on the whole hyper/multispectral data. In these experiments, real DSM data are used to generate synthetic data containing linear and linear-quadratic mixed pixel zones. The DSM data are also used for locally detecting the nature of the mixing model in the proposed approach. Globally, the proposed approach yields good spatial and spectral fidelities for the multi-sharpened data and significantly outperforms the used literature methods.
A Second-Order Conditionally Linear Mixed Effects Model with Observed and Latent Variable Covariates
ERIC Educational Resources Information Center
Harring, Jeffrey R.; Kohli, Nidhi; Silverman, Rebecca D.; Speece, Deborah L.
2012-01-01
A conditionally linear mixed effects model is an appropriate framework for investigating nonlinear change in a continuous latent variable that is repeatedly measured over time. The efficacy of the model is that it allows parameters that enter the specified nonlinear time-response function to be stochastic, whereas those parameters that enter in a…
USDA-ARS?s Scientific Manuscript database
The mixed linear model (MLM) is currently among the most advanced and flexible statistical modeling techniques and its use in tackling problems in plant pathology has begun surfacing in the literature. The longitudinal MLM is a multivariate extension that handles repeatedly measured data, such as r...
Genetic Programming Transforms in Linear Regression Situations
NASA Astrophysics Data System (ADS)
Castillo, Flor; Kordon, Arthur; Villa, Carlos
The chapter summarizes the use of Genetic Programming (GP) inMultiple Linear Regression (MLR) to address multicollinearity and Lack of Fit (LOF). The basis of the proposed method is applying appropriate input transforms (model respecification) that deal with these issues while preserving the information content of the original variables. The transforms are selected from symbolic regression models with optimal trade-off between accuracy of prediction and expressional complexity, generated by multiobjective Pareto-front GP. The chapter includes a comparative study of the GP-generated transforms with Ridge Regression, a variant of ordinary Multiple Linear Regression, which has been a useful and commonly employed approach for reducing multicollinearity. The advantages of GP-generated model respecification are clearly defined and demonstrated. Some recommendations for transforms selection are given as well. The application benefits of the proposed approach are illustrated with a real industrial application in one of the broadest empirical modeling areas in manufacturing - robust inferential sensors. The chapter contributes to increasing the awareness of the potential of GP in statistical model building by MLR.
Mosing, Martina; Waldmann, Andreas D.; MacFarlane, Paul; Iff, Samuel; Auer, Ulrike; Bohm, Stephan H.; Bettschart-Wolfensberger, Regula; Bardell, David
2016-01-01
This study evaluated the breathing pattern and distribution of ventilation in horses prior to and following recovery from general anaesthesia using electrical impedance tomography (EIT). Six horses were anaesthetised for 6 hours in dorsal recumbency. Arterial blood gas and EIT measurements were performed 24 hours before (baseline) and 1, 2, 3, 4, 5 and 6 hours after horses stood following anaesthesia. At each time point 4 representative spontaneous breaths were analysed. The percentage of the total breath length during which impedance remained greater than 50% of the maximum inspiratory impedance change (breath holding), the fraction of total tidal ventilation within each of four stacked regions of interest (ROI) (distribution of ventilation) and the filling time and inflation period of seven ROI evenly distributed over the dorso-ventral height of the lungs were calculated. Mixed effects multi-linear regression and linear regression were used and significance was set at p<0.05. All horses demonstrated inspiratory breath holding until 5 hours after standing. No change from baseline was seen for the distribution of ventilation during inspiration. Filling time and inflation period were more rapid and shorter in ventral and slower and longer in most dorsal ROI compared to baseline, respectively. In a mixed effects multi-linear regression, breath holding was significantly correlated with PaCO2 in both the univariate and multivariate regression. Following recovery from anaesthesia, horses showed inspiratory breath holding during which gas redistributed from ventral into dorsal regions of the lungs. This suggests auto-recruitment of lung tissue which would have been dependent and likely atelectic during anaesthesia. PMID:27331910
As a fast and effective technique, the multiple linear regression (MLR) method has been widely used in modeling and prediction of beach bacteria concentrations. Among previous works on this subject, however, several issues were insufficiently or inconsistently addressed. Those is...
Hossein-Zadeh, Navid Ghavi
2016-08-01
The aim of this study was to compare seven non-linear mathematical models (Brody, Wood, Dhanoa, Sikka, Nelder, Rook and Dijkstra) to examine their efficiency in describing the lactation curves for milk fat to protein ratio (FPR) in Iranian buffaloes. Data were 43 818 test-day records for FPR from the first three lactations of Iranian buffaloes which were collected on 523 dairy herds in the period from 1996 to 2012 by the Animal Breeding Center of Iran. Each model was fitted to monthly FPR records of buffaloes using the non-linear mixed model procedure (PROC NLMIXED) in SAS and the parameters were estimated. The models were tested for goodness of fit using Akaike's information criterion (AIC), Bayesian information criterion (BIC) and log maximum likelihood (-2 Log L). The Nelder and Sikka mixed models provided the best fit of lactation curve for FPR in the first and second lactations of Iranian buffaloes, respectively. However, Wood, Dhanoa and Sikka mixed models provided the best fit of lactation curve for FPR in the third parity buffaloes. Evaluation of first, second and third lactation features showed that all models, except for Dijkstra model in the third lactation, under-predicted test time at which daily FPR was minimum. On the other hand, minimum FPR was over-predicted by all equations. Evaluation of the different models used in this study indicated that non-linear mixed models were sufficient for fitting test-day FPR records of Iranian buffaloes.
Prediction of siRNA potency using sparse logistic regression.
Hu, Wei; Hu, John
2014-06-01
RNA interference (RNAi) can modulate gene expression at post-transcriptional as well as transcriptional levels. Short interfering RNA (siRNA) serves as a trigger for the RNAi gene inhibition mechanism, and therefore is a crucial intermediate step in RNAi. There have been extensive studies to identify the sequence characteristics of potent siRNAs. One such study built a linear model using LASSO (Least Absolute Shrinkage and Selection Operator) to measure the contribution of each siRNA sequence feature. This model is simple and interpretable, but it requires a large number of nonzero weights. We have introduced a novel technique, sparse logistic regression, to build a linear model using single-position specific nucleotide compositions which has the same prediction accuracy of the linear model based on LASSO. The weights in our new model share the same general trend as those in the previous model, but have only 25 nonzero weights out of a total 84 weights, a 54% reduction compared to the previous model. Contrary to the linear model based on LASSO, our model suggests that only a few positions are influential on the efficacy of the siRNA, which are the 5' and 3' ends and the seed region of siRNA sequences. We also employed sparse logistic regression to build a linear model using dual-position specific nucleotide compositions, a task LASSO is not able to accomplish well due to its high dimensional nature. Our results demonstrate the superiority of sparse logistic regression as a technique for both feature selection and regression over LASSO in the context of siRNA design.
Statistical and Biophysical Models for Predicting Total and Outdoor Water Use in Los Angeles
NASA Astrophysics Data System (ADS)
Mini, C.; Hogue, T. S.; Pincetl, S.
2012-04-01
Modeling water demand is a complex exercise in the choice of the functional form, techniques and variables to integrate in the model. The goal of the current research is to identify the determinants that control total and outdoor residential water use in semi-arid cities and to utilize that information in the development of statistical and biophysical models that can forecast spatial and temporal urban water use. The City of Los Angeles is unique in its highly diverse socio-demographic, economic and cultural characteristics across neighborhoods, which introduces significant challenges in modeling water use. Increasing climate variability also contributes to uncertainties in water use predictions in urban areas. Monthly individual water use records were acquired from the Los Angeles Department of Water and Power (LADWP) for the 2000 to 2010 period. Study predictors of residential water use include socio-demographic, economic, climate and landscaping variables at the zip code level collected from US Census database. Climate variables are estimated from ground-based observations and calculated at the centroid of each zip code by inverse-distance weighting method. Remotely-sensed products of vegetation biomass and landscape land cover are also utilized. Two linear regression models were developed based on the panel data and variables described: a pooled-OLS regression model and a linear mixed effects model. Both models show income per capita and the percentage of landscape areas in each zip code as being statistically significant predictors. The pooled-OLS model tends to over-estimate higher water use zip codes and both models provide similar RMSE values.Outdoor water use was estimated at the census tract level as the residual between total water use and indoor use. This residual is being compared with the output from a biophysical model including tree and grass cover areas, climate variables and estimates of evapotranspiration at very high spatial resolution. A genetic algorithm based model (Shuffled Complex Evolution-UA; SCE-UA) is also being developed to provide estimates of the predictions and parameters uncertainties and to compare against the linear regression models. Ultimately, models will be selected to undertake predictions for a range of climate change and landscape scenarios. Finally, project results will contribute to a better understanding of water demand to help predict future water use and implement targeted landscaping conservation programs to maintain sustainable water needs for a growing population under uncertain climate variability.
Predictive and mechanistic multivariate linear regression models for reaction development
Santiago, Celine B.; Guo, Jing-Yao
2018-01-01
Multivariate Linear Regression (MLR) models utilizing computationally-derived and empirically-derived physical organic molecular descriptors are described in this review. Several reports demonstrating the effectiveness of this methodological approach towards reaction optimization and mechanistic interrogation are discussed. A detailed protocol to access quantitative and predictive MLR models is provided as a guide for model development and parameter analysis. PMID:29719711
Adding a Parameter Increases the Variance of an Estimated Regression Function
ERIC Educational Resources Information Center
Withers, Christopher S.; Nadarajah, Saralees
2011-01-01
The linear regression model is one of the most popular models in statistics. It is also one of the simplest models in statistics. It has received applications in almost every area of science, engineering and medicine. In this article, the authors show that adding a predictor to a linear model increases the variance of the estimated regression…
Scarneciu, Camelia C; Sangeorzan, Livia; Rus, Horatiu; Scarneciu, Vlad D; Varciu, Mihai S; Andreescu, Oana; Scarneciu, Ioan
2017-01-01
This study aimed at assessing the incidence of pulmonary hypertension (PH) at newly diagnosed hyperthyroid patients and at finding a simple model showing the complex functional relation between pulmonary hypertension in hyperthyroidism and the factors causing it. The 53 hyperthyroid patients (H-group) were evaluated mainly by using an echocardiographical method and compared with 35 euthyroid (E-group) and 25 healthy people (C-group). In order to identify the factors causing pulmonary hypertension the statistical method of comparing the values of arithmetical means is used. The functional relation between the two random variables (PAPs and each of the factors determining it within our research study) can be expressed by linear or non-linear function. By applying the linear regression method described by a first-degree equation the line of regression (linear model) has been determined; by applying the non-linear regression method described by a second degree equation, a parabola-type curve of regression (non-linear or polynomial model) has been determined. We made the comparison and the validation of these two models by calculating the determination coefficient (criterion 1), the comparison of residuals (criterion 2), application of AIC criterion (criterion 3) and use of F-test (criterion 4). From the H-group, 47% have pulmonary hypertension completely reversible when obtaining euthyroidism. The factors causing pulmonary hypertension were identified: previously known- level of free thyroxin, pulmonary vascular resistance, cardiac output; new factors identified in this study- pretreatment period, age, systolic blood pressure. According to the four criteria and to the clinical judgment, we consider that the polynomial model (graphically parabola- type) is better than the linear one. The better model showing the functional relation between the pulmonary hypertension in hyperthyroidism and the factors identified in this study is given by a polynomial equation of second degree where the parabola is its graphical representation.
Evaluation of third-degree and fourth-degree laceration rates as quality indicators.
Friedman, Alexander M; Ananth, Cande V; Prendergast, Eri; D'Alton, Mary E; Wright, Jason D
2015-04-01
To examine the patterns and predictors of third-degree and fourth-degree laceration in women undergoing vaginal delivery. We identified a population-based cohort of women in the United States who underwent a vaginal delivery between 1998 and 2010 using the Nationwide Inpatient Sample. Multivariable log-linear regression models were developed to account for patient, obstetric, and hospital factors related to lacerations. Between-hospital variability of laceration rates was calculated using generalized log-linear mixed models. Among 7,096,056 women who underwent vaginal delivery in 3,070 hospitals, 3.3% (n=232,762) had a third-degree laceration and 1.1% (n=76,347) had a fourth-degree laceration. In an adjusted model for fourth-degree lacerations, important risk factors included shoulder dystocia and forceps and vacuum deliveries with and without episiotomy. Other demographic, obstetric, medical, and hospital variables, although statistically significant, were not major determinants of lacerations. Risk factors in a multivariable model for third-degree lacerations were similar to those in the fourth-degree model. Regression analysis of hospital rates (n=3,070) of lacerations demonstrated limited between-hospital variation. Risk of third-degree and fourth-degree laceration was most strongly related to operative delivery and shoulder dystocia. Between-hospital variation was limited. Given these findings and that the most modifiable practice related to lacerations would be reduction in operative vaginal deliveries (and a possible increase in cesarean delivery), third-degree and fourth-degree laceration rates may be a quality metric of limited utility.
Liu, Danping; Yeung, Edwina H; McLain, Alexander C; Xie, Yunlong; Buck Louis, Germaine M; Sundaram, Rajeshwari
2017-09-01
Imperfect follow-up in longitudinal studies commonly leads to missing outcome data that can potentially bias the inference when the missingness is nonignorable; that is, the propensity of missingness depends on missing values in the data. In the Upstate KIDS Study, we seek to determine if the missingness of child development outcomes is nonignorable, and how a simple model assuming ignorable missingness would compare with more complicated models for a nonignorable mechanism. To correct for nonignorable missingness, the shared random effects model (SREM) jointly models the outcome and the missing mechanism. However, the computational complexity and lack of software packages has limited its practical applications. This paper proposes a novel two-step approach to handle nonignorable missing outcomes in generalized linear mixed models. We first analyse the missing mechanism with a generalized linear mixed model and predict values of the random effects; then, the outcome model is fitted adjusting for the predicted random effects to account for heterogeneity in the missingness propensity. Extensive simulation studies suggest that the proposed method is a reliable approximation to SREM, with a much faster computation. The nonignorability of missing data in the Upstate KIDS Study is estimated to be mild to moderate, and the analyses using the two-step approach or SREM are similar to the model assuming ignorable missingness. The two-step approach is a computationally straightforward method that can be conducted as sensitivity analyses in longitudinal studies to examine violations to the ignorable missingness assumption and the implications relative to health outcomes. © 2017 John Wiley & Sons Ltd.
Marcotte, Thomas D.; Deutsch, Reena; Michael, Benedict Daniel; Franklin, Donald; Cookson, Debra Rosario; Bharti, Ajay R.; Grant, Igor; Letendre, Scott L.
2013-01-01
Background Neurocognitive (NC) impairment (NCI) occurs commonly in people living with HIV. Despite substantial effort, no biomarkers have been sufficiently validated for diagnosis and prognosis of NCI in the clinic. The goal of this project was to identify diagnostic or prognostic biomarkers for NCI in a comprehensively characterized HIV cohort. Methods Multidisciplinary case review selected 98 HIV-infected individuals and categorized them into four NC groups using normative data: stably normal (SN), stably impaired (SI), worsening (Wo), or improving (Im). All subjects underwent comprehensive NC testing, phlebotomy, and lumbar puncture at two timepoints separated by a median of 6.2 months. Eight biomarkers were measured in CSF and blood by immunoassay. Results were analyzed using mixed model linear regression and staged recursive partitioning. Results At the first visit, subjects were mostly middle-aged (median 45) white (58%) men (84%) who had AIDS (70%). Of the 73% who took antiretroviral therapy (ART), 54% had HIV RNA levels below 50 c/mL in plasma. Mixed model linear regression identified that only MCP-1 in CSF was associated with neurocognitive change group. Recursive partitioning models aimed at diagnosis (i.e., correctly classifying neurocognitive status at the first visit) were complex and required most biomarkers to achieve misclassification limits. In contrast, prognostic models were more efficient. A combination of three biomarkers (sCD14, MCP-1, SDF-1α) correctly classified 82% of Wo and SN subjects, including 88% of SN subjects. A combination of two biomarkers (MCP-1, TNF-α) correctly classified 81% of Im and SI subjects, including 100% of SI subjects. Conclusions This analysis of well-characterized individuals identified concise panels of biomarkers associated with NC change. Across all analyses, the two most frequently identified biomarkers were sCD14 and MCP-1, indicators of monocyte/macrophage activation. While the panels differed depending on the outcome and on the degree of misclassification, nearly all stable patients were correctly classified. PMID:24101401
Optimal clinical trial design based on a dichotomous Markov-chain mixed-effect sleep model.
Steven Ernest, C; Nyberg, Joakim; Karlsson, Mats O; Hooker, Andrew C
2014-12-01
D-optimal designs for discrete-type responses have been derived using generalized linear mixed models, simulation based methods and analytical approximations for computing the fisher information matrix (FIM) of non-linear mixed effect models with homogeneous probabilities over time. In this work, D-optimal designs using an analytical approximation of the FIM for a dichotomous, non-homogeneous, Markov-chain phase advanced sleep non-linear mixed effect model was investigated. The non-linear mixed effect model consisted of transition probabilities of dichotomous sleep data estimated as logistic functions using piecewise linear functions. Theoretical linear and nonlinear dose effects were added to the transition probabilities to modify the probability of being in either sleep stage. D-optimal designs were computed by determining an analytical approximation the FIM for each Markov component (one where the previous state was awake and another where the previous state was asleep). Each Markov component FIM was weighted either equally or by the average probability of response being awake or asleep over the night and summed to derive the total FIM (FIM(total)). The reference designs were placebo, 0.1, 1-, 6-, 10- and 20-mg dosing for a 2- to 6-way crossover study in six dosing groups. Optimized design variables were dose and number of subjects in each dose group. The designs were validated using stochastic simulation/re-estimation (SSE). Contrary to expectations, the predicted parameter uncertainty obtained via FIM(total) was larger than the uncertainty in parameter estimates computed by SSE. Nevertheless, the D-optimal designs decreased the uncertainty of parameter estimates relative to the reference designs. Additionally, the improvement for the D-optimal designs were more pronounced using SSE than predicted via FIM(total). Through the use of an approximate analytic solution and weighting schemes, the FIM(total) for a non-homogeneous, dichotomous Markov-chain phase advanced sleep model was computed and provided more efficient trial designs and increased nonlinear mixed-effects modeling parameter precision.
An Application to the Prediction of LOD Change Based on General Regression Neural Network
NASA Astrophysics Data System (ADS)
Zhang, X. H.; Wang, Q. J.; Zhu, J. J.; Zhang, H.
2011-07-01
Traditional prediction of the LOD (length of day) change was based on linear models, such as the least square model and the autoregressive technique, etc. Due to the complex non-linear features of the LOD variation, the performances of the linear model predictors are not fully satisfactory. This paper applies a non-linear neural network - general regression neural network (GRNN) model to forecast the LOD change, and the results are analyzed and compared with those obtained with the back propagation neural network and other models. The comparison shows that the performance of the GRNN model in the prediction of the LOD change is efficient and feasible.
Missing Data in Clinical Studies: Issues and Methods
Ibrahim, Joseph G.; Chu, Haitao; Chen, Ming-Hui
2012-01-01
Missing data are a prevailing problem in any type of data analyses. A participant variable is considered missing if the value of the variable (outcome or covariate) for the participant is not observed. In this article, various issues in analyzing studies with missing data are discussed. Particularly, we focus on missing response and/or covariate data for studies with discrete, continuous, or time-to-event end points in which generalized linear models, models for longitudinal data such as generalized linear mixed effects models, or Cox regression models are used. We discuss various classifications of missing data that may arise in a study and demonstrate in several situations that the commonly used method of throwing out all participants with any missing data may lead to incorrect results and conclusions. The methods described are applied to data from an Eastern Cooperative Oncology Group phase II clinical trial of liver cancer and a phase III clinical trial of advanced non–small-cell lung cancer. Although the main area of application discussed here is cancer, the issues and methods we discuss apply to any type of study. PMID:22649133
Blood biomarkers in male and female participants after an Ironman-distance triathlon.
Danielsson, Tom; Carlsson, Jörg; Schreyer, Hendrik; Ahnesjö, Jonas; Ten Siethoff, Lasse; Ragnarsson, Thony; Tugetam, Åsa; Bergman, Patrick
2017-01-01
While overall physical activity is clearly associated with a better short-term and long-term health, prolonged strenuous physical activity may result in a rise in acute levels of blood-biomarkers used in clinical practice for diagnosis of various conditions or diseases. In this study, we explored the acute effects of a full Ironman-distance triathlon on biomarkers related to heart-, liver-, kidney- and skeletal muscle damage immediately post-race and after one week's rest. We also examined if sex, age, finishing time and body composition influenced the post-race values of the biomarkers. A sample of 30 subjects was recruited (50% women) to the study. The subjects were evaluated for body composition and blood samples were taken at three occasions, before the race (T1), immediately after (T2) and one week after the race (T3). Linear regression models were fitted to analyse the independent contribution of sex and finishing time controlled for weight, body fat percentage and age, on the biomarkers at the termination of the race (T2). Linear mixed models were fitted to examine if the biomarkers differed between the sexes over time (T1-T3). Being male was a significant predictor of higher post-race (T2) levels of myoglobin, CK, and creatinine levels and body weight was negatively associated with myoglobin. In general, the models were unable to explain the variation of the dependent variables. In the linear mixed models, an interaction between time (T1-T3) and sex was seen for myoglobin and creatinine, in which women had a less pronounced response to the race. Overall women appear to tolerate the effects of prolonged strenuous physical activity better than men as illustrated by their lower values of the biomarkers both post-race as well as during recovery.
A componential model of human interaction with graphs: 1. Linear regression modeling
NASA Technical Reports Server (NTRS)
Gillan, Douglas J.; Lewis, Robert
1994-01-01
Task analyses served as the basis for developing the Mixed Arithmetic-Perceptual (MA-P) model, which proposes (1) that people interacting with common graphs to answer common questions apply a set of component processes-searching for indicators, encoding the value of indicators, performing arithmetic operations on the values, making spatial comparisons among indicators, and repsonding; and (2) that the type of graph and user's task determine the combination and order of the components applied (i.e., the processing steps). Two experiments investigated the prediction that response time will be linearly related to the number of processing steps according to the MA-P model. Subjects used line graphs, scatter plots, and stacked bar graphs to answer comparison questions and questions requiring arithmetic calculations. A one-parameter version of the model (with equal weights for all components) and a two-parameter version (with different weights for arithmetic and nonarithmetic processes) accounted for 76%-85% of individual subjects' variance in response time and 61%-68% of the variance taken across all subjects. The discussion addresses possible modifications in the MA-P model, alternative models, and design implications from the MA-P model.
Sociodemographic Factors Associated With Changes in Successful Aging in Spain: A Follow-Up Study.
Domènech-Abella, Joan; Perales, Jaime; Lara, Elvira; Moneta, Maria Victoria; Izquierdo, Ana; Rico-Uribe, Laura Alejandra; Mundó, Jordi; Haro, Josep Maria
2017-06-01
Successful aging (SA) refers to maintaining well-being in old age. Several definitions or models of SA exist (biomedical, psychosocial, and mixed). We examined the longitudinal association between various SA models and sociodemographic factors, and analyzed the patterns of change within these models. This was a nationally representative follow-up in Spain including 3,625 individuals aged ≥50 years. Some 1,970 individuals were interviewed after 3 years. Linear regression models were used to analyze the survey data. Age, sex, and occupation predicted SA in the biomedical model, while marital status, educational level, and urbanicity predicted SA in the psychosocial model. The remaining models included different sets of these predictors as significant. In the psychosocial model, individuals tended to improve over time but this was not the case in the biomedical model. The biomedical and psychosocial components of SA need to be addressed specifically to achieve the best aging trajectories.
1990-03-01
and M.H. Knuter. Applied Linear Regression Models. Homewood IL: Richard D. Erwin Inc., 1983. Pritsker, A. Alan B. Introduction to Simulation and SLAM...Control Variates in Simulation," European Journal of Operational Research, 42: (1989). Neter, J., W. Wasserman, and M.H. Xnuter. Applied Linear Regression Models
Evaluating Differential Effects Using Regression Interactions and Regression Mixture Models
ERIC Educational Resources Information Center
Van Horn, M. Lee; Jaki, Thomas; Masyn, Katherine; Howe, George; Feaster, Daniel J.; Lamont, Andrea E.; George, Melissa R. W.; Kim, Minjung
2015-01-01
Research increasingly emphasizes understanding differential effects. This article focuses on understanding regression mixture models, which are relatively new statistical methods for assessing differential effects by comparing results to using an interactive term in linear regression. The research questions which each model answers, their…
Bai, Lu; Chan, Ching-Yao; Liu, Pan; Xu, Chengcheng
2017-10-03
Electric bikes (e-bikes) have been one of the fastest growing trip modes in Southeast Asia over the past 2 decades. The increasing popularity of e-bikes raised some safety concerns regarding urban transport systems. The primary objective of this study was to identify whether and how the generalized linear regression model (GLM) could be used to relate cyclists' safety with various contributing factors when riding in a mid-block bike lane. The types of 2-wheeled vehicles in the study included bicycle-style electric bicycles (BSEBs), scooter-style electric bicycles (SSEBs), and regular bicycles (RBs). Traffic conflict technology was applied as a surrogate measure to evaluate the safety of 2-wheeled vehicles. The safety performance model was developed by adopting a generalized linear regression model for relating the frequency of rear-end conflicts between e-bikes and regular bikes to the operating speeds of BSEBs, SSEBs, and RBs in mid-block bike lanes. The frequency of rear-end conflicts between e-bikes and bikes increased with an increase in the operating speeds of e-bikes and the volume of e-bikes and bikes and decreased with an increase in the width of bike lanes. The large speed difference between e-bikes and bikes increased the frequency of rear-end conflicts between e-bikes and bikes in mid-block bike lanes. A 1% increase in the average operating speed of e-bikes would increase the expected number of rear-end conflicts between e-bikes and bikes by 1.48%. A 1% increase in the speed difference between e-bikes and bikes would increase the expected number of rear-end conflicts between e-bikes/bikes by 0.16%. The conflict frequency in mid-block bike lanes can be modeled using generalized linear regression models. The factors that significantly affected the frequency of rear-end conflicts included the operating speeds of e-bikes, the speed difference between e-bikes and regular bikes, the volume of e-bikes, the volume of bikes, and the width of bike lanes. The safety performance model can help better understand the causes of crash occurrences in mid-block bike lanes.
Identifying the Factors That Influence Change in SEBD Using Logistic Regression Analysis
ERIC Educational Resources Information Center
Camilleri, Liberato; Cefai, Carmel
2013-01-01
Multiple linear regression and ANOVA models are widely used in applications since they provide effective statistical tools for assessing the relationship between a continuous dependent variable and several predictors. However these models rely heavily on linearity and normality assumptions and they do not accommodate categorical dependent…
Killiches, Matthias; Czado, Claudia
2018-03-22
We propose a model for unbalanced longitudinal data, where the univariate margins can be selected arbitrarily and the dependence structure is described with the help of a D-vine copula. We show that our approach is an extremely flexible extension of the widely used linear mixed model if the correlation is homogeneous over the considered individuals. As an alternative to joint maximum-likelihood a sequential estimation approach for the D-vine copula is provided and validated in a simulation study. The model can handle missing values without being forced to discard data. Since conditional distributions are known analytically, we easily make predictions for future events. For model selection, we adjust the Bayesian information criterion to our situation. In an application to heart surgery data our model performs clearly better than competing linear mixed models. © 2018, The International Biometric Society.
Estimation of Complex Generalized Linear Mixed Models for Measurement and Growth
ERIC Educational Resources Information Center
Jeon, Minjeong
2012-01-01
Maximum likelihood (ML) estimation of generalized linear mixed models (GLMMs) is technically challenging because of the intractable likelihoods that involve high dimensional integrations over random effects. The problem is magnified when the random effects have a crossed design and thus the data cannot be reduced to small independent clusters. A…
Application of General Regression Neural Network to the Prediction of LOD Change
NASA Astrophysics Data System (ADS)
Zhang, Xiao-Hong; Wang, Qi-Jie; Zhu, Jian-Jun; Zhang, Hao
2012-01-01
Traditional methods for predicting the change in length of day (LOD change) are mainly based on some linear models, such as the least square model and autoregression model, etc. However, the LOD change comprises complicated non-linear factors and the prediction effect of the linear models is always not so ideal. Thus, a kind of non-linear neural network — general regression neural network (GRNN) model is tried to make the prediction of the LOD change and the result is compared with the predicted results obtained by taking advantage of the BP (back propagation) neural network model and other models. The comparison result shows that the application of the GRNN to the prediction of the LOD change is highly effective and feasible.
Mohd Yusof, Mohd Yusmiaidil Putera; Cauwels, Rita; Deschepper, Ellen; Martens, Luc
2015-08-01
The third molar development (TMD) has been widely utilized as one of the radiographic method for dental age estimation. By using the same radiograph of the same individual, third molar eruption (TME) information can be incorporated to the TMD regression model. This study aims to evaluate the performance of dental age estimation in individual method models and the combined model (TMD and TME) based on the classic regressions of multiple linear and principal component analysis. A sample of 705 digital panoramic radiographs of Malay sub-adults aged between 14.1 and 23.8 years was collected. The techniques described by Gleiser and Hunt (modified by Kohler) and Olze were employed to stage the TMD and TME, respectively. The data was divided to develop three respective models based on the two regressions of multiple linear and principal component analysis. The trained models were then validated on the test sample and the accuracy of age prediction was compared between each model. The coefficient of determination (R²) and root mean square error (RMSE) were calculated. In both genders, adjusted R² yielded an increment in the linear regressions of combined model as compared to the individual models. The overall decrease in RMSE was detected in combined model as compared to TMD (0.03-0.06) and TME (0.2-0.8). In principal component regression, low value of adjusted R(2) and high RMSE except in male were exhibited in combined model. Dental age estimation is better predicted using combined model in multiple linear regression models. Copyright © 2015 Elsevier Ltd and Faculty of Forensic and Legal Medicine. All rights reserved.
Marek K. Jakubowksi; Qinghua Guo; Brandon Collins; Scott Stephens; Maggi Kelly
2013-01-01
We compared the ability of several classification and regression algorithms to predict forest stand structure metrics and standard surface fuel models. Our study area spans a dense, topographically complex Sierra Nevada mixed-conifer forest. We used clustering, regression trees, and support vector machine algorithms to analyze high density (average 9 pulses/m
Linear mixed-effects modeling approach to FMRI group analysis
Chen, Gang; Saad, Ziad S.; Britton, Jennifer C.; Pine, Daniel S.; Cox, Robert W.
2013-01-01
Conventional group analysis is usually performed with Student-type t-test, regression, or standard AN(C)OVA in which the variance–covariance matrix is presumed to have a simple structure. Some correction approaches are adopted when assumptions about the covariance structure is violated. However, as experiments are designed with different degrees of sophistication, these traditional methods can become cumbersome, or even be unable to handle the situation at hand. For example, most current FMRI software packages have difficulty analyzing the following scenarios at group level: (1) taking within-subject variability into account when there are effect estimates from multiple runs or sessions; (2) continuous explanatory variables (covariates) modeling in the presence of a within-subject (repeated measures) factor, multiple subject-grouping (between-subjects) factors, or the mixture of both; (3) subject-specific adjustments in covariate modeling; (4) group analysis with estimation of hemodynamic response (HDR) function by multiple basis functions; (5) various cases of missing data in longitudinal studies; and (6) group studies involving family members or twins. Here we present a linear mixed-effects modeling (LME) methodology that extends the conventional group analysis approach to analyze many complicated cases, including the six prototypes delineated above, whose analyses would be otherwise either difficult or unfeasible under traditional frameworks such as AN(C)OVA and general linear model (GLM). In addition, the strength of the LME framework lies in its flexibility to model and estimate the variance–covariance structures for both random effects and residuals. The intraclass correlation (ICC) values can be easily obtained with an LME model with crossed random effects, even at the presence of confounding fixed effects. The simulations of one prototypical scenario indicate that the LME modeling keeps a balance between the control for false positives and the sensitivity for activation detection. The importance of hypothesis formulation is also illustrated in the simulations. Comparisons with alternative group analysis approaches and the limitations of LME are discussed in details. PMID:23376789
Linear mixed-effects modeling approach to FMRI group analysis.
Chen, Gang; Saad, Ziad S; Britton, Jennifer C; Pine, Daniel S; Cox, Robert W
2013-06-01
Conventional group analysis is usually performed with Student-type t-test, regression, or standard AN(C)OVA in which the variance-covariance matrix is presumed to have a simple structure. Some correction approaches are adopted when assumptions about the covariance structure is violated. However, as experiments are designed with different degrees of sophistication, these traditional methods can become cumbersome, or even be unable to handle the situation at hand. For example, most current FMRI software packages have difficulty analyzing the following scenarios at group level: (1) taking within-subject variability into account when there are effect estimates from multiple runs or sessions; (2) continuous explanatory variables (covariates) modeling in the presence of a within-subject (repeated measures) factor, multiple subject-grouping (between-subjects) factors, or the mixture of both; (3) subject-specific adjustments in covariate modeling; (4) group analysis with estimation of hemodynamic response (HDR) function by multiple basis functions; (5) various cases of missing data in longitudinal studies; and (6) group studies involving family members or twins. Here we present a linear mixed-effects modeling (LME) methodology that extends the conventional group analysis approach to analyze many complicated cases, including the six prototypes delineated above, whose analyses would be otherwise either difficult or unfeasible under traditional frameworks such as AN(C)OVA and general linear model (GLM). In addition, the strength of the LME framework lies in its flexibility to model and estimate the variance-covariance structures for both random effects and residuals. The intraclass correlation (ICC) values can be easily obtained with an LME model with crossed random effects, even at the presence of confounding fixed effects. The simulations of one prototypical scenario indicate that the LME modeling keeps a balance between the control for false positives and the sensitivity for activation detection. The importance of hypothesis formulation is also illustrated in the simulations. Comparisons with alternative group analysis approaches and the limitations of LME are discussed in details. Published by Elsevier Inc.
Simple linear and multivariate regression models.
Rodríguez del Águila, M M; Benítez-Parejo, N
2011-01-01
In biomedical research it is common to find problems in which we wish to relate a response variable to one or more variables capable of describing the behaviour of the former variable by means of mathematical models. Regression techniques are used to this effect, in which an equation is determined relating the two variables. While such equations can have different forms, linear equations are the most widely used form and are easy to interpret. The present article describes simple and multiple linear regression models, how they are calculated, and how their applicability assumptions are checked. Illustrative examples are provided, based on the use of the freely accessible R program. Copyright © 2011 SEICAP. Published by Elsevier Espana. All rights reserved.
Javed, Faizan; Chan, Gregory S H; Savkin, Andrey V; Middleton, Paul M; Malouf, Philip; Steel, Elizabeth; Mackie, James; Lovell, Nigel H
2009-01-01
This paper uses non-linear support vector regression (SVR) to model the blood volume and heart rate (HR) responses in 9 hemodynamically stable kidney failure patients during hemodialysis. Using radial bias function (RBF) kernels the non-parametric models of relative blood volume (RBV) change with time as well as percentage change in HR with respect to RBV were obtained. The e-insensitivity based loss function was used for SVR modeling. Selection of the design parameters which includes capacity (C), insensitivity region (e) and the RBF kernel parameter (sigma) was made based on a grid search approach and the selected models were cross-validated using the average mean square error (AMSE) calculated from testing data based on a k-fold cross-validation technique. Linear regression was also applied to fit the curves and the AMSE was calculated for comparison with SVR. For the model based on RBV with time, SVR gave a lower AMSE for both training (AMSE=1.5) as well as testing data (AMSE=1.4) compared to linear regression (AMSE=1.8 and 1.5). SVR also provided a better fit for HR with RBV for both training as well as testing data (AMSE=15.8 and 16.4) compared to linear regression (AMSE=25.2 and 20.1).
Use of probabilistic weights to enhance linear regression myoelectric control
NASA Astrophysics Data System (ADS)
Smith, Lauren H.; Kuiken, Todd A.; Hargrove, Levi J.
2015-12-01
Objective. Clinically available prostheses for transradial amputees do not allow simultaneous myoelectric control of degrees of freedom (DOFs). Linear regression methods can provide simultaneous myoelectric control, but frequently also result in difficulty with isolating individual DOFs when desired. This study evaluated the potential of using probabilistic estimates of categories of gross prosthesis movement, which are commonly used in classification-based myoelectric control, to enhance linear regression myoelectric control. Approach. Gaussian models were fit to electromyogram (EMG) feature distributions for three movement classes at each DOF (no movement, or movement in either direction) and used to weight the output of linear regression models by the probability that the user intended the movement. Eight able-bodied and two transradial amputee subjects worked in a virtual Fitts’ law task to evaluate differences in controllability between linear regression and probability-weighted regression for an intramuscular EMG-based three-DOF wrist and hand system. Main results. Real-time and offline analyses in able-bodied subjects demonstrated that probability weighting improved performance during single-DOF tasks (p < 0.05) by preventing extraneous movement at additional DOFs. Similar results were seen in experiments with two transradial amputees. Though goodness-of-fit evaluations suggested that the EMG feature distributions showed some deviations from the Gaussian, equal-covariance assumptions used in this experiment, the assumptions were sufficiently met to provide improved performance compared to linear regression control. Significance. Use of probability weights can improve the ability to isolate individual during linear regression myoelectric control, while maintaining the ability to simultaneously control multiple DOFs.
NASA Astrophysics Data System (ADS)
Kutzbach, L.; Schneider, J.; Sachs, T.; Giebels, M.; Nykänen, H.; Shurpali, N. J.; Martikainen, P. J.; Alm, J.; Wilmking, M.
2007-11-01
Closed (non-steady state) chambers are widely used for quantifying carbon dioxide (CO2) fluxes between soils or low-stature canopies and the atmosphere. It is well recognised that covering a soil or vegetation by a closed chamber inherently disturbs the natural CO2 fluxes by altering the concentration gradients between the soil, the vegetation and the overlying air. Thus, the driving factors of CO2 fluxes are not constant during the closed chamber experiment, and no linear increase or decrease of CO2 concentration over time within the chamber headspace can be expected. Nevertheless, linear regression has been applied for calculating CO2 fluxes in many recent, partly influential, studies. This approach has been justified by keeping the closure time short and assuming the concentration change over time to be in the linear range. Here, we test if the application of linear regression is really appropriate for estimating CO2 fluxes using closed chambers over short closure times and if the application of nonlinear regression is necessary. We developed a nonlinear exponential regression model from diffusion and photosynthesis theory. This exponential model was tested with four different datasets of CO2 flux measurements (total number: 1764) conducted at three peatlands sites in Finland and a tundra site in Siberia. Thorough analyses of residuals demonstrated that linear regression was frequently not appropriate for the determination of CO2 fluxes by closed-chamber methods, even if closure times were kept short. The developed exponential model was well suited for nonlinear regression of the concentration over time c(t) evolution in the chamber headspace and estimation of the initial CO2 fluxes at closure time for the majority of experiments. However, a rather large percentage of the exponential regression functions showed curvatures not consistent with the theoretical model which is considered to be caused by violations of the underlying model assumptions. Especially the effects of turbulence and pressure disturbances by the chamber deployment are suspected to have caused unexplainable curvatures. CO2 flux estimates by linear regression can be as low as 40% of the flux estimates of exponential regression for closure times of only two minutes. The degree of underestimation increased with increasing CO2 flux strength and was dependent on soil and vegetation conditions which can disturb not only the quantitative but also the qualitative evaluation of CO2 flux dynamics. The underestimation effect by linear regression was observed to be different for CO2 uptake and release situations which can lead to stronger bias in the daily, seasonal and annual CO2 balances than in the individual fluxes. To avoid serious bias of CO2 flux estimates based on closed chamber experiments, we suggest further tests using published datasets and recommend the use of nonlinear regression models for future closed chamber studies.
Kohli, Nidhi; Sullivan, Amanda L; Sadeh, Shanna; Zopluoglu, Cengiz
2015-04-01
Effective instructional planning and intervening rely heavily on accurate understanding of students' growth, but relatively few researchers have examined mathematics achievement trajectories, particularly for students with special needs. We applied linear, quadratic, and piecewise linear mixed-effects models to identify the best-fitting model for mathematics development over elementary and middle school and to ascertain differences in growth trajectories of children with learning disabilities relative to their typically developing peers. The analytic sample of 2150 students was drawn from the Early Childhood Longitudinal Study - Kindergarten Cohort, a nationally representative sample of United States children who entered kindergarten in 1998. We first modeled students' mathematics growth via multiple mixed-effects models to determine the best fitting model of 9-year growth and then compared the trajectories of students with and without learning disabilities. Results indicate that the piecewise linear mixed-effects model captured best the functional form of students' mathematics trajectories. In addition, there were substantial achievement gaps between students with learning disabilities and students with no disabilities, and their trajectories differed such that students without disabilities progressed at a higher rate than their peers who had learning disabilities. The results underscore the need for further research to understand how to appropriately model students' mathematics trajectories and the need for attention to mathematics achievement gaps in policy. Copyright © 2015 Society for the Study of School Psychology. Published by Elsevier Ltd. All rights reserved.
An Evaluation of the Automated Cost Estimating Integrated Tools (ACEIT) System
1989-09-01
residual and it is described as the residual divided by its standard deviation (13:App A,17). Neter, Wasserman, and Kutner, in Applied Linear Regression Models...others. Applied Linear Regression Models. Homewood IL: Irwin, 1983. 19. Raduchel, William J. "A Professional’s Perspective on User-Friendliness," Byte
Conjoint Analysis: A Study of the Effects of Using Person Variables.
ERIC Educational Resources Information Center
Fraas, John W.; Newman, Isadore
Three statistical techniques--conjoint analysis, a multiple linear regression model, and a multiple linear regression model with a surrogate person variable--were used to estimate the relative importance of five university attributes for students in the process of selecting a college. The five attributes include: availability and variety of…
Random effects coefficient of determination for mixed and meta-analysis models.
Demidenko, Eugene; Sargent, James; Onega, Tracy
2012-01-01
The key feature of a mixed model is the presence of random effects. We have developed a coefficient, called the random effects coefficient of determination, [Formula: see text], that estimates the proportion of the conditional variance of the dependent variable explained by random effects. This coefficient takes values from 0 to 1 and indicates how strong the random effects are. The difference from the earlier suggested fixed effects coefficient of determination is emphasized. If [Formula: see text] is close to 0, there is weak support for random effects in the model because the reduction of the variance of the dependent variable due to random effects is small; consequently, random effects may be ignored and the model simplifies to standard linear regression. The value of [Formula: see text] apart from 0 indicates the evidence of the variance reduction in support of the mixed model. If random effects coefficient of determination is close to 1 the variance of random effects is very large and random effects turn into free fixed effects-the model can be estimated using the dummy variable approach. We derive explicit formulas for [Formula: see text] in three special cases: the random intercept model, the growth curve model, and meta-analysis model. Theoretical results are illustrated with three mixed model examples: (1) travel time to the nearest cancer center for women with breast cancer in the U.S., (2) cumulative time watching alcohol related scenes in movies among young U.S. teens, as a risk factor for early drinking onset, and (3) the classic example of the meta-analysis model for combination of 13 studies on tuberculosis vaccine.
Fabian C.C. Uzoh; William W. Oliver
2008-01-01
A diameter increment model is developed and evaluated for individual trees of ponderosa pine throughout the species range in the United States using a multilevel linear mixed model. Stochastic variability is broken down among period, locale, plot, tree and within-tree components. Covariates acting at tree and stand level, as breast height diameter, density, site index...
NASA Technical Reports Server (NTRS)
MCKissick, Burnell T. (Technical Monitor); Plassman, Gerald E.; Mall, Gerald H.; Quagliano, John R.
2005-01-01
Linear multivariable regression models for predicting day and night Eddy Dissipation Rate (EDR) from available meteorological data sources are defined and validated. Model definition is based on a combination of 1997-2000 Dallas/Fort Worth (DFW) data sources, EDR from Aircraft Vortex Spacing System (AVOSS) deployment data, and regression variables primarily from corresponding Automated Surface Observation System (ASOS) data. Model validation is accomplished through EDR predictions on a similar combination of 1994-1995 Memphis (MEM) AVOSS and ASOS data. Model forms include an intercept plus a single term of fixed optimal power for each of these regression variables; 30-minute forward averaged mean and variance of near-surface wind speed and temperature, variance of wind direction, and a discrete cloud cover metric. Distinct day and night models, regressing on EDR and the natural log of EDR respectively, yield best performance and avoid model discontinuity over day/night data boundaries.
Extended Mixed-Efects Item Response Models with the MH-RM Algorithm
ERIC Educational Resources Information Center
Chalmers, R. Philip
2015-01-01
A mixed-effects item response theory (IRT) model is presented as a logical extension of the generalized linear mixed-effects modeling approach to formulating explanatory IRT models. Fixed and random coefficients in the extended model are estimated using a Metropolis-Hastings Robbins-Monro (MH-RM) stochastic imputation algorithm to accommodate for…
NASA Technical Reports Server (NTRS)
Cromwell, R. L.; Zanello, S. B.; Yarbough, P. O.; Ploutz-Snyder, R.; Taibbi, G.; Brewer, J. L.; Vizzeri, G.
2013-01-01
Mean IOP significantly increased while at 6deg HDT and returned towards pre-bed rest values upon leaving bed rest. While mean IOP increased during bed rest, it remained within the normal limits for subject safety. A diuretic shift and cardiovascular deconditioning occurs during in-bed rest, as expected. There was no demonstrable correlation between the largest change in IOP (pre/post) and cardiovascular measure changes (pre/post). Additional mixed effects linear regression modeling may reveal some subclinical physiological changes that might assist in describing the VIIP syndrome pathophysiology.
Lu, Tao; Lu, Minggen; Wang, Min; Zhang, Jun; Dong, Guang-Hui; Xu, Yong
2017-12-18
Longitudinal competing risks data frequently arise in clinical studies. Skewness and missingness are commonly observed for these data in practice. However, most joint models do not account for these data features. In this article, we propose partially linear mixed-effects joint models to analyze skew longitudinal competing risks data with missingness. In particular, to account for skewness, we replace the commonly assumed symmetric distributions by asymmetric distribution for model errors. To deal with missingness, we employ an informative missing data model. The joint models that couple the partially linear mixed-effects model for the longitudinal process, the cause-specific proportional hazard model for competing risks process and missing data process are developed. To estimate the parameters in the joint models, we propose a fully Bayesian approach based on the joint likelihood. To illustrate the proposed model and method, we implement them to an AIDS clinical study. Some interesting findings are reported. We also conduct simulation studies to validate the proposed method.
Testing hypotheses for differences between linear regression lines
Stanley J. Zarnoch
2009-01-01
Five hypotheses are identified for testing differences between simple linear regression lines. The distinctions between these hypotheses are based on a priori assumptions and illustrated with full and reduced models. The contrast approach is presented as an easy and complete method for testing for overall differences between the regressions and for making pairwise...
Zhai, Hong Lin; Zhai, Yue Yuan; Li, Pei Zhen; Tian, Yue Li
2013-01-21
A very simple approach to quantitative analysis is proposed based on the technology of digital image processing using three-dimensional (3D) spectra obtained by high-performance liquid chromatography coupled with a diode array detector (HPLC-DAD). As the region-based shape features of a grayscale image, Zernike moments with inherently invariance property were employed to establish the linear quantitative models. This approach was applied to the quantitative analysis of three compounds in mixed samples using 3D HPLC-DAD spectra, and three linear models were obtained, respectively. The correlation coefficients (R(2)) for training and test sets were more than 0.999, and the statistical parameters and strict validation supported the reliability of established models. The analytical results suggest that the Zernike moment selected by stepwise regression can be used in the quantitative analysis of target compounds. Our study provides a new idea for quantitative analysis using 3D spectra, which can be extended to the analysis of other 3D spectra obtained by different methods or instruments.
Narayanan, Neethu; Gupta, Suman; Gajbhiye, V T; Manjaiah, K M
2017-04-01
A carboxy methyl cellulose-nano organoclay (nano montmorillonite modified with 35-45 wt % dimethyl dialkyl (C 14 -C 18 ) amine (DMDA)) composite was prepared by solution intercalation method. The prepared composite was characterized by infrared spectroscopy (FTIR), X-Ray diffraction spectroscopy (XRD) and scanning electron microscopy (SEM). The composite was utilized for its pesticide sorption efficiency for atrazine, imidacloprid and thiamethoxam. The sorption data was fitted into Langmuir and Freundlich isotherms using linear and non linear methods. The linear regression method suggested best fitting of sorption data into Type II Langmuir and Freundlich isotherms. In order to avoid the bias resulting from linearization, seven different error parameters were also analyzed by non linear regression method. The non linear error analysis suggested that the sorption data fitted well into Langmuir model rather than in Freundlich model. The maximum sorption capacity, Q 0 (μg/g) was given by imidacloprid (2000) followed by thiamethoxam (1667) and atrazine (1429). The study suggests that the degree of determination of linear regression alone cannot be used for comparing the best fitting of Langmuir and Freundlich models and non-linear error analysis needs to be done to avoid inaccurate results. Copyright © 2017 Elsevier Ltd. All rights reserved.
Laboratory Headphone Studies of Human Response to Low-Amplitude Sonic Booms and Rattle Heard Indoors
NASA Technical Reports Server (NTRS)
Loubeau, Alexandra; Sullivan, Brenda M.; Klos, Jacob; Rathsam, Jonathan; Gavin, Joseph R.
2013-01-01
Human response to sonic booms heard indoors is affected by the generation of contact-induced rattle noise. The annoyance caused by sonic boom-induced rattle noise was studied in a series of psychoacoustics tests. Stimuli were divided into three categories and presented in three different studies: isolated rattles at the same calculated Perceived Level (PL), sonic booms combined with rattles with the mixed sound at a single PL, and sonic booms combined with rattles with the mixed sound at three different PL. Subjects listened to sounds over headphones and were asked to report their annoyance. Annoyance to different rattles was shown to vary significantly according to rattle object size. In addition, the combination of low-amplitude sonic booms and rattles can be more annoying than the sonic boom alone. Correlations and regression analyses for the combined sonic boom and rattle sounds identified the Moore and Glasberg Stationary Loudness (MGSL) metric as a primary predictor of annoyance for the tested sounds. Multiple linear regression models were developed to describe annoyance to the tested sounds, and simplifications for applicability to a wider range of sounds are presented.
ERIC Educational Resources Information Center
Liou, Pey-Yan
2009-01-01
The current study examines three regression models: OLS (ordinary least square) linear regression, Poisson regression, and negative binomial regression for analyzing count data. Simulation results show that the OLS regression model performed better than the others, since it did not produce more false statistically significant relationships than…
NASA Astrophysics Data System (ADS)
Hadley, Brian Christopher
This dissertation assessed remotely sensed data and geospatial modeling technique(s) to map the spatial distribution of total above-ground biomass present on the surface of the Savannah River National Laboratory's (SRNL) Mixed Waste Management Facility (MWMF) hazardous waste landfill. Ordinary least squares (OLS) regression, regression kriging, and tree-structured regression were employed to model the empirical relationship between in-situ measured Bahia (Paspalum notatum Flugge) and Centipede [Eremochloa ophiuroides (Munro) Hack.] grass biomass against an assortment of explanatory variables extracted from fine spatial resolution passive optical and LIDAR remotely sensed data. Explanatory variables included: (1) discrete channels of visible, near-infrared (NIR), and short-wave infrared (SWIR) reflectance, (2) spectral vegetation indices (SVI), (3) spectral mixture analysis (SMA) modeled fractions, (4) narrow-band derivative-based vegetation indices, and (5) LIDAR derived topographic variables (i.e. elevation, slope, and aspect). Results showed that a linear combination of the first- (1DZ_DGVI), second- (2DZ_DGVI), and third-derivative of green vegetation indices (3DZ_DGVI) calculated from hyperspectral data recorded over the 400--960 nm wavelengths of the electromagnetic spectrum explained the largest percentage of statistical variation (R2 = 0.5184) in the total above-ground biomass measurements. In general, the topographic variables did not correlate well with the MWMF biomass data, accounting for less than five percent of the statistical variation. It was concluded that tree-structured regression represented the optimum geospatial modeling technique due to a combination of model performance and efficiency/flexibility factors.
Climate change at upper treeline: How do trees on the edge react to increasing temperatures?
NASA Astrophysics Data System (ADS)
Jochner, Matthias; Bugmann, Harald; Nötzli, Magdalena; Bigler, Christof
2017-04-01
Treeline ecotones are thought to be particularly sensitive to climate warming, and an alteration of their growth conditions may have important implications for the ecosystem services they supply in mountain regions. We use a novel approach to quantify effects of a changing climate on tree growth, using case studies in the European Alps. We compiled tree-ring data from almost 600 trees of four species at treeline in three climate regions of Switzerland. Temperature loggers installed along transects provided data for a precise interpolation of temperatures experienced by the sampled trees. To assess the influence of temperature on annual growth, we used linear mixed-effects models, allowing us to quantify effect sizes and to account for between-tree growth variability. After removing biological growth trends, we isolated temporal trends of ring-width indices. Furthermore, we fitted non-linear regression models to radial growth rates of individual years with temperature and tree age as predicting covariates for a fine-scale investigation of the temperature dependency of tree growth. For all species, climate-growth linear mixed-effects models indicated strong positive responses of ring-width indices to temperature in early summer and previous year's autumn, featuring considerable between-tree variability. All species showed positive ring-width index trends at treeline but different interactions with elevation: Larix decidua exhibited a declining ring-width index trend with decreasing elevation, whereas Picea abies, Pinus cembra and Pinus mugo showed increasing and/or stable trends. Not only reflected our findings the effects of ameliorated growth conditions, they might have also revealed suspected negative and positive feedbacks of climate change on growth, and increased the knowledge about the functional form and parameterization of the temperature dependency of tree growth.
Unit Cohesion and the Surface Navy: Does Cohesion Affect Performance
1989-12-01
v. 68, 1968. Neter, J., Wasserman, W., and Kutner, M. H., Applied Linear Regression Models, 2d ed., Boston, MA: Irwin, 1989. Rand Corporation R-2607...Neter, J., Wasserman, W., and Kutner, M. H., Applied Linear Regression Models, 2d ed., Boston, MA: Irwin, 1989. SAS User’s Guide: Basics, Version 5 ed
ERIC Educational Resources Information Center
Richter, Tobias
2006-01-01
Most reading time studies using naturalistic texts yield data sets characterized by a multilevel structure: Sentences (sentence level) are nested within persons (person level). In contrast to analysis of variance and multiple regression techniques, hierarchical linear models take the multilevel structure of reading time data into account. They…
Di Mauro, Michele; Dato, Guglielmo Mario Actis; Barili, Fabio; Gelsomino, Sandro; Santè, Pasquale; Corte, Alessandro Della; Carrozza, Antonio; Ratta, Ester Della; Cugola, Diego; Galletti, Lorenzo; Devotini, Roger; Casabona, Riccardo; Santini, Francesco; Salsano, Antonio; Scrofani, Roberto; Antona, Carlo; Botta, Luca; Russo, Claudio; Mancuso, Samuel; Rinaldi, Mauro; De Vincentiis, Carlo; Biondi, Andrea; Beghi, Cesare; Cappabianca, Giangiuseppe; Tarzia, Vincenzo; Gerosa, Gino; De Bonis, Michele; Pozzoli, Alberto; Nicolini, Francesco; Benassi, Filippo; Rosato, Francesco; Grasso, Elena; Livi, Ugolino; Sponga, Sandro; Pacini, Davide; Di Bartolomeo, Roberto; De Martino, Andrea; Bortolotti, Uberto; Onorati, Francesco; Faggian, Giuseppe; Lorusso, Roberto; Vizzardi, Enrico; Di Giammarco, Gabriele; Marinelli, Daniele; Villa, Emmanuel; Troise, Giovanni; Picichè, Marco; Musumeci, Francesco; Paparella, Domenico; Margari, Vito; Tritto, Francesco; Damiani, Girolamo; Scrascia, Giuseppe; Zaccaria, Salvatore; Renzulli, Attilio; Serraino, Giuseppe; Mariscalco, Giovanni; Maselli, Daniele; Foschi, Massimiliano; Parolari, Alessandro; Nappi, Giannantonio
2017-08-15
The aim of this large retrospective study was to provide a logistic risk model along an additive score to predict early mortality after surgical treatment of patients with heart valve or prosthesis infective endocarditis (IE). From 2000 to 2015, 2715 patients with native valve endocarditis (NVE) or prosthesis valve endocarditis (PVE) were operated on in 26 Italian Cardiac Surgery Centers. The relationship between early mortality and covariates was evaluated with logistic mixed effect models. Fixed effects are parameters associated with the entire population or with certain repeatable levels of experimental factors, while random effects are associated with individual experimental units (centers). Early mortality was 11.0% (298/2715); At mixed effect logistic regression the following variables were found associated with early mortality: age class, female gender, LVEF, preoperative shock, COPD, creatinine value above 2mg/dl, presence of abscess, number of treated valve/prosthesis (with respect to one treated valve/prosthesis) and the isolation of Staphylococcus aureus, Fungus spp., Pseudomonas Aeruginosa and other micro-organisms, while Streptococcus spp., Enterococcus spp. and other Staphylococci did not affect early mortality, as well as no micro-organisms isolation. LVEF was found linearly associated with outcomes while non-linear association between mortality and age was tested and the best model was found with a categorization into four classes (AUC=0.851). The following study provides a logistic risk model to predict early mortality in patients with heart valve or prosthesis infective endocarditis undergoing surgical treatment, called "The EndoSCORE". Copyright © 2017. Published by Elsevier B.V.
NASA Astrophysics Data System (ADS)
Wibowo, Wahyu; Wene, Chatrien; Budiantara, I. Nyoman; Permatasari, Erma Oktania
2017-03-01
Multiresponse semiparametric regression is simultaneous equation regression model and fusion of parametric and nonparametric model. The regression model comprise several models and each model has two components, parametric and nonparametric. The used model has linear function as parametric and polynomial truncated spline as nonparametric component. The model can handle both linearity and nonlinearity relationship between response and the sets of predictor variables. The aim of this paper is to demonstrate the application of the regression model for modeling of effect of regional socio-economic on use of information technology. More specific, the response variables are percentage of households has access to internet and percentage of households has personal computer. Then, predictor variables are percentage of literacy people, percentage of electrification and percentage of economic growth. Based on identification of the relationship between response and predictor variable, economic growth is treated as nonparametric predictor and the others are parametric predictors. The result shows that the multiresponse semiparametric regression can be applied well as indicate by the high coefficient determination, 90 percent.
Gimelfarb, A.; Willis, J. H.
1994-01-01
An experiment was conducted to investigate the offspring-parent regression for three quantitative traits (weight, abdominal bristles and wing length) in Drosophila melanogaster. Linear and polynomial models were fitted for the regressions of a character in offspring on both parents. It is demonstrated that responses by the characters to selection predicted by the nonlinear regressions may differ substantially from those predicted by the linear regressions. This is true even, and especially, if selection is weak. The realized heritability for a character under selection is shown to be determined not only by the offspring-parent regression but also by the distribution of the character and by the form and strength of selection. PMID:7828818
Aqil, Muhammad; Kita, Ichiro; Yano, Akira; Nishiyama, Soichi
2007-10-01
Traditionally, the multiple linear regression technique has been one of the most widely used models in simulating hydrological time series. However, when the nonlinear phenomenon is significant, the multiple linear will fail to develop an appropriate predictive model. Recently, neuro-fuzzy systems have gained much popularity for calibrating the nonlinear relationships. This study evaluated the potential of a neuro-fuzzy system as an alternative to the traditional statistical regression technique for the purpose of predicting flow from a local source in a river basin. The effectiveness of the proposed identification technique was demonstrated through a simulation study of the river flow time series of the Citarum River in Indonesia. Furthermore, in order to provide the uncertainty associated with the estimation of river flow, a Monte Carlo simulation was performed. As a comparison, a multiple linear regression analysis that was being used by the Citarum River Authority was also examined using various statistical indices. The simulation results using 95% confidence intervals indicated that the neuro-fuzzy model consistently underestimated the magnitude of high flow while the low and medium flow magnitudes were estimated closer to the observed data. The comparison of the prediction accuracy of the neuro-fuzzy and linear regression methods indicated that the neuro-fuzzy approach was more accurate in predicting river flow dynamics. The neuro-fuzzy model was able to improve the root mean square error (RMSE) and mean absolute percentage error (MAPE) values of the multiple linear regression forecasts by about 13.52% and 10.73%, respectively. Considering its simplicity and efficiency, the neuro-fuzzy model is recommended as an alternative tool for modeling of flow dynamics in the study area.
González-Aparicio, I; Hidalgo, J; Baklanov, A; Padró, A; Santa-Coloma, O
2013-07-01
There is extensive evidence of the negative impacts on health linked to the rise of the regional background of particulate matter (PM) 10 levels. These levels are often increased over urban areas becoming one of the main air pollution concerns. This is the case on the Bilbao metropolitan area, Spain. This study describes a data-driven model to diagnose PM10 levels in Bilbao at hourly intervals. The model is built with a training period of 7-year historical data covering different urban environments (inland, city centre and coastal sites). The explanatory variables are quantitative-log [NO2], temperature, short-wave incoming radiation, wind speed and direction, specific humidity, hour and vehicle intensity-and qualitative-working days/weekends, season (winter/summer), the hour (from 00 to 23 UTC) and precipitation/no precipitation. Three different linear regression models are compared: simple linear regression; linear regression with interaction terms (INT); and linear regression with interaction terms following the Sawa's Bayesian Information Criteria (INT-BIC). Each type of model is calculated selecting two different periods: the training (it consists of 6 years) and the testing dataset (it consists of 1 year). The results of each type of model show that the INT-BIC-based model (R(2) = 0.42) is the best. Results were R of 0.65, 0.63 and 0.60 for the city centre, inland and coastal sites, respectively, a level of confidence similar to the state-of-the art methodology. The related error calculated for longer time intervals (monthly or seasonal means) diminished significantly (R of 0.75-0.80 for monthly means and R of 0.80 to 0.98 at seasonally means) with respect to shorter periods.
NASA Astrophysics Data System (ADS)
Tian, Wenli; Cao, Chengxuan
2017-03-01
A generalized interval fuzzy mixed integer programming model is proposed for the multimodal freight transportation problem under uncertainty, in which the optimal mode of transport and the optimal amount of each type of freight transported through each path need to be decided. For practical purposes, three mathematical methods, i.e. the interval ranking method, fuzzy linear programming method and linear weighted summation method, are applied to obtain equivalents of constraints and parameters, and then a fuzzy expected value model is presented. A heuristic algorithm based on a greedy criterion and the linear relaxation algorithm are designed to solve the model.
Afantitis, Antreas; Melagraki, Georgia; Sarimveis, Haralambos; Koutentis, Panayiotis A; Markopoulos, John; Igglessi-Markopoulou, Olga
2006-08-01
A quantitative-structure activity relationship was obtained by applying Multiple Linear Regression Analysis to a series of 80 1-[2-hydroxyethoxy-methyl]-6-(phenylthio) thymine (HEPT) derivatives with significant anti-HIV activity. For the selection of the best among 37 different descriptors, the Elimination Selection Stepwise Regression Method (ES-SWR) was utilized. The resulting QSAR model (R (2) (CV) = 0.8160; S (PRESS) = 0.5680) proved to be very accurate both in training and predictive stages.
Yang, Xiaowei; Nie, Kun
2008-03-15
Longitudinal data sets in biomedical research often consist of large numbers of repeated measures. In many cases, the trajectories do not look globally linear or polynomial, making it difficult to summarize the data or test hypotheses using standard longitudinal data analysis based on various linear models. An alternative approach is to apply the approaches of functional data analysis, which directly target the continuous nonlinear curves underlying discretely sampled repeated measures. For the purposes of data exploration, many functional data analysis strategies have been developed based on various schemes of smoothing, but fewer options are available for making causal inferences regarding predictor-outcome relationships, a common task seen in hypothesis-driven medical studies. To compare groups of curves, two testing strategies with good power have been proposed for high-dimensional analysis of variance: the Fourier-based adaptive Neyman test and the wavelet-based thresholding test. Using a smoking cessation clinical trial data set, this paper demonstrates how to extend the strategies for hypothesis testing into the framework of functional linear regression models (FLRMs) with continuous functional responses and categorical or continuous scalar predictors. The analysis procedure consists of three steps: first, apply the Fourier or wavelet transform to the original repeated measures; then fit a multivariate linear model in the transformed domain; and finally, test the regression coefficients using either adaptive Neyman or thresholding statistics. Since a FLRM can be viewed as a natural extension of the traditional multiple linear regression model, the development of this model and computational tools should enhance the capacity of medical statistics for longitudinal data.
MIXOR: a computer program for mixed-effects ordinal regression analysis.
Hedeker, D; Gibbons, R D
1996-03-01
MIXOR provides maximum marginal likelihood estimates for mixed-effects ordinal probit, logistic, and complementary log-log regression models. These models can be used for analysis of dichotomous and ordinal outcomes from either a clustered or longitudinal design. For clustered data, the mixed-effects model assumes that data within clusters are dependent. The degree of dependency is jointly estimated with the usual model parameters, thus adjusting for dependence resulting from clustering of the data. Similarly, for longitudinal data, the mixed-effects approach can allow for individual-varying intercepts and slopes across time, and can estimate the degree to which these time-related effects vary in the population of individuals. MIXOR uses marginal maximum likelihood estimation, utilizing a Fisher-scoring solution. For the scoring solution, the Cholesky factor of the random-effects variance-covariance matrix is estimated, along with the effects of model covariates. Examples illustrating usage and features of MIXOR are provided.
Generalized linear mixed models with varying coefficients for longitudinal data.
Zhang, Daowen
2004-03-01
The routinely assumed parametric functional form in the linear predictor of a generalized linear mixed model for longitudinal data may be too restrictive to represent true underlying covariate effects. We relax this assumption by representing these covariate effects by smooth but otherwise arbitrary functions of time, with random effects used to model the correlation induced by among-subject and within-subject variation. Due to the usually intractable integration involved in evaluating the quasi-likelihood function, the double penalized quasi-likelihood (DPQL) approach of Lin and Zhang (1999, Journal of the Royal Statistical Society, Series B61, 381-400) is used to estimate the varying coefficients and the variance components simultaneously by representing a nonparametric function by a linear combination of fixed effects and random effects. A scaled chi-squared test based on the mixed model representation of the proposed model is developed to test whether an underlying varying coefficient is a polynomial of certain degree. We evaluate the performance of the procedures through simulation studies and illustrate their application with Indonesian children infectious disease data.
NASA Technical Reports Server (NTRS)
Schlesinger, Robert E.
1990-01-01
Results are presented from a linear Lagrangian entraining parcel model of an overshooting thunderstorm cloud top. The model, which is similar to that of Adler and Mack (1986), gives analytic exact solutions for vertical velocity and temperature by representing mixing with Rayleigh damping instead of nonlinearly. Model results are presented for various combinations of stratospheric lapse rate, drag intensity, and mixing strength. The results are compared to those of Adler and Mack.
An evaluation of bias in propensity score-adjusted non-linear regression models.
Wan, Fei; Mitra, Nandita
2018-03-01
Propensity score methods are commonly used to adjust for observed confounding when estimating the conditional treatment effect in observational studies. One popular method, covariate adjustment of the propensity score in a regression model, has been empirically shown to be biased in non-linear models. However, no compelling underlying theoretical reason has been presented. We propose a new framework to investigate bias and consistency of propensity score-adjusted treatment effects in non-linear models that uses a simple geometric approach to forge a link between the consistency of the propensity score estimator and the collapsibility of non-linear models. Under this framework, we demonstrate that adjustment of the propensity score in an outcome model results in the decomposition of observed covariates into the propensity score and a remainder term. Omission of this remainder term from a non-collapsible regression model leads to biased estimates of the conditional odds ratio and conditional hazard ratio, but not for the conditional rate ratio. We further show, via simulation studies, that the bias in these propensity score-adjusted estimators increases with larger treatment effect size, larger covariate effects, and increasing dissimilarity between the coefficients of the covariates in the treatment model versus the outcome model.
NASA Astrophysics Data System (ADS)
Kutzbach, L.; Schneider, J.; Sachs, T.; Giebels, M.; Nykänen, H.; Shurpali, N. J.; Martikainen, P. J.; Alm, J.; Wilmking, M.
2007-07-01
Closed (non-steady state) chambers are widely used for quantifying carbon dioxide (CO2) fluxes between soils or low-stature canopies and the atmosphere. It is well recognised that covering a soil or vegetation by a closed chamber inherently disturbs the natural CO2 fluxes by altering the concentration gradients between the soil, the vegetation and the overlying air. Thus, the driving factors of CO2 fluxes are not constant during the closed chamber experiment, and no linear increase or decrease of CO2 concentration over time within the chamber headspace can be expected. Nevertheless, linear regression has been applied for calculating CO2 fluxes in many recent, partly influential, studies. This approach was justified by keeping the closure time short and assuming the concentration change over time to be in the linear range. Here, we test if the application of linear regression is really appropriate for estimating CO2 fluxes using closed chambers over short closure times and if the application of nonlinear regression is necessary. We developed a nonlinear exponential regression model from diffusion and photosynthesis theory. This exponential model was tested with four different datasets of CO2 flux measurements (total number: 1764) conducted at three peatland sites in Finland and a tundra site in Siberia. The flux measurements were performed using transparent chambers on vegetated surfaces and opaque chambers on bare peat surfaces. Thorough analyses of residuals demonstrated that linear regression was frequently not appropriate for the determination of CO2 fluxes by closed-chamber methods, even if closure times were kept short. The developed exponential model was well suited for nonlinear regression of the concentration over time c(t) evolution in the chamber headspace and estimation of the initial CO2 fluxes at closure time for the majority of experiments. CO2 flux estimates by linear regression can be as low as 40% of the flux estimates of exponential regression for closure times of only two minutes and even lower for longer closure times. The degree of underestimation increased with increasing CO2 flux strength and is dependent on soil and vegetation conditions which can disturb not only the quantitative but also the qualitative evaluation of CO2 flux dynamics. The underestimation effect by linear regression was observed to be different for CO2 uptake and release situations which can lead to stronger bias in the daily, seasonal and annual CO2 balances than in the individual fluxes. To avoid serious bias of CO2 flux estimates based on closed chamber experiments, we suggest further tests using published datasets and recommend the use of nonlinear regression models for future closed chamber studies.
Senn, Stephen; Graf, Erika; Caputo, Angelika
2007-12-30
Stratifying and matching by the propensity score are increasingly popular approaches to deal with confounding in medical studies investigating effects of a treatment or exposure. A more traditional alternative technique is the direct adjustment for confounding in regression models. This paper discusses fundamental differences between the two approaches, with a focus on linear regression and propensity score stratification, and identifies points to be considered for an adequate comparison. The treatment estimators are examined for unbiasedness and efficiency. This is illustrated in an application to real data and supplemented by an investigation on properties of the estimators for a range of underlying linear models. We demonstrate that in specific circumstances the propensity score estimator is identical to the effect estimated from a full linear model, even if it is built on coarser covariate strata than the linear model. As a consequence the coarsening property of the propensity score-adjustment for a one-dimensional confounder instead of a high-dimensional covariate-may be viewed as a way to implement a pre-specified, richly parametrized linear model. We conclude that the propensity score estimator inherits the potential for overfitting and that care should be taken to restrict covariates to those relevant for outcome. Copyright (c) 2007 John Wiley & Sons, Ltd.
Approximating a nonlinear advanced-delayed equation from acoustics
NASA Astrophysics Data System (ADS)
Teodoro, M. Filomena
2016-10-01
We approximate the solution of a particular non-linear mixed type functional differential equation from physiology, the mucosal wave model of the vocal oscillation during phonation. The mathematical equation models a superficial wave propagating through the tissues. The numerical scheme is adapted from the work presented in [1, 2, 3], using homotopy analysis method (HAM) to solve the non linear mixed type equation under study.
Improving the Power of GWAS and Avoiding Confounding from Population Stratification with PC-Select
Tucker, George; Price, Alkes L.; Berger, Bonnie
2014-01-01
Using a reduced subset of SNPs in a linear mixed model can improve power for genome-wide association studies, yet this can result in insufficient correction for population stratification. We propose a hybrid approach using principal components that does not inflate statistics in the presence of population stratification and improves power over standard linear mixed models. PMID:24788602
Empirical Models for the Shielding and Reflection of Jet Mixing Noise by a Surface
NASA Technical Reports Server (NTRS)
Brown, Cliff
2015-01-01
Empirical models for the shielding and refection of jet mixing noise by a nearby surface are described and the resulting models evaluated. The flow variables are used to non-dimensionalize the surface position variables, reducing the variable space and producing models that are linear function of non-dimensional surface position and logarithmic in Strouhal frequency. A separate set of coefficients are determined at each observer angle in the dataset and linear interpolation is used to for the intermediate observer angles. The shielding and rejection models are then combined with existing empirical models for the jet mixing and jet-surface interaction noise sources to produce predicted spectra for a jet operating near a surface. These predictions are then evaluated against experimental data.
Empirical Models for the Shielding and Reflection of Jet Mixing Noise by a Surface
NASA Technical Reports Server (NTRS)
Brown, Clifford A.
2016-01-01
Empirical models for the shielding and reflection of jet mixing noise by a nearby surface are described and the resulting models evaluated. The flow variables are used to non-dimensionalize the surface position variables, reducing the variable space and producing models that are linear function of non-dimensional surface position and logarithmic in Strouhal frequency. A separate set of coefficients are determined at each observer angle in the dataset and linear interpolation is used to for the intermediate observer angles. The shielding and reflection models are then combined with existing empirical models for the jet mixing and jet-surface interaction noise sources to produce predicted spectra for a jet operating near a surface. These predictions are then evaluated against experimental data.
Rotolo, Federico; Paoletti, Xavier; Michiels, Stefan
2018-03-01
Surrogate endpoints are attractive for use in clinical trials instead of well-established endpoints because of practical convenience. To validate a surrogate endpoint, two important measures can be estimated in a meta-analytic context when individual patient data are available: the R indiv 2 or the Kendall's τ at the individual level, and the R trial 2 at the trial level. We aimed at providing an R implementation of classical and well-established as well as more recent statistical methods for surrogacy assessment with failure time endpoints. We also intended incorporating utilities for model checking and visualization and data generating methods described in the literature to date. In the case of failure time endpoints, the classical approach is based on two steps. First, a Kendall's τ is estimated as measure of individual level surrogacy using a copula model. Then, the R trial 2 is computed via a linear regression of the estimated treatment effects; at this second step, the estimation uncertainty can be accounted for via measurement-error model or via weights. In addition to the classical approach, we recently developed an approach based on bivariate auxiliary Poisson models with individual random effects to measure the Kendall's τ and treatment-by-trial interactions to measure the R trial 2 . The most common data simulation models described in the literature are based on: copula models, mixed proportional hazard models, and mixture of half-normal and exponential random variables. The R package surrosurv implements the classical two-step method with Clayton, Plackett, and Hougaard copulas. It also allows to optionally adjusting the second-step linear regression for measurement-error. The mixed Poisson approach is implemented with different reduced models in addition to the full model. We present the package functions for estimating the surrogacy models, for checking their convergence, for performing leave-one-trial-out cross-validation, and for plotting the results. We illustrate their use in practice on individual patient data from a meta-analysis of 4069 patients with advanced gastric cancer from 20 trials of chemotherapy. The surrosurv package provides an R implementation of classical and recent statistical methods for surrogacy assessment of failure time endpoints. Flexible simulation functions are available to generate data according to the methods described in the literature. Copyright © 2017 Elsevier B.V. All rights reserved.
Linear Mixed Models: Gum and Beyond
NASA Astrophysics Data System (ADS)
Arendacká, Barbora; Täubner, Angelika; Eichstädt, Sascha; Bruns, Thomas; Elster, Clemens
2014-04-01
In Annex H.5, the Guide to the Evaluation of Uncertainty in Measurement (GUM) [1] recognizes the necessity to analyze certain types of experiments by applying random effects ANOVA models. These belong to the more general family of linear mixed models that we focus on in the current paper. Extending the short introduction provided by the GUM, our aim is to show that the more general, linear mixed models cover a wider range of situations occurring in practice and can be beneficial when employed in data analysis of long-term repeated experiments. Namely, we point out their potential as an aid in establishing an uncertainty budget and as means for gaining more insight into the measurement process. We also comment on computational issues and to make the explanations less abstract, we illustrate all the concepts with the help of a measurement campaign conducted in order to challenge the uncertainty budget in calibration of accelerometers.
Wang, Huifang; Xiao, Bo; Wang, Mingyu; Shao, Ming'an
2013-01-01
Soil water retention parameters are critical to quantify flow and solute transport in vadose zone, while the presence of rock fragments remarkably increases their variability. Therefore a novel method for determining water retention parameters of soil-gravel mixtures is required. The procedure to generate such a model is based firstly on the determination of the quantitative relationship between the content of rock fragments and the effective saturation of soil-gravel mixtures, and then on the integration of this relationship with former analytical equations of water retention curves (WRCs). In order to find such relationships, laboratory experiments were conducted to determine WRCs of soil-gravel mixtures obtained with a clay loam soil mixed with shale clasts or pebbles in three size groups with various gravel contents. Data showed that the effective saturation of the soil-gravel mixtures with the same kind of gravels within one size group had a linear relation with gravel contents, and had a power relation with the bulk density of samples at any pressure head. Revised formulas for water retention properties of the soil-gravel mixtures are proposed to establish the water retention curved surface models of the power-linear functions and power functions. The analysis of the parameters obtained by regression and validation of the empirical models showed that they were acceptable by using either the measured data of separate gravel size group or those of all the three gravel size groups having a large size range. Furthermore, the regression parameters of the curved surfaces for the soil-gravel mixtures with a large range of gravel content could be determined from the water retention data of the soil-gravel mixtures with two representative gravel contents or bulk densities. Such revised water retention models are potentially applicable in regional or large scale field investigations of significantly heterogeneous media, where various gravel sizes and different gravel contents are present.
Wang, Huifang; Xiao, Bo; Wang, Mingyu; Shao, Ming'an
2013-01-01
Soil water retention parameters are critical to quantify flow and solute transport in vadose zone, while the presence of rock fragments remarkably increases their variability. Therefore a novel method for determining water retention parameters of soil-gravel mixtures is required. The procedure to generate such a model is based firstly on the determination of the quantitative relationship between the content of rock fragments and the effective saturation of soil-gravel mixtures, and then on the integration of this relationship with former analytical equations of water retention curves (WRCs). In order to find such relationships, laboratory experiments were conducted to determine WRCs of soil-gravel mixtures obtained with a clay loam soil mixed with shale clasts or pebbles in three size groups with various gravel contents. Data showed that the effective saturation of the soil-gravel mixtures with the same kind of gravels within one size group had a linear relation with gravel contents, and had a power relation with the bulk density of samples at any pressure head. Revised formulas for water retention properties of the soil-gravel mixtures are proposed to establish the water retention curved surface models of the power-linear functions and power functions. The analysis of the parameters obtained by regression and validation of the empirical models showed that they were acceptable by using either the measured data of separate gravel size group or those of all the three gravel size groups having a large size range. Furthermore, the regression parameters of the curved surfaces for the soil-gravel mixtures with a large range of gravel content could be determined from the water retention data of the soil-gravel mixtures with two representative gravel contents or bulk densities. Such revised water retention models are potentially applicable in regional or large scale field investigations of significantly heterogeneous media, where various gravel sizes and different gravel contents are present. PMID:23555040
Determining Predictor Importance in Hierarchical Linear Models Using Dominance Analysis
ERIC Educational Resources Information Center
Luo, Wen; Azen, Razia
2013-01-01
Dominance analysis (DA) is a method used to evaluate the relative importance of predictors that was originally proposed for linear regression models. This article proposes an extension of DA that allows researchers to determine the relative importance of predictors in hierarchical linear models (HLM). Commonly used measures of model adequacy in…
Quantile Regression in the Study of Developmental Sciences
Petscher, Yaacov; Logan, Jessica A. R.
2014-01-01
Linear regression analysis is one of the most common techniques applied in developmental research, but only allows for an estimate of the average relations between the predictor(s) and the outcome. This study describes quantile regression, which provides estimates of the relations between the predictor(s) and outcome, but across multiple points of the outcome’s distribution. Using data from the High School and Beyond and U.S. Sustained Effects Study databases, quantile regression is demonstrated and contrasted with linear regression when considering models with: (a) one continuous predictor, (b) one dichotomous predictor, (c) a continuous and a dichotomous predictor, and (d) a longitudinal application. Results from each example exhibited the differential inferences which may be drawn using linear or quantile regression. PMID:24329596
Modelling daily water temperature from air temperature for the Missouri River.
Zhu, Senlin; Nyarko, Emmanuel Karlo; Hadzima-Nyarko, Marijana
2018-01-01
The bio-chemical and physical characteristics of a river are directly affected by water temperature, which thereby affects the overall health of aquatic ecosystems. It is a complex problem to accurately estimate water temperature. Modelling of river water temperature is usually based on a suitable mathematical model and field measurements of various atmospheric factors. In this article, the air-water temperature relationship of the Missouri River is investigated by developing three different machine learning models (Artificial Neural Network (ANN), Gaussian Process Regression (GPR), and Bootstrap Aggregated Decision Trees (BA-DT)). Standard models (linear regression, non-linear regression, and stochastic models) are also developed and compared to machine learning models. Analyzing the three standard models, the stochastic model clearly outperforms the standard linear model and nonlinear model. All the three machine learning models have comparable results and outperform the stochastic model, with GPR having slightly better results for stations No. 2 and 3, while BA-DT has slightly better results for station No. 1. The machine learning models are very effective tools which can be used for the prediction of daily river temperature.
Madarang, Krish J; Kang, Joo-Hyon
2014-06-01
Stormwater runoff has been identified as a source of pollution for the environment, especially for receiving waters. In order to quantify and manage the impacts of stormwater runoff on the environment, predictive models and mathematical models have been developed. Predictive tools such as regression models have been widely used to predict stormwater discharge characteristics. Storm event characteristics, such as antecedent dry days (ADD), have been related to response variables, such as pollutant loads and concentrations. However it has been a controversial issue among many studies to consider ADD as an important variable in predicting stormwater discharge characteristics. In this study, we examined the accuracy of general linear regression models in predicting discharge characteristics of roadway runoff. A total of 17 storm events were monitored in two highway segments, located in Gwangju, Korea. Data from the monitoring were used to calibrate United States Environmental Protection Agency's Storm Water Management Model (SWMM). The calibrated SWMM was simulated for 55 storm events, and the results of total suspended solid (TSS) discharge loads and event mean concentrations (EMC) were extracted. From these data, linear regression models were developed. R(2) and p-values of the regression of ADD for both TSS loads and EMCs were investigated. Results showed that pollutant loads were better predicted than pollutant EMC in the multiple regression models. Regression may not provide the true effect of site-specific characteristics, due to uncertainty in the data. Copyright © 2014 The Research Centre for Eco-Environmental Sciences, Chinese Academy of Sciences. Published by Elsevier B.V. All rights reserved.
Cernuda, Carlos; Lughofer, Edwin; Klein, Helmut; Forster, Clemens; Pawliczek, Marcin; Brandstetter, Markus
2017-01-01
During the production process of beer, it is of utmost importance to guarantee a high consistency of the beer quality. For instance, the bitterness is an essential quality parameter which has to be controlled within the specifications at the beginning of the production process in the unfermented beer (wort) as well as in final products such as beer and beer mix beverages. Nowadays, analytical techniques for quality control in beer production are mainly based on manual supervision, i.e., samples are taken from the process and analyzed in the laboratory. This typically requires significant lab technicians efforts for only a small fraction of samples to be analyzed, which leads to significant costs for beer breweries and companies. Fourier transform mid-infrared (FT-MIR) spectroscopy was used in combination with nonlinear multivariate calibration techniques to overcome (i) the time consuming off-line analyses in beer production and (ii) already known limitations of standard linear chemometric methods, like partial least squares (PLS), for important quality parameters Speers et al. (J I Brewing. 2003;109(3):229-235), Zhang et al. (J I Brewing. 2012;118(4):361-367) such as bitterness, citric acid, total acids, free amino nitrogen, final attenuation, or foam stability. The calibration models are established with enhanced nonlinear techniques based (i) on a new piece-wise linear version of PLS by employing fuzzy rules for local partitioning the latent variable space and (ii) on extensions of support vector regression variants (-PLSSVR and ν-PLSSVR), for overcoming high computation times in high-dimensional problems and time-intensive and inappropriate settings of the kernel parameters. Furthermore, we introduce a new model selection scheme based on bagged ensembles in order to improve robustness and thus predictive quality of the final models. The approaches are tested on real-world calibration data sets for wort and beer mix beverages, and successfully compared to linear methods, showing a clear out-performance in most cases and being able to meet the model quality requirements defined by the experts at the beer company. Figure Workflow for calibration of non-Linear model ensembles from FT-MIR spectra in beer production .
Predicting U.S. Army Reserve Unit Manning Using Market Demographics
2015-06-01
develops linear regression , classification tree, and logistic regression models to determine the ability of the location to support manning requirements... logistic regression model delivers predictive results that allow decision-makers to identify locations with a high probability of meeting unit...manning requirements. The recommendation of this thesis is that the USAR implement the logistic regression model. 14. SUBJECT TERMS U.S
Weigel, K A; VanRaden, P M; Norman, H D; Grosu, H
2017-12-01
In the early 1900s, breed society herdbooks had been established and milk-recording programs were in their infancy. Farmers wanted to improve the productivity of their cattle, but the foundations of population genetics, quantitative genetics, and animal breeding had not been laid. Early animal breeders struggled to identify genetically superior families using performance records that were influenced by local environmental conditions and herd-specific management practices. Daughter-dam comparisons were used for more than 30 yr and, although genetic progress was minimal, the attention given to performance recording, genetic theory, and statistical methods paid off in future years. Contemporary (herdmate) comparison methods allowed more accurate accounting for environmental factors and genetic progress began to accelerate when these methods were coupled with artificial insemination and progeny testing. Advances in computing facilitated the implementation of mixed linear models that used pedigree and performance data optimally and enabled accurate selection decisions. Sequencing of the bovine genome led to a revolution in dairy cattle breeding, and the pace of scientific discovery and genetic progress accelerated rapidly. Pedigree-based models have given way to whole-genome prediction, and Bayesian regression models and machine learning algorithms have joined mixed linear models in the toolbox of modern animal breeders. Future developments will likely include elucidation of the mechanisms of genetic inheritance and epigenetic modification in key biological pathways, and genomic data will be used with data from on-farm sensors to facilitate precision management on modern dairy farms. Copyright © 2017 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Ernst, Anja F; Albers, Casper J
2017-01-01
Misconceptions about the assumptions behind the standard linear regression model are widespread and dangerous. These lead to using linear regression when inappropriate, and to employing alternative procedures with less statistical power when unnecessary. Our systematic literature review investigated employment and reporting of assumption checks in twelve clinical psychology journals. Findings indicate that normality of the variables themselves, rather than of the errors, was wrongfully held for a necessary assumption in 4% of papers that use regression. Furthermore, 92% of all papers using linear regression were unclear about their assumption checks, violating APA-recommendations. This paper appeals for a heightened awareness for and increased transparency in the reporting of statistical assumption checking.
Ernst, Anja F.
2017-01-01
Misconceptions about the assumptions behind the standard linear regression model are widespread and dangerous. These lead to using linear regression when inappropriate, and to employing alternative procedures with less statistical power when unnecessary. Our systematic literature review investigated employment and reporting of assumption checks in twelve clinical psychology journals. Findings indicate that normality of the variables themselves, rather than of the errors, was wrongfully held for a necessary assumption in 4% of papers that use regression. Furthermore, 92% of all papers using linear regression were unclear about their assumption checks, violating APA-recommendations. This paper appeals for a heightened awareness for and increased transparency in the reporting of statistical assumption checking. PMID:28533971
SEMIPARAMETRIC QUANTILE REGRESSION WITH HIGH-DIMENSIONAL COVARIATES
Zhu, Liping; Huang, Mian; Li, Runze
2012-01-01
This paper is concerned with quantile regression for a semiparametric regression model, in which both the conditional mean and conditional variance function of the response given the covariates admit a single-index structure. This semiparametric regression model enables us to reduce the dimension of the covariates and simultaneously retains the flexibility of nonparametric regression. Under mild conditions, we show that the simple linear quantile regression offers a consistent estimate of the index parameter vector. This is a surprising and interesting result because the single-index model is possibly misspecified under the linear quantile regression. With a root-n consistent estimate of the index vector, one may employ a local polynomial regression technique to estimate the conditional quantile function. This procedure is computationally efficient, which is very appealing in high-dimensional data analysis. We show that the resulting estimator of the quantile function performs asymptotically as efficiently as if the true value of the index vector were known. The methodologies are demonstrated through comprehensive simulation studies and an application to a real dataset. PMID:24501536
Logistic Mixed Models to Investigate Implicit and Explicit Belief Tracking.
Lages, Martin; Scheel, Anne
2016-01-01
We investigated the proposition of a two-systems Theory of Mind in adults' belief tracking. A sample of N = 45 participants predicted the choice of one of two opponent players after observing several rounds in an animated card game. Three matches of this card game were played and initial gaze direction on target and subsequent choice predictions were recorded for each belief task and participant. We conducted logistic regressions with mixed effects on the binary data and developed Bayesian logistic mixed models to infer implicit and explicit mentalizing in true belief and false belief tasks. Although logistic regressions with mixed effects predicted the data well a Bayesian logistic mixed model with latent task- and subject-specific parameters gave a better account of the data. As expected explicit choice predictions suggested a clear understanding of true and false beliefs (TB/FB). Surprisingly, however, model parameters for initial gaze direction also indicated belief tracking. We discuss why task-specific parameters for initial gaze directions are different from choice predictions yet reflect second-order perspective taking.
The long-solved problem of the best-fit straight line: application to isotopic mixing lines
NASA Astrophysics Data System (ADS)
Wehr, Richard; Saleska, Scott R.
2017-01-01
It has been almost 50 years since York published an exact and general solution for the best-fit straight line to independent points with normally distributed errors in both x and y. York's solution is highly cited in the geophysical literature but almost unknown outside of it, so that there has been no ebb in the tide of books and papers wrestling with the problem. Much of the post-1969 literature on straight-line fitting has sown confusion not merely by its content but by its very existence. The optimal least-squares fit is already known; the problem is already solved. Here we introduce the non-specialist reader to York's solution and demonstrate its application in the interesting case of the isotopic mixing line, an analytical tool widely used to determine the isotopic signature of trace gas sources for the study of biogeochemical cycles. The most commonly known linear regression methods - ordinary least-squares regression (OLS), geometric mean regression (GMR), and orthogonal distance regression (ODR) - have each been recommended as the best method for fitting isotopic mixing lines. In fact, OLS, GMR, and ODR are all special cases of York's solution that are valid only under particular measurement conditions, and those conditions do not hold in general for isotopic mixing lines. Using Monte Carlo simulations, we quantify the biases in OLS, GMR, and ODR under various conditions and show that York's general - and convenient - solution is always the least biased.
NASA Astrophysics Data System (ADS)
Jakubowski, J.; Stypulkowski, J. B.; Bernardeau, F. G.
2017-12-01
The first phase of the Abu Hamour drainage and storm tunnel was completed in early 2017. The 9.5 km long, 3.7 m diameter tunnel was excavated with two Earth Pressure Balance (EPB) Tunnel Boring Machines from Herrenknecht. TBM operation processes were monitored and recorded by Data Acquisition and Evaluation System. The authors coupled collected TBM drive data with available information on rock mass properties, cleansed, completed with secondary variables and aggregated by weeks and shifts. Correlations and descriptive statistics charts were examined. Multivariate Linear Regression and CART regression tree models linking TBM penetration rate (PR), penetration per revolution (PPR) and field penetration index (FPI) with TBM operational and geotechnical characteristics were performed for the conditions of the weak/soft rock of Doha. Both regression methods are interpretable and the data were screened with different computational approaches allowing enriched insight. The primary goal of the analysis was to investigate empirical relations between multiple explanatory and responding variables, to search for best subsets of explanatory variables and to evaluate the strength of linear and non-linear relations. For each of the penetration indices, a predictive model coupling both regression methods was built and validated. The resultant models appeared to be stronger than constituent ones and indicated an opportunity for more accurate and robust TBM performance predictions.
Logistic models--an odd(s) kind of regression.
Jupiter, Daniel C
2013-01-01
The logistic regression model bears some similarity to the multivariable linear regression with which we are familiar. However, the differences are great enough to warrant a discussion of the need for and interpretation of logistic regression. Copyright © 2013 American College of Foot and Ankle Surgeons. Published by Elsevier Inc. All rights reserved.
Height and Weight Estimation From Anthropometric Measurements Using Machine Learning Regressions
Fernandes, Bruno J. T.; Roque, Alexandre
2018-01-01
Height and weight are measurements explored to tracking nutritional diseases, energy expenditure, clinical conditions, drug dosages, and infusion rates. Many patients are not ambulant or may be unable to communicate, and a sequence of these factors may not allow accurate estimation or measurements; in those cases, it can be estimated approximately by anthropometric means. Different groups have proposed different linear or non-linear equations which coefficients are obtained by using single or multiple linear regressions. In this paper, we present a complete study of the application of different learning models to estimate height and weight from anthropometric measurements: support vector regression, Gaussian process, and artificial neural networks. The predicted values are significantly more accurate than that obtained with conventional linear regressions. In all the cases, the predictions are non-sensitive to ethnicity, and to gender, if more than two anthropometric parameters are analyzed. The learning model analysis creates new opportunities for anthropometric applications in industry, textile technology, security, and health care. PMID:29651366
Skew-t partially linear mixed-effects models for AIDS clinical studies.
Lu, Tao
2016-01-01
We propose partially linear mixed-effects models with asymmetry and missingness to investigate the relationship between two biomarkers in clinical studies. The proposed models take into account irregular time effects commonly observed in clinical studies under a semiparametric model framework. In addition, commonly assumed symmetric distributions for model errors are substituted by asymmetric distribution to account for skewness. Further, informative missing data mechanism is accounted for. A Bayesian approach is developed to perform parameter estimation simultaneously. The proposed model and method are applied to an AIDS dataset and comparisons with alternative models are performed.
Chen, Han; Wang, Chaolong; Conomos, Matthew P.; Stilp, Adrienne M.; Li, Zilin; Sofer, Tamar; Szpiro, Adam A.; Chen, Wei; Brehm, John M.; Celedón, Juan C.; Redline, Susan; Papanicolaou, George J.; Thornton, Timothy A.; Laurie, Cathy C.; Rice, Kenneth; Lin, Xihong
2016-01-01
Linear mixed models (LMMs) are widely used in genome-wide association studies (GWASs) to account for population structure and relatedness, for both continuous and binary traits. Motivated by the failure of LMMs to control type I errors in a GWAS of asthma, a binary trait, we show that LMMs are generally inappropriate for analyzing binary traits when population stratification leads to violation of the LMM’s constant-residual variance assumption. To overcome this problem, we develop a computationally efficient logistic mixed model approach for genome-wide analysis of binary traits, the generalized linear mixed model association test (GMMAT). This approach fits a logistic mixed model once per GWAS and performs score tests under the null hypothesis of no association between a binary trait and individual genetic variants. We show in simulation studies and real data analysis that GMMAT effectively controls for population structure and relatedness when analyzing binary traits in a wide variety of study designs. PMID:27018471
Anodic microbial community diversity as a predictor of the power output of microbial fuel cells.
Stratford, James P; Beecroft, Nelli J; Slade, Robert C T; Grüning, André; Avignone-Rossa, Claudio
2014-03-01
The relationship between the diversity of mixed-species microbial consortia and their electrogenic potential in the anodes of microbial fuel cells was examined using different diversity measures as predictors. Identical microbial fuel cells were sampled at multiple time-points. Biofilm and suspension communities were analysed by denaturing gradient gel electrophoresis to calculate the number and relative abundance of species. Shannon and Simpson indices and richness were examined for association with power using bivariate and multiple linear regression, with biofilm DNA as an additional variable. In simple bivariate regressions, the correlation of Shannon diversity of the biofilm and power is stronger (r=0.65, p=0.001) than between power and richness (r=0.39, p=0.076), or between power and the Simpson index (r=0.5, p=0.018). Using Shannon diversity and biofilm DNA as predictors of power, a regression model can be constructed (r=0.73, p<0.001). Ecological parameters such as the Shannon index are predictive of the electrogenic potential of microbial communities. Copyright © 2014 Elsevier Ltd. All rights reserved.
Understory response following varying levels of overstory removal in mixed conifer stands
Fabian C.C. Uzoh; Leroy K. Dolph; John R. Anstead
1997-01-01
Diameter growth rates of understory trees were measured for periods both before and after overstory removal on six study areas in northern California. All the species responded with increased diameter growth after adjusting to their new environments. Linear regression equations that predict post treatment diameter growth increment of the residual trees are presented...
Non-Linear Approach in Kinesiology Should Be Preferred to the Linear--A Case of Basketball.
Trninić, Marko; Jeličić, Mario; Papić, Vladan
2015-07-01
In kinesiology, medicine, biology and psychology, in which research focus is on dynamical self-organized systems, complex connections exist between variables. Non-linear nature of complex systems has been discussed and explained by the example of non-linear anthropometric predictors of performance in basketball. Previous studies interpreted relations between anthropometric features and measures of effectiveness in basketball by (a) using linear correlation models, and by (b) including all basketball athletes in the same sample of participants regardless of their playing position. In this paper the significance and character of linear and non-linear relations between simple anthropometric predictors (AP) and performance criteria consisting of situation-related measures of effectiveness (SE) in basketball were determined and evaluated. The sample of participants consisted of top-level junior basketball players divided in three groups according to their playing time (8 minutes and more per game) and playing position: guards (N = 42), forwards (N = 26) and centers (N = 40). Linear (general model) and non-linear (general model) regression models were calculated simultaneously and separately for each group. The conclusion is viable: non-linear regressions are frequently superior to linear correlations when interpreting actual association logic among research variables.
Goodarzi, Mohammad; Jensen, Richard; Vander Heyden, Yvan
2012-12-01
A Quantitative Structure-Retention Relationship (QSRR) is proposed to estimate the chromatographic retention of 83 diverse drugs on a Unisphere poly butadiene (PBD) column, using isocratic elutions at pH 11.7. Previous work has generated QSRR models for them using Classification And Regression Trees (CART). In this work, Ant Colony Optimization is used as a feature selection method to find the best molecular descriptors from a large pool. In addition, several other selection methods have been applied, such as Genetic Algorithms, Stepwise Regression and the Relief method, not only to evaluate Ant Colony Optimization as a feature selection method but also to investigate its ability to find the important descriptors in QSRR. Multiple Linear Regression (MLR) and Support Vector Machines (SVMs) were applied as linear and nonlinear regression methods, respectively, giving excellent correlation between the experimental, i.e. extrapolated to a mobile phase consisting of pure water, and predicted logarithms of the retention factors of the drugs (logk(w)). The overall best model was the SVM one built using descriptors selected by ACO. Copyright © 2012 Elsevier B.V. All rights reserved.
Use of AMMI and linear regression models to analyze genotype-environment interaction in durum wheat.
Nachit, M M; Nachit, G; Ketata, H; Gauch, H G; Zobel, R W
1992-03-01
The joint durum wheat (Triticum turgidum L var 'durum') breeding program of the International Maize and Wheat Improvement Center (CIMMYT) and the International Center for Agricultural Research in the Dry Areas (ICARDA) for the Mediterranean region employs extensive multilocation testing. Multilocation testing produces significant genotype-environment (GE) interaction that reduces the accuracy for estimating yield and selecting appropriate germ plasm. The sum of squares (SS) of GE interaction was partitioned by linear regression techniques into joint, genotypic, and environmental regressions, and by Additive Main effects and the Multiplicative Interactions (AMMI) model into five significant Interaction Principal Component Axes (IPCA). The AMMI model was more effective in partitioning the interaction SS than the linear regression technique. The SS contained in the AMMI model was 6 times higher than the SS for all three regressions. Postdictive assessment recommended the use of the first five IPCA axes, while predictive assessment AMMI1 (main effects plus IPCA1). After elimination of random variation, AMMI1 estimates for genotypic yields within sites were more precise than unadjusted means. This increased precision was equivalent to increasing the number of replications by a factor of 3.7.
The Relationship Between Oxygen Reserve Index and Arterial Partial Pressure of Oxygen During Surgery
Dorotta, Ihab L.; Wells, Briana; Juma, David; Applegate, Patricia M.
2016-01-01
BACKGROUND: The use of intraoperative pulse oximetry (Spo2) enhances hypoxia detection and is associated with fewer perioperative hypoxic events. However, Spo2 may be reported as 98% when arterial partial pressure of oxygen (Pao2) is as low as 70 mm Hg. Therefore, Spo2 may not provide advance warning of falling arterial oxygenation until Pao2 approaches this level. Multiwave pulse co-oximetry can provide a calculated oxygen reserve index (ORI) that may add to information from pulse oximetry when Spo2 is >98%. This study evaluates the ORI to Pao2 relationship during surgery. METHODS: We studied patients undergoing scheduled surgery in which arterial catheterization and intraoperative arterial blood gas analysis were planned. Data from multiple pulse co-oximetry sensors on each patient were continuously collected and stored on a research computer. Regression analysis was used to compare ORI with Pao2 obtained from each arterial blood gas measurement and changes in ORI with changes in Pao2 from sequential measurements. Linear mixed-effects regression models for repeated measures were then used to account for within-subject correlation across the repeatedly measured Pao2 and ORI and for the unequal time intervals of Pao2 determination over elapsed surgical time. Regression plots were inspected for ORI values corresponding to Pao2 of 100 and 150 mm Hg. ORI and Pao2 were compared using mixed-effects models with a subject-specific random intercept. RESULTS: ORI values and Pao2 measurements were obtained from intraoperative data collected from 106 patients. Regression analysis showed that the ORI to Pao2 relationship was stronger for Pao2 to 240 mm Hg (r2 = 0.536) than for Pao2 over 240 mm Hg (r2 = 0.0016). Measured Pao2 was ≥100 mm Hg for all ORI over 0.24. Measured Pao2 was ≥150 mm Hg in 96.6% of samples when ORI was over 0.55. A random intercept variance component linear mixed-effects model for repeated measures indicated that Pao2 was significantly related to ORI (β[95% confidence interval] = 0.002 [0.0019–0.0022]; P < 0.0001). A similar analysis indicated a significant relationship between change in Pao2 and change in ORI (β [95% confidence interval] = 0.0044 [0.0040–0.0048]; P < 0.0001). CONCLUSIONS: These findings suggest that ORI >0.24 can distinguish Pao2 ≥100 mm Hg when Spo2 is over 98%. Similarly, ORI > 0.55 appears to be a threshold to distinguish Pao2 ≥150 mm Hg. The usefulness of these values should be evaluated prospectively. Decreases in ORI to near 0.24 may provide advance indication of falling Pao2 approaching 100 mm Hg when Spo2 is >98%. The clinical utility of interventions based on continuous ORI monitoring should be studied prospectively. PMID:27007078
Applegate, Richard L; Dorotta, Ihab L; Wells, Briana; Juma, David; Applegate, Patricia M
2016-09-01
The use of intraoperative pulse oximetry (SpO2) enhances hypoxia detection and is associated with fewer perioperative hypoxic events. However, SpO2 may be reported as 98% when arterial partial pressure of oxygen (PaO2) is as low as 70 mm Hg. Therefore, SpO2 may not provide advance warning of falling arterial oxygenation until PaO2 approaches this level. Multiwave pulse co-oximetry can provide a calculated oxygen reserve index (ORI) that may add to information from pulse oximetry when SpO2 is >98%. This study evaluates the ORI to PaO2 relationship during surgery. We studied patients undergoing scheduled surgery in which arterial catheterization and intraoperative arterial blood gas analysis were planned. Data from multiple pulse co-oximetry sensors on each patient were continuously collected and stored on a research computer. Regression analysis was used to compare ORI with PaO2 obtained from each arterial blood gas measurement and changes in ORI with changes in PaO2 from sequential measurements. Linear mixed-effects regression models for repeated measures were then used to account for within-subject correlation across the repeatedly measured PaO2 and ORI and for the unequal time intervals of PaO2 determination over elapsed surgical time. Regression plots were inspected for ORI values corresponding to PaO2 of 100 and 150 mm Hg. ORI and PaO2 were compared using mixed-effects models with a subject-specific random intercept. ORI values and PaO2 measurements were obtained from intraoperative data collected from 106 patients. Regression analysis showed that the ORI to PaO2 relationship was stronger for PaO2 to 240 mm Hg (r = 0.536) than for PaO2 over 240 mm Hg (r = 0.0016). Measured PaO2 was ≥100 mm Hg for all ORI over 0.24. Measured PaO2 was ≥150 mm Hg in 96.6% of samples when ORI was over 0.55. A random intercept variance component linear mixed-effects model for repeated measures indicated that PaO2 was significantly related to ORI (β[95% confidence interval] = 0.002 [0.0019-0.0022]; P < 0.0001). A similar analysis indicated a significant relationship between change in PaO2 and change in ORI (β [95% confidence interval] = 0.0044 [0.0040-0.0048]; P < 0.0001). These findings suggest that ORI >0.24 can distinguish PaO2 ≥100 mm Hg when SpO2 is over 98%. Similarly, ORI > 0.55 appears to be a threshold to distinguish PaO2 ≥150 mm Hg. The usefulness of these values should be evaluated prospectively. Decreases in ORI to near 0.24 may provide advance indication of falling PaO2 approaching 100 mm Hg when SpO2 is >98%. The clinical utility of interventions based on continuous ORI monitoring should be studied prospectively.
Bennett, Bradley C; Husby, Chad E
2008-03-28
Botanical pharmacopoeias are non-random subsets of floras, with some taxonomic groups over- or under-represented. Moerman [Moerman, D.E., 1979. Symbols and selectivity: a statistical analysis of Native American medical ethnobotany, Journal of Ethnopharmacology 1, 111-119] introduced linear regression/residual analysis to examine these patterns. However, regression, the commonly-employed analysis, suffers from several statistical flaws. We use contingency table and binomial analyses to examine patterns of Shuar medicinal plant use (from Amazonian Ecuador). We first analyzed the Shuar data using Moerman's approach, modified to better meet requirements of linear regression analysis. Second, we assessed the exact randomization contingency table test for goodness of fit. Third, we developed a binomial model to test for non-random selection of plants in individual families. Modified regression models (which accommodated assumptions of linear regression) reduced R(2) to from 0.59 to 0.38, but did not eliminate all problems associated with regression analyses. Contingency table analyses revealed that the entire flora departs from the null model of equal proportions of medicinal plants in all families. In the binomial analysis, only 10 angiosperm families (of 115) differed significantly from the null model. These 10 families are largely responsible for patterns seen at higher taxonomic levels. Contingency table and binomial analyses offer an easy and statistically valid alternative to the regression approach.
Multilevel Models for Binary Data
ERIC Educational Resources Information Center
Powers, Daniel A.
2012-01-01
The methods and models for categorical data analysis cover considerable ground, ranging from regression-type models for binary and binomial data, count data, to ordered and unordered polytomous variables, as well as regression models that mix qualitative and continuous data. This article focuses on methods for binary or binomial data, which are…
Spatial Characteristics of Small Green Spaces' Mitigating Effects on Microscopic Urban Heat Islands
NASA Astrophysics Data System (ADS)
Park, J.; Lee, D. K.; Jeong, W.; Kim, J. H.; Huh, K. Y.
2015-12-01
The purpose of the study is to find small greens' disposition, types and sizes to reduce air temperature effectively in urban blocks. The research sites were six high developed blocks in Seoul, Korea. Air temperature was measured with mobile loggers in clear daytime during summer, from August to September, at screen level. Also the measurement repeated over three times a day during three days by walking and circulating around the experimental blocks and the control blocks at the same time. By analyzing spatial characteristics, the averaged air temperatures were classified with three spaces, sunny spaces, building-shaded spaces and small green spaces by using Kruskal-Wallis Test; and small green spaces in 6 blocks were classified into their outward forms, polygonal or linear and single or mixed. The polygonal and mixed types of small green spaces mitigated averaged air temperature of each block which they belonged with a simple linear regression model with adjusted R2 = 0.90**. As the area and volume of these types increased, the effect of air temperature reduction (ΔT; Air temperature difference between sunny space and green space in a block) also increased in a linear relationship. The experimental range of this research is 100m2 ~ 2,000m2 of area, and 1,000m3 ~ 10,000m3 of volume of small green space. As a result, more than 300m2 and 2,300m3 of polygonal green spaces with mixed vegetation is required to lower 1°C; 650m2 and 5,000m3 of them to lower 2°C; about 2,000m2 and about 10,000m3 of them to lower 4°C air temperature reduction in an urban block.
Hejri-Zarifi, Sudiyeh; Ahmadian-Kouchaksaraei, Zahra; Pourfarzad, Amir; Khodaparast, Mohammad Hossein Haddad
2014-12-01
Germinated palm date seeds were milled into two fractions: germ and residue. Dough rheological characteristics, baking (specific volume and sensory evaluation), and textural properties (at first day and during storage for 5 days) were determined in Barbari flat bread. Germ and residue fractions were incorporated at various levels ranged in 0.5-3 g/100 g of wheat flour. Water absorption, arrival time and gelatination temperature were decreased by germ fraction but accompanied by an increasing effect on the mixing tolerance index and degree of softening in most levels. Although improvement in dough stability was monitored but specific volume of bread was not affected by both fractions. Texture analysis of bread samples during 5 days of storage indicated that both fractions of germinated date seeds were able to diminish bread staling. Avrami non-linear regression equation was chosen as useful mathematical model to properly study bread hardening kinetics. In addition, principal component analysis (PCA) allowed discriminating among dough and bread specialties. Partial least squares regression (PLSR) models were applied to determine the relationships between sensory and instrumental data.
London Measure of Unplanned Pregnancy: guidance for its use as an outcome measure
Hall, Jennifer A; Barrett, Geraldine; Copas, Andrew; Stephenson, Judith
2017-01-01
Background The London Measure of Unplanned Pregnancy (LMUP) is a psychometrically validated measure of the degree of intention of a current or recent pregnancy. The LMUP is increasingly being used worldwide, and can be used to evaluate family planning or preconception care programs. However, beyond recommending the use of the full LMUP scale, there is no published guidance on how to use the LMUP as an outcome measure. Ordinal logistic regression has been recommended informally, but studies published to date have all used binary logistic regression and dichotomized the scale at different cut points. There is thus a need for evidence-based guidance to provide a standardized methodology for multivariate analysis and to enable comparison of results. This paper makes recommendations for the regression method for analysis of the LMUP as an outcome measure. Materials and methods Data collected from 4,244 pregnant women in Malawi were used to compare five regression methods: linear, logistic with two cut points, and ordinal logistic with either the full or grouped LMUP score. The recommendations were then tested on the original UK LMUP data. Results There were small but no important differences in the findings across the regression models. Logistic regression resulted in the largest loss of information, and assumptions were violated for the linear and ordinal logistic regression. Consequently, robust standard errors were used for linear regression and a partial proportional odds ordinal logistic regression model attempted. The latter could only be fitted for grouped LMUP score. Conclusion We recommend the linear regression model with robust standard errors to make full use of the LMUP score when analyzed as an outcome measure. Ordinal logistic regression could be considered, but a partial proportional odds model with grouped LMUP score may be required. Logistic regression is the least-favored option, due to the loss of information. For logistic regression, the cut point for un/planned pregnancy should be between nine and ten. These recommendations will standardize the analysis of LMUP data and enhance comparability of results across studies. PMID:28435343
Linear regression analysis of survival data with missing censoring indicators.
Wang, Qihua; Dinse, Gregg E
2011-04-01
Linear regression analysis has been studied extensively in a random censorship setting, but typically all of the censoring indicators are assumed to be observed. In this paper, we develop synthetic data methods for estimating regression parameters in a linear model when some censoring indicators are missing. We define estimators based on regression calibration, imputation, and inverse probability weighting techniques, and we prove all three estimators are asymptotically normal. The finite-sample performance of each estimator is evaluated via simulation. We illustrate our methods by assessing the effects of sex and age on the time to non-ambulatory progression for patients in a brain cancer clinical trial.
Yu, Wenxi; Liu, Yang; Ma, Zongwei; Bi, Jun
2017-08-01
Using satellite-based aerosol optical depth (AOD) measurements and statistical models to estimate ground-level PM 2.5 is a promising way to fill the areas that are not covered by ground PM 2.5 monitors. The statistical models used in previous studies are primarily Linear Mixed Effects (LME) and Geographically Weighted Regression (GWR) models. In this study, we developed a new regression model between PM 2.5 and AOD using Gaussian processes in a Bayesian hierarchical setting. Gaussian processes model the stochastic nature of the spatial random effects, where the mean surface and the covariance function is specified. The spatial stochastic process is incorporated under the Bayesian hierarchical framework to explain the variation of PM 2.5 concentrations together with other factors, such as AOD, spatial and non-spatial random effects. We evaluate the results of our model and compare them with those of other, conventional statistical models (GWR and LME) by within-sample model fitting and out-of-sample validation (cross validation, CV). The results show that our model possesses a CV result (R 2 = 0.81) that reflects higher accuracy than that of GWR and LME (0.74 and 0.48, respectively). Our results indicate that Gaussian process models have the potential to improve the accuracy of satellite-based PM 2.5 estimates.
Casellas, J; Bach, R
2012-06-01
Lambing interval is a relevant reproductive indicator for sheep populations under continuous mating systems, although there is a shortage of selection programs accounting for this trait in the sheep industry. Both the historical assumption of small genetic background and its unorthodox distribution pattern have limited its implementation as a breeding objective. In this manuscript, statistical performances of 3 alternative parametrizations [i.e., symmetric Gaussian mixed linear (GML) model, skew-Gaussian mixed linear (SGML) model, and piecewise Weibull proportional hazard (PWPH) model] have been compared to elucidate the preferred methodology to handle lambing interval data. More specifically, flock-by-flock analyses were performed on 31,986 lambing interval records (257.3 ± 0.2 d) from 6 purebred Ripollesa flocks. Model performances were compared in terms of deviance information criterion (DIC) and Bayes factor (BF). For all flocks, PWPH models were clearly preferred; they generated a reduction of 1,900 or more DIC units and provided BF estimates larger than 100 (i.e., PWPH models against linear models). These differences were reduced when comparing PWPH models with different number of change points for the baseline hazard function. In 4 flocks, only 2 change points were required to minimize the DIC, whereas 4 and 6 change points were needed for the 2 remaining flocks. These differences demonstrated a remarkable degree of heterogeneity across sheep flocks that must be properly accounted for in genetic evaluation models to avoid statistical biases and suboptimal genetic trends. Within this context, all 6 Ripollesa flocks revealed substantial genetic background for lambing interval with heritabilities ranging between 0.13 and 0.19. This study provides the first evidence of the suitability of PWPH models for lambing interval analysis, clearly discarding previous parametrizations focused on mixed linear models.
Louys, Julien; Meloro, Carlo; Elton, Sarah; Ditchfield, Peter; Bishop, Laura C
2015-01-01
We test the performance of two models that use mammalian communities to reconstruct multivariate palaeoenvironments. While both models exploit the correlation between mammal communities (defined in terms of functional groups) and arboreal heterogeneity, the first uses a multiple multivariate regression of community structure and arboreal heterogeneity, while the second uses a linear regression of the principal components of each ecospace. The success of these methods means the palaeoenvironment of a particular locality can be reconstructed in terms of the proportions of heavy, moderate, light, and absent tree canopy cover. The linear regression is less biased, and more precisely and accurately reconstructs heavy tree canopy cover than the multiple multivariate model. However, the multiple multivariate model performs better than the linear regression for all other canopy cover categories. Both models consistently perform better than randomly generated reconstructions. We apply both models to the palaeocommunity of the Upper Laetolil Beds, Tanzania. Our reconstructions indicate that there was very little heavy tree cover at this site (likely less than 10%), with the palaeo-landscape instead comprising a mixture of light and absent tree cover. These reconstructions help resolve the previous conflicting palaeoecological reconstructions made for this site. Copyright © 2014 Elsevier Ltd. All rights reserved.
A New SEYHAN's Approach in Case of Heterogeneity of Regression Slopes in ANCOVA.
Ankarali, Handan; Cangur, Sengul; Ankarali, Seyit
2018-06-01
In this study, when the assumptions of linearity and homogeneity of regression slopes of conventional ANCOVA are not met, a new approach named as SEYHAN has been suggested to use conventional ANCOVA instead of robust or nonlinear ANCOVA. The proposed SEYHAN's approach involves transformation of continuous covariate into categorical structure when the relationship between covariate and dependent variable is nonlinear and the regression slopes are not homogenous. A simulated data set was used to explain SEYHAN's approach. In this approach, we performed conventional ANCOVA in each subgroup which is constituted according to knot values and analysis of variance with two-factor model after MARS method was used for categorization of covariate. The first model is a simpler model than the second model that includes interaction term. Since the model with interaction effect has more subjects, the power of test also increases and the existing significant difference is revealed better. We can say that linearity and homogeneity of regression slopes are not problem for data analysis by conventional linear ANCOVA model by helping this approach. It can be used fast and efficiently for the presence of one or more covariates.
Shek, Daniel T L; Ma, Cecilia M S
2011-01-05
Although different methods are available for the analyses of longitudinal data, analyses based on generalized linear models (GLM) are criticized as violating the assumption of independence of observations. Alternatively, linear mixed models (LMM) are commonly used to understand changes in human behavior over time. In this paper, the basic concepts surrounding LMM (or hierarchical linear models) are outlined. Although SPSS is a statistical analyses package commonly used by researchers, documentation on LMM procedures in SPSS is not thorough or user friendly. With reference to this limitation, the related procedures for performing analyses based on LMM in SPSS are described. To demonstrate the application of LMM analyses in SPSS, findings based on six waves of data collected in the Project P.A.T.H.S. (Positive Adolescent Training through Holistic Social Programmes) in Hong Kong are presented.
Longitudinal Data Analyses Using Linear Mixed Models in SPSS: Concepts, Procedures and Illustrations
Shek, Daniel T. L.; Ma, Cecilia M. S.
2011-01-01
Although different methods are available for the analyses of longitudinal data, analyses based on generalized linear models (GLM) are criticized as violating the assumption of independence of observations. Alternatively, linear mixed models (LMM) are commonly used to understand changes in human behavior over time. In this paper, the basic concepts surrounding LMM (or hierarchical linear models) are outlined. Although SPSS is a statistical analyses package commonly used by researchers, documentation on LMM procedures in SPSS is not thorough or user friendly. With reference to this limitation, the related procedures for performing analyses based on LMM in SPSS are described. To demonstrate the application of LMM analyses in SPSS, findings based on six waves of data collected in the Project P.A.T.H.S. (Positive Adolescent Training through Holistic Social Programmes) in Hong Kong are presented. PMID:21218263
Jesus, Gilmar Mercês de; Assis, Maria Alice Altenburg de; Kupek, Emil; Dias, Lizziane Andrade
2017-01-01
The quality control of data entry in computerized questionnaires is an important step in the validation of new instruments. The study assessed the consistency of recorded weight and height on the Food Intake and Physical Activity of School Children (Web-CAAFE) between repeated measures and against directly measured data. Students from the 2nd to the 5th grade (n = 390) had their weight and height directly measured and then filled out the Web-CAAFE. A subsample (n = 92) filled out the Web-CAAFE twice, three hours apart. The analysis included hierarchical linear regression, mixed linear regression model, to evaluate the bias, and intraclass correlation coefficient (ICC), to assess consistency. Univariate linear regression assessed the effect of gender, reading/writing performance, and computer/internet use and possession on residuals of fixed and random effects. The Web-CAAFE showed high values of ICC between repeated measures (body weight = 0.996, height = 0.937, body mass index - BMI = 0.972), and regarding the checked measures (body weight = 0.962, height = 0.882, BMI = 0.828). The difference between means of body weight, height, and BMI directly measured and recorded was 208 g, -2 mm, and 0.238 kg/m², respectively, indicating slight BMI underestimation due to underestimation of weight and overestimation of height. This trend was related to body weight and age. Height and weight data entered in the Web-CAAFE by children were highly correlated with direct measurements and with the repeated entry. The bias found was similar to validation studies of self-reported weight and height in comparison to direct measurements.
Gonçalves, M A D; Bello, N M; Dritz, S S; Tokach, M D; DeRouchey, J M; Woodworth, J C; Goodband, R D
2016-05-01
Advanced methods for dose-response assessments are used to estimate the minimum concentrations of a nutrient that maximizes a given outcome of interest, thereby determining nutritional requirements for optimal performance. Contrary to standard modeling assumptions, experimental data often present a design structure that includes correlations between observations (i.e., blocking, nesting, etc.) as well as heterogeneity of error variances; either can mislead inference if disregarded. Our objective is to demonstrate practical implementation of linear and nonlinear mixed models for dose-response relationships accounting for correlated data structure and heterogeneous error variances. To illustrate, we modeled data from a randomized complete block design study to evaluate the standardized ileal digestible (SID) Trp:Lys ratio dose-response on G:F of nursery pigs. A base linear mixed model was fitted to explore the functional form of G:F relative to Trp:Lys ratios and assess model assumptions. Next, we fitted 3 competing dose-response mixed models to G:F, namely a quadratic polynomial (QP) model, a broken-line linear (BLL) ascending model, and a broken-line quadratic (BLQ) ascending model, all of which included heteroskedastic specifications, as dictated by the base model. The GLIMMIX procedure of SAS (version 9.4) was used to fit the base and QP models and the NLMIXED procedure was used to fit the BLL and BLQ models. We further illustrated the use of a grid search of initial parameter values to facilitate convergence and parameter estimation in nonlinear mixed models. Fit between competing dose-response models was compared using a maximum likelihood-based Bayesian information criterion (BIC). The QP, BLL, and BLQ models fitted on G:F of nursery pigs yielded BIC values of 353.7, 343.4, and 345.2, respectively, thus indicating a better fit of the BLL model. The BLL breakpoint estimate of the SID Trp:Lys ratio was 16.5% (95% confidence interval [16.1, 17.0]). Problems with the estimation process rendered results from the BLQ model questionable. Importantly, accounting for heterogeneous variance enhanced inferential precision as the breadth of the confidence interval for the mean breakpoint decreased by approximately 44%. In summary, the article illustrates the use of linear and nonlinear mixed models for dose-response relationships accounting for heterogeneous residual variances, discusses important diagnostics and their implications for inference, and provides practical recommendations for computational troubleshooting.
NASA Astrophysics Data System (ADS)
Mathias, Simon A.; Wen, Zhang
2015-05-01
This article presents a numerical study to investigate the combined role of partial well penetration (PWP) and non-Darcy effects concerning the performance of groundwater production wells. A finite difference model is developed in MATLAB to solve the two-dimensional mixed-type boundary value problem associated with flow to a partially penetrating well within a cylindrical confined aquifer. Non-Darcy effects are incorporated using the Forchheimer equation. The model is verified by comparison to results from existing semi-analytical solutions concerning the same problem but assuming Darcy's law. A sensitivity analysis is presented to explore the problem of concern. For constant pressure production, Non-Darcy effects lead to a reduction in production rate, as compared to an equivalent problem solved using Darcy's law. For fully penetrating wells, this reduction in production rate becomes less significant with time. However, for partially penetrating wells, the reduction in production rate persists for much larger times. For constant production rate scenarios, the combined effect of PWP and non-Darcy flow takes the form of a constant additional drawdown term. An approximate solution for this loss term is obtained by performing linear regression on the modeling results.
Guan, Yongtao; Li, Yehua; Sinha, Rajita
2011-01-01
In a cocaine dependence treatment study, we use linear and nonlinear regression models to model posttreatment cocaine craving scores and first cocaine relapse time. A subset of the covariates are summary statistics derived from baseline daily cocaine use trajectories, such as baseline cocaine use frequency and average daily use amount. These summary statistics are subject to estimation error and can therefore cause biased estimators for the regression coefficients. Unlike classical measurement error problems, the error we encounter here is heteroscedastic with an unknown distribution, and there are no replicates for the error-prone variables or instrumental variables. We propose two robust methods to correct for the bias: a computationally efficient method-of-moments-based method for linear regression models and a subsampling extrapolation method that is generally applicable to both linear and nonlinear regression models. Simulations and an application to the cocaine dependence treatment data are used to illustrate the efficacy of the proposed methods. Asymptotic theory and variance estimation for the proposed subsampling extrapolation method and some additional simulation results are described in the online supplementary material. PMID:21984854
Blood biomarkers in male and female participants after an Ironman-distance triathlon
Danielsson, Tom; Carlsson, Jörg; Schreyer, Hendrik; Ahnesjö, Jonas; Ten Siethoff, Lasse; Ragnarsson, Thony; Tugetam, Åsa
2017-01-01
Background While overall physical activity is clearly associated with a better short-term and long-term health, prolonged strenuous physical activity may result in a rise in acute levels of blood-biomarkers used in clinical practice for diagnosis of various conditions or diseases. In this study, we explored the acute effects of a full Ironman-distance triathlon on biomarkers related to heart-, liver-, kidney- and skeletal muscle damage immediately post-race and after one week’s rest. We also examined if sex, age, finishing time and body composition influenced the post-race values of the biomarkers. Methods A sample of 30 subjects was recruited (50% women) to the study. The subjects were evaluated for body composition and blood samples were taken at three occasions, before the race (T1), immediately after (T2) and one week after the race (T3). Linear regression models were fitted to analyse the independent contribution of sex and finishing time controlled for weight, body fat percentage and age, on the biomarkers at the termination of the race (T2). Linear mixed models were fitted to examine if the biomarkers differed between the sexes over time (T1-T3). Results Being male was a significant predictor of higher post-race (T2) levels of myoglobin, CK, and creatinine levels and body weight was negatively associated with myoglobin. In general, the models were unable to explain the variation of the dependent variables. In the linear mixed models, an interaction between time (T1-T3) and sex was seen for myoglobin and creatinine, in which women had a less pronounced response to the race. Conclusion Overall women appear to tolerate the effects of prolonged strenuous physical activity better than men as illustrated by their lower values of the biomarkers both post-race as well as during recovery. PMID:28609447
Terluin, Berend; de Boer, Michiel R; de Vet, Henrica C W
2016-01-01
The network approach to psychopathology conceives mental disorders as sets of symptoms causally impacting on each other. The strengths of the connections between symptoms are key elements in the description of those symptom networks. Typically, the connections are analysed as linear associations (i.e., correlations or regression coefficients). However, there is insufficient awareness of the fact that differences in variance may account for differences in connection strength. Differences in variance frequently occur when subgroups are based on skewed data. An illustrative example is a study published in PLoS One (2013;8(3):e59559) that aimed to test the hypothesis that the development of psychopathology through "staging" was characterized by increasing connection strength between mental states. Three mental states (negative affect, positive affect, and paranoia) were studied in severity subgroups of a general population sample. The connection strength was found to increase with increasing severity in six of nine models. However, the method used (linear mixed modelling) is not suitable for skewed data. We reanalysed the data using inverse Gaussian generalized linear mixed modelling, a method suited for positively skewed data (such as symptoms in the general population). The distribution of positive affect was normal, but the distributions of negative affect and paranoia were heavily skewed. The variance of the skewed variables increased with increasing severity. Reanalysis of the data did not confirm increasing connection strength, except for one of nine models. Reanalysis of the data did not provide convincing evidence in support of staging as characterized by increasing connection strength between mental states. Network researchers should be aware that differences in connection strength between symptoms may be caused by differences in variances, in which case they should not be interpreted as differences in impact of one symptom on another symptom.
Ng, Kar Yong; Awang, Norhashidah
2018-01-06
Frequent haze occurrences in Malaysia have made the management of PM 10 (particulate matter with aerodynamic less than 10 μm) pollution a critical task. This requires knowledge on factors associating with PM 10 variation and good forecast of PM 10 concentrations. Hence, this paper demonstrates the prediction of 1-day-ahead daily average PM 10 concentrations based on predictor variables including meteorological parameters and gaseous pollutants. Three different models were built. They were multiple linear regression (MLR) model with lagged predictor variables (MLR1), MLR model with lagged predictor variables and PM 10 concentrations (MLR2) and regression with time series error (RTSE) model. The findings revealed that humidity, temperature, wind speed, wind direction, carbon monoxide and ozone were the main factors explaining the PM 10 variation in Peninsular Malaysia. Comparison among the three models showed that MLR2 model was on a same level with RTSE model in terms of forecasting accuracy, while MLR1 model was the worst.
The effect of dropout on the efficiency of D-optimal designs of linear mixed models.
Ortega-Azurduy, S A; Tan, F E S; Berger, M P F
2008-06-30
Dropout is often encountered in longitudinal data. Optimal designs will usually not remain optimal in the presence of dropout. In this paper, we study D-optimal designs for linear mixed models where dropout is encountered. Moreover, we estimate the efficiency loss in cases where a D-optimal design for complete data is chosen instead of that for data with dropout. Two types of monotonically decreasing response probability functions are investigated to describe dropout. Our results show that the location of D-optimal design points for the dropout case will shift with respect to that for the complete and uncorrelated data case. Owing to this shift, the information collected at the D-optimal design points for the complete data case does not correspond to the smallest variance. We show that the size of the displacement of the time points depends on the linear mixed model and that the efficiency loss is moderate.
Esserman, Denise A.; Moore, Charity G.; Roth, Mary T.
2009-01-01
Older community dwelling adults often take multiple medications for numerous chronic diseases. Non-adherence to these medications can have a large public health impact. Therefore, the measurement and modeling of medication adherence in the setting of polypharmacy is an important area of research. We apply a variety of different modeling techniques (standard linear regression; weighted linear regression; adjusted linear regression; naïve logistic regression; beta-binomial (BB) regression; generalized estimating equations (GEE)) to binary medication adherence data from a study in a North Carolina based population of older adults, where each medication an individual was taking was classified as adherent or non-adherent. In addition, through simulation we compare these different methods based on Type I error rates, bias, power, empirical 95% coverage, and goodness of fit. We find that estimation and inference using GEE is robust to a wide variety of scenarios and we recommend using this in the setting of polypharmacy when adherence is dichotomously measured for multiple medications per person. PMID:20414358
NASA Astrophysics Data System (ADS)
Fukuda, Jun'ichi; Johnson, Kaj M.
2010-06-01
We present a unified theoretical framework and solution method for probabilistic, Bayesian inversions of crustal deformation data. The inversions involve multiple data sets with unknown relative weights, model parameters that are related linearly or non-linearly through theoretic models to observations, prior information on model parameters and regularization priors to stabilize underdetermined problems. To efficiently handle non-linear inversions in which some of the model parameters are linearly related to the observations, this method combines both analytical least-squares solutions and a Monte Carlo sampling technique. In this method, model parameters that are linearly and non-linearly related to observations, relative weights of multiple data sets and relative weights of prior information and regularization priors are determined in a unified Bayesian framework. In this paper, we define the mixed linear-non-linear inverse problem, outline the theoretical basis for the method, provide a step-by-step algorithm for the inversion, validate the inversion method using synthetic data and apply the method to two real data sets. We apply the method to inversions of multiple geodetic data sets with unknown relative data weights for interseismic fault slip and locking depth. We also apply the method to the problem of estimating the spatial distribution of coseismic slip on faults with unknown fault geometry, relative data weights and smoothing regularization weight.
Experimental and computational prediction of glass transition temperature of drugs.
Alzghoul, Ahmad; Alhalaweh, Amjad; Mahlin, Denny; Bergström, Christel A S
2014-12-22
Glass transition temperature (Tg) is an important inherent property of an amorphous solid material which is usually determined experimentally. In this study, the relation between Tg and melting temperature (Tm) was evaluated using a data set of 71 structurally diverse druglike compounds. Further, in silico models for prediction of Tg were developed based on calculated molecular descriptors and linear (multilinear regression, partial least-squares, principal component regression) and nonlinear (neural network, support vector regression) modeling techniques. The models based on Tm predicted Tg with an RMSE of 19.5 K for the test set. Among the five computational models developed herein the support vector regression gave the best result with RMSE of 18.7 K for the test set using only four chemical descriptors. Hence, two different models that predict Tg of drug-like molecules with high accuracy were developed. If Tm is available, a simple linear regression can be used to predict Tg. However, the results also suggest that support vector regression and calculated molecular descriptors can predict Tg with equal accuracy, already before compound synthesis.
NASA Astrophysics Data System (ADS)
Gonçalves, Karen dos Santos; Winkler, Mirko S.; Benchimol-Barbosa, Paulo Roberto; de Hoogh, Kees; Artaxo, Paulo Eduardo; de Souza Hacon, Sandra; Schindler, Christian; Künzli, Nino
2018-07-01
Epidemiological studies generally use particulate matter measurements with diameter less 2.5 μm (PM2.5) from monitoring networks. Satellite aerosol optical depth (AOD) data has considerable potential in predicting PM2.5 concentrations, and thus provides an alternative method for producing knowledge regarding the level of pollution and its health impact in areas where no ground PM2.5 measurements are available. This is the case in the Brazilian Amazon rainforest region where forest fires are frequent sources of high pollution. In this study, we applied a non-linear model for predicting PM2.5 concentration from AOD retrievals using interaction terms between average temperature, relative humidity, sine, cosine of date in a period of 365,25 days and the square of the lagged relative residual. Regression performance statistics were tested comparing the goodness of fit and R2 based on results from linear regression and non-linear regression for six different models. The regression results for non-linear prediction showed the best performance, explaining on average 82% of the daily PM2.5 concentrations when considering the whole period studied. In the context of Amazonia, it was the first study predicting PM2.5 concentrations using the latest high-resolution AOD products also in combination with the testing of a non-linear model performance. Our results permitted a reliable prediction considering the AOD-PM2.5 relationship and set the basis for further investigations on air pollution impacts in the complex context of Brazilian Amazon Region.
Breast Radiotherapy with Mixed Energy Photons; a Model for Optimal Beam Weighting.
Birgani, Mohammadjavad Tahmasebi; Fatahiasl, Jafar; Hosseini, Seyed Mohammad; Bagheri, Ali; Behrooz, Mohammad Ali; Zabiehzadeh, Mansour; Meskani, Reza; Gomari, Maryam Talaei
2015-01-01
Utilization of high energy photons (>10 MV) with an optimal weight using a mixed energy technique is a practical way to generate a homogenous dose distribution while maintaining adequate target coverage in intact breast radiotherapy. This study represents a model for estimation of this optimal weight for day to day clinical usage. For this purpose, treatment planning computed tomography scans of thirty-three consecutive early stage breast cancer patients following breast conservation surgery were analyzed. After delineation of the breast clinical target volume (CTV) and placing opposed wedge paired isocenteric tangential portals, dosimeteric calculations were conducted and dose volume histograms (DVHs) were generated, first with pure 6 MV photons and then these calculations were repeated ten times with incorporating 18 MV photons (ten percent increase in weight per step) in each individual patient. For each calculation two indexes including maximum dose in the breast CTV (Dmax) and the volume of CTV which covered with 95% Isodose line (VCTV, 95%IDL) were measured according to the DVH data and then normalized values were plotted in a graph. The optimal weight of 18 MV photons was defined as the intersection point of Dmax and VCTV, 95%IDL graphs. For creating a model to predict this optimal weight multiple linear regression analysis was used based on some of the breast and tangential field parameters. The best fitting model for prediction of 18 MV photons optimal weight in breast radiotherapy using mixed energy technique, incorporated chest wall separation plus central lung distance (Adjusted R2=0.776). In conclusion, this study represents a model for the estimation of optimal beam weighting in breast radiotherapy using mixed photon energy technique for routine day to day clinical usage.
Boosted Regression Tree Models to Explain Watershed Nutrient Concentrations and Biological Condition
Boosted regression tree (BRT) models were developed to quantify the nonlinear relationships between landscape variables and nutrient concentrations in a mesoscale mixed land cover watershed during base-flow conditions. Factors that affect instream biological components, based on ...
Elliott, Marc N; Cohea, Christopher W; Lehrman, William G; Goldstein, Elizabeth H; Cleary, Paul D; Giordano, Laura A; Beckett, Megan K; Zaslavsky, Alan M
2015-12-01
Measure HCAHPS improvement in hospitals participating in the second and fifth years of HCAHPS public reporting; determine whether change is greater for some hospital types. Surveys from 4,822,960 adult inpatients discharged July 2007-June 2008 or July 2010-June 2011 from 3,541 U.S. hospitals. Linear mixed-effect regression models with fixed effects for time, patient mix, and hospital characteristics (bedsize, ownership, Census division, teaching status, Critical Access status); random effects for hospitals and hospital-time interactions; fixed-effect interactions of hospital characteristics and patient characteristics (gender, health, education) with time predicted HCAHPS measures correcting for regression-to-the-mean biases. National probability sample of adult inpatients in any of four approved survey modes. HCAHPS scores increased by 2.8 percentage points from 2008 to 2011 in the most positive response category. Among the middle 95 percent of hospitals, changes ranged from a 5.1 percent decrease to a 10.2 percent gain overall. The greatest improvement was in for-profit and larger (200 or more beds) hospitals. Five years after HCAHPS public reporting began, meaningful improvement of patients' hospital care experiences continues, especially among initially low-scoring hospitals, reducing some gaps among hospitals. © Health Research and Educational Trust.
Moerbeek, Mirjam; van Schie, Sander
2016-07-11
The number of clusters in a cluster randomized trial is often low. It is therefore likely random assignment of clusters to treatment conditions results in covariate imbalance. There are no studies that quantify the consequences of covariate imbalance in cluster randomized trials on parameter and standard error bias and on power to detect treatment effects. The consequences of covariance imbalance in unadjusted and adjusted linear mixed models are investigated by means of a simulation study. The factors in this study are the degree of imbalance, the covariate effect size, the cluster size and the intraclass correlation coefficient. The covariate is binary and measured at the cluster level; the outcome is continuous and measured at the individual level. The results show covariate imbalance results in negligible parameter bias and small standard error bias in adjusted linear mixed models. Ignoring the possibility of covariate imbalance while calculating the sample size at the cluster level may result in a loss in power of at most 25 % in the adjusted linear mixed model. The results are more severe for the unadjusted linear mixed model: parameter biases up to 100 % and standard error biases up to 200 % may be observed. Power levels based on the unadjusted linear mixed model are often too low. The consequences are most severe for large clusters and/or small intraclass correlation coefficients since then the required number of clusters to achieve a desired power level is smallest. The possibility of covariate imbalance should be taken into account while calculating the sample size of a cluster randomized trial. Otherwise more sophisticated methods to randomize clusters to treatments should be used, such as stratification or balance algorithms. All relevant covariates should be carefully identified, be actually measured and included in the statistical model to avoid severe levels of parameter and standard error bias and insufficient power levels.
Continuing Medical Education Speakers with High Evaluation Scores Use more Image-based Slides.
Ferguson, Ian; Phillips, Andrew W; Lin, Michelle
2017-01-01
Although continuing medical education (CME) presentations are common across health professions, it is unknown whether slide design is independently associated with audience evaluations of the speaker. Based on the conceptual framework of Mayer's theory of multimedia learning, this study aimed to determine whether image use and text density in presentation slides are associated with overall speaker evaluations. This retrospective analysis of six sequential CME conferences (two annual emergency medicine conferences over a three-year period) used a mixed linear regression model to assess whether post-conference speaker evaluations were associated with image fraction (percentage of image-based slides per presentation) and text density (number of words per slide). A total of 105 unique lectures were given by 49 faculty members, and 1,222 evaluations (70.1% response rate) were available for analysis. On average, 47.4% (SD=25.36) of slides had at least one educationally-relevant image (image fraction). Image fraction significantly predicted overall higher evaluation scores [F(1, 100.676)=6.158, p=0.015] in the mixed linear regression model. The mean (SD) text density was 25.61 (8.14) words/slide but was not a significant predictor [F(1, 86.293)=0.55, p=0.815]. Of note, the individual speaker [χ 2 (1)=2.952, p=0.003] and speaker seniority [F(3, 59.713)=4.083, p=0.011] significantly predicted higher scores. This is the first published study to date assessing the linkage between slide design and CME speaker evaluations by an audience of practicing clinicians. The incorporation of images was associated with higher evaluation scores, in alignment with Mayer's theory of multimedia learning. Contrary to this theory, however, text density showed no significant association, suggesting that these scores may be multifactorial. Professional development efforts should focus on teaching best practices in both slide design and presentation skills.
Zeng, Yanni; Navarro, Pau; Fernandez-Pujals, Ana M; Hall, Lynsey S; Clarke, Toni-Kim; Thomson, Pippa A; Smith, Blair H; Hocking, Lynne J; Padmanabhan, Sandosh; Hayward, Caroline; MacIntyre, Donald J; Wray, Naomi R; Deary, Ian J; Porteous, David J; Haley, Chris S; McIntosh, Andrew M
2017-02-15
Genome-wide association studies (GWASs) of major depressive disorder (MDD) have identified few significant associations. Testing the aggregation of genetic variants, in particular biological pathways, may be more powerful. Regional heritability analysis can be used to detect genomic regions that contribute to disease risk. We integrated pathway analysis and multilevel regional heritability analyses in a pipeline designed to identify MDD-associated pathways. The pipeline was applied to two independent GWAS samples [Generation Scotland: The Scottish Family Health Study (GS:SFHS, N = 6455) and Psychiatric Genomics Consortium (PGC:MDD) (N = 18,759)]. A polygenic risk score (PRS) composed of single nucleotide polymorphisms from the pathway most consistently associated with MDD was created, and its accuracy to predict MDD, using area under the curve, logistic regression, and linear mixed model analyses, was tested. In GS:SFHS, four pathways were significantly associated with MDD, and two of these explained a significant amount of pathway-level regional heritability. In PGC:MDD, one pathway was significantly associated with MDD. Pathway-level regional heritability was significant in this pathway in one subset of PGC:MDD. For both samples the regional heritabilities were further localized to the gene and subregion levels. The NETRIN1 signaling pathway showed the most consistent association with MDD across the two samples. PRSs from this pathway showed competitive predictive accuracy compared with the whole-genome PRSs when using area under the curve statistics, logistic regression, and linear mixed model. These post-GWAS analyses highlight the value of combining multiple methods on multiple GWAS data for the identification of risk pathways for MDD. The NETRIN1 signaling pathway is identified as a candidate pathway for MDD and should be explored in further large population studies. Copyright © 2016 Society of Biological Psychiatry. Published by Elsevier Inc. All rights reserved.
Jameson, John; Smith, Peter; Harris, Gerald
2015-01-01
Osteogenesis Imperfecta is a genetic disorder resulting in bone fragility. The mechanisms behind this fragility are not well understood. In addition to characteristic bone mass deficiencies, research suggests that bone material properties are compromised in individuals with this disorder. However, little data exists regarding bone properties beyond the microstructural scale in individuals with this disorder. Specimens were obtained from long bone diaphyses of nine children with osteogenesis imperfecta during routine osteotomy procedures. Small rectangular beams, oriented longitudinally and transversely to the diaphyseal axis, were machined from these specimens and elastic modulus, yield strength, and maximum strength were measured in three-point bending. Intracortical vascular porosity, bone volume fraction, osteocyte lacuna density, and volumetric tissue mineral density were determined by synchrotron micro-computed tomography, and relationships among these mechanical properties and structural parameters were explored. Modulus and strength were on average 64–68% lower in the transverse vs. longitudinal beams (P<0.001, linear mixed model). Vascular porosity ranged between 3–42% of total bone volume. Longitudinal properties were associated negatively with porosity (P≤0.006, linear regressions). Mechanical properties, however, were not associated with osteocyte lacuna density or volumetric tissue mineral density (P≥0.167). Bone properties and structural parameters were not associated significantly with donor age (p≥0.225, linear mixed models). This study presents novel data regarding bone material strength in children with osteogenesis imperfecta. Results confirm that these properties are anisotropic. Elevated vascular porosity was observed in most specimens, and this parameter was associated with reduced bone material strength. These results offer insight towards understanding bone fragility and the role of intracortical porosity on the strength of bone tissue in children with osteogenesis imperfecta. PMID:24928496
Hu, Rongrong; Wang, Chenkun; Gu, Yangshun; Racette, Lyne
2016-01-01
Abstract Detection of progression is paramount to the clinical management of glaucoma. Our goal is to compare the performance of standard automated perimetry (SAP), short-wavelength automated perimetry (SWAP), and frequency-doubling technology (FDT) perimetry in monitoring glaucoma progression. Longitudinal data of paired SAP, SWAP, and FDT from 113 eyes with primary open-angle glaucoma enrolled in the Diagnostic Innovations in Glaucoma Study or the African Descent and Glaucoma Evaluation Study were included. Data from all tests were expressed in comparable units by converting the sensitivity from decibels to unitless contrast sensitivity and by expressing sensitivity values in percent of mean normal based on an independent dataset of 207 healthy eyes with aging deterioration taken into consideration. Pointwise linear regression analysis was performed and 3 criteria (conservative, moderate, and liberal) were used to define progression and improvement. Global mean sensitivity (MS) was fitted with linear mixed models. No statistically significant difference in the proportion of progressing and improving eyes was observed across tests using the conservative criterion. Fewer eyes showed improvement on SAP compared to SWAP and FDT using the moderate criterion; and FDT detected less progressing eyes than SAP and SWAP using the liberal criterion. The agreement between these test types was poor. The linear mixed model showed a progressing trend of global MS overtime for SAP and SWAP, but not for FDT. The baseline estimate of SWAP MS was significantly lower than SAP MS by 21.59% of mean normal. FDT showed comparable estimation of baseline MS with SAP. SWAP and FDT do not appear to have significant benefits over SAP in monitoring glaucoma progression. SAP, SWAP, and FDT may, however, detect progression in different glaucoma eyes. PMID:26886602
Albert, Carolyne; Jameson, John; Smith, Peter; Harris, Gerald
2014-09-01
Osteogenesis imperfecta is a genetic disorder resulting in bone fragility. The mechanisms behind this fragility are not well understood. In addition to characteristic bone mass deficiencies, research suggests that bone material properties are compromised in individuals with this disorder. However, little data exists regarding bone properties beyond the microstructural scale in individuals with this disorder. Specimens were obtained from long bone diaphyses of nine children with osteogenesis imperfecta during routine osteotomy procedures. Small rectangular beams, oriented longitudinally and transversely to the diaphyseal axis, were machined from these specimens and elastic modulus, yield strength, and maximum strength were measured in three-point bending. Intracortical vascular porosity, bone volume fraction, osteocyte lacuna density, and volumetric tissue mineral density were determined by synchrotron micro-computed tomography, and relationships among these mechanical properties and structural parameters were explored. Modulus and strength were on average 64-68% lower in the transverse vs. longitudinal beams (P<0.001, linear mixed model). Vascular porosity ranged between 3 and 42% of total bone volume. Longitudinal properties were associated negatively with porosity (P≤0.006, linear regressions). Mechanical properties, however, were not associated with osteocyte lacuna density or volumetric tissue mineral density (P≥0.167). Bone properties and structural parameters were not associated significantly with donor age (P≥0.225, linear mixed models). This study presents novel data regarding bone material strength in children with osteogenesis imperfecta. Results confirm that these properties are anisotropic. Elevated vascular porosity was observed in most specimens, and this parameter was associated with reduced bone material strength. These results offer insight toward understanding bone fragility and the role of intracortical porosity on the strength of bone tissue in children with osteogenesis imperfecta. Copyright © 2014 Elsevier Inc. All rights reserved.
Precision Efficacy Analysis for Regression.
ERIC Educational Resources Information Center
Brooks, Gordon P.
When multiple linear regression is used to develop a prediction model, sample size must be large enough to ensure stable coefficients. If the derivation sample size is inadequate, the model may not predict well for future subjects. The precision efficacy analysis for regression (PEAR) method uses a cross- validity approach to select sample sizes…
Application of linear regression analysis in accuracy assessment of rolling force calculations
NASA Astrophysics Data System (ADS)
Poliak, E. I.; Shim, M. K.; Kim, G. S.; Choo, W. Y.
1998-10-01
Efficient operation of the computational models employed in process control systems require periodical assessment of the accuracy of their predictions. Linear regression is proposed as a tool which allows separate systematic and random prediction errors from those related to measurements. A quantitative characteristic of the model predictive ability is introduced in addition to standard statistical tests for model adequacy. Rolling force calculations are considered as an example for the application. However, the outlined approach can be used to assess the performance of any computational model.
Access disparities to Magnet hospitals for patients undergoing neurosurgical operations
Missios, Symeon; Bekelis, Kimon
2017-01-01
Background Centers of excellence focusing on quality improvement have demonstrated superior outcomes for a variety of surgical interventions. We investigated the presence of access disparities to hospitals recognized by the Magnet Recognition Program of the American Nurses Credentialing Center (ANCC) for patients undergoing neurosurgical operations. Methods We performed a cohort study of all neurosurgery patients who were registered in the New York Statewide Planning and Research Cooperative System (SPARCS) database from 2009–2013. We examined the association of African-American race and lack of insurance with Magnet status hospitalization for neurosurgical procedures. A mixed effects propensity adjusted multivariable regression analysis was used to control for confounding. Results During the study period, 190,535 neurosurgical patients met the inclusion criteria. Using a multivariable logistic regression, we demonstrate that African-Americans had lower admission rates to Magnet institutions (OR 0.62; 95% CI, 0.58–0.67). This persisted in a mixed effects logistic regression model (OR 0.77; 95% CI, 0.70–0.83) to adjust for clustering at the patient county level, and a propensity score adjusted logistic regression model (OR 0.75; 95% CI, 0.69–0.82). Additionally, lack of insurance was associated with lower admission rates to Magnet institutions (OR 0.71; 95% CI, 0.68–0.73), in a multivariable logistic regression model. This persisted in a mixed effects logistic regression model (OR 0.72; 95% CI, 0.69–0.74), and a propensity score adjusted logistic regression model (OR 0.72; 95% CI, 0.69–0.75). Conclusions Using a comprehensive all-payer cohort of neurosurgery patients in New York State we identified an association of African-American race and lack of insurance with lower rates of admission to Magnet hospitals. PMID:28684152
NASA Astrophysics Data System (ADS)
Shi, Jinfei; Zhu, Songqing; Chen, Ruwen
2017-12-01
An order selection method based on multiple stepwise regressions is proposed for General Expression of Nonlinear Autoregressive model which converts the model order problem into the variable selection of multiple linear regression equation. The partial autocorrelation function is adopted to define the linear term in GNAR model. The result is set as the initial model, and then the nonlinear terms are introduced gradually. Statistics are chosen to study the improvements of both the new introduced and originally existed variables for the model characteristics, which are adopted to determine the model variables to retain or eliminate. So the optimal model is obtained through data fitting effect measurement or significance test. The simulation and classic time-series data experiment results show that the method proposed is simple, reliable and can be applied to practical engineering.
Linear and nonlinear regression techniques for simultaneous and proportional myoelectric control.
Hahne, J M; Biessmann, F; Jiang, N; Rehbaum, H; Farina, D; Meinecke, F C; Muller, K-R; Parra, L C
2014-03-01
In recent years the number of active controllable joints in electrically powered hand-prostheses has increased significantly. However, the control strategies for these devices in current clinical use are inadequate as they require separate and sequential control of each degree-of-freedom (DoF). In this study we systematically compare linear and nonlinear regression techniques for an independent, simultaneous and proportional myoelectric control of wrist movements with two DoF. These techniques include linear regression, mixture of linear experts (ME), multilayer-perceptron, and kernel ridge regression (KRR). They are investigated offline with electro-myographic signals acquired from ten able-bodied subjects and one person with congenital upper limb deficiency. The control accuracy is reported as a function of the number of electrodes and the amount and diversity of training data providing guidance for the requirements in clinical practice. The results showed that KRR, a nonparametric statistical learning method, outperformed the other methods. However, simple transformations in the feature space could linearize the problem, so that linear models could achieve similar performance as KRR at much lower computational costs. Especially ME, a physiologically inspired extension of linear regression represents a promising candidate for the next generation of prosthetic devices.
Effect of water-based recovery on blood lactate removal after high-intensity exercise.
Lucertini, Francesco; Gervasi, Marco; D'Amen, Giancarlo; Sisti, Davide; Rocchi, Marco Bruno Luigi; Stocchi, Vilberto; Benelli, Piero
2017-01-01
This study assessed the effectiveness of water immersion to the shoulders in enhancing blood lactate removal during active and passive recovery after short-duration high-intensity exercise. Seventeen cyclists underwent active water- and land-based recoveries and passive water and land-based recoveries. The recovery conditions lasted 31 minutes each and started after the identification of each cyclist's blood lactate accumulation peak, induced by a 30-second all-out sprint on a cycle ergometer. Active recoveries were performed on a cycle ergometer at 70% of the oxygen consumption corresponding to the lactate threshold (the control for the intensity was oxygen consumption), while passive recoveries were performed with subjects at rest and seated on the cycle ergometer. Blood lactate concentration was measured 8 times during each recovery condition and lactate clearance was modeled over a negative exponential function using non-linear regression. Actual active recovery intensity was compared to the target intensity (one sample t-test) and passive recovery intensities were compared between environments (paired sample t-tests). Non-linear regression parameters (coefficients of the exponential decay of lactate; predicted resting lactates; predicted delta decreases in lactate) were compared between environments (linear mixed model analyses for repeated measures) separately for the active and passive recovery modes. Active recovery intensities did not differ significantly from the target oxygen consumption, whereas passive recovery resulted in a slightly lower oxygen consumption when performed while immersed in water rather than on land. The exponential decay of blood lactate was not significantly different in water- or land-based recoveries in either active or passive recovery conditions. In conclusion, water immersion at 29°C would not appear to be an effective practice for improving post-exercise lactate removal in either the active or passive recovery modes.
NASA Astrophysics Data System (ADS)
Chu, Hone-Jay; Kong, Shish-Jeng; Chang, Chih-Hua
2018-03-01
The turbidity (TB) of a water body varies with time and space. Water quality is traditionally estimated via linear regression based on satellite images. However, estimating and mapping water quality require a spatio-temporal nonstationary model, while TB mapping necessitates the use of geographically and temporally weighted regression (GTWR) and geographically weighted regression (GWR) models, both of which are more precise than linear regression. Given the temporal nonstationary models for mapping water quality, GTWR offers the best option for estimating regional water quality. Compared with GWR, GTWR provides highly reliable information for water quality mapping, boasts a relatively high goodness of fit, improves the explanation of variance from 44% to 87%, and shows a sufficient space-time explanatory power. The seasonal patterns of TB and the main spatial patterns of TB variability can be identified using the estimated TB maps from GTWR and by conducting an empirical orthogonal function (EOF) analysis.
Quality of life in breast cancer patients--a quantile regression analysis.
Pourhoseingholi, Mohamad Amin; Safaee, Azadeh; Moghimi-Dehkordi, Bijan; Zeighami, Bahram; Faghihzadeh, Soghrat; Tabatabaee, Hamid Reza; Pourhoseingholi, Asma
2008-01-01
Quality of life study has an important role in health care especially in chronic diseases, in clinical judgment and in medical resources supplying. Statistical tools like linear regression are widely used to assess the predictors of quality of life. But when the response is not normal the results are misleading. The aim of this study is to determine the predictors of quality of life in breast cancer patients, using quantile regression model and compare to linear regression. A cross-sectional study conducted on 119 breast cancer patients that admitted and treated in chemotherapy ward of Namazi hospital in Shiraz. We used QLQ-C30 questionnaire to assessment quality of life in these patients. A quantile regression was employed to assess the assocciated factors and the results were compared to linear regression. All analysis carried out using SAS. The mean score for the global health status for breast cancer patients was 64.92+/-11.42. Linear regression showed that only grade of tumor, occupational status, menopausal status, financial difficulties and dyspnea were statistically significant. In spite of linear regression, financial difficulties were not significant in quantile regression analysis and dyspnea was only significant for first quartile. Also emotion functioning and duration of disease statistically predicted the QOL score in the third quartile. The results have demonstrated that using quantile regression leads to better interpretation and richer inference about predictors of the breast cancer patient quality of life.
Casero-Alonso, V; López-Fidalgo, J; Torsney, B
2017-01-01
Binary response models are used in many real applications. For these models the Fisher information matrix (FIM) is proportional to the FIM of a weighted simple linear regression model. The same is also true when the weight function has a finite integral. Thus, optimal designs for one binary model are also optimal for the corresponding weighted linear regression model. The main objective of this paper is to provide a tool for the construction of MV-optimal designs, minimizing the maximum of the variances of the estimates, for a general design space. MV-optimality is a potentially difficult criterion because of its nondifferentiability at equal variance designs. A methodology for obtaining MV-optimal designs where the design space is a compact interval [a, b] will be given for several standard weight functions. The methodology will allow us to build a user-friendly computer tool based on Mathematica to compute MV-optimal designs. Some illustrative examples will show a representation of MV-optimal designs in the Euclidean plane, taking a and b as the axes. The applet will be explained using two relevant models. In the first one the case of a weighted linear regression model is considered, where the weight function is directly chosen from a typical family. In the second example a binary response model is assumed, where the probability of the outcome is given by a typical probability distribution. Practitioners can use the provided applet to identify the solution and to know the exact support points and design weights. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Novitski, Linda Nicole
Accurate and cost-effective assessment of water quality is necessary for proper management and restoration of inland water bodies susceptible to algal bloom conditions. Landsat and MODIS satellite images were used to create chlorophyll and Secchi depth predictive models for algal assessment of Great Lakes and other lakes of the United States. Boosted regression tree (BRT) models using satellite imagery are both easy to use and can have high predictive performance. BRT models inferred chlorophyll and Secchi depth more accurately than linear regression models for all study locations. Inferred chlorophyll of inner Saginaw Bay was subsequently used in ecological models to help understand the ecological drivers of algal blooms in this ecosystem. For small lakes (non-Great Lakes), the best national Landsat model for ln-transformed chlorophyll was the BRT model and had a cross-validation R 2 of 0.44 and a 0.76 ln-transformed mug/L RMSE. The best national Landsat model for Secchi depth was also a BRT model that had an adjusted R 2 of 0.52 and a 0.80 m RMSE. We assessed the applicability of the national chlorophyll model for ecological analysis by comparing the total phosphorus- chlorophyll relationship with chlorophyll determined from sampling or remote sensing, which showed the total phosphorus- chlorophyll relationship had an adjusted R2 = 0.58 and 1.02 ln-transformed microg/L RMSE with sampled chlorophyll versus an adjusted R2 = 0.56 and 1.04 ln-transformed mug/L RMSE with chlorophyll determined by the boosted regression tree remote sensing model. For Great Lakes models, the MODIS BRT model predicted chlorophyll most accurately of the three BRT models and compared well to other models in the literature. BRT models for Landsat ETM+ and TM more accurately predicted chlorophyll than the MSS model and all Landsat models had favorable results when compared to the literature. BRT chlorophyll predictive models are useful in helping to understand historical, long-term chlorophyll trends and to inform us of how climate change may alter ecosystems in the future. In inner Saginaw Bay, annual average and upper quartile Landsat-derived chlorophyll decreased from 7.44 to 6.62 and 8.38 to 7.38 mug/L between 1973-1982, and annual upper quartile of 8-day phosphorus loads increased from 5.29 to 6.79 kg between 1973-2012. Simple linear and multiple regression models and Wilcoxon rank test results for MODIS and Landsat-derived chlorophyll indicate that distance from the Saginaw River mouth influences chlorophyll concentration in Saginaw Bay; Landsat-derived surface water temperature and phosphorus loads to a lesser extent. Mixed-effect models for MODIS and Landsat-derived chlorophyll were related to chlorophyll better than simple linear or multiple regressions, with random effects of pixel and sample date contributing substantially to predictive power (NSE=0.35-70), though phosphorus loads, distance to Saginaw River mouth, and water were significant fixed effects in most models. Water quality changes in Saginaw Bay between 1972-2012 were influenced by phosphorus loading and distance to the Saginaw River's mouth. Landsat and MODIS imagery are complementary platforms because of the long history of Landsat operation and the finer spectral resolution and image frequency of MODIS. Remote sensing water quality assessment tools can be valuable for limnological study, ecological assessment, and water resource management.
Gries, Katharine S; Regier, Dean A; Ramsey, Scott D; Patrick, Donald L
2017-06-01
To develop a statistical model generating utility estimates for prostate cancer specific health states, using preference weights derived from the perspectives of prostate cancer patients, men at risk for prostate cancer, and society. Utility estimate values were calculated using standard gamble (SG) methodology. Study participants valued 18 prostate-specific health states with the five attributes: sexual function, urinary function, bowel function, pain, and emotional well-being. Appropriateness of model (linear regression, mixed effects, or generalized estimating equation) to generate prostate cancer utility estimates was determined by paired t-tests to compare observed and predicted values. Mixed-corrected standard SG utility estimates to account for loss aversion were calculated based on prospect theory. 132 study participants assigned values to the health states (n = 40 men at risk for prostate cancer; n = 43 men with prostate cancer; n = 49 general population). In total, 792 valuations were elicited (six health states for each 132 participants). The most appropriate model for the classification system was a mixed effects model; correlations between the mean observed and predicted utility estimates were greater than 0.80 for each perspective. Developing a health-state classification system with preference weights for three different perspectives demonstrates the relative importance of main effects between populations. The predicted values for men with prostate cancer support the hypothesis that patients experiencing the disease state assign higher utility estimates to health states and there is a difference in valuations made by patients and the general population.
Product unit neural network models for predicting the growth limits of Listeria monocytogenes.
Valero, A; Hervás, C; García-Gimeno, R M; Zurera, G
2007-08-01
A new approach to predict the growth/no growth interface of Listeria monocytogenes as a function of storage temperature, pH, citric acid (CA) and ascorbic acid (AA) is presented. A linear logistic regression procedure was performed and a non-linear model was obtained by adding new variables by means of a Neural Network model based on Product Units (PUNN). The classification efficiency of the training data set and the generalization data of the new Logistic Regression PUNN model (LRPU) were compared with Linear Logistic Regression (LLR) and Polynomial Logistic Regression (PLR) models. 92% of the total cases from the LRPU model were correctly classified, an improvement on the percentage obtained using the PLR model (90%) and significantly higher than the results obtained with the LLR model, 80%. On the other hand predictions of LRPU were closer to data observed which permits to design proper formulations in minimally processed foods. This novel methodology can be applied to predictive microbiology for describing growth/no growth interface of food-borne microorganisms such as L. monocytogenes. The optimal balance is trying to find models with an acceptable interpretation capacity and with good ability to fit the data on the boundaries of variable range. The results obtained conclude that these kinds of models might well be very a valuable tool for mathematical modeling.
Yang, Yingbao; Li, Xiaolong; Pan, Xin; Zhang, Yong; Cao, Chen
2017-01-01
Many downscaling algorithms have been proposed to address the issue of coarse-resolution land surface temperature (LST) derived from available satellite-borne sensors. However, few studies have focused on improving LST downscaling in urban areas with several mixed surface types. In this study, LST was downscaled by a multiple linear regression model between LST and multiple scale factors in mixed areas with three or four surface types. The correlation coefficients (CCs) between LST and the scale factors were used to assess the importance of the scale factors within a moving window. CC thresholds determined which factors participated in the fitting of the regression equation. The proposed downscaling approach, which involves an adaptive selection of the scale factors, was evaluated using the LST derived from four Landsat 8 thermal imageries of Nanjing City in different seasons. Results of the visual and quantitative analyses show that the proposed approach achieves relatively satisfactory downscaling results on 11 August, with coefficient of determination and root-mean-square error of 0.87 and 1.13 °C, respectively. Relative to other approaches, our approach shows the similar accuracy and the availability in all seasons. The best (worst) availability occurred in the region of vegetation (water). Thus, the approach is an efficient and reliable LST downscaling method. Future tasks include reliable LST downscaling in challenging regions and the application of our model in middle and low spatial resolutions. PMID:28368301
Wang, Zheng-Xin; Hao, Peng; Yao, Pei-Yi
2017-01-01
The non-linear relationship between provincial economic growth and carbon emissions is investigated by using panel smooth transition regression (PSTR) models. The research indicates that, on the condition of separately taking Gross Domestic Product per capita (GDPpc), energy structure (Es), and urbanisation level (Ul) as transition variables, three models all reject the null hypothesis of a linear relationship, i.e., a non-linear relationship exists. The results show that the three models all contain only one transition function but different numbers of location parameters. The model taking GDPpc as the transition variable has two location parameters, while the other two models separately considering Es and Ul as the transition variables both contain one location parameter. The three models applied in the study all favourably describe the non-linear relationship between economic growth and CO2 emissions in China. It also can be seen that the conversion rate of the influence of Ul on per capita CO2 emissions is significantly higher than those of GDPpc and Es on per capita CO2 emissions. PMID:29236083
Wang, Zheng-Xin; Hao, Peng; Yao, Pei-Yi
2017-12-13
The non-linear relationship between provincial economic growth and carbon emissions is investigated by using panel smooth transition regression (PSTR) models. The research indicates that, on the condition of separately taking Gross Domestic Product per capita (GDPpc), energy structure (Es), and urbanisation level (Ul) as transition variables, three models all reject the null hypothesis of a linear relationship, i.e., a non-linear relationship exists. The results show that the three models all contain only one transition function but different numbers of location parameters. The model taking GDPpc as the transition variable has two location parameters, while the other two models separately considering Es and Ul as the transition variables both contain one location parameter. The three models applied in the study all favourably describe the non-linear relationship between economic growth and CO₂ emissions in China. It also can be seen that the conversion rate of the influence of Ul on per capita CO₂ emissions is significantly higher than those of GDPpc and Es on per capita CO₂ emissions.
Sung, Sheng-Feng; Hsieh, Cheng-Yang; Kao Yang, Yea-Huei; Lin, Huey-Juan; Chen, Chih-Hung; Chen, Yu-Wei; Hu, Ya-Han
2015-11-01
Case-mix adjustment is difficult for stroke outcome studies using administrative data. However, relevant prescription, laboratory, procedure, and service claims might be surrogates for stroke severity. This study proposes a method for developing a stroke severity index (SSI) by using administrative data. We identified 3,577 patients with acute ischemic stroke from a hospital-based registry and analyzed claims data with plenty of features. Stroke severity was measured using the National Institutes of Health Stroke Scale (NIHSS). We used two data mining methods and conventional multiple linear regression (MLR) to develop prediction models, comparing the model performance according to the Pearson correlation coefficient between the SSI and the NIHSS. We validated these models in four independent cohorts by using hospital-based registry data linked to a nationwide administrative database. We identified seven predictive features and developed three models. The k-nearest neighbor model (correlation coefficient, 0.743; 95% confidence interval: 0.737, 0.749) performed slightly better than the MLR model (0.742; 0.736, 0.747), followed by the regression tree model (0.737; 0.731, 0.742). In the validation cohorts, the correlation coefficients were between 0.677 and 0.725 for all three models. The claims-based SSI enables adjusting for disease severity in stroke studies using administrative data. Copyright © 2015 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Gusriani, N.; Firdaniza
2018-03-01
The existence of outliers on multiple linear regression analysis causes the Gaussian assumption to be unfulfilled. If the Least Square method is forcedly used on these data, it will produce a model that cannot represent most data. For that, we need a robust regression method against outliers. This paper will compare the Minimum Covariance Determinant (MCD) method and the TELBS method on secondary data on the productivity of phytoplankton, which contains outliers. Based on the robust determinant coefficient value, MCD method produces a better model compared to TELBS method.
Lee, Eunjee; Zhu, Hongtu; Kong, Dehan; Wang, Yalin; Giovanello, Kelly Sullivan; Ibrahim, Joseph G
2015-01-01
The aim of this paper is to develop a Bayesian functional linear Cox regression model (BFLCRM) with both functional and scalar covariates. This new development is motivated by establishing the likelihood of conversion to Alzheimer’s disease (AD) in 346 patients with mild cognitive impairment (MCI) enrolled in the Alzheimer’s Disease Neuroimaging Initiative 1 (ADNI-1) and the early markers of conversion. These 346 MCI patients were followed over 48 months, with 161 MCI participants progressing to AD at 48 months. The functional linear Cox regression model was used to establish that functional covariates including hippocampus surface morphology and scalar covariates including brain MRI volumes, cognitive performance (ADAS-Cog), and APOE status can accurately predict time to onset of AD. Posterior computation proceeds via an efficient Markov chain Monte Carlo algorithm. A simulation study is performed to evaluate the finite sample performance of BFLCRM. PMID:26900412
Wodschow, Kirstine; Hansen, Birgitte; Schullehner, Jörg; Ersbøll, Annette Kjær
2018-06-08
Concentrations and spatial variations of the four cations Na, K, Mg and Ca are known to some extent for groundwater and to a lesser extent for drinking water. Using Denmark as case, the purpose of this study was to analyze the spatial and temporal variations in the major cations in drinking water. The results will contribute to a better exposure estimation in future studies of the association between cations and diseases. Spatial and temporal variations and the association with aquifer types, were analyzed with spatial scan statistics, linear regression and a multilevel mixed-effects linear regression model. About 65,000 water samples of each cation (1980⁻2017) were included in the study. Results of mean concentrations were 31.4 mg/L, 3.5 mg/L, 12.1 mg/L and 84.5 mg/L for 1980⁻2017 for Na, K, Mg and Ca, respectively. An expected west-east trend in concentrations were confirmed, mainly explained by variations in aquifer types. The trend in concentration was stable for about 31⁻45% of the public water supply areas. It is therefore recommended that the exposure estimate in future health related studies not only be based on a single mean value, but that temporal and spatial variations should also be included.
Ridge Regression for Interactive Models.
ERIC Educational Resources Information Center
Tate, Richard L.
1988-01-01
An exploratory study of the value of ridge regression for interactive models is reported. Assuming that the linear terms in a simple interactive model are centered to eliminate non-essential multicollinearity, a variety of common models, representing both ordinal and disordinal interactions, are shown to have "orientations" that are…
Samsuddin, Niza; Rampal, Krishna Gopal; Ismail, Noor Hassim; Abdullah, Nor Zamzila; Nasreen, Hashima E
2016-02-01
Research findings have linked exposure to pesticides to an increased risk of cardiovascular (CVS) diseases. Therefore, this study aimed to assess the impact of chronic mix-pesticides exposure on CVS hemodynamic parameters. A total of 198 male Malay pesticide-exposed and 195 male Malay nonexposed workers were examined. Data were collected through exposure-matrix assessment, questionnaire, blood analyses, and CVS assessment. Explanatory variables comprised of lipid profiles, paraoxonase 1 (PON1), and oxidized low-density lipoprotein (ox-LDL). Outcome measures comprised of brachial and aortic diastolic blood pressure (DBP) and systolic BP (SBP), heart rate, and pulse wave velocity (PWV). Linear regressions identified the B coefficient showing how many units of CVS parameters are associated with each unit of covariates. Diazoxonase was significantly lower and ox-LDL was higher among pesticide-exposed workers than the comparison group. The final multivariate linear regression model revealed that age, body mass index (BMI), smoking, and pesticide exposure were independent predictors of brachial and aortic DBP and SBP. Pesticide exposure was also associated with heart rate, but not with PWV. Lipid profiles, PON1 enzymes, and ox-LDL showed no association with any of the CVS parameters. Chronic mix-pesticide exposure among workers involved in mosquito control has possible association with depression of diazoxonase and the increase in ox-LDL, brachial and aortic DBP and SBP, and heart rate. This study raises concerns that those using pesticides may be exposed to hitherto unrecognized CVS risks among others. If this is confirmed by further studies, greater efforts will be needed to protect these workers. © American Journal of Hypertension, Ltd 2015. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Reasons for Hierarchical Linear Modeling: A Reminder.
ERIC Educational Resources Information Center
Wang, Jianjun
1999-01-01
Uses examples of hierarchical linear modeling (HLM) at local and national levels to illustrate proper applications of HLM and dummy variable regression. Raises cautions about the circumstances under which hierarchical data do not need HLM. (SLD)
Image interpolation via regularized local linear regression.
Liu, Xianming; Zhao, Debin; Xiong, Ruiqin; Ma, Siwei; Gao, Wen; Sun, Huifang
2011-12-01
The linear regression model is a very attractive tool to design effective image interpolation schemes. Some regression-based image interpolation algorithms have been proposed in the literature, in which the objective functions are optimized by ordinary least squares (OLS). However, it is shown that interpolation with OLS may have some undesirable properties from a robustness point of view: even small amounts of outliers can dramatically affect the estimates. To address these issues, in this paper we propose a novel image interpolation algorithm based on regularized local linear regression (RLLR). Starting with the linear regression model where we replace the OLS error norm with the moving least squares (MLS) error norm leads to a robust estimator of local image structure. To keep the solution stable and avoid overfitting, we incorporate the l(2)-norm as the estimator complexity penalty. Moreover, motivated by recent progress on manifold-based semi-supervised learning, we explicitly consider the intrinsic manifold structure by making use of both measured and unmeasured data points. Specifically, our framework incorporates the geometric structure of the marginal probability distribution induced by unmeasured samples as an additional local smoothness preserving constraint. The optimal model parameters can be obtained with a closed-form solution by solving a convex optimization problem. Experimental results on benchmark test images demonstrate that the proposed method achieves very competitive performance with the state-of-the-art interpolation algorithms, especially in image edge structure preservation. © 2011 IEEE
Sun, Wei; Huang, Guo H; Lv, Ying; Li, Gongchen
2012-06-01
To tackle nonlinear economies-of-scale (EOS) effects in interval-parameter constraints for a representative waste management problem, an inexact piecewise-linearization-based fuzzy flexible programming (IPFP) model is developed. In IPFP, interval parameters for waste amounts and transportation/operation costs can be quantified; aspiration levels for net system costs, as well as tolerance intervals for both capacities of waste treatment facilities and waste generation rates can be reflected; and the nonlinear EOS effects transformed from objective function to constraints can be approximated. An interactive algorithm is proposed for solving the IPFP model, which in nature is an interval-parameter mixed-integer quadratically constrained programming model. To demonstrate the IPFP's advantages, two alternative models are developed to compare their performances. One is a conventional linear-regression-based inexact fuzzy programming model (IPFP2) and the other is an IPFP model with all right-hand-sides of fussy constraints being the corresponding interval numbers (IPFP3). The comparison results between IPFP and IPFP2 indicate that the optimized waste amounts would have the similar patterns in both models. However, when dealing with EOS effects in constraints, the IPFP2 may underestimate the net system costs while the IPFP can estimate the costs more accurately. The comparison results between IPFP and IPFP3 indicate that their solutions would be significantly different. The decreased system uncertainties in IPFP's solutions demonstrate its effectiveness for providing more satisfactory interval solutions than IPFP3. Following its first application to waste management, the IPFP can be potentially applied to other environmental problems under multiple complexities. Copyright © 2012 Elsevier Ltd. All rights reserved.
Logistic Mixed Models to Investigate Implicit and Explicit Belief Tracking
Lages, Martin; Scheel, Anne
2016-01-01
We investigated the proposition of a two-systems Theory of Mind in adults’ belief tracking. A sample of N = 45 participants predicted the choice of one of two opponent players after observing several rounds in an animated card game. Three matches of this card game were played and initial gaze direction on target and subsequent choice predictions were recorded for each belief task and participant. We conducted logistic regressions with mixed effects on the binary data and developed Bayesian logistic mixed models to infer implicit and explicit mentalizing in true belief and false belief tasks. Although logistic regressions with mixed effects predicted the data well a Bayesian logistic mixed model with latent task- and subject-specific parameters gave a better account of the data. As expected explicit choice predictions suggested a clear understanding of true and false beliefs (TB/FB). Surprisingly, however, model parameters for initial gaze direction also indicated belief tracking. We discuss why task-specific parameters for initial gaze directions are different from choice predictions yet reflect second-order perspective taking. PMID:27853440
NASA Technical Reports Server (NTRS)
Banse, Karl; Yong, Marina
1990-01-01
As a proxy for satellite CZCS observations and concurrent measurements of primary production rates, data from 138 stations occupied seasonally during 1967-1968 in the offshore eastern tropical Pacific were analyzed in terms of six temporal groups and our current regimes. Multiple linear regressions on column production Pt show that simulated satellite pigment is generally weakly correlated, but sometimes not correlated with Pt, and that incident irradiance, sea surface temperature, nitrate, transparency, and depths of mixed layer or nitracline assume little or no importance. After a proxy for the light-saturated chlorophyll-specific photosynthetic rate P(max) is added, the coefficient of determination ranges from 0.55 to 0.91 (median of 0.85) for the 10 cases. In stepwise multiple linear regressions the P(max) proxy is the best predictor for Pt.
The long-solved problem of the best-fit straight line: Application to isotopic mixing lines
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wehr, Richard; Saleska, Scott R.
It has been almost 50 years since York published an exact and general solution for the best-fit straight line to independent points with normally distributed errors in both x and y. York's solution is highly cited in the geophysical literature but almost unknown outside of it, so that there has been no ebb in the tide of books and papers wrestling with the problem. Much of the post-1969 literature on straight-line fitting has sown confusion not merely by its content but by its very existence. The optimal least-squares fit is already known; the problem is already solved. Here we introducemore » the non-specialist reader to York's solution and demonstrate its application in the interesting case of the isotopic mixing line, an analytical tool widely used to determine the isotopic signature of trace gas sources for the study of biogeochemical cycles. The most commonly known linear regression methods – ordinary least-squares regression (OLS), geometric mean regression (GMR), and orthogonal distance regression (ODR) – have each been recommended as the best method for fitting isotopic mixing lines. In fact, OLS, GMR, and ODR are all special cases of York's solution that are valid only under particular measurement conditions, and those conditions do not hold in general for isotopic mixing lines. Here, using Monte Carlo simulations, we quantify the biases in OLS, GMR, and ODR under various conditions and show that York's general – and convenient – solution is always the least biased.« less
The long-solved problem of the best-fit straight line: Application to isotopic mixing lines
Wehr, Richard; Saleska, Scott R.
2017-01-03
It has been almost 50 years since York published an exact and general solution for the best-fit straight line to independent points with normally distributed errors in both x and y. York's solution is highly cited in the geophysical literature but almost unknown outside of it, so that there has been no ebb in the tide of books and papers wrestling with the problem. Much of the post-1969 literature on straight-line fitting has sown confusion not merely by its content but by its very existence. The optimal least-squares fit is already known; the problem is already solved. Here we introducemore » the non-specialist reader to York's solution and demonstrate its application in the interesting case of the isotopic mixing line, an analytical tool widely used to determine the isotopic signature of trace gas sources for the study of biogeochemical cycles. The most commonly known linear regression methods – ordinary least-squares regression (OLS), geometric mean regression (GMR), and orthogonal distance regression (ODR) – have each been recommended as the best method for fitting isotopic mixing lines. In fact, OLS, GMR, and ODR are all special cases of York's solution that are valid only under particular measurement conditions, and those conditions do not hold in general for isotopic mixing lines. Here, using Monte Carlo simulations, we quantify the biases in OLS, GMR, and ODR under various conditions and show that York's general – and convenient – solution is always the least biased.« less
How Robust Is Linear Regression with Dummy Variables?
ERIC Educational Resources Information Center
Blankmeyer, Eric
2006-01-01
Researchers in education and the social sciences make extensive use of linear regression models in which the dependent variable is continuous-valued while the explanatory variables are a combination of continuous-valued regressors and dummy variables. The dummies partition the sample into groups, some of which may contain only a few observations.…
Spinnato, J; Roubaud, M-C; Burle, B; Torrésani, B
2015-06-01
The main goal of this work is to develop a model for multisensor signals, such as magnetoencephalography or electroencephalography (EEG) signals that account for inter-trial variability, suitable for corresponding binary classification problems. An important constraint is that the model be simple enough to handle small size and unbalanced datasets, as often encountered in BCI-type experiments. The method involves the linear mixed effects statistical model, wavelet transform, and spatial filtering, and aims at the characterization of localized discriminant features in multisensor signals. After discrete wavelet transform and spatial filtering, a projection onto the relevant wavelet and spatial channels subspaces is used for dimension reduction. The projected signals are then decomposed as the sum of a signal of interest (i.e., discriminant) and background noise, using a very simple Gaussian linear mixed model. Thanks to the simplicity of the model, the corresponding parameter estimation problem is simplified. Robust estimates of class-covariance matrices are obtained from small sample sizes and an effective Bayes plug-in classifier is derived. The approach is applied to the detection of error potentials in multichannel EEG data in a very unbalanced situation (detection of rare events). Classification results prove the relevance of the proposed approach in such a context. The combination of the linear mixed model, wavelet transform and spatial filtering for EEG classification is, to the best of our knowledge, an original approach, which is proven to be effective. This paper improves upon earlier results on similar problems, and the three main ingredients all play an important role.
Barzi, Federica; Jones, Graham R D; Hughes, Jaquelyne T; Lawton, Paul D; Hoy, Wendy; O'Dea, Kerin; Jerums, George; MacIsaac, Richard J; Cass, Alan; Maple-Brown, Louise J
2018-03-01
Being able to estimate kidney decline accurately is particularly important in Indigenous Australians, a population at increased risk of developing chronic kidney disease and end stage kidney disease. The aim of this analysis was to explore the trend of decline in estimated glomerular filtration rate (eGFR) over a four year period using multiple local creatinine measures, compared with estimates derived using centrally-measured enzymatic creatinine and with estimates derived using only two local measures. The eGFR study comprised a cohort of over 600 Aboriginal Australian participants recruited from over twenty sites in urban, regional and remote Australia across five strata of health, diabetes and kidney function. Trajectories of eGFR were explored on 385 participants with at least three local creatinine records using graphical methods that compared the linear trends fitted using linear mixed models with non-linear trends fitted using fractional polynomial equations. Temporal changes of local creatinine were also characterized using group-based modelling. Analyses were stratified by eGFR (<60; 60-89; 90-119 and ≥120ml/min/1.73m 2 ) and albuminuria categories (<3mg/mmol; 3-30mg/mmol; >30mg/mmol). Mean age of the participants was 48years, 64% were female and the median follow-up was 3years. Decline of eGFR was accurately estimated using simple linear regression models and locally measured creatinine was as good as centrally measured creatinine at predicting kidney decline in people with an eGFR<60 and an eGFR 60-90ml/min/1.73m 2 with albuminuria. Analyses showed that one baseline and one follow-up locally measured creatinine may be sufficient to estimate short term (up to four years) kidney function decline. The greatest yearly decline was estimated in those with eGFR 60-90 and macro-albuminuria: -6.21 (-8.20, -4.23) ml/min/1.73m 2 . Short term estimates of kidney function decline can be reliably derived using an easy to implement and simple to interpret linear mixed effect model. Locally measured creatinine did not differ to centrally measured creatinine, thus is an accurate cost-efficient and timely means to monitoring kidney function progression. Copyright © 2018 The Canadian Society of Clinical Chemists. Published by Elsevier Inc. All rights reserved.
A green vehicle routing problem with customer satisfaction criteria
NASA Astrophysics Data System (ADS)
Afshar-Bakeshloo, M.; Mehrabi, A.; Safari, H.; Maleki, M.; Jolai, F.
2016-12-01
This paper develops an MILP model, named Satisfactory-Green Vehicle Routing Problem. It consists of routing a heterogeneous fleet of vehicles in order to serve a set of customers within predefined time windows. In this model in addition to the traditional objective of the VRP, both the pollution and customers' satisfaction have been taken into account. Meanwhile, the introduced model prepares an effective dashboard for decision-makers that determines appropriate routes, the best mixed fleet, speed and idle time of vehicles. Additionally, some new factors evaluate the greening of each decision based on three criteria. This model applies piecewise linear functions (PLFs) to linearize a nonlinear fuzzy interval for incorporating customers' satisfaction into other linear objectives. We have presented a mixed integer linear programming formulation for the S-GVRP. This model enriches managerial insights by providing trade-offs between customers' satisfaction, total costs and emission levels. Finally, we have provided a numerical study for showing the applicability of the model.
Estimating effects of limiting factors with regression quantiles
Cade, B.S.; Terrell, J.W.; Schroeder, R.L.
1999-01-01
In a recent Concepts paper in Ecology, Thomson et al. emphasized that assumptions of conventional correlation and regression analyses fundamentally conflict with the ecological concept of limiting factors, and they called for new statistical procedures to address this problem. The analytical issue is that unmeasured factors may be the active limiting constraint and may induce a pattern of unequal variation in the biological response variable through an interaction with the measured factors. Consequently, changes near the maxima, rather than at the center of response distributions, are better estimates of the effects expected when the observed factor is the active limiting constraint. Regression quantiles provide estimates for linear models fit to any part of a response distribution, including near the upper bounds, and require minimal assumptions about the form of the error distribution. Regression quantiles extend the concept of one-sample quantiles to the linear model by solving an optimization problem of minimizing an asymmetric function of absolute errors. Rank-score tests for regression quantiles provide tests of hypotheses and confidence intervals for parameters in linear models with heteroscedastic errors, conditions likely to occur in models of limiting ecological relations. We used selected regression quantiles (e.g., 5th, 10th, ..., 95th) and confidence intervals to test hypotheses that parameters equal zero for estimated changes in average annual acorn biomass due to forest canopy cover of oak (Quercus spp.) and oak species diversity. Regression quantiles also were used to estimate changes in glacier lily (Erythronium grandiflorum) seedling numbers as a function of lily flower numbers, rockiness, and pocket gopher (Thomomys talpoides fossor) activity, data that motivated the query by Thomson et al. for new statistical procedures. Both example applications showed that effects of limiting factors estimated by changes in some upper regression quantile (e.g., 90-95th) were greater than if effects were estimated by changes in the means from standard linear model procedures. Estimating a range of regression quantiles (e.g., 5-95th) provides a comprehensive description of biological response patterns for exploratory and inferential analyses in observational studies of limiting factors, especially when sampling large spatial and temporal scales.
Chen, Han; Wang, Chaolong; Conomos, Matthew P; Stilp, Adrienne M; Li, Zilin; Sofer, Tamar; Szpiro, Adam A; Chen, Wei; Brehm, John M; Celedón, Juan C; Redline, Susan; Papanicolaou, George J; Thornton, Timothy A; Laurie, Cathy C; Rice, Kenneth; Lin, Xihong
2016-04-07
Linear mixed models (LMMs) are widely used in genome-wide association studies (GWASs) to account for population structure and relatedness, for both continuous and binary traits. Motivated by the failure of LMMs to control type I errors in a GWAS of asthma, a binary trait, we show that LMMs are generally inappropriate for analyzing binary traits when population stratification leads to violation of the LMM's constant-residual variance assumption. To overcome this problem, we develop a computationally efficient logistic mixed model approach for genome-wide analysis of binary traits, the generalized linear mixed model association test (GMMAT). This approach fits a logistic mixed model once per GWAS and performs score tests under the null hypothesis of no association between a binary trait and individual genetic variants. We show in simulation studies and real data analysis that GMMAT effectively controls for population structure and relatedness when analyzing binary traits in a wide variety of study designs. Copyright © 2016 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Seong W. Lee
During this reporting period, the literature survey including the gasifier temperature measurement literature, the ultrasonic application and its background study in cleaning application, and spray coating process are completed. The gasifier simulator (cold model) testing has been successfully conducted. Four factors (blower voltage, ultrasonic application, injection time intervals, particle weight) were considered as significant factors that affect the temperature measurement. The Analysis of Variance (ANOVA) was applied to analyze the test data. The analysis shows that all four factors are significant to the temperature measurements in the gasifier simulator (cold model). The regression analysis for the case with the normalizedmore » room temperature shows that linear model fits the temperature data with 82% accuracy (18% error). The regression analysis for the case without the normalized room temperature shows 72.5% accuracy (27.5% error). The nonlinear regression analysis indicates a better fit than that of the linear regression. The nonlinear regression model's accuracy is 88.7% (11.3% error) for normalized room temperature case, which is better than the linear regression analysis. The hot model thermocouple sleeve design and fabrication are completed. The gasifier simulator (hot model) design and the fabrication are completed. The system tests of the gasifier simulator (hot model) have been conducted and some modifications have been made. Based on the system tests and results analysis, the gasifier simulator (hot model) has met the proposed design requirement and the ready for system test. The ultrasonic cleaning method is under evaluation and will be further studied for the gasifier simulator (hot model) application. The progress of this project has been on schedule.« less
Pfeiffer, R M; Riedl, R
2015-08-15
We assess the asymptotic bias of estimates of exposure effects conditional on covariates when summary scores of confounders, instead of the confounders themselves, are used to analyze observational data. First, we study regression models for cohort data that are adjusted for summary scores. Second, we derive the asymptotic bias for case-control studies when cases and controls are matched on a summary score, and then analyzed either using conditional logistic regression or by unconditional logistic regression adjusted for the summary score. Two scores, the propensity score (PS) and the disease risk score (DRS) are studied in detail. For cohort analysis, when regression models are adjusted for the PS, the estimated conditional treatment effect is unbiased only for linear models, or at the null for non-linear models. Adjustment of cohort data for DRS yields unbiased estimates only for linear regression; all other estimates of exposure effects are biased. Matching cases and controls on DRS and analyzing them using conditional logistic regression yields unbiased estimates of exposure effect, whereas adjusting for the DRS in unconditional logistic regression yields biased estimates, even under the null hypothesis of no association. Matching cases and controls on the PS yield unbiased estimates only under the null for both conditional and unconditional logistic regression, adjusted for the PS. We study the bias for various confounding scenarios and compare our asymptotic results with those from simulations with limited sample sizes. To create realistic correlations among multiple confounders, we also based simulations on a real dataset. Copyright © 2015 John Wiley & Sons, Ltd.
Statistical power analyses using G*Power 3.1: tests for correlation and regression analyses.
Faul, Franz; Erdfelder, Edgar; Buchner, Axel; Lang, Albert-Georg
2009-11-01
G*Power is a free power analysis program for a variety of statistical tests. We present extensions and improvements of the version introduced by Faul, Erdfelder, Lang, and Buchner (2007) in the domain of correlation and regression analyses. In the new version, we have added procedures to analyze the power of tests based on (1) single-sample tetrachoric correlations, (2) comparisons of dependent correlations, (3) bivariate linear regression, (4) multiple linear regression based on the random predictor model, (5) logistic regression, and (6) Poisson regression. We describe these new features and provide a brief introduction to their scope and handling.
Nie, Z Q; Ou, Y Q; Zhuang, J; Qu, Y J; Mai, J Z; Chen, J M; Liu, X Q
2016-05-01
Conditional logistic regression analysis and unconditional logistic regression analysis are commonly used in case control study, but Cox proportional hazard model is often used in survival data analysis. Most literature only refer to main effect model, however, generalized linear model differs from general linear model, and the interaction was composed of multiplicative interaction and additive interaction. The former is only statistical significant, but the latter has biological significance. In this paper, macros was written by using SAS 9.4 and the contrast ratio, attributable proportion due to interaction and synergy index were calculated while calculating the items of logistic and Cox regression interactions, and the confidence intervals of Wald, delta and profile likelihood were used to evaluate additive interaction for the reference in big data analysis in clinical epidemiology and in analysis of genetic multiplicative and additive interactions.
Marrero-Ponce, Yovani; Medina-Marrero, Ricardo; Castillo-Garit, Juan A; Romero-Zaldivar, Vicente; Torrens, Francisco; Castro, Eduardo A
2005-04-15
A novel approach to bio-macromolecular design from a linear algebra point of view is introduced. A protein's total (whole protein) and local (one or more amino acid) linear indices are a new set of bio-macromolecular descriptors of relevance to protein QSAR/QSPR studies. These amino-acid level biochemical descriptors are based on the calculation of linear maps on Rn[f k(xmi):Rn-->Rn] in canonical basis. These bio-macromolecular indices are calculated from the kth power of the macromolecular pseudograph alpha-carbon atom adjacency matrix. Total linear indices are linear functional on Rn. That is, the kth total linear indices are linear maps from Rn to the scalar R[f k(xm):Rn-->R]. Thus, the kth total linear indices are calculated by summing the amino-acid linear indices of all amino acids in the protein molecule. A study of the protein stability effects for a complete set of alanine substitutions in the Arc repressor illustrates this approach. A quantitative model that discriminates near wild-type stability alanine mutants from the reduced-stability ones in a training series was obtained. This model permitted the correct classification of 97.56% (40/41) and 91.67% (11/12) of proteins in the training and test set, respectively. It shows a high Matthews correlation coefficient (MCC=0.952) for the training set and an MCC=0.837 for the external prediction set. Additionally, canonical regression analysis corroborated the statistical quality of the classification model (Rcanc=0.824). This analysis was also used to compute biological stability canonical scores for each Arc alanine mutant. On the other hand, the linear piecewise regression model compared favorably with respect to the linear regression one on predicting the melting temperature (tm) of the Arc alanine mutants. The linear model explains almost 81% of the variance of the experimental tm (R=0.90 and s=4.29) and the LOO press statistics evidenced its predictive ability (q2=0.72 and scv=4.79). Moreover, the TOMOCOMD-CAMPS method produced a linear piecewise regression (R=0.97) between protein backbone descriptors and tm values for alanine mutants of the Arc repressor. A break-point value of 51.87 degrees C characterized two mutant clusters and coincided perfectly with the experimental scale. For this reason, we can use the linear discriminant analysis and piecewise models in combination to classify and predict the stability of the mutant Arc homodimers. These models also permitted the interpretation of the driving forces of such folding process, indicating that topologic/topographic protein backbone interactions control the stability profile of wild-type Arc and its alanine mutants.
Unequal views of inequality: Cross-national support for redistribution 1985-2011.
VanHeuvelen, Tom
2017-05-01
This research examines public views on government responsibility to reduce income inequality, support for redistribution. While individual-level correlates of support for redistribution are relatively well understood, many questions remain at the country-level. Therefore, I examine how country-level characteristics affect aggregate support for redistribution. I test explanations of aggregate support using a unique dataset combining 18 waves of the International Social Survey Programme and European Social Survey. Results from mixed-effects logistic regression and fixed-effects linear regression models show two primary and contrasting effects. States that reduce inequality through bundles of tax and transfer policies are rewarded with more supportive publics. In contrast, economic development has a seemingly equivalent and dampening effect on public support. Importantly, the effect of economic development grows at higher levels of development, potentially overwhelming the amplifying effect of state redistribution. My results therefore suggest a fundamental challenge to proponents of egalitarian politics. Copyright © 2016 Elsevier Inc. All rights reserved.
Bayesian generalized linear mixed modeling of Tuberculosis using informative priors.
Ojo, Oluwatobi Blessing; Lougue, Siaka; Woldegerima, Woldegebriel Assefa
2017-01-01
TB is rated as one of the world's deadliest diseases and South Africa ranks 9th out of the 22 countries with hardest hit of TB. Although many pieces of research have been carried out on this subject, this paper steps further by inculcating past knowledge into the model, using Bayesian approach with informative prior. Bayesian statistics approach is getting popular in data analyses. But, most applications of Bayesian inference technique are limited to situations of non-informative prior, where there is no solid external information about the distribution of the parameter of interest. The main aim of this study is to profile people living with TB in South Africa. In this paper, identical regression models are fitted for classical and Bayesian approach both with non-informative and informative prior, using South Africa General Household Survey (GHS) data for the year 2014. For the Bayesian model with informative prior, South Africa General Household Survey dataset for the year 2011 to 2013 are used to set up priors for the model 2014.
Some comparisons of complexity in dictionary-based and linear computational models.
Gnecco, Giorgio; Kůrková, Věra; Sanguineti, Marcello
2011-03-01
Neural networks provide a more flexible approximation of functions than traditional linear regression. In the latter, one can only adjust the coefficients in linear combinations of fixed sets of functions, such as orthogonal polynomials or Hermite functions, while for neural networks, one may also adjust the parameters of the functions which are being combined. However, some useful properties of linear approximators (such as uniqueness, homogeneity, and continuity of best approximation operators) are not satisfied by neural networks. Moreover, optimization of parameters in neural networks becomes more difficult than in linear regression. Experimental results suggest that these drawbacks of neural networks are offset by substantially lower model complexity, allowing accuracy of approximation even in high-dimensional cases. We give some theoretical results comparing requirements on model complexity for two types of approximators, the traditional linear ones and so called variable-basis types, which include neural networks, radial, and kernel models. We compare upper bounds on worst-case errors in variable-basis approximation with lower bounds on such errors for any linear approximator. Using methods from nonlinear approximation and integral representations tailored to computational units, we describe some cases where neural networks outperform any linear approximator. Copyright © 2010 Elsevier Ltd. All rights reserved.
Applications of MIDAS regression in analysing trends in water quality
NASA Astrophysics Data System (ADS)
Penev, Spiridon; Leonte, Daniela; Lazarov, Zdravetz; Mann, Rob A.
2014-04-01
We discuss novel statistical methods in analysing trends in water quality. Such analysis uses complex data sets of different classes of variables, including water quality, hydrological and meteorological. We analyse the effect of rainfall and flow on trends in water quality utilising a flexible model called Mixed Data Sampling (MIDAS). This model arises because of the mixed frequency in the data collection. Typically, water quality variables are sampled fortnightly, whereas the rain data is sampled daily. The advantage of using MIDAS regression is in the flexible and parsimonious modelling of the influence of the rain and flow on trends in water quality variables. We discuss the model and its implementation on a data set from the Shoalhaven Supply System and Catchments in the state of New South Wales, Australia. Information criteria indicate that MIDAS modelling improves upon simplistic approaches that do not utilise the mixed data sampling nature of the data.
NASA Astrophysics Data System (ADS)
Manolakis, Dimitris G.
2004-10-01
The linear mixing model is widely used in hyperspectral imaging applications to model the reflectance spectra of mixed pixels in the SWIR atmospheric window or the radiance spectra of plume gases in the LWIR atmospheric window. In both cases it is important to detect the presence of materials or gases and then estimate their amount, if they are present. The detection and estimation algorithms available for these tasks are related but they are not identical. The objective of this paper is to theoretically investigate how the heavy tails observed in hyperspectral background data affect the quality of abundance estimates and how the F-test, used for endmember selection, is robust to the presence of heavy tails when the model fits the data.
TG study of the Li0.4Fe2.4Zn0.2O4 ferrite synthesis
NASA Astrophysics Data System (ADS)
Lysenko, E. N.; Nikolaev, E. V.; Surzhikov, A. P.
2016-02-01
In this paper, the kinetic analysis of Li-Zn ferrite synthesis was studied using thermogravimetry (TG) method through the simultaneous application of non-linear regression to several measurements run at different heating rates (multivariate non-linear regression). Using TG-curves obtained for the four heating rates and Netzsch Thermokinetics software package, the kinetic models with minimal adjustable parameters were selected to quantitatively describe the reaction of Li-Zn ferrite synthesis. It was shown that the experimental TG-curves clearly suggest a two-step process for the ferrite synthesis and therefore a model-fitting kinetic analysis based on multivariate non-linear regressions was conducted. The complex reaction was described by a two-step reaction scheme consisting of sequential reaction steps. It is established that the best results were obtained using the Yander three-dimensional diffusion model at the first stage and Ginstling-Bronstein model at the second step. The kinetic parameters for lithium-zinc ferrite synthesis reaction were found and discussed.
Modeling thermal sensation in a Mediterranean climate—a comparison of linear and ordinal models
NASA Astrophysics Data System (ADS)
Pantavou, Katerina; Lykoudis, Spyridon
2014-08-01
A simple thermo-physiological model of outdoor thermal sensation adjusted with psychological factors is developed aiming to predict thermal sensation in Mediterranean climates. Microclimatic measurements simultaneously with interviews on personal and psychological conditions were carried out in a square, a street canyon and a coastal location of the greater urban area of Athens, Greece. Multiple linear and ordinal regression were applied in order to estimate thermal sensation making allowance for all the recorded parameters or specific, empirically selected, subsets producing so-called extensive and empirical models, respectively. Meteorological, thermo-physiological and overall models - considering psychological factors as well - were developed. Predictions were improved when personal and psychological factors were taken into account as compared to meteorological models. The model based on ordinal regression reproduced extreme values of thermal sensation vote more adequately than the linear regression one, while the empirical model produced satisfactory results in relation to the extensive model. The effects of adaptation and expectation on thermal sensation vote were introduced in the models by means of the exposure time, season and preference related to air temperature and irradiation. The assessment of thermal sensation could be a useful criterion in decision making regarding public health, outdoor spaces planning and tourism.
NASA Astrophysics Data System (ADS)
Wang, Jin; Sun, Tao; Fu, Anmin; Xu, Hao; Wang, Xinjie
2018-05-01
Degradation in drylands is a critically important global issue that threatens ecosystem and environmental in many ways. Researchers have tried to use remote sensing data and meteorological data to perform residual trend analysis and identify human-induced vegetation changes. However, complex interactions between vegetation and climate, soil units and topography have not yet been considered. Data used in the study included annual accumulated Moderate Resolution Imaging Spectroradiometer (MODIS) 250 m normalized difference vegetation index (NDVI) from 2002 to 2013, accumulated rainfall from September to August, digital elevation model (DEM) and soil units. This paper presents linear mixed-effect (LME) modeling methods for the NDVI-rainfall relationship. We developed linear mixed-effects models that considered the random effects of sample points nested in soil units for nested two-level modeling and single-level modeling of soil units and sample points, respectively. Additionally, three functions, including the exponential function (exp), the power function (power), and the constant plus power function (CPP), were tested to remove heterogeneity, and an additional three correlation structures, including the first-order autoregressive structure [AR(1)], a combination of first-order autoregressive and moving average structures [ARMA(1,1)] and the compound symmetry structure (CS), were used to address the spatiotemporal correlations. It was concluded that the nested two-level model considering both heteroscedasticity with (CPP) and spatiotemporal correlation with [ARMA(1,1)] showed the best performance (AMR = 0.1881, RMSE = 0.2576, adj- R 2 = 0.9593). Variations between soil units and sample points that may have an effect on the NDVI-rainfall relationship should be included in model structures, and linear mixed-effects modeling achieves this in an effective and accurate way.
Distributed Monitoring of the R(sup 2) Statistic for Linear Regression
NASA Technical Reports Server (NTRS)
Bhaduri, Kanishka; Das, Kamalika; Giannella, Chris R.
2011-01-01
The problem of monitoring a multivariate linear regression model is relevant in studying the evolving relationship between a set of input variables (features) and one or more dependent target variables. This problem becomes challenging for large scale data in a distributed computing environment when only a subset of instances is available at individual nodes and the local data changes frequently. Data centralization and periodic model recomputation can add high overhead to tasks like anomaly detection in such dynamic settings. Therefore, the goal is to develop techniques for monitoring and updating the model over the union of all nodes data in a communication-efficient fashion. Correctness guarantees on such techniques are also often highly desirable, especially in safety-critical application scenarios. In this paper we develop DReMo a distributed algorithm with very low resource overhead, for monitoring the quality of a regression model in terms of its coefficient of determination (R2 statistic). When the nodes collectively determine that R2 has dropped below a fixed threshold, the linear regression model is recomputed via a network-wide convergecast and the updated model is broadcast back to all nodes. We show empirically, using both synthetic and real data, that our proposed method is highly communication-efficient and scalable, and also provide theoretical guarantees on correctness.
NASA Astrophysics Data System (ADS)
Tan, C. H.; Matjafri, M. Z.; Lim, H. S.
2015-10-01
This paper presents the prediction models which analyze and compute the CO2 emission in Malaysia. Each prediction model for CO2 emission will be analyzed based on three main groups which is transportation, electricity and heat production as well as residential buildings and commercial and public services. The prediction models were generated using data obtained from World Bank Open Data. Best subset method will be used to remove irrelevant data and followed by multi linear regression to produce the prediction models. From the results, high R-square (prediction) value was obtained and this implies that the models are reliable to predict the CO2 emission by using specific data. In addition, the CO2 emissions from these three groups are forecasted using trend analysis plots for observation purpose.
VoxelStats: A MATLAB Package for Multi-Modal Voxel-Wise Brain Image Analysis.
Mathotaarachchi, Sulantha; Wang, Seqian; Shin, Monica; Pascoal, Tharick A; Benedet, Andrea L; Kang, Min Su; Beaudry, Thomas; Fonov, Vladimir S; Gauthier, Serge; Labbe, Aurélie; Rosa-Neto, Pedro
2016-01-01
In healthy individuals, behavioral outcomes are highly associated with the variability on brain regional structure or neurochemical phenotypes. Similarly, in the context of neurodegenerative conditions, neuroimaging reveals that cognitive decline is linked to the magnitude of atrophy, neurochemical declines, or concentrations of abnormal protein aggregates across brain regions. However, modeling the effects of multiple regional abnormalities as determinants of cognitive decline at the voxel level remains largely unexplored by multimodal imaging research, given the high computational cost of estimating regression models for every single voxel from various imaging modalities. VoxelStats is a voxel-wise computational framework to overcome these computational limitations and to perform statistical operations on multiple scalar variables and imaging modalities at the voxel level. VoxelStats package has been developed in Matlab(®) and supports imaging formats such as Nifti-1, ANALYZE, and MINC v2. Prebuilt functions in VoxelStats enable the user to perform voxel-wise general and generalized linear models and mixed effect models with multiple volumetric covariates. Importantly, VoxelStats can recognize scalar values or image volumes as response variables and can accommodate volumetric statistical covariates as well as their interaction effects with other variables. Furthermore, this package includes built-in functionality to perform voxel-wise receiver operating characteristic analysis and paired and unpaired group contrast analysis. Validation of VoxelStats was conducted by comparing the linear regression functionality with existing toolboxes such as glim_image and RMINC. The validation results were identical to existing methods and the additional functionality was demonstrated by generating feature case assessments (t-statistics, odds ratio, and true positive rate maps). In summary, VoxelStats expands the current methods for multimodal imaging analysis by allowing the estimation of advanced regional association metrics at the voxel level.
2013-01-01
Background This study aims to improve accuracy of Bioelectrical Impedance Analysis (BIA) prediction equations for estimating fat free mass (FFM) of the elderly by using non-linear Back Propagation Artificial Neural Network (BP-ANN) model and to compare the predictive accuracy with the linear regression model by using energy dual X-ray absorptiometry (DXA) as reference method. Methods A total of 88 Taiwanese elderly adults were recruited in this study as subjects. Linear regression equations and BP-ANN prediction equation were developed using impedances and other anthropometrics for predicting the reference FFM measured by DXA (FFMDXA) in 36 male and 26 female Taiwanese elderly adults. The FFM estimated by BIA prediction equations using traditional linear regression model (FFMLR) and BP-ANN model (FFMANN) were compared to the FFMDXA. The measuring results of an additional 26 elderly adults were used to validate than accuracy of the predictive models. Results The results showed the significant predictors were impedance, gender, age, height and weight in developed FFMLR linear model (LR) for predicting FFM (coefficient of determination, r2 = 0.940; standard error of estimate (SEE) = 2.729 kg; root mean square error (RMSE) = 2.571kg, P < 0.001). The above predictors were set as the variables of the input layer by using five neurons in the BP-ANN model (r2 = 0.987 with a SD = 1.192 kg and relatively lower RMSE = 1.183 kg), which had greater (improved) accuracy for estimating FFM when compared with linear model. The results showed a better agreement existed between FFMANN and FFMDXA than that between FFMLR and FFMDXA. Conclusion When compared the performance of developed prediction equations for estimating reference FFMDXA, the linear model has lower r2 with a larger SD in predictive results than that of BP-ANN model, which indicated ANN model is more suitable for estimating FFM. PMID:23388042
Kilian, Reinhold; Matschinger, Herbert; Löeffler, Walter; Roick, Christiane; Angermeyer, Matthias C
2002-03-01
Transformation of the dependent cost variable is often used to solve the problems of heteroscedasticity and skewness in linear ordinary least square regression of health service cost data. However, transformation may cause difficulties in the interpretation of regression coefficients and the retransformation of predicted values. The study compares the advantages and disadvantages of different methods to estimate regression based cost functions using data on the annual costs of schizophrenia treatment. Annual costs of psychiatric service use and clinical and socio-demographic characteristics of the patients were assessed for a sample of 254 patients with a diagnosis of schizophrenia (ICD-10 F 20.0) living in Leipzig. The clinical characteristics of the participants were assessed by means of the BPRS 4.0, the GAF, and the CAN for service needs. Quality of life was measured by WHOQOL-BREF. A linear OLS regression model with non-parametric standard errors, a log-transformed OLS model and a generalized linear model with a log-link and a gamma distribution were used to estimate service costs. For the estimation of robust non-parametric standard errors, the variance estimator by White and a bootstrap estimator based on 2000 replications were employed. Models were evaluated by the comparison of the R2 and the root mean squared error (RMSE). RMSE of the log-transformed OLS model was computed with three different methods of bias-correction. The 95% confidence intervals for the differences between the RMSE were computed by means of bootstrapping. A split-sample-cross-validation procedure was used to forecast the costs for the one half of the sample on the basis of a regression equation computed for the other half of the sample. All three methods showed significant positive influences of psychiatric symptoms and met psychiatric service needs on service costs. Only the log- transformed OLS model showed a significant negative impact of age, and only the GLM shows a significant negative influences of employment status and partnership on costs. All three models provided a R2 of about.31. The Residuals of the linear OLS model revealed significant deviances from normality and homoscedasticity. The residuals of the log-transformed model are normally distributed but still heteroscedastic. The linear OLS model provided the lowest prediction error and the best forecast of the dependent cost variable. The log-transformed model provided the lowest RMSE if the heteroscedastic bias correction was used. The RMSE of the GLM with a log link and a gamma distribution was higher than those of the linear OLS model and the log-transformed OLS model. The difference between the RMSE of the linear OLS model and that of the log-transformed OLS model without bias correction was significant at the 95% level. As result of the cross-validation procedure, the linear OLS model provided the lowest RMSE followed by the log-transformed OLS model with a heteroscedastic bias correction. The GLM showed the weakest model fit again. None of the differences between the RMSE resulting form the cross- validation procedure were found to be significant. The comparison of the fit indices of the different regression models revealed that the linear OLS model provided a better fit than the log-transformed model and the GLM, but the differences between the models RMSE were not significant. Due to the small number of cases in the study the lack of significance does not sufficiently proof that the differences between the RSME for the different models are zero and the superiority of the linear OLS model can not be generalized. The lack of significant differences among the alternative estimators may reflect a lack of sample size adequate to detect important differences among the estimators employed. Further studies with larger case number are necessary to confirm the results. Specification of an adequate regression models requires a careful examination of the characteristics of the data. Estimation of standard errors and confidence intervals by nonparametric methods which are robust against deviations from the normal distribution and the homoscedasticity of residuals are suitable alternatives to the transformation of the skew distributed dependent variable. Further studies with more adequate case numbers are needed to confirm the results.
Vaeth, Michael; Skovlund, Eva
2004-06-15
For a given regression problem it is possible to identify a suitably defined equivalent two-sample problem such that the power or sample size obtained for the two-sample problem also applies to the regression problem. For a standard linear regression model the equivalent two-sample problem is easily identified, but for generalized linear models and for Cox regression models the situation is more complicated. An approximately equivalent two-sample problem may, however, also be identified here. In particular, we show that for logistic regression and Cox regression models the equivalent two-sample problem is obtained by selecting two equally sized samples for which the parameters differ by a value equal to the slope times twice the standard deviation of the independent variable and further requiring that the overall expected number of events is unchanged. In a simulation study we examine the validity of this approach to power calculations in logistic regression and Cox regression models. Several different covariate distributions are considered for selected values of the overall response probability and a range of alternatives. For the Cox regression model we consider both constant and non-constant hazard rates. The results show that in general the approach is remarkably accurate even in relatively small samples. Some discrepancies are, however, found in small samples with few events and a highly skewed covariate distribution. Comparison with results based on alternative methods for logistic regression models with a single continuous covariate indicates that the proposed method is at least as good as its competitors. The method is easy to implement and therefore provides a simple way to extend the range of problems that can be covered by the usual formulas for power and sample size determination. Copyright 2004 John Wiley & Sons, Ltd.
Yokoo, Takeshi; Serai, Suraj D; Pirasteh, Ali; Bashir, Mustafa R; Hamilton, Gavin; Hernando, Diego; Hu, Houchun H; Hetterich, Holger; Kühn, Jens-Peter; Kukuk, Guido M; Loomba, Rohit; Middleton, Michael S; Obuchowski, Nancy A; Song, Ji Soo; Tang, An; Wu, Xinhuai; Reeder, Scott B; Sirlin, Claude B
2018-02-01
Purpose To determine the linearity, bias, and precision of hepatic proton density fat fraction (PDFF) measurements by using magnetic resonance (MR) imaging across different field strengths, imager manufacturers, and reconstruction methods. Materials and Methods This meta-analysis was performed in accordance with Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines. A systematic literature search identified studies that evaluated the linearity and/or bias of hepatic PDFF measurements by using MR imaging (hereafter, MR imaging-PDFF) against PDFF measurements by using colocalized MR spectroscopy (hereafter, MR spectroscopy-PDFF) or the precision of MR imaging-PDFF. The quality of each study was evaluated by using the Quality Assessment of Studies of Diagnostic Accuracy 2 tool. De-identified original data sets from the selected studies were pooled. Linearity was evaluated by using linear regression between MR imaging-PDFF and MR spectroscopy-PDFF measurements. Bias, defined as the mean difference between MR imaging-PDFF and MR spectroscopy-PDFF measurements, was evaluated by using Bland-Altman analysis. Precision, defined as the agreement between repeated MR imaging-PDFF measurements, was evaluated by using a linear mixed-effects model, with field strength, imager manufacturer, reconstruction method, and region of interest as random effects. Results Twenty-three studies (1679 participants) were selected for linearity and bias analyses and 11 studies (425 participants) were selected for precision analyses. MR imaging-PDFF was linear with MR spectroscopy-PDFF (R 2 = 0.96). Regression slope (0.97; P < .001) and mean Bland-Altman bias (-0.13%; 95% limits of agreement: -3.95%, 3.40%) indicated minimal underestimation by using MR imaging-PDFF. MR imaging-PDFF was precise at the region-of-interest level, with repeatability and reproducibility coefficients of 2.99% and 4.12%, respectively. Field strength, imager manufacturer, and reconstruction method each had minimal effects on reproducibility. Conclusion MR imaging-PDFF has excellent linearity, bias, and precision across different field strengths, imager manufacturers, and reconstruction methods. © RSNA, 2017 Online supplemental material is available for this article. An earlier incorrect version of this article appeared online. This article was corrected on October 2, 2017.
Selection of higher order regression models in the analysis of multi-factorial transcription data.
Prazeres da Costa, Olivia; Hoffman, Arthur; Rey, Johannes W; Mansmann, Ulrich; Buch, Thorsten; Tresch, Achim
2014-01-01
Many studies examine gene expression data that has been obtained under the influence of multiple factors, such as genetic background, environmental conditions, or exposure to diseases. The interplay of multiple factors may lead to effect modification and confounding. Higher order linear regression models can account for these effects. We present a new methodology for linear model selection and apply it to microarray data of bone marrow-derived macrophages. This experiment investigates the influence of three variable factors: the genetic background of the mice from which the macrophages were obtained, Yersinia enterocolitica infection (two strains, and a mock control), and treatment/non-treatment with interferon-γ. We set up four different linear regression models in a hierarchical order. We introduce the eruption plot as a new practical tool for model selection complementary to global testing. It visually compares the size and significance of effect estimates between two nested models. Using this methodology we were able to select the most appropriate model by keeping only relevant factors showing additional explanatory power. Application to experimental data allowed us to qualify the interaction of factors as either neutral (no interaction), alleviating (co-occurring effects are weaker than expected from the single effects), or aggravating (stronger than expected). We find a biologically meaningful gene cluster of putative C2TA target genes that appear to be co-regulated with MHC class II genes. We introduced the eruption plot as a tool for visual model comparison to identify relevant higher order interactions in the analysis of expression data obtained under the influence of multiple factors. We conclude that model selection in higher order linear regression models should generally be performed for the analysis of multi-factorial microarray data.
NASA Astrophysics Data System (ADS)
Wellen, Christopher; Arhonditsis, George B.; Labencki, Tanya; Boyd, Duncan
2012-10-01
Regression-type, hybrid empirical/process-based models (e.g., SPARROW, PolFlow) have assumed a prominent role in efforts to estimate the sources and transport of nutrient pollution at river basin scales. However, almost no attempts have been made to explicitly accommodate interannual nutrient loading variability in their structure, despite empirical and theoretical evidence indicating that the associated source/sink processes are quite variable at annual timescales. In this study, we present two methodological approaches to accommodate interannual variability with the Spatially Referenced Regressions on Watershed attributes (SPARROW) nonlinear regression model. The first strategy uses the SPARROW model to estimate a static baseline load and climatic variables (e.g., precipitation) to drive the interannual variability. The second approach allows the source/sink processes within the SPARROW model to vary at annual timescales using dynamic parameter estimation techniques akin to those used in dynamic linear models. Model parameterization is founded upon Bayesian inference techniques that explicitly consider calibration data and model uncertainty. Our case study is the Hamilton Harbor watershed, a mixed agricultural and urban residential area located at the western end of Lake Ontario, Canada. Our analysis suggests that dynamic parameter estimation is the more parsimonious of the two strategies tested and can offer insights into the temporal structural changes associated with watershed functioning. Consistent with empirical and theoretical work, model estimated annual in-stream attenuation rates varied inversely with annual discharge. Estimated phosphorus source areas were concentrated near the receiving water body during years of high in-stream attenuation and dispersed along the main stems of the streams during years of low attenuation, suggesting that nutrient source areas are subject to interannual variability.
Su, Liyun; Zhao, Yanyong; Yan, Tianshun; Li, Fenglan
2012-01-01
Multivariate local polynomial fitting is applied to the multivariate linear heteroscedastic regression model. Firstly, the local polynomial fitting is applied to estimate heteroscedastic function, then the coefficients of regression model are obtained by using generalized least squares method. One noteworthy feature of our approach is that we avoid the testing for heteroscedasticity by improving the traditional two-stage method. Due to non-parametric technique of local polynomial estimation, it is unnecessary to know the form of heteroscedastic function. Therefore, we can improve the estimation precision, when the heteroscedastic function is unknown. Furthermore, we verify that the regression coefficients is asymptotic normal based on numerical simulations and normal Q-Q plots of residuals. Finally, the simulation results and the local polynomial estimation of real data indicate that our approach is surely effective in finite-sample situations.
Martin, Guillaume; Magne, Marie-Angélina; Cristobal, Magali San
2017-01-01
The need to adapt to decrease farm vulnerability to adverse contextual events has been extensively discussed on a theoretical basis. We developed an integrated and operational method to assess farm vulnerability to multiple and interacting contextual changes and explain how this vulnerability can best be reduced according to farm configurations and farmers' technical adaptations over time. Our method considers farm vulnerability as a function of the raw measurements of vulnerability variables (e.g., economic efficiency of production), the slope of the linear regression of these measurements over time, and the residuals of this linear regression. The last two are extracted from linear mixed models considering a random regression coefficient (an intercept common to all farms), a global trend (a slope common to all farms), a random deviation from the general mean for each farm, and a random deviation from the general trend for each farm. Among all possible combinations, the lowest farm vulnerability is obtained through a combination of high values of measurements, a stable or increasing trend and low variability for all vulnerability variables considered. Our method enables relating the measurements, trends and residuals of vulnerability variables to explanatory variables that illustrate farm exposure to climatic and economic variability, initial farm configurations and farmers' technical adaptations over time. We applied our method to 19 cattle (beef, dairy, and mixed) farms over the period 2008-2013. Selected vulnerability variables, i.e., farm productivity and economic efficiency, varied greatly among cattle farms and across years, with means ranging from 43.0 to 270.0 kg protein/ha and 29.4-66.0% efficiency, respectively. No farm had a high level, stable or increasing trend and low residuals for both farm productivity and economic efficiency of production. Thus, the least vulnerable farms represented a compromise among measurement value, trend, and variability of both performances. No specific combination of farmers' practices emerged for reducing cattle farm vulnerability to climatic and economic variability. In the least vulnerable farms, the practices implemented (stocking rate, input use…) were more consistent with the objective of developing the properties targeted (efficiency, robustness…). Our method can be used to support farmers with sector-specific and local insights about most promising farm adaptations.
Martin, Guillaume; Magne, Marie-Angélina; Cristobal, Magali San
2017-01-01
The need to adapt to decrease farm vulnerability to adverse contextual events has been extensively discussed on a theoretical basis. We developed an integrated and operational method to assess farm vulnerability to multiple and interacting contextual changes and explain how this vulnerability can best be reduced according to farm configurations and farmers’ technical adaptations over time. Our method considers farm vulnerability as a function of the raw measurements of vulnerability variables (e.g., economic efficiency of production), the slope of the linear regression of these measurements over time, and the residuals of this linear regression. The last two are extracted from linear mixed models considering a random regression coefficient (an intercept common to all farms), a global trend (a slope common to all farms), a random deviation from the general mean for each farm, and a random deviation from the general trend for each farm. Among all possible combinations, the lowest farm vulnerability is obtained through a combination of high values of measurements, a stable or increasing trend and low variability for all vulnerability variables considered. Our method enables relating the measurements, trends and residuals of vulnerability variables to explanatory variables that illustrate farm exposure to climatic and economic variability, initial farm configurations and farmers’ technical adaptations over time. We applied our method to 19 cattle (beef, dairy, and mixed) farms over the period 2008–2013. Selected vulnerability variables, i.e., farm productivity and economic efficiency, varied greatly among cattle farms and across years, with means ranging from 43.0 to 270.0 kg protein/ha and 29.4–66.0% efficiency, respectively. No farm had a high level, stable or increasing trend and low residuals for both farm productivity and economic efficiency of production. Thus, the least vulnerable farms represented a compromise among measurement value, trend, and variability of both performances. No specific combination of farmers’ practices emerged for reducing cattle farm vulnerability to climatic and economic variability. In the least vulnerable farms, the practices implemented (stocking rate, input use…) were more consistent with the objective of developing the properties targeted (efficiency, robustness…). Our method can be used to support farmers with sector-specific and local insights about most promising farm adaptations. PMID:28900435
Magezi, David A
2015-01-01
Linear mixed-effects models (LMMs) are increasingly being used for data analysis in cognitive neuroscience and experimental psychology, where within-participant designs are common. The current article provides an introductory review of the use of LMMs for within-participant data analysis and describes a free, simple, graphical user interface (LMMgui). LMMgui uses the package lme4 (Bates et al., 2014a,b) in the statistical environment R (R Core Team).
Huang, Jian; Zhang, Cun-Hui
2013-01-01
The ℓ1-penalized method, or the Lasso, has emerged as an important tool for the analysis of large data sets. Many important results have been obtained for the Lasso in linear regression which have led to a deeper understanding of high-dimensional statistical problems. In this article, we consider a class of weighted ℓ1-penalized estimators for convex loss functions of a general form, including the generalized linear models. We study the estimation, prediction, selection and sparsity properties of the weighted ℓ1-penalized estimator in sparse, high-dimensional settings where the number of predictors p can be much larger than the sample size n. Adaptive Lasso is considered as a special case. A multistage method is developed to approximate concave regularized estimation by applying an adaptive Lasso recursively. We provide prediction and estimation oracle inequalities for single- and multi-stage estimators, a general selection consistency theorem, and an upper bound for the dimension of the Lasso estimator. Important models including the linear regression, logistic regression and log-linear models are used throughout to illustrate the applications of the general results. PMID:24348100
Imatoh, Takuya; Kamimura, Seiichiro; Miyazaki, Motonobu
2017-03-01
It has been reported that adipocytes secrete vascular endothelial growth factor. Therefore, we conducted a 5-year longitudinal epidemiological study to further elucidate the association between vascular endothelial growth factor levels and temporal changes in body mass index. Our study subjects were Japanese male workers, who had regular health check-ups. Vascular endothelial growth factor levels were measured at baseline. To examine the association between vascular endothelial growth factor levels and overweight, we calculated the odds ratio using a multivariate logistic regression model. Moreover, linear mixed effect models were used to assess the association between vascular endothelial growth factor level and temporal changes in body mass index during the 5-year follow-up period. Vascular endothelial growth factor levels were marginally higher in subjects with a body mass index greater than 25 kg/m 2 compared with in those with a body mass index less than 25 kg/m 2 (505.4 vs. 465.5 pg/mL, P = 0.1) and were weakly correlated with leptin levels (β: 0.05, P = 0.07). In multivariate logistic regression, subjects in the highest vascular endothelial growth factor quantile were significantly associated with an increased risk for overweight compared with those in the lowest quantile (odds ratio 1.65, 95 % confidential interval: 1.10-2.50). Moreover P for trend was significant (P for trend = 0.003). However, the linear mixed effect model revealed that vascular endothelial growth factor levels were not associated with changes in body mass index over a 5-year period (quantile 2, β: 0.06, P = 0.46; quantile 3, β: -0.06, P = 0.45; quantile 4, β: -0.10, P = 0.22; quantile 1 as reference). Our results suggested that high vascular endothelial growth factor levels were significantly associated with overweight in Japanese males but high vascular endothelial growth factor levels did not necessarily cause obesity.
Genomic prediction based on data from three layer lines using non-linear regression models.
Huang, Heyun; Windig, Jack J; Vereijken, Addie; Calus, Mario P L
2014-11-06
Most studies on genomic prediction with reference populations that include multiple lines or breeds have used linear models. Data heterogeneity due to using multiple populations may conflict with model assumptions used in linear regression methods. In an attempt to alleviate potential discrepancies between assumptions of linear models and multi-population data, two types of alternative models were used: (1) a multi-trait genomic best linear unbiased prediction (GBLUP) model that modelled trait by line combinations as separate but correlated traits and (2) non-linear models based on kernel learning. These models were compared to conventional linear models for genomic prediction for two lines of brown layer hens (B1 and B2) and one line of white hens (W1). The three lines each had 1004 to 1023 training and 238 to 240 validation animals. Prediction accuracy was evaluated by estimating the correlation between observed phenotypes and predicted breeding values. When the training dataset included only data from the evaluated line, non-linear models yielded at best a similar accuracy as linear models. In some cases, when adding a distantly related line, the linear models showed a slight decrease in performance, while non-linear models generally showed no change in accuracy. When only information from a closely related line was used for training, linear models and non-linear radial basis function (RBF) kernel models performed similarly. The multi-trait GBLUP model took advantage of the estimated genetic correlations between the lines. Combining linear and non-linear models improved the accuracy of multi-line genomic prediction. Linear models and non-linear RBF models performed very similarly for genomic prediction, despite the expectation that non-linear models could deal better with the heterogeneous multi-population data. This heterogeneity of the data can be overcome by modelling trait by line combinations as separate but correlated traits, which avoids the occasional occurrence of large negative accuracies when the evaluated line was not included in the training dataset. Furthermore, when using a multi-line training dataset, non-linear models provided information on the genotype data that was complementary to the linear models, which indicates that the underlying data distributions of the three studied lines were indeed heterogeneous.
Finite mixture models for the computation of isotope ratios in mixed isotopic samples
NASA Astrophysics Data System (ADS)
Koffler, Daniel; Laaha, Gregor; Leisch, Friedrich; Kappel, Stefanie; Prohaska, Thomas
2013-04-01
Finite mixture models have been used for more than 100 years, but have seen a real boost in popularity over the last two decades due to the tremendous increase in available computing power. The areas of application of mixture models range from biology and medicine to physics, economics and marketing. These models can be applied to data where observations originate from various groups and where group affiliations are not known, as is the case for multiple isotope ratios present in mixed isotopic samples. Recently, the potential of finite mixture models for the computation of 235U/238U isotope ratios from transient signals measured in individual (sub-)µm-sized particles by laser ablation - multi-collector - inductively coupled plasma mass spectrometry (LA-MC-ICPMS) was demonstrated by Kappel et al. [1]. The particles, which were deposited on the same substrate, were certified with respect to their isotopic compositions. Here, we focus on the statistical model and its application to isotope data in ecogeochemistry. Commonly applied evaluation approaches for mixed isotopic samples are time-consuming and are dependent on the judgement of the analyst. Thus, isotopic compositions may be overlooked due to the presence of more dominant constituents. Evaluation using finite mixture models can be accomplished unsupervised and automatically. The models try to fit several linear models (regression lines) to subgroups of data taking the respective slope as estimation for the isotope ratio. The finite mixture models are parameterised by: • The number of different ratios. • Number of points belonging to each ratio-group. • The ratios (i.e. slopes) of each group. Fitting of the parameters is done by maximising the log-likelihood function using an iterative expectation-maximisation (EM) algorithm. In each iteration step, groups of size smaller than a control parameter are dropped; thereby the number of different ratios is determined. The analyst only influences some control parameters of the algorithm, i.e. the maximum count of ratios, the minimum relative group-size of data points belonging to each ratio has to be defined. Computation of the models can be done with statistical software. In this study Leisch and Grün's flexmix package [2] for the statistical open-source software R was applied. A code example is available in the electronic supplementary material of Kappel et al. [1]. In order to demonstrate the usefulness of finite mixture models in fields dealing with the computation of multiple isotope ratios in mixed samples, a transparent example based on simulated data is presented and problems regarding small group-sizes are illustrated. In addition, the application of finite mixture models to isotope ratio data measured in uranium oxide particles is shown. The results indicate that finite mixture models perform well in computing isotope ratios relative to traditional estimation procedures and can be recommended for more objective and straightforward calculation of isotope ratios in geochemistry than it is current practice. [1] S. Kappel, S. Boulyga, L. Dorta, D. Günther, B. Hattendorf, D. Koffler, G. Laaha, F. Leisch and T. Prohaska: Evaluation Strategies for Isotope Ratio Measurements of Single Particles by LA-MC-ICPMS, Analytical and Bioanalytical Chemistry, 2013, accepted for publication on 2012-12-18 (doi: 10.1007/s00216-012-6674-3) [2] B. Grün and F. Leisch: Fitting finite mixtures of generalized linear regressions in R. Computational Statistics & Data Analysis, 51(11), 5247-5252, 2007. (doi:10.1016/j.csda.2006.08.014)
A Demonstration of Regression False Positive Selection in Data Mining
ERIC Educational Resources Information Center
Pinder, Jonathan P.
2014-01-01
Business analytics courses, such as marketing research, data mining, forecasting, and advanced financial modeling, have substantial predictive modeling components. The predictive modeling in these courses requires students to estimate and test many linear regressions. As a result, false positive variable selection ("type I errors") is…
Real longitudinal data analysis for real people: building a good enough mixed model.
Cheng, Jing; Edwards, Lloyd J; Maldonado-Molina, Mildred M; Komro, Kelli A; Muller, Keith E
2010-02-20
Mixed effects models have become very popular, especially for the analysis of longitudinal data. One challenge is how to build a good enough mixed effects model. In this paper, we suggest a systematic strategy for addressing this challenge and introduce easily implemented practical advice to build mixed effects models. A general discussion of the scientific strategies motivates the recommended five-step procedure for model fitting. The need to model both the mean structure (the fixed effects) and the covariance structure (the random effects and residual error) creates the fundamental flexibility and complexity. Some very practical recommendations help to conquer the complexity. Centering, scaling, and full-rank coding of all the predictor variables radically improve the chances of convergence, computing speed, and numerical accuracy. Applying computational and assumption diagnostics from univariate linear models to mixed model data greatly helps to detect and solve the related computational problems. Applying computational and assumption diagnostics from the univariate linear models to the mixed model data can radically improve the chances of convergence, computing speed, and numerical accuracy. The approach helps to fit more general covariance models, a crucial step in selecting a credible covariance model needed for defensible inference. A detailed demonstration of the recommended strategy is based on data from a published study of a randomized trial of a multicomponent intervention to prevent young adolescents' alcohol use. The discussion highlights a need for additional covariance and inference tools for mixed models. The discussion also highlights the need for improving how scientists and statisticians teach and review the process of finding a good enough mixed model. (c) 2009 John Wiley & Sons, Ltd.
NASA Astrophysics Data System (ADS)
Martínez-Fernández, J.; Chuvieco, E.; Koutsias, N.
2013-02-01
Humans are responsible for most forest fires in Europe, but anthropogenic factors behind these events are still poorly understood. We tried to identify the driving factors of human-caused fire occurrence in Spain by applying two different statistical approaches. Firstly, assuming stationary processes for the whole country, we created models based on multiple linear regression and binary logistic regression to find factors associated with fire density and fire presence, respectively. Secondly, we used geographically weighted regression (GWR) to better understand and explore the local and regional variations of those factors behind human-caused fire occurrence. The number of human-caused fires occurring within a 25-yr period (1983-2007) was computed for each of the 7638 Spanish mainland municipalities, creating a binary variable (fire/no fire) to develop logistic models, and a continuous variable (fire density) to build standard linear regression models. A total of 383 657 fires were registered in the study dataset. The binary logistic model, which estimates the probability of having/not having a fire, successfully classified 76.4% of the total observations, while the ordinary least squares (OLS) regression model explained 53% of the variation of the fire density patterns (adjusted R2 = 0.53). Both approaches confirmed, in addition to forest and climatic variables, the importance of variables related with agrarian activities, land abandonment, rural population exodus and developmental processes as underlying factors of fire occurrence. For the GWR approach, the explanatory power of the GW linear model for fire density using an adaptive bandwidth increased from 53% to 67%, while for the GW logistic model the correctly classified observations improved only slightly, from 76.4% to 78.4%, but significantly according to the corrected Akaike Information Criterion (AICc), from 3451.19 to 3321.19. The results from GWR indicated a significant spatial variation in the local parameter estimates for all the variables and an important reduction of the autocorrelation in the residuals of the GW linear model. Despite the fitting improvement of local models, GW regression, more than an alternative to "global" or traditional regression modelling, seems to be a valuable complement to explore the non-stationary relationships between the response variable and the explanatory variables. The synergy of global and local modelling provides insights into fire management and policy and helps further our understanding of the fire problem over large areas while at the same time recognizing its local character.
ERIC Educational Resources Information Center
Daniels, Amy M.; Mandell, David S.
2013-01-01
This study estimated compliance with American Academy of Pediatrics (AAP) guidelines for well-child care and the association between compliance and age at diagnosis in a national sample of Medicaid-enrolled children with autism (N = 1,475). Mixed effects linear regression was used to assess the relationship between compliance and age at diagnosis.…
Wennberg, Alexandra M V; Hagen, Clinton E; Edwards, Kelly; Roberts, Rosebud O; Machulda, Mary M; Knopman, David S; Petersen, Ronald C; Mielke, Michelle M
2018-06-05
To determine the cross-sectional and longitudinal associations between diabetes treatment type and cognitive outcomes among type II diabetics. We examined the association between metformin use, as compared to other diabetic treatment (ie, insulin, other oral medications, and diet/exercise) and cognitive test performance and mild cognitive impairment (MCI) diagnosis among 508 cognitively unimpaired at baseline type II diabetics enrolled in the Mayo Clinic Study of Aging. We created propensity scores to adjust for treatment effects. We used multivariate linear and logistic regression models to investigate the cross-sectional association between treatment type and cognitive test z scores, respectively. Mixed effects models and competing risk regression models were used to determine the longitudinal association between treatment type and change in cognitive test z scores and risk of developing incident MCI. In linear regression analyses adjusted for age, sex, education, body mass index, APOE ε4, insulin treatment, medical comorbidities, number of medications, duration of diabetes, and propensity score, we did not observe an association between metformin use and cognitive test performance. Additionally, we did not observe an association between metformin use and cognitive test performance over time (median = 3.7-year follow-up). Metformin was associated with an increased risk of MCI (subhazard ratio (SHR) = 2.75; 95% CI = 1.64, 4.63, P < .001). Similarly, other oral medications (SHR = 1.96; 95% CI = 1.19, 3.25; P = .009) and insulin (SHR = 3.17; 95% CI = 1.27, 7.92; P = .014) use were also associated with risk of MCI diagnosis. These findings suggest that metformin use, as compared to management of diabetes with other treatments, is not associated with cognitive test performance. However, metformin was associated with incident MCI diagnosis. Copyright © 2018 John Wiley & Sons, Ltd.
Damman, Olga C; Stubbe, Janine H; Hendriks, Michelle; Arah, Onyebuchi A; Spreeuwenberg, Peter; Delnoij, Diana M J; Groenewegen, Peter P
2009-04-01
Ratings on the quality of healthcare from the consumer's perspective need to be adjusted for consumer characteristics to ensure fair and accurate comparisons between healthcare providers or health plans. Although multilevel analysis is already considered an appropriate method for analyzing healthcare performance data, it has rarely been used to assess case-mix adjustment of such data. The purpose of this article is to investigate whether multilevel regression analysis is a useful tool to detect case-mix adjusters in consumer assessment of healthcare. We used data on 11,539 consumers from 27 Dutch health plans, which were collected using the Dutch Consumer Quality Index health plan instrument. We conducted multilevel regression analyses of consumers' responses nested within health plans to assess the effects of consumer characteristics on consumer experience. We compared our findings to the results of another methodology: the impact factor approach, which combines the predictive effect of each case-mix variable with its heterogeneity across health plans. Both multilevel regression and impact factor analyses showed that age and education were the most important case-mix adjusters for consumer experience and ratings of health plans. With the exception of age, case-mix adjustment had little impact on the ranking of health plans. On both theoretical and practical grounds, multilevel modeling is useful for adequate case-mix adjustment and analysis of performance ratings.
Wittkopp, Felix; Peeck, Lars; Hafner, Mathias; Frech, Christian
2018-04-13
Process development and characterization based on mathematic modeling provides several advantages and has been applied more frequently over the last few years. In this work, a Donnan equilibrium ion exchange (DIX) model is applied for modelling and simulation of ion exchange chromatography of a monoclonal antibody in linear chromatography. Four different cation exchange resin prototypes consisting of weak, strong and mixed ligands are characterized using pH and salt gradient elution experiments applying the extended DIX model. The modelling results are compared with the results using a classic stoichiometric displacement model. The Donnan equilibrium model is able to describe all four prototype resins while the stoichiometric displacement model fails for the weak and mixed weak/strong ligands. Finally, in silico chromatogram simulations of pH and pH/salt dual gradients are performed to verify the results and to show the consistency of the developed model. Copyright © 2018 Elsevier B.V. All rights reserved.
Wu, Baolin
2006-02-15
Differential gene expression detection and sample classification using microarray data have received much research interest recently. Owing to the large number of genes p and small number of samples n (p > n), microarray data analysis poses big challenges for statistical analysis. An obvious problem owing to the 'large p small n' is over-fitting. Just by chance, we are likely to find some non-differentially expressed genes that can classify the samples very well. The idea of shrinkage is to regularize the model parameters to reduce the effects of noise and produce reliable inferences. Shrinkage has been successfully applied in the microarray data analysis. The SAM statistics proposed by Tusher et al. and the 'nearest shrunken centroid' proposed by Tibshirani et al. are ad hoc shrinkage methods. Both methods are simple, intuitive and prove to be useful in empirical studies. Recently Wu proposed the penalized t/F-statistics with shrinkage by formally using the (1) penalized linear regression models for two-class microarray data, showing good performance. In this paper we systematically discussed the use of penalized regression models for analyzing microarray data. We generalize the two-class penalized t/F-statistics proposed by Wu to multi-class microarray data. We formally derive the ad hoc shrunken centroid used by Tibshirani et al. using the (1) penalized regression models. And we show that the penalized linear regression models provide a rigorous and unified statistical framework for sample classification and differential gene expression detection.
Accounting for the relationship between per diem cost and LOS when estimating hospitalization costs.
Ishak, K Jack; Stolar, Marilyn; Hu, Ming-yi; Alvarez, Piedad; Wang, Yamei; Getsios, Denis; Williams, Gregory C
2012-12-01
Hospitalization costs in clinical trials are typically derived by multiplying the length of stay (LOS) by an average per-diem (PD) cost from external sources. This assumes that PD costs are independent of LOS. Resource utilization in early days of the stay is usually more intense, however, and thus, the PD cost for a short hospitalization may be higher than for longer stays. The shape of this relationship is unlikely to be linear, as PD costs would be expected to gradually plateau. This paper describes how to model the relationship between PD cost and LOS using flexible statistical modelling techniques. An example based on a clinical study of clevidipine for the treatment of peri-operative hypertension during hospitalizations for cardiac surgery is used to illustrate how inferences about cost-savings associated with good blood pressure (BP) control during the stay can be affected by the approach used to derive hospitalization costs.Data on the cost and LOS of hospitalizations for coronary artery bypass grafting (CABG) from the Massachusetts Acute Hospital Case Mix Database (the MA Case Mix Database) were analyzed to link LOS to PD cost, factoring in complications that may have occurred during the hospitalization or post-discharge. The shape of the relationship between LOS and PD costs in the MA Case Mix was explored graphically in a regression framework. A series of statistical models including those based on simple logarithmic transformation of LOS to more flexible models using LOcally wEighted Scatterplot Smoothing (LOESS) techniques were considered. A final model was selected, using simplicity and parsimony as guiding principles in addition traditional fit statistics (like Akaike's Information Criterion, or AIC). This mapping was applied in ECLIPSE to predict an LOS-specific PD cost, and then a total cost of hospitalization. These were then compared for patients who had good vs. poor peri-operative blood-pressure control. The MA Case Mix dataset included data from over 10,000 patients. Visual inspection of PD vs. LOS revealed a non-linear relationship. A logarithmic model and a series of LOESS and piecewise-linear models with varying connection points were tested. The logarithmic model was ultimately favoured for its fit and simplicity. Using this mapping in the ECLIPSE trials, we found that good peri-operative BP control was associated with a cost savings of $5,366 when costs were derived using the mapping, compared with savings of $7,666 obtained using the traditional approach of calculating the cost. PD costs vary systematically with LOS, with short stays being associated with high PD costs that drop gradually and level off. The shape of the relationship may differ in other settings. It is important to assess this and model the observed pattern, as this may have an impact on conclusions based on derived hospitalization costs.
Accounting for the relationship between per diem cost and LOS when estimating hospitalization costs
2012-01-01
Background Hospitalization costs in clinical trials are typically derived by multiplying the length of stay (LOS) by an average per-diem (PD) cost from external sources. This assumes that PD costs are independent of LOS. Resource utilization in early days of the stay is usually more intense, however, and thus, the PD cost for a short hospitalization may be higher than for longer stays. The shape of this relationship is unlikely to be linear, as PD costs would be expected to gradually plateau. This paper describes how to model the relationship between PD cost and LOS using flexible statistical modelling techniques. Methods An example based on a clinical study of clevidipine for the treatment of peri-operative hypertension during hospitalizations for cardiac surgery is used to illustrate how inferences about cost-savings associated with good blood pressure (BP) control during the stay can be affected by the approach used to derive hospitalization costs. Data on the cost and LOS of hospitalizations for coronary artery bypass grafting (CABG) from the Massachusetts Acute Hospital Case Mix Database (the MA Case Mix Database) were analyzed to link LOS to PD cost, factoring in complications that may have occurred during the hospitalization or post-discharge. The shape of the relationship between LOS and PD costs in the MA Case Mix was explored graphically in a regression framework. A series of statistical models including those based on simple logarithmic transformation of LOS to more flexible models using LOcally wEighted Scatterplot Smoothing (LOESS) techniques were considered. A final model was selected, using simplicity and parsimony as guiding principles in addition traditional fit statistics (like Akaike’s Information Criterion, or AIC). This mapping was applied in ECLIPSE to predict an LOS-specific PD cost, and then a total cost of hospitalization. These were then compared for patients who had good vs. poor peri-operative blood-pressure control. Results The MA Case Mix dataset included data from over 10,000 patients. Visual inspection of PD vs. LOS revealed a non-linear relationship. A logarithmic model and a series of LOESS and piecewise-linear models with varying connection points were tested. The logarithmic model was ultimately favoured for its fit and simplicity. Using this mapping in the ECLIPSE trials, we found that good peri-operative BP control was associated with a cost savings of $5,366 when costs were derived using the mapping, compared with savings of $7,666 obtained using the traditional approach of calculating the cost. Conclusions PD costs vary systematically with LOS, with short stays being associated with high PD costs that drop gradually and level off. The shape of the relationship may differ in other settings. It is important to assess this and model the observed pattern, as this may have an impact on conclusions based on derived hospitalization costs. PMID:23198908
Analyzing longitudinal data with the linear mixed models procedure in SPSS.
West, Brady T
2009-09-01
Many applied researchers analyzing longitudinal data share a common misconception: that specialized statistical software is necessary to fit hierarchical linear models (also known as linear mixed models [LMMs], or multilevel models) to longitudinal data sets. Although several specialized statistical software programs of high quality are available that allow researchers to fit these models to longitudinal data sets (e.g., HLM), rapid advances in general purpose statistical software packages have recently enabled analysts to fit these same models when using preferred packages that also enable other more common analyses. One of these general purpose statistical packages is SPSS, which includes a very flexible and powerful procedure for fitting LMMs to longitudinal data sets with continuous outcomes. This article aims to present readers with a practical discussion of how to analyze longitudinal data using the LMMs procedure in the SPSS statistical software package.
Non-Asymptotic Oracle Inequalities for the High-Dimensional Cox Regression via Lasso.
Kong, Shengchun; Nan, Bin
2014-01-01
We consider finite sample properties of the regularized high-dimensional Cox regression via lasso. Existing literature focuses on linear models or generalized linear models with Lipschitz loss functions, where the empirical risk functions are the summations of independent and identically distributed (iid) losses. The summands in the negative log partial likelihood function for censored survival data, however, are neither iid nor Lipschitz.We first approximate the negative log partial likelihood function by a sum of iid non-Lipschitz terms, then derive the non-asymptotic oracle inequalities for the lasso penalized Cox regression using pointwise arguments to tackle the difficulties caused by lacking iid Lipschitz losses.
Non-Asymptotic Oracle Inequalities for the High-Dimensional Cox Regression via Lasso
Kong, Shengchun; Nan, Bin
2013-01-01
We consider finite sample properties of the regularized high-dimensional Cox regression via lasso. Existing literature focuses on linear models or generalized linear models with Lipschitz loss functions, where the empirical risk functions are the summations of independent and identically distributed (iid) losses. The summands in the negative log partial likelihood function for censored survival data, however, are neither iid nor Lipschitz.We first approximate the negative log partial likelihood function by a sum of iid non-Lipschitz terms, then derive the non-asymptotic oracle inequalities for the lasso penalized Cox regression using pointwise arguments to tackle the difficulties caused by lacking iid Lipschitz losses. PMID:24516328
Introduction to the use of regression models in epidemiology.
Bender, Ralf
2009-01-01
Regression modeling is one of the most important statistical techniques used in analytical epidemiology. By means of regression models the effect of one or several explanatory variables (e.g., exposures, subject characteristics, risk factors) on a response variable such as mortality or cancer can be investigated. From multiple regression models, adjusted effect estimates can be obtained that take the effect of potential confounders into account. Regression methods can be applied in all epidemiologic study designs so that they represent a universal tool for data analysis in epidemiology. Different kinds of regression models have been developed in dependence on the measurement scale of the response variable and the study design. The most important methods are linear regression for continuous outcomes, logistic regression for binary outcomes, Cox regression for time-to-event data, and Poisson regression for frequencies and rates. This chapter provides a nontechnical introduction to these regression models with illustrating examples from cancer research.
Turbulence closure for mixing length theories
NASA Astrophysics Data System (ADS)
Jermyn, Adam S.; Lesaffre, Pierre; Tout, Christopher A.; Chitre, Shashikumar M.
2018-05-01
We present an approach to turbulence closure based on mixing length theory with three-dimensional fluctuations against a two-dimensional background. This model is intended to be rapidly computable for implementation in stellar evolution software and to capture a wide range of relevant phenomena with just a single free parameter, namely the mixing length. We incorporate magnetic, rotational, baroclinic, and buoyancy effects exactly within the formalism of linear growth theories with non-linear decay. We treat differential rotation effects perturbatively in the corotating frame using a novel controlled approximation, which matches the time evolution of the reference frame to arbitrary order. We then implement this model in an efficient open source code and discuss the resulting turbulent stresses and transport coefficients. We demonstrate that this model exhibits convective, baroclinic, and shear instabilities as well as the magnetorotational instability. It also exhibits non-linear saturation behaviour, and we use this to extract the asymptotic scaling of various transport coefficients in physically interesting limits.
A model for prediction of color change after tooth bleaching based on CIELAB color space
NASA Astrophysics Data System (ADS)
Herrera, Luis J.; Santana, Janiley; Yebra, Ana; Rivas, María. José; Pulgar, Rosa; Pérez, María. M.
2017-08-01
An experimental study aiming to develop a model based on CIELAB color space for prediction of color change after a tooth bleaching procedure is presented. Multivariate linear regression models were obtained to predict the L*, a*, b* and W* post-bleaching values using the pre-bleaching L*, a*and b*values. Moreover, univariate linear regression models were obtained to predict the variation in chroma (C*), hue angle (h°) and W*. The results demonstrated that is possible to estimate color change when using a carbamide peroxide tooth-bleaching system. The models obtained can be applied in clinic to predict the colour change after bleaching.
Modeling Manpower and Equipment Productivity in Tall Building Construction Projects
NASA Astrophysics Data System (ADS)
Mudumbai Krishnaswamy, Parthasarathy; Rajiah, Murugasan; Vasan, Ramya
2017-12-01
Tall building construction projects involve two critical resources of manpower and equipment. Their usage, however, widely varies due to several factors affecting their productivity. Currently, no systematic study for estimating and increasing their productivity is available. What is prevalent is the use of empirical data, experience of similar projects and assumptions. As tall building projects are here to stay and increase, to meet the emerging demands in ever shrinking urban spaces, it is imperative to explore ways and means of scientific productivity models for basic construction activities: concrete, reinforcement, formwork, block work and plastering for the input of specific resources in a mixed environment of manpower and equipment usage. Data pertaining to 72 tall building projects in India were collected and analyzed. Then, suitable productivity estimation models were developed using multiple linear regression analysis and validated using independent field data. It is hoped that the models developed in the study will be useful for quantity surveyors, cost engineers and project managers to estimate productivity of resources in tall building projects.
Improvement of Storm Forecasts Using Gridded Bayesian Linear Regression for Northeast United States
NASA Astrophysics Data System (ADS)
Yang, J.; Astitha, M.; Schwartz, C. S.
2017-12-01
Bayesian linear regression (BLR) is a post-processing technique in which regression coefficients are derived and used to correct raw forecasts based on pairs of observation-model values. This study presents the development and application of a gridded Bayesian linear regression (GBLR) as a new post-processing technique to improve numerical weather prediction (NWP) of rain and wind storm forecasts over northeast United States. Ten controlled variables produced from ten ensemble members of the National Center for Atmospheric Research (NCAR) real-time prediction system are used for a GBLR model. In the GBLR framework, leave-one-storm-out cross-validation is utilized to study the performances of the post-processing technique in a database composed of 92 storms. To estimate the regression coefficients of the GBLR, optimization procedures that minimize the systematic and random error of predicted atmospheric variables (wind speed, precipitation, etc.) are implemented for the modeled-observed pairs of training storms. The regression coefficients calculated for meteorological stations of the National Weather Service are interpolated back to the model domain. An analysis of forecast improvements based on error reductions during the storms will demonstrate the value of GBLR approach. This presentation will also illustrate how the variances are optimized for the training partition in GBLR and discuss the verification strategy for grid points where no observations are available. The new post-processing technique is successful in improving wind speed and precipitation storm forecasts using past event-based data and has the potential to be implemented in real-time.
Shi, Liuhua; Liu, Pengfei; Kloog, Itai; Lee, Mihye; Kosheleva, Anna; Schwartz, Joel
2015-01-01
Accurate estimates of spatio-temporal resolved near-surface air temperature (Ta) are crucial for environmental epidemiological studies. However, values of Ta are conventionally obtained from weather stations, which have limited spatial coverage. Satellite surface temperature (Ts) measurements offer the possibility of local exposure estimates across large domains. The Southeastern United States has different climatic conditions, more small water bodies and wetlands, and greater humidity in contrast to other regions, which add to the challenge of modeling air temperature. In this study, we incorporated satellite Ts to estimate high resolution (1 km × 1 km) daily Ta across the southeastern USA for 2000-2014. We calibrated Ts to Ta measurements using mixed linear models, land use, and separate slopes for each day. A high out-of-sample cross-validated R2 of 0.952 indicated excellent model performance. When satellite Ts were unavailable, linear regression on nearby monitors and spatio-temporal smoothing was used to estimate Ta. The daily Ta estimations were compared to the NASA's Modern-Era Retrospective Analysis for Research and Applications (MERRA) model. A good agreement with an R2 of 0.969 and a mean squared prediction error (RMSPE) of 1.376 °C was achieved. Our results demonstrate that Ta can be reliably predicted using this Ts-based prediction model, even in a large geographical area with topography and weather patterns varying considerably. PMID:26717080
Karimi, Hamid Reza; Gao, Huijun
2008-07-01
A mixed H2/Hinfinity output-feedback control design methodology is presented in this paper for second-order neutral linear systems with time-varying state and input delays. Delay-dependent sufficient conditions for the design of a desired control are given in terms of linear matrix inequalities (LMIs). A controller, which guarantees asymptotic stability and a mixed H2/Hinfinity performance for the closed-loop system of the second-order neutral linear system, is then developed directly instead of coupling the model to a first-order neutral system. A Lyapunov-Krasovskii method underlies the LMI-based mixed H2/Hinfinity output-feedback control design using some free weighting matrices. The simulation results illustrate the effectiveness of the proposed methodology.
Modeling maximum daily temperature using a varying coefficient regression model
Han Li; Xinwei Deng; Dong-Yum Kim; Eric P. Smith
2014-01-01
Relationships between stream water and air temperatures are often modeled using linear or nonlinear regression methods. Despite a strong relationship between water and air temperatures and a variety of models that are effective for data summarized on a weekly basis, such models did not yield consistently good predictions for summaries such as daily maximum temperature...
Modeling indoor particulate exposures in inner city school classrooms
Gaffin, Jonathan M.; Petty, Carter R.; Hauptman, Marissa; Kang, Choong-Min; Wolfson, Jack M.; Awad, Yara Abu; Di, Qian; Lai, Peggy S.; Sheehan, William J.; Baxi, Sachin; Coull, Brent A.; Schwartz, Joel D.; Gold, Diane R.; Koutrakis, Petros; Phipatanakul, Wanda
2016-01-01
Outdoor air pollution penetrates buildings and contributes to total indoor exposures. We investigated the relationship of indoor to outdoor particulate matter in inner-city school classrooms. The School Inner City Asthma Study investigates the effect of classroom-based environmental exposures on students with asthma in the northeast United States. Mixed-effects linear models were used to determine the relationships between indoor PM2.5 and BC and their corresponding outdoor concentrations, and to develop a model for predicting exposures to these pollutants. The indoor-outdoor sulfur ratio was used as an infiltration factor of outdoor fine particles. Weeklong concentrations of PM2.5 and BC in 199 samples from 136 classrooms (30 school buildings) were compared to those measured at a central monitoring site averaged over the same timeframe. Mixed effects regression models found significant random intercept and slope effects, which indicate that: 1) there are important PM2.5 sources in classrooms; 2) the penetration of outdoor PM2.5 particles varies by school, and 3) the site-specific outside PM2.5 levels (inferred by the models) differ from those observed at the central monitor site. Similar results were found for BC except for lack of indoor sources. The fitted predictions from the sulfur-adjusted models were moderately predictive of observed indoor pollutant levels (Out of sample correlations: PM2.5: r2 = 0.68, BC; r2 = 0.61). Our results suggest that PM2.5 has important classroom sources, which vary by school. Furthermore, using these mixed effects models, classroom exposures can be accurately predicted for dates when central site measures are available but indoor measures are not available. PMID:27599884
Carbon dioxide stripping in aquaculture -- part III: model verification
Colt, John; Watten, Barnaby; Pfeiffer, Tim
2012-01-01
Based on conventional mass transfer models developed for oxygen, the use of the non-linear ASCE method, 2-point method, and one parameter linear-regression method were evaluated for carbon dioxide stripping data. For values of KLaCO2 < approximately 1.5/h, the 2-point or ASCE method are a good fit to experimental data, but the fit breaks down at higher values of KLaCO2. How to correct KLaCO2 for gas phase enrichment remains to be determined. The one-parameter linear regression model was used to vary the C*CO2 over the test, but it did not result in a better fit to the experimental data when compared to the ASCE or fixed C*CO2 assumptions.
Post-processing through linear regression
NASA Astrophysics Data System (ADS)
van Schaeybroeck, B.; Vannitsem, S.
2011-03-01
Various post-processing techniques are compared for both deterministic and ensemble forecasts, all based on linear regression between forecast data and observations. In order to evaluate the quality of the regression methods, three criteria are proposed, related to the effective correction of forecast error, the optimal variability of the corrected forecast and multicollinearity. The regression schemes under consideration include the ordinary least-square (OLS) method, a new time-dependent Tikhonov regularization (TDTR) method, the total least-square method, a new geometric-mean regression (GM), a recently introduced error-in-variables (EVMOS) method and, finally, a "best member" OLS method. The advantages and drawbacks of each method are clarified. These techniques are applied in the context of the 63 Lorenz system, whose model version is affected by both initial condition and model errors. For short forecast lead times, the number and choice of predictors plays an important role. Contrarily to the other techniques, GM degrades when the number of predictors increases. At intermediate lead times, linear regression is unable to provide corrections to the forecast and can sometimes degrade the performance (GM and the best member OLS with noise). At long lead times the regression schemes (EVMOS, TDTR) which yield the correct variability and the largest correlation between ensemble error and spread, should be preferred.
Cho, Sun-Joo; Goodwin, Amanda P
2016-04-01
When word learning is supported by instruction in experimental studies for adolescents, word knowledge outcomes tend to be collected from complex data structure, such as multiple aspects of word knowledge, multilevel reader data, multilevel item data, longitudinal design, and multiple groups. This study illustrates how generalized linear mixed models can be used to measure and explain word learning for data having such complexity. Results from this application provide deeper understanding of word knowledge than could be attained from simpler models and show that word knowledge is multidimensional and depends on word characteristics and instructional contexts.
Small area estimation for semicontinuous data.
Chandra, Hukum; Chambers, Ray
2016-03-01
Survey data often contain measurements for variables that are semicontinuous in nature, i.e. they either take a single fixed value (we assume this is zero) or they have a continuous, often skewed, distribution on the positive real line. Standard methods for small area estimation (SAE) based on the use of linear mixed models can be inefficient for such variables. We discuss SAE techniques for semicontinuous variables under a two part random effects model that allows for the presence of excess zeros as well as the skewed nature of the nonzero values of the response variable. In particular, we first model the excess zeros via a generalized linear mixed model fitted to the probability of a nonzero, i.e. strictly positive, value being observed, and then model the response, given that it is strictly positive, using a linear mixed model fitted on the logarithmic scale. Empirical results suggest that the proposed method leads to efficient small area estimates for semicontinuous data of this type. We also propose a parametric bootstrap method to estimate the MSE of the proposed small area estimator. These bootstrap estimates of the MSE are compared to the true MSE in a simulation study. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Dong, J Q; Zhang, X Y; Wang, S Z; Jiang, X F; Zhang, K; Ma, G W; Wu, M Q; Li, H; Zhang, H
2018-01-01
Plasma very low-density lipoprotein (VLDL) can be used to select for low body fat or abdominal fat (AF) in broilers, but its correlation with AF is limited. We investigated whether any other biochemical indicator can be used in combination with VLDL for a better selective effect. Nineteen plasma biochemical indicators were measured in male chickens from the Northeast Agricultural University broiler lines divergently selected for AF content (NEAUHLF) in the fed state at 46 and 48 d of age. The average concentration of every parameter for the 2 d was used for statistical analysis. Levels of these 19 plasma biochemical parameters were compared between the lean and fat lines. The phenotypic correlations between these plasma biochemical indicators and AF traits were analyzed. Then, multiple linear regression models were constructed to select the best model used for selecting against AF content. and the heritabilities of plasma indicators contained in the best models were estimated. The results showed that 11 plasma biochemical indicators (triglycerides, total bile acid, total protein, globulin, albumin/globulin, aspartate transaminase, alanine transaminase, gamma-glutamyl transpeptidase, uric acid, creatinine, and VLDL) differed significantly between the lean and fat lines (P < 0.01), and correlated significantly with AF traits (P < 0.05). The best multiple linear regression models based on albumin/globulin, VLDL, triglycerides, globulin, total bile acid, and uric acid, had higher R2 (0.73) than the model based only on VLDL (0.21). The plasma parameters included in the best models had moderate heritability estimates (0.21 ≤ h2 ≤ 0.43). These results indicate that these multiple linear regression models can be used to select for lean broiler chickens. © 2017 Poultry Science Association Inc.
Lopes, Marta B; Calado, Cecília R C; Figueiredo, Mário A T; Bioucas-Dias, José M
2017-06-01
The monitoring of biopharmaceutical products using Fourier transform infrared (FT-IR) spectroscopy relies on calibration techniques involving the acquisition of spectra of bioprocess samples along the process. The most commonly used method for that purpose is partial least squares (PLS) regression, under the assumption that a linear model is valid. Despite being successful in the presence of small nonlinearities, linear methods may fail in the presence of strong nonlinearities. This paper studies the potential usefulness of nonlinear regression methods for predicting, from in situ near-infrared (NIR) and mid-infrared (MIR) spectra acquired in high-throughput mode, biomass and plasmid concentrations in Escherichia coli DH5-α cultures producing the plasmid model pVAX-LacZ. The linear methods PLS and ridge regression (RR) are compared with their kernel (nonlinear) versions, kPLS and kRR, as well as with the (also nonlinear) relevance vector machine (RVM) and Gaussian process regression (GPR). For the systems studied, RR provided better predictive performances compared to the remaining methods. Moreover, the results point to further investigation based on larger data sets whenever differences in predictive accuracy between a linear method and its kernelized version could not be found. The use of nonlinear methods, however, shall be judged regarding the additional computational cost required to tune their additional parameters, especially when the less computationally demanding linear methods herein studied are able to successfully monitor the variables under study.
NASA Astrophysics Data System (ADS)
Johnson, Matthew S.; Yates, Emma L.; Iraci, Laura T.; Loewenstein, Max; Tadić, Jovan M.; Wecht, Kevin J.; Jeong, Seongeun; Fischer, Marc L.
2014-12-01
This study analyzes source apportioned methane (CH4) emissions and atmospheric mixing ratios in northern California during the Discover-AQ-CA field campaign using airborne measurement data and model simulations. Source apportioned CH4 emissions from the Emissions Database for Global Atmospheric Research (EDGAR) version 4.2 were applied in the 3-D chemical transport model GEOS-Chem and analyzed using airborne measurements taken as part of the Alpha Jet Atmospheric eXperiment over the San Francisco Bay Area (SFBA) and northern San Joaquin Valley (SJV). During the time period of the Discover-AQ-CA field campaign EDGAR inventory CH4 emissions were ∼5.30 Gg day-1 (Gg = 1.0 × 109 g) (equating to ∼1.90 × 103 Gg yr-1) for all of California. According to EDGAR, the SFBA and northern SJV region contributes ∼30% of total CH4 emissions from California. Source apportionment analysis during this study shows that CH4 mixing ratios over this area of northern California are largely influenced by global emissions from wetlands and local/global emissions from gas and oil production and distribution, waste treatment processes, and livestock management. Model simulations, using EDGAR emissions, suggest that the model under-estimates CH4 mixing ratios in northern California (average normalized mean bias (NMB) = -5.2% and linear regression slope = 0.20). The largest negative biases in the model were calculated on days when large amounts of CH4 were measured over local emission sources and atmospheric CH4 mixing ratios reached values >2.5 parts per million. Sensitivity emission studies conducted during this research suggest that local emissions of CH4 from livestock management processes are likely the primary source of the negative model bias. These results indicate that a variety, and larger quantity, of measurement data needs to be obtained and additional research is necessary to better quantify source apportioned CH4 emissions in California.
Johnson, Matthew S.; Yates, Emma L.; Iraci, Laura T.; ...
2014-12-01
This study analyzes source apportioned methane (CH 4) emissions and atmospheric mixing ratios in northern California during the Discover-AQ-CA field campaign using airborne measurement data and model simulations. Source apportioned CH 4 emissions from the Emissions Database for Global Atmospheric Research (EDGAR) version 4.2 were applied in the 3-D chemical transport model GEOS-Chem and analyzed using airborne measurements taken as part of the Alpha Jet Atmospheric eXperiment over the San Francisco Bay Area (SFBA) and northern San Joaquin Valley (SJV). During the time period of the Discover-AQ-CA field campaign EDGAR inventory CH 4 emissions were ~5.30 Gg day –1 (Ggmore » = 1.0 × 10 9 g) (equating to ~1.90 × 10 3 Gg yr –1) for all of California. According to EDGAR, the SFBA and northern SJV region contributes ~30% of total CH 4 emissions from California. Source apportionment analysis during this study shows that CH 4 mixing ratios over this area of northern California are largely influenced by global emissions from wetlands and local/global emissions from gas and oil production and distribution, waste treatment processes, and livestock management. Model simulations, using EDGAR emissions, suggest that the model under-estimates CH 4 mixing ratios in northern California (average normalized mean bias (NMB) = –5.2% and linear regression slope = 0.20). The largest negative biases in the model were calculated on days when large amounts of CH 4 were measured over local emission sources and atmospheric CH 4 mixing ratios reached values >2.5 parts per million. Sensitivity emission studies conducted during this research suggest that local emissions of CH 4 from livestock management processes are likely the primary source of the negative model bias. These results indicate that a variety, and larger quantity, of measurement data needs to be obtained and additional research is necessary to better quantify source apportioned CH 4 emissions in California.« less
Adjusted variable plots for Cox's proportional hazards regression model.
Hall, C B; Zeger, S L; Bandeen-Roche, K J
1996-01-01
Adjusted variable plots are useful in linear regression for outlier detection and for qualitative evaluation of the fit of a model. In this paper, we extend adjusted variable plots to Cox's proportional hazards model for possibly censored survival data. We propose three different plots: a risk level adjusted variable (RLAV) plot in which each observation in each risk set appears, a subject level adjusted variable (SLAV) plot in which each subject is represented by one point, and an event level adjusted variable (ELAV) plot in which the entire risk set at each failure event is represented by a single point. The latter two plots are derived from the RLAV by combining multiple points. In each point, the regression coefficient and standard error from a Cox proportional hazards regression is obtained by a simple linear regression through the origin fit to the coordinates of the pictured points. The plots are illustrated with a reanalysis of a dataset of 65 patients with multiple myeloma.
NASA Astrophysics Data System (ADS)
Sahabiev, I. A.; Ryazanov, S. S.; Kolcova, T. G.; Grigoryan, B. R.
2018-03-01
The three most common techniques to interpolate soil properties at a field scale—ordinary kriging (OK), regression kriging with multiple linear regression drift model (RK + MLR), and regression kriging with principal component regression drift model (RK + PCR)—were examined. The results of the performed study were compiled into an algorithm of choosing the most appropriate soil mapping technique. Relief attributes were used as the auxiliary variables. When spatial dependence of a target variable was strong, the OK method showed more accurate interpolation results, and the inclusion of the auxiliary data resulted in an insignificant improvement in prediction accuracy. According to the algorithm, the RK + PCR method effectively eliminates multicollinearity of explanatory variables. However, if the number of predictors is less than ten, the probability of multicollinearity is reduced, and application of the PCR becomes irrational. In that case, the multiple linear regression should be used instead.
NASA Astrophysics Data System (ADS)
Boucher, Thomas F.; Ozanne, Marie V.; Carmosino, Marco L.; Dyar, M. Darby; Mahadevan, Sridhar; Breves, Elly A.; Lepore, Kate H.; Clegg, Samuel M.
2015-05-01
The ChemCam instrument on the Mars Curiosity rover is generating thousands of LIBS spectra and bringing interest in this technique to public attention. The key to interpreting Mars or any other types of LIBS data are calibrations that relate laboratory standards to unknowns examined in other settings and enable predictions of chemical composition. Here, LIBS spectral data are analyzed using linear regression methods including partial least squares (PLS-1 and PLS-2), principal component regression (PCR), least absolute shrinkage and selection operator (lasso), elastic net, and linear support vector regression (SVR-Lin). These were compared against results from nonlinear regression methods including kernel principal component regression (K-PCR), polynomial kernel support vector regression (SVR-Py) and k-nearest neighbor (kNN) regression to discern the most effective models for interpreting chemical abundances from LIBS spectra of geological samples. The results were evaluated for 100 samples analyzed with 50 laser pulses at each of five locations averaged together. Wilcoxon signed-rank tests were employed to evaluate the statistical significance of differences among the nine models using their predicted residual sum of squares (PRESS) to make comparisons. For MgO, SiO2, Fe2O3, CaO, and MnO, the sparse models outperform all the others except for linear SVR, while for Na2O, K2O, TiO2, and P2O5, the sparse methods produce inferior results, likely because their emission lines in this energy range have lower transition probabilities. The strong performance of the sparse methods in this study suggests that use of dimensionality-reduction techniques as a preprocessing step may improve the performance of the linear models. Nonlinear methods tend to overfit the data and predict less accurately, while the linear methods proved to be more generalizable with better predictive performance. These results are attributed to the high dimensionality of the data (6144 channels) relative to the small number of samples studied. The best-performing models were SVR-Lin for SiO2, MgO, Fe2O3, and Na2O, lasso for Al2O3, elastic net for MnO, and PLS-1 for CaO, TiO2, and K2O. Although these differences in model performance between methods were identified, most of the models produce comparable results when p ≤ 0.05 and all techniques except kNN produced statistically-indistinguishable results. It is likely that a combination of models could be used together to yield a lower total error of prediction, depending on the requirements of the user.
Auras, Silke; Ostermann, Thomas; de Cruppé, Werner; Bitzer, Eva-Maria; Diel, Franziska; Geraedts, Max
2016-12-01
The study aimed to illustrate the effect of the patients' sex, age, self-rated health and medical practice specialization on patient satisfaction. Secondary analysis of patient survey data using multilevel analysis (generalized linear mixed model, medical practice as random effect) using a sequential modelling strategy. We examined the effects of the patients' sex, age, self-rated health and medical practice specialization on four patient satisfaction dimensions: medical practice organization, information, interaction, professional competence. The study was performed in 92 German medical practices providing ambulatory care in general medicine, internal medicine or gynaecology. In total, 9888 adult patients participated in a patient survey using the validated 'questionnaire on satisfaction with ambulatory care-quality from the patient perspective [ZAP]'. We calculated four models for each satisfaction dimension, revealing regression coefficients with 95% confidence intervals (CIs) for all independent variables, and using Wald Chi-Square statistic for each modelling step (model validity) and LR-Tests to compare the models of each step with the previous model. The patients' sex and age had a weak effect (maximum regression coefficient 1.09, CI 0.39; 1.80), and the patients' self-rated health had the strongest positive effect (maximum regression coefficient 7.66, CI 6.69; 8.63) on satisfaction ratings. The effect of medical practice specialization was heterogeneous. All factors studied, specifically the patients' self-rated health, affected patient satisfaction. Adjustment should always be considered because it improves the comparability of patient satisfaction in medical practices with atypically varying patient populations and increases the acceptance of comparisons. © The Author 2016. Published by Oxford University Press in association with the International Society for Quality in Health Care. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com
Cost Estimation of Naval Ship Acquisition.
1983-12-01
one a 9-sub- system model , the other a single total cost model . The models were developed using the linear least squares regression tech- nique with...to Linear Statistical Models , McGraw-Hill, 1961. 11. Helmer, F. T., Bibliography on Pricing Methodology and Cost Estimating, Dept. of Economics and...SUPPI.EMSaTARY NOTES IS. KWRo" (Cowaft. en tever aide of ..aesep M’ Idab~t 6 Week ONNa.) Cost estimation; Acquisition; Parametric cost estimate; linear
Gugushvili, Alexi
2017-08-01
Building on the previously investigated macro-sociological models which analyze the consequences of economic development, income inequality, and international migration on social mobility, this article studies the specific contextual covariates of intergenerational reproduction of occupational status in post-communist societies. It is theorized that social mobility is higher in societies with democratic political regimes and less liberalized economies. The outlined hypotheses are tested by using micro- and macro-level datasets for 21 post-communist societies which are fitted into multilevel mixed-effects linear regressions. The derived findings suggest that factors specific to transition societies, conventional macro-level variables, and the legacy of the Soviet Union explain variation in intergenerational social mobility, but the results vary depending which birth cohorts survey participants belong to and whether or not they stem from advantaged or disadvantaged social origins. These findings are robust to various alternative data, sample, and method specifications. Copyright © 2017 Elsevier Inc. All rights reserved.
Fenske, Nora; Burns, Jacob; Hothorn, Torsten; Rehfuess, Eva A.
2013-01-01
Background Most attempts to address undernutrition, responsible for one third of global child deaths, have fallen behind expectations. This suggests that the assumptions underlying current modelling and intervention practices should be revisited. Objective We undertook a comprehensive analysis of the determinants of child stunting in India, and explored whether the established focus on linear effects of single risks is appropriate. Design Using cross-sectional data for children aged 0–24 months from the Indian National Family Health Survey for 2005/2006, we populated an evidence-based diagram of immediate, intermediate and underlying determinants of stunting. We modelled linear, non-linear, spatial and age-varying effects of these determinants using additive quantile regression for four quantiles of the Z-score of standardized height-for-age and logistic regression for stunting and severe stunting. Results At least one variable within each of eleven groups of determinants was significantly associated with height-for-age in the 35% Z-score quantile regression. The non-modifiable risk factors child age and sex, and the protective factors household wealth, maternal education and BMI showed the largest effects. Being a twin or multiple birth was associated with dramatically decreased height-for-age. Maternal age, maternal BMI, birth order and number of antenatal visits influenced child stunting in non-linear ways. Findings across the four quantile and two logistic regression models were largely comparable. Conclusions Our analysis confirms the multifactorial nature of child stunting. It emphasizes the need to pursue a systems-based approach and to consider non-linear effects, and suggests that differential effects across the height-for-age distribution do not play a major role. PMID:24223839
Fenske, Nora; Burns, Jacob; Hothorn, Torsten; Rehfuess, Eva A
2013-01-01
Most attempts to address undernutrition, responsible for one third of global child deaths, have fallen behind expectations. This suggests that the assumptions underlying current modelling and intervention practices should be revisited. We undertook a comprehensive analysis of the determinants of child stunting in India, and explored whether the established focus on linear effects of single risks is appropriate. Using cross-sectional data for children aged 0-24 months from the Indian National Family Health Survey for 2005/2006, we populated an evidence-based diagram of immediate, intermediate and underlying determinants of stunting. We modelled linear, non-linear, spatial and age-varying effects of these determinants using additive quantile regression for four quantiles of the Z-score of standardized height-for-age and logistic regression for stunting and severe stunting. At least one variable within each of eleven groups of determinants was significantly associated with height-for-age in the 35% Z-score quantile regression. The non-modifiable risk factors child age and sex, and the protective factors household wealth, maternal education and BMI showed the largest effects. Being a twin or multiple birth was associated with dramatically decreased height-for-age. Maternal age, maternal BMI, birth order and number of antenatal visits influenced child stunting in non-linear ways. Findings across the four quantile and two logistic regression models were largely comparable. Our analysis confirms the multifactorial nature of child stunting. It emphasizes the need to pursue a systems-based approach and to consider non-linear effects, and suggests that differential effects across the height-for-age distribution do not play a major role.
Ding, Chuan; Chen, Peng; Jiao, Junfeng
2018-03-01
Although a growing body of literature focuses on the relationship between the built environment and pedestrian crashes, limited evidence is provided about the relative importance of many built environment attributes by accounting for their mutual interaction effects and their non-linear effects on automobile-involved pedestrian crashes. This study adopts the approach of Multiple Additive Poisson Regression Trees (MAPRT) to fill such gaps using pedestrian collision data collected from Seattle, Washington. Traffic analysis zones are chosen as the analytical unit. The effects of various factors on pedestrian crash frequency investigated include characteristics the of road network, street elements, land use patterns, and traffic demand. Density and the degree of mixed land use have major effects on pedestrian crash frequency, accounting for approximately 66% of the effects in total. More importantly, some factors show clear non-linear relationships with pedestrian crash frequency, challenging the linearity assumption commonly used in existing studies which employ statistical models. With various accurately identified non-linear relationships between the built environment and pedestrian crashes, this study suggests local agencies to adopt geo-spatial differentiated policies to establish a safe walking environment. These findings, especially the effective ranges of the built environment, provide evidence to support for transport and land use planning, policy recommendations, and road safety programs. Copyright © 2018 Elsevier Ltd. All rights reserved.
Poisson Mixture Regression Models for Heart Disease Prediction.
Mufudza, Chipo; Erol, Hamza
2016-01-01
Early heart disease control can be achieved by high disease prediction and diagnosis efficiency. This paper focuses on the use of model based clustering techniques to predict and diagnose heart disease via Poisson mixture regression models. Analysis and application of Poisson mixture regression models is here addressed under two different classes: standard and concomitant variable mixture regression models. Results show that a two-component concomitant variable Poisson mixture regression model predicts heart disease better than both the standard Poisson mixture regression model and the ordinary general linear Poisson regression model due to its low Bayesian Information Criteria value. Furthermore, a Zero Inflated Poisson Mixture Regression model turned out to be the best model for heart prediction over all models as it both clusters individuals into high or low risk category and predicts rate to heart disease componentwise given clusters available. It is deduced that heart disease prediction can be effectively done by identifying the major risks componentwise using Poisson mixture regression model.
Poisson Mixture Regression Models for Heart Disease Prediction
Erol, Hamza
2016-01-01
Early heart disease control can be achieved by high disease prediction and diagnosis efficiency. This paper focuses on the use of model based clustering techniques to predict and diagnose heart disease via Poisson mixture regression models. Analysis and application of Poisson mixture regression models is here addressed under two different classes: standard and concomitant variable mixture regression models. Results show that a two-component concomitant variable Poisson mixture regression model predicts heart disease better than both the standard Poisson mixture regression model and the ordinary general linear Poisson regression model due to its low Bayesian Information Criteria value. Furthermore, a Zero Inflated Poisson Mixture Regression model turned out to be the best model for heart prediction over all models as it both clusters individuals into high or low risk category and predicts rate to heart disease componentwise given clusters available. It is deduced that heart disease prediction can be effectively done by identifying the major risks componentwise using Poisson mixture regression model. PMID:27999611
NASA Astrophysics Data System (ADS)
Haris, A.; Nafian, M.; Riyanto, A.
2017-07-01
Danish North Sea Fields consist of several formations (Ekofisk, Tor, and Cromer Knoll) that was started from the age of Paleocene to Miocene. In this study, the integration of seismic and well log data set is carried out to determine the chalk sand distribution in the Danish North Sea field. The integration of seismic and well log data set is performed by using the seismic inversion analysis and seismic multi-attribute. The seismic inversion algorithm, which is used to derive acoustic impedance (AI), is model-based technique. The derived AI is then used as external attributes for the input of multi-attribute analysis. Moreover, the multi-attribute analysis is used to generate the linear and non-linear transformation of among well log properties. In the case of the linear model, selected transformation is conducted by weighting step-wise linear regression (SWR), while for the non-linear model is performed by using probabilistic neural networks (PNN). The estimated porosity, which is resulted by PNN shows better suited to the well log data compared with the results of SWR. This result can be understood since PNN perform non-linear regression so that the relationship between the attribute data and predicted log data can be optimized. The distribution of chalk sand has been successfully identified and characterized by porosity value ranging from 23% up to 30%.
Røislien, Jo; Lossius, Hans Morten; Kristiansen, Thomas
2015-01-01
Background Trauma is a leading global cause of death. Trauma mortality rates are higher in rural areas, constituting a challenge for quality and equality in trauma care. The aim of the study was to explore population density and transport time to hospital care as possible predictors of geographical differences in mortality rates, and to what extent choice of statistical method might affect the analytical results and accompanying clinical conclusions. Methods Using data from the Norwegian Cause of Death registry, deaths from external causes 1998–2007 were analysed. Norway consists of 434 municipalities, and municipality population density and travel time to hospital care were entered as predictors of municipality mortality rates in univariate and multiple regression models of increasing model complexity. We fitted linear regression models with continuous and categorised predictors, as well as piecewise linear and generalised additive models (GAMs). Models were compared using Akaike's information criterion (AIC). Results Population density was an independent predictor of trauma mortality rates, while the contribution of transport time to hospital care was highly dependent on choice of statistical model. A multiple GAM or piecewise linear model was superior, and similar, in terms of AIC. However, while transport time was statistically significant in multiple models with piecewise linear or categorised predictors, it was not in GAM or standard linear regression. Conclusions Population density is an independent predictor of trauma mortality rates. The added explanatory value of transport time to hospital care is marginal and model-dependent, highlighting the importance of exploring several statistical models when studying complex associations in observational data. PMID:25972600
Modified Regression Correlation Coefficient for Poisson Regression Model
NASA Astrophysics Data System (ADS)
Kaengthong, Nattacha; Domthong, Uthumporn
2017-09-01
This study gives attention to indicators in predictive power of the Generalized Linear Model (GLM) which are widely used; however, often having some restrictions. We are interested in regression correlation coefficient for a Poisson regression model. This is a measure of predictive power, and defined by the relationship between the dependent variable (Y) and the expected value of the dependent variable given the independent variables [E(Y|X)] for the Poisson regression model. The dependent variable is distributed as Poisson. The purpose of this research was modifying regression correlation coefficient for Poisson regression model. We also compare the proposed modified regression correlation coefficient with the traditional regression correlation coefficient in the case of two or more independent variables, and having multicollinearity in independent variables. The result shows that the proposed regression correlation coefficient is better than the traditional regression correlation coefficient based on Bias and the Root Mean Square Error (RMSE).
NASA Astrophysics Data System (ADS)
Isa, N. A.; Mohd, W. M. N. Wan; Salleh, S. A.; Ooi, M. C. G.
2018-02-01
Matured trees contain high concentration of chlorophyll that encourages the process of photosynthesis. This process produces oxygen as a by-product and releases it into the atmosphere and helps in lowering the ambient temperature. This study attempts to analyse the effect of green area on air surface temperature of the Kuala Lumpur city. The air surface temperatures of two different dates which are, in March 2006 and March 2016 were simulated using the Weather Research and Forecasting (WRF) model. The green area in the city was extracted using the Normalized Difference Vegetation Index (NDVI) from two Landsat satellite images. The relationship between the air surface temperature and the green area were analysed using linear regression models. From the study, it was found that, the green area was significantly affecting the distribution of air temperature within the city. A strong negative correlation was identified through this study which indicated that higher NDVI values tend to have lower air surface temperature distribution within the focus study area. It was also found that, different urban setting in mixed built-up and vegetated areas resulted in different distributions of air surface temperature. Future studies should focus on analysing the air surface temperature within the area of mixed built-up and vegetated area.
Vaughan, Adam S; Kramer, Michael R; Waller, Lance A; Schieb, Linda J; Greer, Sophia; Casper, Michele
2015-05-01
To demonstrate the implications of choosing analytical methods for quantifying spatiotemporal trends, we compare the assumptions, implementation, and outcomes of popular methods using county-level heart disease mortality in the United States between 1973 and 2010. We applied four regression-based approaches (joinpoint regression, both aspatial and spatial generalized linear mixed models, and Bayesian space-time model) and compared resulting inferences for geographic patterns of local estimates of annual percent change and associated uncertainty. The average local percent change in heart disease mortality from each method was -4.5%, with the Bayesian model having the smallest range of values. The associated uncertainty in percent change differed markedly across the methods, with the Bayesian space-time model producing the narrowest range of variance (0.0-0.8). The geographic pattern of percent change was consistent across methods with smaller declines in the South Central United States and larger declines in the Northeast and Midwest. However, the geographic patterns of uncertainty differed markedly between methods. The similarity of results, including geographic patterns, for magnitude of percent change across these methods validates the underlying spatial pattern of declines in heart disease mortality. However, marked differences in degree of uncertainty indicate that Bayesian modeling offers substantially more precise estimates. Copyright © 2015 Elsevier Inc. All rights reserved.
Construction of mathematical model for measuring material concentration by colorimetric method
NASA Astrophysics Data System (ADS)
Liu, Bing; Gao, Lingceng; Yu, Kairong; Tan, Xianghua
2018-06-01
This paper use the method of multiple linear regression to discuss the data of C problem of mathematical modeling in 2017. First, we have established a regression model for the concentration of 5 substances. But only the regression model of the substance concentration of urea in milk can pass through the significance test. The regression model established by the second sets of data can pass the significance test. But this model exists serious multicollinearity. We have improved the model by principal component analysis. The improved model is used to control the system so that it is possible to measure the concentration of material by direct colorimetric method.
Assessing risk factors for periodontitis using regression
NASA Astrophysics Data System (ADS)
Lobo Pereira, J. A.; Ferreira, Maria Cristina; Oliveira, Teresa
2013-10-01
Multivariate statistical analysis is indispensable to assess the associations and interactions between different factors and the risk of periodontitis. Among others, regression analysis is a statistical technique widely used in healthcare to investigate and model the relationship between variables. In our work we study the impact of socio-demographic, medical and behavioral factors on periodontal health. Using regression, linear and logistic models, we can assess the relevance, as risk factors for periodontitis disease, of the following independent variables (IVs): Age, Gender, Diabetic Status, Education, Smoking status and Plaque Index. The multiple linear regression analysis model was built to evaluate the influence of IVs on mean Attachment Loss (AL). Thus, the regression coefficients along with respective p-values will be obtained as well as the respective p-values from the significance tests. The classification of a case (individual) adopted in the logistic model was the extent of the destruction of periodontal tissues defined by an Attachment Loss greater than or equal to 4 mm in 25% (AL≥4mm/≥25%) of sites surveyed. The association measures include the Odds Ratios together with the correspondent 95% confidence intervals.
INCORPORATING CONCENTRATION DEPENDENCE IN STABLE ISOTOPE MIXING MODELS
Stable isotopes are frequently used to quantify the contributions of multiple sources to a mixture; e.g., C and N isotopic signatures can be used to determine the fraction of three food sources in a consumer's diet. The standard dual isotope, three source linear mixing model ass...
Forecasting daily patient volumes in the emergency department.
Jones, Spencer S; Thomas, Alun; Evans, R Scott; Welch, Shari J; Haug, Peter J; Snow, Gregory L
2008-02-01
Shifts in the supply of and demand for emergency department (ED) resources make the efficient allocation of ED resources increasingly important. Forecasting is a vital activity that guides decision-making in many areas of economic, industrial, and scientific planning, but has gained little traction in the health care industry. There are few studies that explore the use of forecasting methods to predict patient volumes in the ED. The goals of this study are to explore and evaluate the use of several statistical forecasting methods to predict daily ED patient volumes at three diverse hospital EDs and to compare the accuracy of these methods to the accuracy of a previously proposed forecasting method. Daily patient arrivals at three hospital EDs were collected for the period January 1, 2005, through March 31, 2007. The authors evaluated the use of seasonal autoregressive integrated moving average, time series regression, exponential smoothing, and artificial neural network models to forecast daily patient volumes at each facility. Forecasts were made for horizons ranging from 1 to 30 days in advance. The forecast accuracy achieved by the various forecasting methods was compared to the forecast accuracy achieved when using a benchmark forecasting method already available in the emergency medicine literature. All time series methods considered in this analysis provided improved in-sample model goodness of fit. However, post-sample analysis revealed that time series regression models that augment linear regression models by accounting for serial autocorrelation offered only small improvements in terms of post-sample forecast accuracy, relative to multiple linear regression models, while seasonal autoregressive integrated moving average, exponential smoothing, and artificial neural network forecasting models did not provide consistently accurate forecasts of daily ED volumes. This study confirms the widely held belief that daily demand for ED services is characterized by seasonal and weekly patterns. The authors compared several time series forecasting methods to a benchmark multiple linear regression model. The results suggest that the existing methodology proposed in the literature, multiple linear regression based on calendar variables, is a reasonable approach to forecasting daily patient volumes in the ED. However, the authors conclude that regression-based models that incorporate calendar variables, account for site-specific special-day effects, and allow for residual autocorrelation provide a more appropriate, informative, and consistently accurate approach to forecasting daily ED patient volumes.
Rosenblum, Michael; van der Laan, Mark J.
2010-01-01
Models, such as logistic regression and Poisson regression models, are often used to estimate treatment effects in randomized trials. These models leverage information in variables collected before randomization, in order to obtain more precise estimates of treatment effects. However, there is the danger that model misspecification will lead to bias. We show that certain easy to compute, model-based estimators are asymptotically unbiased even when the working model used is arbitrarily misspecified. Furthermore, these estimators are locally efficient. As a special case of our main result, we consider a simple Poisson working model containing only main terms; in this case, we prove the maximum likelihood estimate of the coefficient corresponding to the treatment variable is an asymptotically unbiased estimator of the marginal log rate ratio, even when the working model is arbitrarily misspecified. This is the log-linear analog of ANCOVA for linear models. Our results demonstrate one application of targeted maximum likelihood estimation. PMID:20628636
Multivariate predictors of music perception and appraisal by adult cochlear implant users.
Gfeller, Kate; Oleson, Jacob; Knutson, John F; Breheny, Patrick; Driscoll, Virginia; Olszewski, Carol
2008-02-01
The research examined whether performance by adult cochlear implant recipients on a variety of recognition and appraisal tests derived from real-world music could be predicted from technological, demographic, and life experience variables, as well as speech recognition scores. A representative sample of 209 adults implanted between 1985 and 2006 participated. Using multiple linear regression models and generalized linear mixed models, sets of optimal predictor variables were selected that effectively predicted performance on a test battery that assessed different aspects of music listening. These analyses established the importance of distinguishing between the accuracy of music perception and the appraisal of musical stimuli when using music listening as an index of implant success. Importantly, neither device type nor processing strategy predicted music perception or music appraisal. Speech recognition performance was not a strong predictor of music perception, and primarily predicted music perception when the test stimuli included lyrics. Additionally, limitations in the utility of speech perception in predicting musical perception and appraisal underscore the utility of music perception as an alternative outcome measure for evaluating implant outcomes. Music listening background, residual hearing (i.e., hearing aid use), cognitive factors, and some demographic factors predicted several indices of perceptual accuracy or appraisal of music.
Proximal sensing of within-field mycotoxin variation - a case study in Northeast Germany
NASA Astrophysics Data System (ADS)
Mueller, Marina; Koszinski, Sylvia; Bangs, Donovan E.; Wehrhan, Marc; Ullrich, Andreas; Verch, Gernot; Brenning, Alexander
2017-04-01
Fusarium head blight is a global problem in agriculture that results in yield losses and, more seriously, produces harmful toxins that enter the food chain. This study (Müller et al. 2016) builds on previous research identifying within-field humidity as an important factor in infection processes by Fusarium fungi and its mycotoxin production. Environmental variables describing topographic control of humidity (topographic wetness index TWI), soil texture and related moisture by electrical conductivity (ECa), and canopy humidity by density (normalized difference vegetation index NDVI) were explored in their relationship to the fungal infection rates and mycotoxin accumulation. Field studies at four sites in NE German Lowlands were performed in 2009 and 2011. Sites differed slightly in soil textural properties and, more pronounced, mean annual precipitation. Sampling positions were selected by usage of NDVI values range from remote sensing data base. Environmental data included elevation and its derivatives like topographic wetness index (TWI) from a DEM25, electrical conductivity distribution maps (5 x 5 m) based on EM38DD survey and, orthorectified RapidEye imagery (5 x 5 m2) with resulting NDVI distributions across the field sites. Grain yield, fungal infection rate, microbiological characteristics and mycotoxin accumulation were determined at 223 field positions. Statistical analysis incorporated Spearman rank order correlations and three regression methods (censored regression models, linear mixed-effects models and spatial linear mixed-effects models). Kriging was used to visualize the spatial patterns and trends. All analyses were performed by R software. In 2011, a more wet year than 2009, high Fusarium infection rates and a high concentration of mycotoxins were stated, the latter once exceeding EU threshold values. For both years associations between NDVI and microbiological variables were found, but being more pronounced and more often significant for 2011 than for 2009. ECa was only related with deoxynivalenol concentration (DON) and abundance of trichothecene-producing fusaria (tri6 gene copy number) in 2009 and, to DON and zearalenone (ZEA) in 2011. In contrast to former findings no correlations were found between TWI and mycological data. NDVI and, less importantly, ECa were essential predictors in all three regression models. Mycotoxins DON and ZEA distribution maps could be interpolated by kriging with internal drift based on these two proximal predictor variables. Providing spatial patterns of mycotoxigenic fungi and its effects may be used to infer mycotoxin hot spots, to develop models for risk assessment and, to manage plant and crop treatments or even harvest. Müller, M.E.H., Koszinski, S., Bangs, D.E. et al. Precision Agric (2016) 17: 698. doi:10.1007/s11119-016-9444-y
NASA Astrophysics Data System (ADS)
Caimmi, R.
2011-08-01
Concerning bivariate least squares linear regression, the classical approach pursued for functional models in earlier attempts ( York, 1966, 1969) is reviewed using a new formalism in terms of deviation (matrix) traces which, for unweighted data, reduce to usual quantities leaving aside an unessential (but dimensional) multiplicative factor. Within the framework of classical error models, the dependent variable relates to the independent variable according to the usual additive model. The classes of linear models considered are regression lines in the general case of correlated errors in X and in Y for weighted data, and in the opposite limiting situations of (i) uncorrelated errors in X and in Y, and (ii) completely correlated errors in X and in Y. The special case of (C) generalized orthogonal regression is considered in detail together with well known subcases, namely: (Y) errors in X negligible (ideally null) with respect to errors in Y; (X) errors in Y negligible (ideally null) with respect to errors in X; (O) genuine orthogonal regression; (R) reduced major-axis regression. In the limit of unweighted data, the results determined for functional models are compared with their counterparts related to extreme structural models i.e. the instrumental scatter is negligible (ideally null) with respect to the intrinsic scatter ( Isobe et al., 1990; Feigelson and Babu, 1992). While regression line slope and intercept estimators for functional and structural models necessarily coincide, the contrary holds for related variance estimators even if the residuals obey a Gaussian distribution, with the exception of Y models. An example of astronomical application is considered, concerning the [O/H]-[Fe/H] empirical relations deduced from five samples related to different stars and/or different methods of oxygen abundance determination. For selected samples and assigned methods, different regression models yield consistent results within the errors (∓ σ) for both heteroscedastic and homoscedastic data. Conversely, samples related to different methods produce discrepant results, due to the presence of (still undetected) systematic errors, which implies no definitive statement can be made at present. A comparison is also made between different expressions of regression line slope and intercept variance estimators, where fractional discrepancies are found to be not exceeding a few percent, which grows up to about 20% in the presence of large dispersion data. An extension of the formalism to structural models is left to a forthcoming paper.
Pestaña-Melero, Francisco Luis; Haff, G Gregory; Rojas, Francisco Javier; Pérez-Castilla, Alejandro; García-Ramos, Amador
2017-12-18
This study aimed to compare the between-session reliability of the load-velocity relationship between (1) linear vs. polynomial regression models, (2) concentric-only vs. eccentric-concentric bench press variants, as well as (3) the within-participants vs. the between-participants variability of the velocity attained at each percentage of the one-repetition maximum (%1RM). The load-velocity relationship of 30 men (age: 21.2±3.8 y; height: 1.78±0.07 m, body mass: 72.3±7.3 kg; bench press 1RM: 78.8±13.2 kg) were evaluated by means of linear and polynomial regression models in the concentric-only and eccentric-concentric bench press variants in a Smith Machine. Two sessions were performed with each bench press variant. The main findings were: (1) first-order-polynomials (CV: 4.39%-4.70%) provided the load-velocity relationship with higher reliability than second-order-polynomials (CV: 4.68%-5.04%); (2) the reliability of the load-velocity relationship did not differ between the concentric-only and eccentric-concentric bench press variants; (3) the within-participants variability of the velocity attained at each %1RM was markedly lower than the between-participants variability. Taken together, these results highlight that, regardless of the bench press variant considered, the individual determination of the load-velocity relationship by a linear regression model could be recommended to monitor and prescribe the relative load in the Smith machine bench press exercise.
Queiroz, Valterlinda A O; Assis, Ana Marlúcia O; Pinheiro, Sandra Maria C; Ribeiro, Hugo C Ribeiro
2012-01-01
To investigate covariates that could affect the variation in mean length/age z scores in the first year of life of children born full term with normal birth weight. This was a prospective study of a cohort of mother-infant pairs recruited at public maternity units in two municipalities in the Brazilian state of Bahia, from March 2005 to October 2006. This paper reports the results for linear growth of 489 children who were followed-up for the first 12 months of their lives. A mixed-effect regression model was used to investigate the influence of covariates of mean length/age z score during the first year of life. The multivariate mixed effect analysis indicated that mothers not cohabiting with a partner (beta = 0.2347; p = 0.004) and increased duration of exclusive breastfeeding (beta = 0.0031; p < 0.001) had a positive impact, whereas mother's height less than 150 cm (beta = -0.4393; p < 0.001), birth weight of 2,500-2,999 g (beta = -0.8084; p < 0.001) and anemia in the child (beta = -0.0875; p < 0.001) all had a negative impact on the variation in estimated length/age z score. Therefore, the results of this study indicate that short maternal stature, birth weight < 3,000 g and anemia in the infant had a negative effect on linear growth during the first year of life, whereas longer duration of exclusive breastfeeding and mothers who did not cohabit with a partner had a positive effect.
Duane, B G; Freeman, R; Richards, D; Crosbie, S; Patel, P; White, S; Humphris, G
2017-03-01
To commission dental services for vulnerable (special care) patient groups effectively, consistently and fairly an evidence base is needed of the costs involved. The simplified Case Mixed Tool (sCMT) can assess treatment mode complexity for these patient groups. To determine if the sCMT can be used to identify costs of service provision. Patients (n=495) attending the Sussex Community NHS Trust Special Care Dental Service for care were assessed using the sCMT. sCMT score and costs (staffing, laboratory fees, etc.) besides patient age, whether a new patient and use of general anaesthetic/intravenous sedation. Statistical analysis (adjusted linear regression modelling) compared sCMT score and costs then sensitivity analyses of the costings to age, being a new patient and sedation use were undertaken. Regression tables were produced to present estimates of service costs. Costs increased with sCMT total scale and single item values in a predictable manner in all analyses except for 'cooperation'. Costs increased with the use of IV sedation; with each rising level of the sCMT, and with complexity in every sCMT category, except cooperation. Costs increased with increase in complexity of treatment mode as measured by sCMT scores. Measures such as the sCMT can provide predictions of the resource allocations required when commissioning special care dental services. Copyright© 2017 Dennis Barber Ltd.
Multiple linear regression models are often used to predict levels of fecal indicator bacteria (FIB) in recreational swimming waters based on independent variables (IVs) such as meteorologic, hydrodynamic, and water-quality measures. The IVs used for these analyses are traditiona...
Bias and uncertainty of δ13CO2 isotopic mixing models
Zachary E. Kayler; Lisa Ganio; Mark Hauck; Thomas G. Pypker; Elizabeth W. Sulzman; Alan C. Mix; Barbara J. Bond
2009-01-01
The goal of this study was to evaluate how factorial combinations of two mixing models and two regression approaches (Keeling-OLS, MillerâTans-OLS, Keeling-GMR, MillerâTans-GMR) compare in small [CO2] range versus large[CO2] range regimes, with different combinations of...
Prediction of monthly rainfall in Victoria, Australia: Clusterwise linear regression approach
NASA Astrophysics Data System (ADS)
Bagirov, Adil M.; Mahmood, Arshad; Barton, Andrew
2017-05-01
This paper develops the Clusterwise Linear Regression (CLR) technique for prediction of monthly rainfall. The CLR is a combination of clustering and regression techniques. It is formulated as an optimization problem and an incremental algorithm is designed to solve it. The algorithm is applied to predict monthly rainfall in Victoria, Australia using rainfall data with five input meteorological variables over the period of 1889-2014 from eight geographically diverse weather stations. The prediction performance of the CLR method is evaluated by comparing observed and predicted rainfall values using four measures of forecast accuracy. The proposed method is also compared with the CLR using the maximum likelihood framework by the expectation-maximization algorithm, multiple linear regression, artificial neural networks and the support vector machines for regression models using computational results. The results demonstrate that the proposed algorithm outperforms other methods in most locations.
Regression Methods for Categorical Dependent Variables: Effects on a Model of Student College Choice
ERIC Educational Resources Information Center
Rapp, Kelly E.
2012-01-01
The use of categorical dependent variables with the classical linear regression model (CLRM) violates many of the model's assumptions and may result in biased estimates (Long, 1997; O'Connell, Goldstein, Rogers, & Peng, 2008). Many dependent variables of interest to educational researchers (e.g., professorial rank, educational attainment) are…
Yoneoka, Daisuke; Henmi, Masayuki
2017-06-01
Recently, the number of regression models has dramatically increased in several academic fields. However, within the context of meta-analysis, synthesis methods for such models have not been developed in a commensurate trend. One of the difficulties hindering the development is the disparity in sets of covariates among literature models. If the sets of covariates differ across models, interpretation of coefficients will differ, thereby making it difficult to synthesize them. Moreover, previous synthesis methods for regression models, such as multivariate meta-analysis, often have problems because covariance matrix of coefficients (i.e. within-study correlations) or individual patient data are not necessarily available. This study, therefore, proposes a brief explanation regarding a method to synthesize linear regression models under different covariate sets by using a generalized least squares method involving bias correction terms. Especially, we also propose an approach to recover (at most) threecorrelations of covariates, which is required for the calculation of the bias term without individual patient data. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Lu, Jun; Li, Li-Ming; He, Ping-Ping; Cao, Wei-Hua; Zhan, Si-Yan; Hu, Yong-Hua
2004-06-01
To introduce the application of mixed linear model in the analysis of secular trend of blood pressure under antihypertensive treatment. A community-based postmarketing surveillance of benazepril was conducted in 1831 essential hypertensive patients (age range from 35 to 88 years) in Shanghai. Data of blood pressure was analyzed every 3 months with mixed linear model to describe the secular trend of blood pressure and changes of age-specific and gender-specific. The changing trends of systolic blood pressure (SBP) and diastolic blood pressure (DBP) were found to fit the curvilinear models. A piecewise model was fit for pulse pressure (PP), i.e., curvilinear model in the first 9 months and linear model after 9 months of taking medication. Both blood pressure and its velocity gradually slowed down. There were significant variation for the curve parameters of intercept, slope, and acceleration. Blood pressure in patients with higher initial levels was persistently declining in the 3-year-treatment. However blood pressures of patients with relatively low initial levels remained low when dropped down to some degree. Elderly patients showed high SBP but low DBP, so as with higher PP. The velocity and sizes of blood pressure reductions increased with the initial level of blood pressure. Mixed linear model is flexible and robust when applied to the analysis of longitudinal data but with missing values and can also make the maximum use of available information.
A model for predicting thermal properties of asphalt mixtures from their constituents
NASA Astrophysics Data System (ADS)
Keller, Merlin; Roche, Alexis; Lavielle, Marc
Numerous theoretical and experimental approaches have been developed to predict the effective thermal conductivity of composite materials such as polymers, foams, epoxies, soils and concrete. None of such models have been applied to asphalt concrete. This study attempts to develop a model to predict the thermal conductivity of asphalt concrete from its constituents that will contribute to the asphalt industry by reducing costs and saving time on laboratory testing. The necessity to do the laboratory testing would be no longer required when a mix for the pavement is created with desired thermal properties at the design stage by selecting correct constituents. This thesis investigated six existing predictive models for applicability to asphalt mixtures, and four standard mathematical techniques were used to develop a regression model to predict the effective thermal conductivity. The effective thermal conductivities of 81 asphalt specimens were used as the response variables, and the thermal conductivities and volume fractions of their constituents were used as the predictors. The conducted statistical analyses showed that the measured values of thermal conductivities of the mixtures are affected by the bitumen and aggregate content, but not by the air content. Contrarily, the predicted data for some investigated models are highly sensitive to air voids, but not to bitumen and/or aggregate content. Additionally, the comparison of the experimental with analytical data showed that none of the existing models gave satisfactory results; on the other hand, two regression models (Exponential 1* and Linear 3*) are promising for asphalt concrete.
Holtschlag, David J.; Shively, Dawn; Whitman, Richard L.; Haack, Sheridan K.; Fogarty, Lisa R.
2008-01-01
Regression analyses and hydrodynamic modeling were used to identify environmental factors and flow paths associated with Escherichia coli (E. coli) concentrations at Memorial and Metropolitan Beaches on Lake St. Clair in Macomb County, Mich. Lake St. Clair is part of the binational waterway between the United States and Canada that connects Lake Huron with Lake Erie in the Great Lakes Basin. Linear regression, regression-tree, and logistic regression models were developed from E. coli concentration and ancillary environmental data. Linear regression models on log10 E. coli concentrations indicated that rainfall prior to sampling, water temperature, and turbidity were positively associated with bacteria concentrations at both beaches. Flow from Clinton River, changes in water levels, wind conditions, and log10 E. coli concentrations 2 days before or after the target bacteria concentrations were statistically significant at one or both beaches. In addition, various interaction terms were significant at Memorial Beach. Linear regression models for both beaches explained only about 30 percent of the variability in log10 E. coli concentrations. Regression-tree models were developed from data from both Memorial and Metropolitan Beaches but were found to have limited predictive capability in this study. The results indicate that too few observations were available to develop reliable regression-tree models. Linear logistic models were developed to estimate the probability of E. coli concentrations exceeding 300 most probable number (MPN) per 100 milliliters (mL). Rainfall amounts before bacteria sampling were positively associated with exceedance probabilities at both beaches. Flow of Clinton River, turbidity, and log10 E. coli concentrations measured before or after the target E. coli measurements were related to exceedances at one or both beaches. The linear logistic models were effective in estimating bacteria exceedances at both beaches. A receiver operating characteristic (ROC) analysis was used to determine cut points for maximizing the true positive rate prediction while minimizing the false positive rate. A two-dimensional hydrodynamic model was developed to simulate horizontal current patterns on Lake St. Clair in response to wind, flow, and water-level conditions at model boundaries. Simulated velocity fields were used to track hypothetical massless particles backward in time from the beaches along flow paths toward source areas. Reverse particle tracking for idealized steady-state conditions shows changes in expected flow paths and traveltimes with wind speeds and directions from 24 sectors. The results indicate that three to four sets of contiguous wind sectors have similar effects on flow paths in the vicinity of the beaches. In addition, reverse particle tracking was used for transient conditions to identify expected flow paths for 10 E. coli sampling events in 2004. These results demonstrate the ability to track hypothetical particles from the beaches, backward in time, to likely source areas. This ability, coupled with a greater frequency of bacteria sampling, may provide insight into changes in bacteria concentrations between source and sink areas.
Bhamidipati, Ravi Kanth; Syed, Muzeeb; Mullangi, Ramesh; Srinivas, Nuggehally
2018-02-01
1. Dalbavancin, a lipoglycopeptide, is approved for treating gram-positive bacterial infections. Area under plasma concentration versus time curve (AUC inf ) of dalbavancin is a key parameter and AUC inf /MIC ratio is a critical pharmacodynamic marker. 2. Using end of intravenous infusion concentration (i.e. C max ) C max versus AUC inf relationship for dalbavancin was established by regression analyses (i.e. linear, log-log, log-linear and power models) using 21 pairs of subject data. 3. The predictions of the AUC inf were performed using published C max data by application of regression equations. The quotient of observed/predicted values rendered fold difference. The mean absolute error (MAE)/root mean square error (RMSE) and correlation coefficient (r) were used in the assessment. 4. MAE and RMSE values for the various models were comparable. The C max versus AUC inf exhibited excellent correlation (r > 0.9488). The internal data evaluation showed narrow confinement (0.84-1.14-fold difference) with a RMSE < 10.3%. The external data evaluation showed that the models predicted AUC inf with a RMSE of 3.02-27.46% with fold difference largely contained within 0.64-1.48. 5. Regardless of the regression models, a single time point strategy of using C max (i.e. end of 30-min infusion) is amenable as a prospective tool for predicting AUC inf of dalbavancin in patients.
A hierarchical linear model for tree height prediction.
Vicente J. Monleon
2003-01-01
Measuring tree height is a time-consuming process. Often, tree diameter is measured and height is estimated from a published regression model. Trees used to develop these models are clustered into stands, but this structure is ignored and independence is assumed. In this study, hierarchical linear models that account explicitly for the clustered structure of the data...
Cruz, Antonio M; Barr, Cameron; Puñales-Pozo, Elsa
2008-01-01
This research's main goals were to build a predictor for a turnaround time (TAT) indicator for estimating its values and use a numerical clustering technique for finding possible causes of undesirable TAT values. The following stages were used: domain understanding, data characterisation and sample reduction and insight characterisation. Building the TAT indicator multiple linear regression predictor and clustering techniques were used for improving corrective maintenance task efficiency in a clinical engineering department (CED). The indicator being studied was turnaround time (TAT). Multiple linear regression was used for building a predictive TAT value model. The variables contributing to such model were clinical engineering department response time (CE(rt), 0.415 positive coefficient), stock service response time (Stock(rt), 0.734 positive coefficient), priority level (0.21 positive coefficient) and service time (0.06 positive coefficient). The regression process showed heavy reliance on Stock(rt), CE(rt) and priority, in that order. Clustering techniques revealed the main causes of high TAT values. This examination has provided a means for analysing current technical service quality and effectiveness. In doing so, it has demonstrated a process for identifying areas and methods of improvement and a model against which to analyse these methods' effectiveness.
Forcing Regression through a Given Point Using Any Familiar Computational Routine.
1983-03-01
a linear model , Y =a + OX + e ( Model I) then adopt the principle of least squares; and use sample data to estimate the unknown parameters, a and 8...has an expected value of zero indicates that the "average" response is considered linear . If c varies widely, Model I, though conceptually correct, may...relationship is linear from the maximum observed x to x - a, then Model II should be used. To pro- ceed with the customary evaluation of Model I would be
Linear and nonlinear models for predicting fish bioconcentration factors for pesticides.
Yuan, Jintao; Xie, Chun; Zhang, Ting; Sun, Jinfang; Yuan, Xuejie; Yu, Shuling; Zhang, Yingbiao; Cao, Yunyuan; Yu, Xingchen; Yang, Xuan; Yao, Wu
2016-08-01
This work is devoted to the applications of the multiple linear regression (MLR), multilayer perceptron neural network (MLP NN) and projection pursuit regression (PPR) to quantitative structure-property relationship analysis of bioconcentration factors (BCFs) of pesticides tested on Bluegill (Lepomis macrochirus). Molecular descriptors of a total of 107 pesticides were calculated with the DRAGON Software and selected by inverse enhanced replacement method. Based on the selected DRAGON descriptors, a linear model was built by MLR, nonlinear models were developed using MLP NN and PPR. The robustness of the obtained models was assessed by cross-validation and external validation using test set. Outliers were also examined and deleted to improve predictive power. Comparative results revealed that PPR achieved the most accurate predictions. This study offers useful models and information for BCF prediction, risk assessment, and pesticide formulation. Copyright © 2016 Elsevier Ltd. All rights reserved.
Alfvén wave interactions in the solar wind
NASA Astrophysics Data System (ADS)
Webb, G. M.; McKenzie, J. F.; Hu, Q.; le Roux, J. A.; Zank, G. P.
2012-11-01
Alfvén wave mixing (interaction) equations used in locally incompressible turbulence transport equations in the solar wind are analyzed from the perspective of linear wave theory. The connection between the wave mixing equations and non-WKB Alfven wave driven wind theories are delineated. We discuss the physical wave energy equation and the canonical wave energy equation for non-WKB Alfven waves and the WKB limit. Variational principles and conservation laws for the linear wave mixing equations for the Heinemann and Olbert non-WKB wind model are obtained. The connection with wave mixing equations used in locally incompressible turbulence transport in the solar wind are discussed.
NASA Astrophysics Data System (ADS)
Leroux, Romain; Chatellier, Ludovic; David, Laurent
2018-01-01
This article is devoted to the estimation of time-resolved particle image velocimetry (TR-PIV) flow fields using a time-resolved point measurements of a voltage signal obtained by hot-film anemometry. A multiple linear regression model is first defined to map the TR-PIV flow fields onto the voltage signal. Due to the high temporal resolution of the signal acquired by the hot-film sensor, the estimates of the TR-PIV flow fields are obtained with a multiple linear regression method called orthonormalized partial least squares regression (OPLSR). Subsequently, this model is incorporated as the observation equation in an ensemble Kalman filter (EnKF) applied on a proper orthogonal decomposition reduced-order model to stabilize it while reducing the effects of the hot-film sensor noise. This method is assessed for the reconstruction of the flow around a NACA0012 airfoil at a Reynolds number of 1000 and an angle of attack of {20}°. Comparisons with multi-time delay-modified linear stochastic estimation show that both the OPLSR and EnKF combined with OPLSR are more accurate as they produce a much lower relative estimation error, and provide a faithful reconstruction of the time evolution of the velocity flow fields.
Orthogonal Projection in Teaching Regression and Financial Mathematics
ERIC Educational Resources Information Center
Kachapova, Farida; Kachapov, Ilias
2010-01-01
Two improvements in teaching linear regression are suggested. The first is to include the population regression model at the beginning of the topic. The second is to use a geometric approach: to interpret the regression estimate as an orthogonal projection and the estimation error as the distance (which is minimized by the projection). Linear…
ERIC Educational Resources Information Center
Tsai, Tien-Lung; Shau, Wen-Yi; Hu, Fu-Chang
2006-01-01
This article generalizes linear path analysis (PA) and simultaneous equations models (SiEM) to deal with mixed responses of different types in a recursive or triangular system. An efficient instrumental variable (IV) method for estimating the structural coefficients of a 2-equation partially recursive generalized path analysis (GPA) model and…
USDA-ARS?s Scientific Manuscript database
Transformations to multiple trait mixed model equations (MME) which are intended to improve computational efficiency in best linear unbiased prediction (BLUP) and restricted maximum likelihood (REML) are described. It is shown that traits that are expected or estimated to have zero residual variance...
Predicting musically induced emotions from physiological inputs: linear and neural network models.
Russo, Frank A; Vempala, Naresh N; Sandstrom, Gillian M
2013-01-01
Listening to music often leads to physiological responses. Do these physiological responses contain sufficient information to infer emotion induced in the listener? The current study explores this question by attempting to predict judgments of "felt" emotion from physiological responses alone using linear and neural network models. We measured five channels of peripheral physiology from 20 participants-heart rate (HR), respiration, galvanic skin response, and activity in corrugator supercilii and zygomaticus major facial muscles. Using valence and arousal (VA) dimensions, participants rated their felt emotion after listening to each of 12 classical music excerpts. After extracting features from the five channels, we examined their correlation with VA ratings, and then performed multiple linear regression to see if a linear relationship between the physiological responses could account for the ratings. Although linear models predicted a significant amount of variance in arousal ratings, they were unable to do so with valence ratings. We then used a neural network to provide a non-linear account of the ratings. The network was trained on the mean ratings of eight of the 12 excerpts and tested on the remainder. Performance of the neural network confirms that physiological responses alone can be used to predict musically induced emotion. The non-linear model derived from the neural network was more accurate than linear models derived from multiple linear regression, particularly along the valence dimension. A secondary analysis allowed us to quantify the relative contributions of inputs to the non-linear model. The study represents a novel approach to understanding the complex relationship between physiological responses and musically induced emotion.
2012-06-15
Maintenance AFSCs ................................................................................................. 14 2. Variation Inflation Factors...total variability in the data. It is an indication of how much of the 20 variation in the data can be accounted for in the regression model. In... Variation Inflation Factors for each independent variable (predictor) as regressed against all of the other independent variables in the model. The
Quantile regression models of animal habitat relationships
Cade, Brian S.
2003-01-01
Typically, all factors that limit an organism are not measured and included in statistical models used to investigate relationships with their environment. If important unmeasured variables interact multiplicatively with the measured variables, the statistical models often will have heterogeneous response distributions with unequal variances. Quantile regression is an approach for estimating the conditional quantiles of a response variable distribution in the linear model, providing a more complete view of possible causal relationships between variables in ecological processes. Chapter 1 introduces quantile regression and discusses the ordering characteristics, interval nature, sampling variation, weighting, and interpretation of estimates for homogeneous and heterogeneous regression models. Chapter 2 evaluates performance of quantile rankscore tests used for hypothesis testing and constructing confidence intervals for linear quantile regression estimates (0 ≤ τ ≤ 1). A permutation F test maintained better Type I errors than the Chi-square T test for models with smaller n, greater number of parameters p, and more extreme quantiles τ. Both versions of the test required weighting to maintain correct Type I errors when there was heterogeneity under the alternative model. An example application related trout densities to stream channel width:depth. Chapter 3 evaluates a drop in dispersion, F-ratio like permutation test for hypothesis testing and constructing confidence intervals for linear quantile regression estimates (0 ≤ τ ≤ 1). Chapter 4 simulates from a large (N = 10,000) finite population representing grid areas on a landscape to demonstrate various forms of hidden bias that might occur when the effect of a measured habitat variable on some animal was confounded with the effect of another unmeasured variable (spatially and not spatially structured). Depending on whether interactions of the measured habitat and unmeasured variable were negative (interference interactions) or positive (facilitation interactions), either upper (τ > 0.5) or lower (τ < 0.5) quantile regression parameters were less biased than mean rate parameters. Sampling (n = 20 - 300) simulations demonstrated that confidence intervals constructed by inverting rankscore tests provided valid coverage of these biased parameters. Quantile regression was used to estimate effects of physical habitat resources on a bivalve mussel (Macomona liliana) in a New Zealand harbor by modeling the spatial trend surface as a cubic polynomial of location coordinates.
Growth and yield in Eucalyptus globulus
James A. Rinehart; Richard B. Standiford
1983-01-01
A study of the major Eucalyptus globulus stands throughout California conducted by Woodbridge Metcalf in 1924 provides a complete and accurate data set for generating variable site-density yield models. Two models were developed using linear regression techniques. Model I depicts a linear relationship between age and yield best used for stands between five and fifteen...
Linear regression models for solvent accessibility prediction in proteins.
Wagner, Michael; Adamczak, Rafał; Porollo, Aleksey; Meller, Jarosław
2005-04-01
The relative solvent accessibility (RSA) of an amino acid residue in a protein structure is a real number that represents the solvent exposed surface area of this residue in relative terms. The problem of predicting the RSA from the primary amino acid sequence can therefore be cast as a regression problem. Nevertheless, RSA prediction has so far typically been cast as a classification problem. Consequently, various machine learning techniques have been used within the classification framework to predict whether a given amino acid exceeds some (arbitrary) RSA threshold and would thus be predicted to be "exposed," as opposed to "buried." We have recently developed novel methods for RSA prediction using nonlinear regression techniques which provide accurate estimates of the real-valued RSA and outperform classification-based approaches with respect to commonly used two-class projections. However, while their performance seems to provide a significant improvement over previously published approaches, these Neural Network (NN) based methods are computationally expensive to train and involve several thousand parameters. In this work, we develop alternative regression models for RSA prediction which are computationally much less expensive, involve orders-of-magnitude fewer parameters, and are still competitive in terms of prediction quality. In particular, we investigate several regression models for RSA prediction using linear L1-support vector regression (SVR) approaches as well as standard linear least squares (LS) regression. Using rigorously derived validation sets of protein structures and extensive cross-validation analysis, we compare the performance of the SVR with that of LS regression and NN-based methods. In particular, we show that the flexibility of the SVR (as encoded by metaparameters such as the error insensitivity and the error penalization terms) can be very beneficial to optimize the prediction accuracy for buried residues. We conclude that the simple and computationally much more efficient linear SVR performs comparably to nonlinear models and thus can be used in order to facilitate further attempts to design more accurate RSA prediction methods, with applications to fold recognition and de novo protein structure prediction methods.
Linear mixing model applied to AVHRR LAC data
NASA Technical Reports Server (NTRS)
Holben, Brent N.; Shimabukuro, Yosio E.
1993-01-01
A linear mixing model was applied to coarse spatial resolution data from the NOAA Advanced Very High Resolution Radiometer. The reflective component of the 3.55 - 3.93 microns channel was extracted and used with the two reflective channels 0.58 - 0.68 microns and 0.725 - 1.1 microns to run a Constraine Least Squares model to generate vegetation, soil, and shade fraction images for an area in the Western region of Brazil. The Landsat Thematic Mapper data covering the Emas National park region was used for estimating the spectral response of the mixture components and for evaluating the mixing model results. The fraction images were compared with an unsupervised classification derived from Landsat TM data acquired on the same day. The relationship between the fraction images and normalized difference vegetation index images show the potential of the unmixing techniques when using coarse resolution data for global studies.
Study on power grid characteristics in summer based on Linear regression analysis
NASA Astrophysics Data System (ADS)
Tang, Jin-hui; Liu, You-fei; Liu, Juan; Liu, Qiang; Liu, Zhuan; Xu, Xi
2018-05-01
The correlation analysis of power load and temperature is the precondition and foundation for accurate load prediction, and a great deal of research has been made. This paper constructed the linear correlation model between temperature and power load, then the correlation of fault maintenance work orders with the power load is researched. Data details of Jiangxi province in 2017 summer such as temperature, power load, fault maintenance work orders were adopted in this paper to develop data analysis and mining. Linear regression models established in this paper will promote electricity load growth forecast, fault repair work order review, distribution network operation weakness analysis and other work to further deepen the refinement.
Drug awareness in adolescents attending a mental health service: analysis of longitudinal data.
Arnau, Jaume; Bono, Roser; Díaz, Rosa; Goti, Javier
2011-11-01
One of the procedures used most recently with longitudinal data is linear mixed models. In the context of health research the increasing number of studies that now use these models bears witness to the growing interest in this type of analysis. This paper describes the application of linear mixed models to a longitudinal study of a sample of Spanish adolescents attending a mental health service, the aim being to investigate their knowledge about the consumption of alcohol and other drugs. More specifically, the main objective was to compare the efficacy of a motivational interviewing programme with a standard approach to drug awareness. The models used to analyse the overall indicator of drug awareness were as follows: (a) unconditional linear growth curve model; (b) growth model with subject-associated variables; and (c) individual curve model with predictive variables. The results showed that awareness increased over time and that the variable 'schooling years' explained part of the between-subjects variation. The effect of motivational interviewing was also significant.
NASA Astrophysics Data System (ADS)
Kamaruddin, Ainur Amira; Ali, Zalila; Noor, Norlida Mohd.; Baharum, Adam; Ahmad, Wan Muhamad Amir W.
2014-07-01
Logistic regression analysis examines the influence of various factors on a dichotomous outcome by estimating the probability of the event's occurrence. Logistic regression, also called a logit model, is a statistical procedure used to model dichotomous outcomes. In the logit model the log odds of the dichotomous outcome is modeled as a linear combination of the predictor variables. The log odds ratio in logistic regression provides a description of the probabilistic relationship of the variables and the outcome. In conducting logistic regression, selection procedures are used in selecting important predictor variables, diagnostics are used to check that assumptions are valid which include independence of errors, linearity in the logit for continuous variables, absence of multicollinearity, and lack of strongly influential outliers and a test statistic is calculated to determine the aptness of the model. This study used the binary logistic regression model to investigate overweight and obesity among rural secondary school students on the basis of their demographics profile, medical history, diet and lifestyle. The results indicate that overweight and obesity of students are influenced by obesity in family and the interaction between a student's ethnicity and routine meals intake. The odds of a student being overweight and obese are higher for a student having a family history of obesity and for a non-Malay student who frequently takes routine meals as compared to a Malay student.
Efficient least angle regression for identification of linear-in-the-parameters models
Beach, Thomas H.; Rezgui, Yacine
2017-01-01
Least angle regression, as a promising model selection method, differentiates itself from conventional stepwise and stagewise methods, in that it is neither too greedy nor too slow. It is closely related to L1 norm optimization, which has the advantage of low prediction variance through sacrificing part of model bias property in order to enhance model generalization capability. In this paper, we propose an efficient least angle regression algorithm for model selection for a large class of linear-in-the-parameters models with the purpose of accelerating the model selection process. The entire algorithm works completely in a recursive manner, where the correlations between model terms and residuals, the evolving directions and other pertinent variables are derived explicitly and updated successively at every subset selection step. The model coefficients are only computed when the algorithm finishes. The direct involvement of matrix inversions is thereby relieved. A detailed computational complexity analysis indicates that the proposed algorithm possesses significant computational efficiency, compared with the original approach where the well-known efficient Cholesky decomposition is involved in solving least angle regression. Three artificial and real-world examples are employed to demonstrate the effectiveness, efficiency and numerical stability of the proposed algorithm. PMID:28293140
NASA Astrophysics Data System (ADS)
Bruno, Delia Evelina; Barca, Emanuele; Goncalves, Rodrigo Mikosz; de Araujo Queiroz, Heithor Alexandre; Berardi, Luigi; Passarella, Giuseppe
2018-01-01
In this paper, the Evolutionary Polynomial Regression data modelling strategy has been applied to study small scale, short-term coastal morphodynamics, given its capability for treating a wide database of known information, non-linearly. Simple linear and multilinear regression models were also applied to achieve a balance between the computational load and reliability of estimations of the three models. In fact, even though it is easy to imagine that the more complex the model, the more the prediction improves, sometimes a "slight" worsening of estimations can be accepted in exchange for the time saved in data organization and computational load. The models' outcomes were validated through a detailed statistical, error analysis, which revealed a slightly better estimation of the polynomial model with respect to the multilinear model, as expected. On the other hand, even though the data organization was identical for the two models, the multilinear one required a simpler simulation setting and a faster run time. Finally, the most reliable evolutionary polynomial regression model was used in order to make some conjecture about the uncertainty increase with the extension of extrapolation time of the estimation. The overlapping rate between the confidence band of the mean of the known coast position and the prediction band of the estimated position can be a good index of the weakness in producing reliable estimations when the extrapolation time increases too much. The proposed models and tests have been applied to a coastal sector located nearby Torre Colimena in the Apulia region, south Italy.
Time and frequency domain analysis of sampled data controllers via mixed operation equations
NASA Technical Reports Server (NTRS)
Frisch, H. P.
1981-01-01
Specification of the mathematical equations required to define the dynamic response of a linear continuous plant, subject to sampled data control, is complicated by the fact that the digital components of the control system cannot be modeled via linear ordinary differential equations. This complication can be overcome by introducing two new mathematical operations; namely, the operation of zero order hold and digial delay. It is shown that by direct utilization of these operations, a set of linear mixed operation equations can be written and used to define the dynamic response characteristics of the controlled system. It also is shown how these linear mixed operation equations lead, in an automatable manner, directly to a set of finite difference equations which are in a format compatible with follow on time and frequency domain analysis methods.
Wang, Yuanjia; Chen, Huaihou
2012-01-01
Summary We examine a generalized F-test of a nonparametric function through penalized splines and a linear mixed effects model representation. With a mixed effects model representation of penalized splines, we imbed the test of an unspecified function into a test of some fixed effects and a variance component in a linear mixed effects model with nuisance variance components under the null. The procedure can be used to test a nonparametric function or varying-coefficient with clustered data, compare two spline functions, test the significance of an unspecified function in an additive model with multiple components, and test a row or a column effect in a two-way analysis of variance model. Through a spectral decomposition of the residual sum of squares, we provide a fast algorithm for computing the null distribution of the test, which significantly improves the computational efficiency over bootstrap. The spectral representation reveals a connection between the likelihood ratio test (LRT) in a multiple variance components model and a single component model. We examine our methods through simulations, where we show that the power of the generalized F-test may be higher than the LRT, depending on the hypothesis of interest and the true model under the alternative. We apply these methods to compute the genome-wide critical value and p-value of a genetic association test in a genome-wide association study (GWAS), where the usual bootstrap is computationally intensive (up to 108 simulations) and asymptotic approximation may be unreliable and conservative. PMID:23020801
Wang, Yuanjia; Chen, Huaihou
2012-12-01
We examine a generalized F-test of a nonparametric function through penalized splines and a linear mixed effects model representation. With a mixed effects model representation of penalized splines, we imbed the test of an unspecified function into a test of some fixed effects and a variance component in a linear mixed effects model with nuisance variance components under the null. The procedure can be used to test a nonparametric function or varying-coefficient with clustered data, compare two spline functions, test the significance of an unspecified function in an additive model with multiple components, and test a row or a column effect in a two-way analysis of variance model. Through a spectral decomposition of the residual sum of squares, we provide a fast algorithm for computing the null distribution of the test, which significantly improves the computational efficiency over bootstrap. The spectral representation reveals a connection between the likelihood ratio test (LRT) in a multiple variance components model and a single component model. We examine our methods through simulations, where we show that the power of the generalized F-test may be higher than the LRT, depending on the hypothesis of interest and the true model under the alternative. We apply these methods to compute the genome-wide critical value and p-value of a genetic association test in a genome-wide association study (GWAS), where the usual bootstrap is computationally intensive (up to 10(8) simulations) and asymptotic approximation may be unreliable and conservative. © 2012, The International Biometric Society.
Semiparametric regression during 2003–2007*
Ruppert, David; Wand, M.P.; Carroll, Raymond J.
2010-01-01
Semiparametric regression is a fusion between parametric regression and nonparametric regression that integrates low-rank penalized splines, mixed model and hierarchical Bayesian methodology – thus allowing more streamlined handling of longitudinal and spatial correlation. We review progress in the field over the five-year period between 2003 and 2007. We find semiparametric regression to be a vibrant field with substantial involvement and activity, continual enhancement and widespread application. PMID:20305800
Flow of nanofluid past a Riga plate
NASA Astrophysics Data System (ADS)
Ahmad, Adeel; Asghar, Saleem; Afzal, Sumaira
2016-03-01
This paper studies the mixed convection boundary layer flow of a nanofluid past a vertical Riga plate in the presence of strong suction. The mathematical model incorporates the Brownian motion and thermophoresis effects due to nanofluid and the Grinberg-term for the wall parallel Lorentz force due to Riga plate. The analytical solution of the problem is presented using the perturbation method for small Brownian and thermophoresis diffusion parameters. The numerical solution is also presented to ensure the reliability of the asymptotic method. The comparison of the two solutions shows an excellent agreement. The correlation expressions for skin friction, Nusselt number and Sherwood number are developed by performing linear regression on the obtained numerical data. The effects of nanofluid and the Lorentz force due to Riga plate, on the skin friction are discussed.
Wankel, Scott D.; Kendall, Carol; Paytan, Adina
2009-01-01
Nitrate (NO-3 concentrations and dual isotopic composition (??15N and ??18O) were measured during various seasons and tidal conditions in Elkhorn Slough to evaluate mixing of sources of NO-3 within this California estuary. We found the isotopic composition of NO-3 was influenced most heavily by mixing of two primary sources with unique isotopic signatures, a marine (Monterey Bay) and terrestrial agricultural runoff source (Old Salinas River). However, our attempt to use a simple two end-member mixing model to calculate the relative contribution of these two NO-3 sources to the Slough was complicated by periods of nonconservative behavior and/or the presence of additional sources, particularly during the dry season when NO-3 concentrations were low. Although multiple linear regression generally yielded good fits to the observed data, deviations from conservative mixing were still evident. After consideration of potential alternative sources, we concluded that deviations from two end-member mixing were most likely derived from interactions with marsh sediments in regions of the Slough where high rates of NO-3 uptake and nitrification result in NO-3 with low ?? 15N and high ??18O values. A simple steady state dual isotope model is used to illustrate the impact of cycling processes in an estuarine setting which may play a primary role in controlling NO -3 isotopic composition when and where cycling rates and water residence times are high. This work expands our understanding of nitrogen and oxygen isotopes as biogeochemical tools for investigating NO -3 sources and cycling in estuaries, emphasizing the role that cycling processes may play in altering isotopic composition. Copyright 2009 by the American Geophysical Union.
Linear regression analysis: part 14 of a series on evaluation of scientific publications.
Schneider, Astrid; Hommel, Gerhard; Blettner, Maria
2010-11-01
Regression analysis is an important statistical method for the analysis of medical data. It enables the identification and characterization of relationships among multiple factors. It also enables the identification of prognostically relevant risk factors and the calculation of risk scores for individual prognostication. This article is based on selected textbooks of statistics, a selective review of the literature, and our own experience. After a brief introduction of the uni- and multivariable regression models, illustrative examples are given to explain what the important considerations are before a regression analysis is performed, and how the results should be interpreted. The reader should then be able to judge whether the method has been used correctly and interpret the results appropriately. The performance and interpretation of linear regression analysis are subject to a variety of pitfalls, which are discussed here in detail. The reader is made aware of common errors of interpretation through practical examples. Both the opportunities for applying linear regression analysis and its limitations are presented.
Linear Modeling and Evaluation of Controls on Flow Response in Western Post-Fire Watersheds
NASA Astrophysics Data System (ADS)
Saxe, S.; Hogue, T. S.; Hay, L.
2015-12-01
This research investigates the impact of wildfires on watershed flow regimes throughout the western United States, specifically focusing on evaluation of fire events within specified subregions and determination of the impact of climate and geophysical variables in post-fire flow response. Fire events were collected through federal and state-level databases and streamflow data were collected from U.S. Geological Survey stream gages. 263 watersheds were identified with at least 10 years of continuous pre-fire daily streamflow records and 5 years of continuous post-fire daily flow records. For each watershed, percent changes in runoff ratio (RO), annual seven day low-flows (7Q2) and annual seven day high-flows (7Q10) were calculated from pre- to post-fire. Numerous independent variables were identified for each watershed and fire event, including topographic, land cover, climate, burn severity, and soils data. The national watersheds were divided into five regions through K-clustering and a lasso linear regression model, applying the Leave-One-Out calibration method, was calculated for each region. Nash-Sutcliffe Efficiency (NSE) was used to determine the accuracy of the resulting models. The regions encompassing the United States along and west of the Rocky Mountains, excluding the coastal watersheds, produced the most accurate linear models. The Pacific coast region models produced poor and inconsistent results, indicating that the regions need to be further subdivided. Presently, RO and HF response variables appear to be more easily modeled than LF. Results of linear regression modeling showed varying importance of watershed and fire event variables, with conflicting correlation between land cover types and soil types by region. The addition of further independent variables and constriction of current variables based on correlation indicators is ongoing and should allow for more accurate linear regression modeling.
Zhao, Xin; Han, Meng; Ding, Lili; Calin, Adrian Cantemir
2018-01-01
The accurate forecast of carbon dioxide emissions is critical for policy makers to take proper measures to establish a low carbon society. This paper discusses a hybrid of the mixed data sampling (MIDAS) regression model and BP (back propagation) neural network (MIDAS-BP model) to forecast carbon dioxide emissions. Such analysis uses mixed frequency data to study the effects of quarterly economic growth on annual carbon dioxide emissions. The forecasting ability of MIDAS-BP is remarkably better than MIDAS, ordinary least square (OLS), polynomial distributed lags (PDL), autoregressive distributed lags (ADL), and auto-regressive moving average (ARMA) models. The MIDAS-BP model is suitable for forecasting carbon dioxide emissions for both the short and longer term. This research is expected to influence the methodology for forecasting carbon dioxide emissions by improving the forecast accuracy. Empirical results show that economic growth has both negative and positive effects on carbon dioxide emissions that last 15 quarters. Carbon dioxide emissions are also affected by their own change within 3 years. Therefore, there is a need for policy makers to explore an alternative way to develop the economy, especially applying new energy policies to establish a low carbon society.
NASA Astrophysics Data System (ADS)
Huang, Wen Deng; Chen, Guang De; Yuan, Zhao Lin; Yang, Chuang Hua; Ye, Hong Gang; Wu, Ye Long
2016-02-01
The theoretical investigations of the interface optical phonons, electron-phonon couplings and its ternary mixed effects in zinc-blende spherical quantum dots are obtained by using the dielectric continuum model and modified random-element isodisplacement model. The features of dispersion curves, electron-phonon coupling strengths, and its ternary mixed effects for interface optical phonons in a single zinc-blende GaN/AlxGa1-xN spherical quantum dot are calculated and discussed in detail. The numerical results show that there are three branches of interface optical phonons. One branch exists in low frequency region; another two branches exist in high frequency region. The interface optical phonons with small quantum number l have more important contributions to the electron-phonon interactions. It is also found that ternary mixed effects have important influences on the interface optical phonon properties in a single zinc-blende GaN/AlxGa1-xN quantum dot. With the increase of Al component, the interface optical phonon frequencies appear linear changes, and the electron-phonon coupling strengths appear non-linear changes in high frequency region. But in low frequency region, the frequencies appear non-linear changes, and the electron-phonon coupling strengths appear linear changes.
Heteroscedasticity as a Basis of Direction Dependence in Reversible Linear Regression Models.
Wiedermann, Wolfgang; Artner, Richard; von Eye, Alexander
2017-01-01
Heteroscedasticity is a well-known issue in linear regression modeling. When heteroscedasticity is observed, researchers are advised to remedy possible model misspecification of the explanatory part of the model (e.g., considering alternative functional forms and/or omitted variables). The present contribution discusses another source of heteroscedasticity in observational data: Directional model misspecifications in the case of nonnormal variables. Directional misspecification refers to situations where alternative models are equally likely to explain the data-generating process (e.g., x → y versus y → x). It is shown that the homoscedasticity assumption is likely to be violated in models that erroneously treat true nonnormal predictors as response variables. Recently, Direction Dependence Analysis (DDA) has been proposed as a framework to empirically evaluate the direction of effects in linear models. The present study links the phenomenon of heteroscedasticity with DDA and describes visual diagnostics and nine homoscedasticity tests that can be used to make decisions concerning the direction of effects in linear models. Results of a Monte Carlo simulation that demonstrate the adequacy of the approach are presented. An empirical example is provided, and applicability of the methodology in cases of violated assumptions is discussed.