Boligon, A A; Baldi, F; Mercadante, M E Z; Lobo, R B; Pereira, R J; Albuquerque, L G
2011-06-28
We quantified the potential increase in accuracy of expected breeding value for weights of Nelore cattle, from birth to mature age, using multi-trait and random regression models on Legendre polynomials and B-spline functions. A total of 87,712 weight records from 8144 females were used, recorded every three months from birth to mature age from the Nelore Brazil Program. For random regression analyses, all female weight records from birth to eight years of age (data set I) were considered. From this general data set, a subset was created (data set II), which included only nine weight records: at birth, weaning, 365 and 550 days of age, and 2, 3, 4, 5, and 6 years of age. Data set II was analyzed using random regression and multi-trait models. The model of analysis included the contemporary group as fixed effects and age of dam as a linear and quadratic covariable. In the random regression analyses, average growth trends were modeled using a cubic regression on orthogonal polynomials of age. Residual variances were modeled by a step function with five classes. Legendre polynomials of fourth and sixth order were utilized to model the direct genetic and animal permanent environmental effects, respectively, while third-order Legendre polynomials were considered for maternal genetic and maternal permanent environmental effects. Quadratic polynomials were applied to model all random effects in random regression models on B-spline functions. Direct genetic and animal permanent environmental effects were modeled using three segments or five coefficients, and genetic maternal and maternal permanent environmental effects were modeled with one segment or three coefficients in the random regression models on B-spline functions. For both data sets (I and II), animals ranked differently according to expected breeding value obtained by random regression or multi-trait models. With random regression models, the highest gains in accuracy were obtained at ages with a low number of weight records. The results indicate that random regression models provide more accurate expected breeding values than the traditionally finite multi-trait models. Thus, higher genetic responses are expected for beef cattle growth traits by replacing a multi-trait model with random regression models for genetic evaluation. B-spline functions could be applied as an alternative to Legendre polynomials to model covariance functions for weights from birth to mature age.
Prediction models for clustered data: comparison of a random intercept and standard regression model
2013-01-01
Background When study data are clustered, standard regression analysis is considered inappropriate and analytical techniques for clustered data need to be used. For prediction research in which the interest of predictor effects is on the patient level, random effect regression models are probably preferred over standard regression analysis. It is well known that the random effect parameter estimates and the standard logistic regression parameter estimates are different. Here, we compared random effect and standard logistic regression models for their ability to provide accurate predictions. Methods Using an empirical study on 1642 surgical patients at risk of postoperative nausea and vomiting, who were treated by one of 19 anesthesiologists (clusters), we developed prognostic models either with standard or random intercept logistic regression. External validity of these models was assessed in new patients from other anesthesiologists. We supported our results with simulation studies using intra-class correlation coefficients (ICC) of 5%, 15%, or 30%. Standard performance measures and measures adapted for the clustered data structure were estimated. Results The model developed with random effect analysis showed better discrimination than the standard approach, if the cluster effects were used for risk prediction (standard c-index of 0.69 versus 0.66). In the external validation set, both models showed similar discrimination (standard c-index 0.68 versus 0.67). The simulation study confirmed these results. For datasets with a high ICC (≥15%), model calibration was only adequate in external subjects, if the used performance measure assumed the same data structure as the model development method: standard calibration measures showed good calibration for the standard developed model, calibration measures adapting the clustered data structure showed good calibration for the prediction model with random intercept. Conclusion The models with random intercept discriminate better than the standard model only if the cluster effect is used for predictions. The prediction model with random intercept had good calibration within clusters. PMID:23414436
Bouwmeester, Walter; Twisk, Jos W R; Kappen, Teus H; van Klei, Wilton A; Moons, Karel G M; Vergouwe, Yvonne
2013-02-15
When study data are clustered, standard regression analysis is considered inappropriate and analytical techniques for clustered data need to be used. For prediction research in which the interest of predictor effects is on the patient level, random effect regression models are probably preferred over standard regression analysis. It is well known that the random effect parameter estimates and the standard logistic regression parameter estimates are different. Here, we compared random effect and standard logistic regression models for their ability to provide accurate predictions. Using an empirical study on 1642 surgical patients at risk of postoperative nausea and vomiting, who were treated by one of 19 anesthesiologists (clusters), we developed prognostic models either with standard or random intercept logistic regression. External validity of these models was assessed in new patients from other anesthesiologists. We supported our results with simulation studies using intra-class correlation coefficients (ICC) of 5%, 15%, or 30%. Standard performance measures and measures adapted for the clustered data structure were estimated. The model developed with random effect analysis showed better discrimination than the standard approach, if the cluster effects were used for risk prediction (standard c-index of 0.69 versus 0.66). In the external validation set, both models showed similar discrimination (standard c-index 0.68 versus 0.67). The simulation study confirmed these results. For datasets with a high ICC (≥15%), model calibration was only adequate in external subjects, if the used performance measure assumed the same data structure as the model development method: standard calibration measures showed good calibration for the standard developed model, calibration measures adapting the clustered data structure showed good calibration for the prediction model with random intercept. The models with random intercept discriminate better than the standard model only if the cluster effect is used for predictions. The prediction model with random intercept had good calibration within clusters.
Baldi, F; Alencar, M M; Albuquerque, L G
2010-12-01
The objective of this work was to estimate covariance functions using random regression models on B-splines functions of animal age, for weights from birth to adult age in Canchim cattle. Data comprised 49,011 records on 2435 females. The model of analysis included fixed effects of contemporary groups, age of dam as quadratic covariable and the population mean trend taken into account by a cubic regression on orthogonal polynomials of animal age. Residual variances were modelled through a step function with four classes. The direct and maternal additive genetic effects, and animal and maternal permanent environmental effects were included as random effects in the model. A total of seventeen analyses, considering linear, quadratic and cubic B-splines functions and up to seven knots, were carried out. B-spline functions of the same order were considered for all random effects. Random regression models on B-splines functions were compared to a random regression model on Legendre polynomials and with a multitrait model. Results from different models of analyses were compared using the REML form of the Akaike Information criterion and Schwarz' Bayesian Information criterion. In addition, the variance components and genetic parameters estimated for each random regression model were also used as criteria to choose the most adequate model to describe the covariance structure of the data. A model fitting quadratic B-splines, with four knots or three segments for direct additive genetic effect and animal permanent environmental effect and two knots for maternal additive genetic effect and maternal permanent environmental effect, was the most adequate to describe the covariance structure of the data. Random regression models using B-spline functions as base functions fitted the data better than Legendre polynomials, especially at mature ages, but higher number of parameters need to be estimated with B-splines functions. © 2010 Blackwell Verlag GmbH.
2014-01-01
Background Meta-regression is becoming increasingly used to model study level covariate effects. However this type of statistical analysis presents many difficulties and challenges. Here two methods for calculating confidence intervals for the magnitude of the residual between-study variance in random effects meta-regression models are developed. A further suggestion for calculating credible intervals using informative prior distributions for the residual between-study variance is presented. Methods Two recently proposed and, under the assumptions of the random effects model, exact methods for constructing confidence intervals for the between-study variance in random effects meta-analyses are extended to the meta-regression setting. The use of Generalised Cochran heterogeneity statistics is extended to the meta-regression setting and a Newton-Raphson procedure is developed to implement the Q profile method for meta-analysis and meta-regression. WinBUGS is used to implement informative priors for the residual between-study variance in the context of Bayesian meta-regressions. Results Results are obtained for two contrasting examples, where the first example involves a binary covariate and the second involves a continuous covariate. Intervals for the residual between-study variance are wide for both examples. Conclusions Statistical methods, and R computer software, are available to compute exact confidence intervals for the residual between-study variance under the random effects model for meta-regression. These frequentist methods are almost as easily implemented as their established counterparts for meta-analysis. Bayesian meta-regressions are also easily performed by analysts who are comfortable using WinBUGS. Estimates of the residual between-study variance in random effects meta-regressions should be routinely reported and accompanied by some measure of their uncertainty. Confidence and/or credible intervals are well-suited to this purpose. PMID:25196829
Wang, Wei; Griswold, Michael E
2016-11-30
The random effect Tobit model is a regression model that accommodates both left- and/or right-censoring and within-cluster dependence of the outcome variable. Regression coefficients of random effect Tobit models have conditional interpretations on a constructed latent dependent variable and do not provide inference of overall exposure effects on the original outcome scale. Marginalized random effects model (MREM) permits likelihood-based estimation of marginal mean parameters for the clustered data. For random effect Tobit models, we extend the MREM to marginalize over both the random effects and the normal space and boundary components of the censored response to estimate overall exposure effects at population level. We also extend the 'Average Predicted Value' method to estimate the model-predicted marginal means for each person under different exposure status in a designated reference group by integrating over the random effects and then use the calculated difference to assess the overall exposure effect. The maximum likelihood estimation is proposed utilizing a quasi-Newton optimization algorithm with Gauss-Hermite quadrature to approximate the integration of the random effects. We use these methods to carefully analyze two real datasets. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Testing homogeneity in Weibull-regression models.
Bolfarine, Heleno; Valença, Dione M
2005-10-01
In survival studies with families or geographical units it may be of interest testing whether such groups are homogeneous for given explanatory variables. In this paper we consider score type tests for group homogeneity based on a mixing model in which the group effect is modelled as a random variable. As opposed to hazard-based frailty models, this model presents survival times that conditioned on the random effect, has an accelerated failure time representation. The test statistics requires only estimation of the conventional regression model without the random effect and does not require specifying the distribution of the random effect. The tests are derived for a Weibull regression model and in the uncensored situation, a closed form is obtained for the test statistic. A simulation study is used for comparing the power of the tests. The proposed tests are applied to real data sets with censored data.
Covariance functions for body weight from birth to maturity in Nellore cows.
Boligon, A A; Mercadante, M E Z; Forni, S; Lôbo, R B; Albuquerque, L G
2010-03-01
The objective of this study was to estimate (co)variance functions using random regression models on Legendre polynomials for the analysis of repeated measures of BW from birth to adult age. A total of 82,064 records from 8,145 females were analyzed. Different models were compared. The models included additive direct and maternal effects, and animal and maternal permanent environmental effects as random terms. Contemporary group and dam age at calving (linear and quadratic effect) were included as fixed effects, and orthogonal Legendre polynomials of animal age (cubic regression) were considered as random covariables. Eight models with polynomials of third to sixth order were used to describe additive direct and maternal effects, and animal and maternal permanent environmental effects. Residual effects were modeled using 1 (i.e., assuming homogeneity of variances across all ages) or 5 age classes. The model with 5 classes was the best to describe the trajectory of residuals along the growth curve. The model including fourth- and sixth-order polynomials for additive direct and animal permanent environmental effects, respectively, and third-order polynomials for maternal genetic and maternal permanent environmental effects were the best. Estimates of (co)variance obtained with the multi-trait and random regression models were similar. Direct heritability estimates obtained with the random regression models followed a trend similar to that obtained with the multi-trait model. The largest estimates of maternal heritability were those of BW taken close to 240 d of age. In general, estimates of correlation between BW from birth to 8 yr of age decreased with increasing distance between ages.
Analyzing degradation data with a random effects spline regression model
Fugate, Michael Lynn; Hamada, Michael Scott; Weaver, Brian Phillip
2017-03-17
This study proposes using a random effects spline regression model to analyze degradation data. Spline regression avoids having to specify a parametric function for the true degradation of an item. A distribution for the spline regression coefficients captures the variation of the true degradation curves from item to item. We illustrate the proposed methodology with a real example using a Bayesian approach. The Bayesian approach allows prediction of degradation of a population over time and estimation of reliability is easy to perform.
Analyzing degradation data with a random effects spline regression model
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fugate, Michael Lynn; Hamada, Michael Scott; Weaver, Brian Phillip
This study proposes using a random effects spline regression model to analyze degradation data. Spline regression avoids having to specify a parametric function for the true degradation of an item. A distribution for the spline regression coefficients captures the variation of the true degradation curves from item to item. We illustrate the proposed methodology with a real example using a Bayesian approach. The Bayesian approach allows prediction of degradation of a population over time and estimation of reliability is easy to perform.
Bignardi, A B; El Faro, L; Cardoso, V L; Machado, P F; Albuquerque, L G
2009-09-01
The objective of the present study was to estimate milk yield genetic parameters applying random regression models and parametric correlation functions combined with a variance function to model animal permanent environmental effects. A total of 152,145 test-day milk yields from 7,317 first lactations of Holstein cows belonging to herds located in the southeastern region of Brazil were analyzed. Test-day milk yields were divided into 44 weekly classes of days in milk. Contemporary groups were defined by herd-test-day comprising a total of 2,539 classes. The model included direct additive genetic, permanent environmental, and residual random effects. The following fixed effects were considered: contemporary group, age of cow at calving (linear and quadratic regressions), and the population average lactation curve modeled by fourth-order orthogonal Legendre polynomial. Additive genetic effects were modeled by random regression on orthogonal Legendre polynomials of days in milk, whereas permanent environmental effects were estimated using a stationary or nonstationary parametric correlation function combined with a variance function of different orders. The structure of residual variances was modeled using a step function containing 6 variance classes. The genetic parameter estimates obtained with the model using a stationary correlation function associated with a variance function to model permanent environmental effects were similar to those obtained with models employing orthogonal Legendre polynomials for the same effect. A model using a sixth-order polynomial for additive effects and a stationary parametric correlation function associated with a seventh-order variance function to model permanent environmental effects would be sufficient for data fitting.
Rovadoscki, Gregori A; Petrini, Juliana; Ramirez-Diaz, Johanna; Pertile, Simone F N; Pertille, Fábio; Salvian, Mayara; Iung, Laiza H S; Rodriguez, Mary Ana P; Zampar, Aline; Gaya, Leila G; Carvalho, Rachel S B; Coelho, Antonio A D; Savino, Vicente J M; Coutinho, Luiz L; Mourão, Gerson B
2016-09-01
Repeated measures from the same individual have been analyzed by using repeatability and finite dimension models under univariate or multivariate analyses. However, in the last decade, the use of random regression models for genetic studies with longitudinal data have become more common. Thus, the aim of this research was to estimate genetic parameters for body weight of four experimental chicken lines by using univariate random regression models. Body weight data from hatching to 84 days of age (n = 34,730) from four experimental free-range chicken lines (7P, Caipirão da ESALQ, Caipirinha da ESALQ and Carijó Barbado) were used. The analysis model included the fixed effects of contemporary group (gender and rearing system), fixed regression coefficients for age at measurement, and random regression coefficients for permanent environmental effects and additive genetic effects. Heterogeneous variances for residual effects were considered, and one residual variance was assigned for each of six subclasses of age at measurement. Random regression curves were modeled by using Legendre polynomials of the second and third orders, with the best model chosen based on the Akaike Information Criterion, Bayesian Information Criterion, and restricted maximum likelihood. Multivariate analyses under the same animal mixed model were also performed for the validation of the random regression models. The Legendre polynomials of second order were better for describing the growth curves of the lines studied. Moderate to high heritabilities (h(2) = 0.15 to 0.98) were estimated for body weight between one and 84 days of age, suggesting that selection for body weight at all ages can be used as a selection criteria. Genetic correlations among body weight records obtained through multivariate analyses ranged from 0.18 to 0.96, 0.12 to 0.89, 0.06 to 0.96, and 0.28 to 0.96 in 7P, Caipirão da ESALQ, Caipirinha da ESALQ, and Carijó Barbado chicken lines, respectively. Results indicate that genetic gain for body weight can be achieved by selection. Also, selection for body weight at 42 days of age can be maintained as a selection criterion. © 2016 Poultry Science Association Inc.
Bohmanova, J; Miglior, F; Jamrozik, J; Misztal, I; Sullivan, P G
2008-09-01
A random regression model with both random and fixed regressions fitted by Legendre polynomials of order 4 was compared with 3 alternative models fitting linear splines with 4, 5, or 6 knots. The effects common for all models were a herd-test-date effect, fixed regressions on days in milk (DIM) nested within region-age-season of calving class, and random regressions for additive genetic and permanent environmental effects. Data were test-day milk, fat and protein yields, and SCS recorded from 5 to 365 DIM during the first 3 lactations of Canadian Holstein cows. A random sample of 50 herds consisting of 96,756 test-day records was generated to estimate variance components within a Bayesian framework via Gibbs sampling. Two sets of genetic evaluations were subsequently carried out to investigate performance of the 4 models. Models were compared by graphical inspection of variance functions, goodness of fit, error of prediction of breeding values, and stability of estimated breeding values. Models with splines gave lower estimates of variances at extremes of lactations than the model with Legendre polynomials. Differences among models in goodness of fit measured by percentages of squared bias, correlations between predicted and observed records, and residual variances were small. The deviance information criterion favored the spline model with 6 knots. Smaller error of prediction and higher stability of estimated breeding values were achieved by using spline models with 5 and 6 knots compared with the model with Legendre polynomials. In general, the spline model with 6 knots had the best overall performance based upon the considered model comparison criteria.
Multilevel covariance regression with correlated random effects in the mean and variance structure.
Quintero, Adrian; Lesaffre, Emmanuel
2017-09-01
Multivariate regression methods generally assume a constant covariance matrix for the observations. In case a heteroscedastic model is needed, the parametric and nonparametric covariance regression approaches can be restrictive in the literature. We propose a multilevel regression model for the mean and covariance structure, including random intercepts in both components and allowing for correlation between them. The implied conditional covariance function can be different across clusters as a result of the random effect in the variance structure. In addition, allowing for correlation between the random intercepts in the mean and covariance makes the model convenient for skewedly distributed responses. Furthermore, it permits us to analyse directly the relation between the mean response level and the variability in each cluster. Parameter estimation is carried out via Gibbs sampling. We compare the performance of our model to other covariance modelling approaches in a simulation study. Finally, the proposed model is applied to the RN4CAST dataset to identify the variables that impact burnout of nurses in Belgium. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Selapa, N W; Nephawe, K A; Maiwashe, A; Norris, D
2012-02-08
The aim of this study was to estimate genetic parameters for body weights of individually fed beef bulls measured at centralized testing stations in South Africa using random regression models. Weekly body weights of Bonsmara bulls (N = 2919) tested between 1999 and 2003 were available for the analyses. The model included a fixed regression of the body weights on fourth-order orthogonal Legendre polynomials of the actual days on test (7, 14, 21, 28, 35, 42, 49, 56, 63, 70, 77, and 84) for starting age and contemporary group effects. Random regressions on fourth-order orthogonal Legendre polynomials of the actual days on test were included for additive genetic effects and additional uncorrelated random effects of the weaning-herd-year and the permanent environment of the animal. Residual effects were assumed to be independently distributed with heterogeneous variance for each test day. Variance ratios for additive genetic, permanent environment and weaning-herd-year for weekly body weights at different test days ranged from 0.26 to 0.29, 0.37 to 0.44 and 0.26 to 0.34, respectively. The weaning-herd-year was found to have a significant effect on the variation of body weights of bulls despite a 28-day adjustment period. Genetic correlations amongst body weights at different test days were high, ranging from 0.89 to 1.00. Heritability estimates were comparable to literature using multivariate models. Therefore, random regression model could be applied in the genetic evaluation of body weight of individually fed beef bulls in South Africa.
Solving large test-day models by iteration on data and preconditioned conjugate gradient.
Lidauer, M; Strandén, I; Mäntysaari, E A; Pösö, J; Kettunen, A
1999-12-01
A preconditioned conjugate gradient method was implemented into an iteration on a program for data estimation of breeding values, and its convergence characteristics were studied. An algorithm was used as a reference in which one fixed effect was solved by Gauss-Seidel method, and other effects were solved by a second-order Jacobi method. Implementation of the preconditioned conjugate gradient required storing four vectors (size equal to number of unknowns in the mixed model equations) in random access memory and reading the data at each round of iteration. The preconditioner comprised diagonal blocks of the coefficient matrix. Comparison of algorithms was based on solutions of mixed model equations obtained by a single-trait animal model and a single-trait, random regression test-day model. Data sets for both models used milk yield records of primiparous Finnish dairy cows. Animal model data comprised 665,629 lactation milk yields and random regression test-day model data of 6,732,765 test-day milk yields. Both models included pedigree information of 1,099,622 animals. The animal model ¿random regression test-day model¿ required 122 ¿305¿ rounds of iteration to converge with the reference algorithm, but only 88 ¿149¿ were required with the preconditioned conjugate gradient. To solve the random regression test-day model with the preconditioned conjugate gradient required 237 megabytes of random access memory and took 14% of the computation time needed by the reference algorithm.
[How to fit and interpret multilevel models using SPSS].
Pardo, Antonio; Ruiz, Miguel A; San Martín, Rafael
2007-05-01
Hierarchic or multilevel models are used to analyse data when cases belong to known groups and sample units are selected both from the individual level and from the group level. In this work, the multilevel models most commonly discussed in the statistic literature are described, explaining how to fit these models using the SPSS program (any version as of the 11 th ) and how to interpret the outcomes of the analysis. Five particular models are described, fitted, and interpreted: (1) one-way analysis of variance with random effects, (2) regression analysis with means-as-outcomes, (3) one-way analysis of covariance with random effects, (4) regression analysis with random coefficients, and (5) regression analysis with means- and slopes-as-outcomes. All models are explained, trying to make them understandable to researchers in health and behaviour sciences.
Random regression models using different functions to model milk flow in dairy cows.
Laureano, M M M; Bignardi, A B; El Faro, L; Cardoso, V L; Tonhati, H; Albuquerque, L G
2014-09-12
We analyzed 75,555 test-day milk flow records from 2175 primiparous Holstein cows that calved between 1997 and 2005. Milk flow was obtained by dividing the mean milk yield (kg) of the 3 daily milking by the total milking time (min) and was expressed as kg/min. Milk flow was grouped into 43 weekly classes. The analyses were performed using a single-trait Random Regression Models that included direct additive genetic, permanent environmental, and residual random effects. In addition, the contemporary group and linear and quadratic effects of cow age at calving were included as fixed effects. Fourth-order orthogonal Legendre polynomial of days in milk was used to model the mean trend in milk flow. The additive genetic and permanent environmental covariance functions were estimated using random regression Legendre polynomials and B-spline functions of days in milk. The model using a third-order Legendre polynomial for additive genetic effects and a sixth-order polynomial for permanent environmental effects, which contained 7 residual classes, proved to be the most adequate to describe variations in milk flow, and was also the most parsimonious. The heritability in milk flow estimated by the most parsimonious model was of moderate to high magnitude.
Kim, Yoonsang; Choi, Young-Ku; Emery, Sherry
2013-08-01
Several statistical packages are capable of estimating generalized linear mixed models and these packages provide one or more of three estimation methods: penalized quasi-likelihood, Laplace, and Gauss-Hermite. Many studies have investigated these methods' performance for the mixed-effects logistic regression model. However, the authors focused on models with one or two random effects and assumed a simple covariance structure between them, which may not be realistic. When there are multiple correlated random effects in a model, the computation becomes intensive, and often an algorithm fails to converge. Moreover, in our analysis of smoking status and exposure to anti-tobacco advertisements, we have observed that when a model included multiple random effects, parameter estimates varied considerably from one statistical package to another even when using the same estimation method. This article presents a comprehensive review of the advantages and disadvantages of each estimation method. In addition, we compare the performances of the three methods across statistical packages via simulation, which involves two- and three-level logistic regression models with at least three correlated random effects. We apply our findings to a real dataset. Our results suggest that two packages-SAS GLIMMIX Laplace and SuperMix Gaussian quadrature-perform well in terms of accuracy, precision, convergence rates, and computing speed. We also discuss the strengths and weaknesses of the two packages in regard to sample sizes.
Kim, Yoonsang; Emery, Sherry
2013-01-01
Several statistical packages are capable of estimating generalized linear mixed models and these packages provide one or more of three estimation methods: penalized quasi-likelihood, Laplace, and Gauss-Hermite. Many studies have investigated these methods’ performance for the mixed-effects logistic regression model. However, the authors focused on models with one or two random effects and assumed a simple covariance structure between them, which may not be realistic. When there are multiple correlated random effects in a model, the computation becomes intensive, and often an algorithm fails to converge. Moreover, in our analysis of smoking status and exposure to anti-tobacco advertisements, we have observed that when a model included multiple random effects, parameter estimates varied considerably from one statistical package to another even when using the same estimation method. This article presents a comprehensive review of the advantages and disadvantages of each estimation method. In addition, we compare the performances of the three methods across statistical packages via simulation, which involves two- and three-level logistic regression models with at least three correlated random effects. We apply our findings to a real dataset. Our results suggest that two packages—SAS GLIMMIX Laplace and SuperMix Gaussian quadrature—perform well in terms of accuracy, precision, convergence rates, and computing speed. We also discuss the strengths and weaknesses of the two packages in regard to sample sizes. PMID:24288415
Random regression analyses using B-spline functions to model growth of Nellore cattle.
Boligon, A A; Mercadante, M E Z; Lôbo, R B; Baldi, F; Albuquerque, L G
2012-02-01
The objective of this study was to estimate (co)variance components using random regression on B-spline functions to weight records obtained from birth to adulthood. A total of 82 064 weight records of 8145 females obtained from the data bank of the Nellore Breeding Program (PMGRN/Nellore Brazil) which started in 1987, were used. The models included direct additive and maternal genetic effects and animal and maternal permanent environmental effects as random. Contemporary group and dam age at calving (linear and quadratic effect) were included as fixed effects, and orthogonal Legendre polynomials of age (cubic regression) were considered as random covariate. The random effects were modeled using B-spline functions considering linear, quadratic and cubic polynomials for each individual segment. Residual variances were grouped in five age classes. Direct additive genetic and animal permanent environmental effects were modeled using up to seven knots (six segments). A single segment with two knots at the end points of the curve was used for the estimation of maternal genetic and maternal permanent environmental effects. A total of 15 models were studied, with the number of parameters ranging from 17 to 81. The models that used B-splines were compared with multi-trait analyses with nine weight traits and to a random regression model that used orthogonal Legendre polynomials. A model fitting quadratic B-splines, with four knots or three segments for direct additive genetic effect and animal permanent environmental effect and two knots for maternal additive genetic effect and maternal permanent environmental effect, was the most appropriate and parsimonious model to describe the covariance structure of the data. Selection for higher weight, such as at young ages, should be performed taking into account an increase in mature cow weight. Particularly, this is important in most of Nellore beef cattle production systems, where the cow herd is maintained on range conditions. There is limited modification of the growth curve of Nellore cattle with respect to the aim of selecting them for rapid growth at young ages while maintaining constant adult weight.
Probability Theory Plus Noise: Descriptive Estimation and Inferential Judgment.
Costello, Fintan; Watts, Paul
2018-01-01
We describe a computational model of two central aspects of people's probabilistic reasoning: descriptive probability estimation and inferential probability judgment. This model assumes that people's reasoning follows standard frequentist probability theory, but it is subject to random noise. This random noise has a regressive effect in descriptive probability estimation, moving probability estimates away from normative probabilities and toward the center of the probability scale. This random noise has an anti-regressive effect in inferential judgement, however. These regressive and anti-regressive effects explain various reliable and systematic biases seen in people's descriptive probability estimation and inferential probability judgment. This model predicts that these contrary effects will tend to cancel out in tasks that involve both descriptive estimation and inferential judgement, leading to unbiased responses in those tasks. We test this model by applying it to one such task, described by Gallistel et al. ). Participants' median responses in this task were unbiased, agreeing with normative probability theory over the full range of responses. Our model captures the pattern of unbiased responses in this task, while simultaneously explaining systematic biases away from normatively correct probabilities seen in other tasks. Copyright © 2018 Cognitive Science Society, Inc.
David, Ingrid; Garreau, Hervé; Balmisse, Elodie; Billon, Yvon; Canario, Laurianne
2017-01-20
Some genetic studies need to take into account correlations between traits that are repeatedly measured over time. Multiple-trait random regression models are commonly used to analyze repeated traits but suffer from several major drawbacks. In the present study, we developed a multiple-trait extension of the structured antedependence model (SAD) to overcome this issue and validated its usefulness by modeling the association between litter size (LS) and average birth weight (ABW) over parities in pigs and rabbits. The single-trait SAD model assumes that a random effect at time [Formula: see text] can be explained by the previous values of the random effect (i.e. at previous times). The proposed multiple-trait extension of the SAD model consists in adding a cross-antedependence parameter to the single-trait SAD model. This model can be easily fitted using ASReml and the OWN Fortran program that we have developed. In comparison with the random regression model, we used our multiple-trait SAD model to analyze the LS and ABW of 4345 litters from 1817 Large White sows and 8706 litters from 2286 L-1777 does over a maximum of five successive parities. For both species, the multiple-trait SAD fitted the data better than the random regression model. The difference between AIC of the two models (AIC_random regression-AIC_SAD) were equal to 7 and 227 for pigs and rabbits, respectively. A similar pattern of heritability and correlation estimates was obtained for both species. Heritabilities were lower for LS (ranging from 0.09 to 0.29) than for ABW (ranging from 0.23 to 0.39). The general trend was a decrease of the genetic correlation for a given trait between more distant parities. Estimates of genetic correlations between LS and ABW were negative and ranged from -0.03 to -0.52 across parities. No correlation was observed between the permanent environmental effects, except between the permanent environmental effects of LS and ABW of the same parity, for which the estimate of the correlation was strongly negative (ranging from -0.57 to -0.67). We demonstrated that application of our multiple-trait SAD model is feasible for studying several traits with repeated measurements and showed that it provided a better fit to the data than the random regression model.
Silva, F G; Torres, R A; Brito, L F; Euclydes, R F; Melo, A L P; Souza, N O; Ribeiro, J I; Rodrigues, M T
2013-12-11
The objective of this study was to identify the best random regression model using Legendre orthogonal polynomials to evaluate Alpine goats genetically and to estimate the parameters for test day milk yield. On the test day, we analyzed 20,710 records of milk yield of 667 goats from the Goat Sector of the Universidade Federal de Viçosa. The evaluated models had combinations of distinct fitting orders for polynomials (2-5), random genetic (1-7), and permanent environmental (1-7) fixed curves and a number of classes for residual variance (2, 4, 5, and 6). WOMBAT software was used for all genetic analyses. A random regression model using the best Legendre orthogonal polynomial for genetic evaluation of milk yield on the test day of Alpine goats considered a fixed curve of order 4, curve of genetic additive effects of order 2, curve of permanent environmental effects of order 7, and a minimum of 5 classes of residual variance because it was the most economical model among those that were equivalent to the complete model by the likelihood ratio test. Phenotypic variance and heritability were higher at the end of the lactation period, indicating that the length of lactation has more genetic components in relation to the production peak and persistence. It is very important that the evaluation utilizes the best combination of fixed, genetic additive and permanent environmental regressions, and number of classes of heterogeneous residual variance for genetic evaluation using random regression models, thereby enhancing the precision and accuracy of the estimates of parameters and prediction of genetic values.
Baldi, F; Albuquerque, L G; Alencar, M M
2010-08-01
The objective of this work was to estimate covariance functions for direct and maternal genetic effects, animal and maternal permanent environmental effects, and subsequently, to derive relevant genetic parameters for growth traits in Canchim cattle. Data comprised 49,011 weight records on 2435 females from birth to adult age. The model of analysis included fixed effects of contemporary groups (year and month of birth and at weighing) and age of dam as quadratic covariable. Mean trends were taken into account by a cubic regression on orthogonal polynomials of animal age. Residual variances were allowed to vary and were modelled by a step function with 1, 4 or 11 classes based on animal's age. The model fitting four classes of residual variances was the best. A total of 12 random regression models from second to seventh order were used to model direct and maternal genetic effects, animal and maternal permanent environmental effects. The model with direct and maternal genetic effects, animal and maternal permanent environmental effects fitted by quadric, cubic, quintic and linear Legendre polynomials, respectively, was the most adequate to describe the covariance structure of the data. Estimates of direct and maternal heritability obtained by multi-trait (seven traits) and random regression models were very similar. Selection for higher weight at any age, especially after weaning, will produce an increase in mature cow weight. The possibility to modify the growth curve in Canchim cattle to obtain animals with rapid growth at early ages and moderate to low mature cow weight is limited.
Yamazaki, Takeshi; Takeda, Hisato; Hagiya, Koichi; Yamaguchi, Satoshi; Sasaki, Osamu
2018-03-13
Because lactation periods in dairy cows lengthen with increasing total milk production, it is important to predict individual productivities after 305 days in milk (DIM) to determine the optimal lactation period. We therefore examined whether the random regression (RR) coefficient from 306 to 450 DIM (M2) can be predicted from those during the first 305 DIM (M1) by using a random regression model. We analyzed test-day milk records from 85690 Holstein cows in their first lactations and 131727 cows in their later (second to fifth) lactations. Data in M1 and M2 were analyzed separately by using different single-trait RR animal models. We then performed a multiple regression analysis of the RR coefficients of M2 on those of M1 during the first and later lactations. The first-order Legendre polynomials were practical covariates of random regression for the milk yields of M2. All RR coefficients for the additive genetic (AG) effect and the intercept for the permanent environmental (PE) effect of M2 had moderate to strong correlations with the intercept for the AG effect of M1. The coefficients of determination for multiple regression of the combined intercepts for the AG and PE effects of M2 on the coefficients for the AG effect of M1 were moderate to high. The daily milk yields of M2 predicted by using the RR coefficients for the AG effect of M1 were highly correlated with those obtained by using the coefficients of M2. Milk production after 305 DIM can be predicted by using the RR coefficient estimates of the AG effect during the first 305 DIM.
Rosenblum, Michael; van der Laan, Mark J.
2010-01-01
Models, such as logistic regression and Poisson regression models, are often used to estimate treatment effects in randomized trials. These models leverage information in variables collected before randomization, in order to obtain more precise estimates of treatment effects. However, there is the danger that model misspecification will lead to bias. We show that certain easy to compute, model-based estimators are asymptotically unbiased even when the working model used is arbitrarily misspecified. Furthermore, these estimators are locally efficient. As a special case of our main result, we consider a simple Poisson working model containing only main terms; in this case, we prove the maximum likelihood estimate of the coefficient corresponding to the treatment variable is an asymptotically unbiased estimator of the marginal log rate ratio, even when the working model is arbitrarily misspecified. This is the log-linear analog of ANCOVA for linear models. Our results demonstrate one application of targeted maximum likelihood estimation. PMID:20628636
Logistic regression of family data from retrospective study designs.
Whittemore, Alice S; Halpern, Jerry
2003-11-01
We wish to study the effects of genetic and environmental factors on disease risk, using data from families ascertained because they contain multiple cases of the disease. To do so, we must account for the way participants were ascertained, and for within-family correlations in both disease occurrences and covariates. We model the joint probability distribution of the covariates of ascertained family members, given family disease occurrence and pedigree structure. We describe two such covariate models: the random effects model and the marginal model. Both models assume a logistic form for the distribution of one person's covariates that involves a vector beta of regression parameters. The components of beta in the two models have different interpretations, and they differ in magnitude when the covariates are correlated within families. We describe ascertainment assumptions needed to estimate consistently the parameters beta(RE) in the random effects model and the parameters beta(M) in the marginal model. Under the ascertainment assumptions for the random effects model, we show that conditional logistic regression (CLR) of matched family data gives a consistent estimate beta(RE) for beta(RE) and a consistent estimate for the covariance matrix of beta(RE). Under the ascertainment assumptions for the marginal model, we show that unconditional logistic regression (ULR) gives a consistent estimate for beta(M), and we give a consistent estimator for its covariance matrix. The random effects/CLR approach is simple to use and to interpret, but it can use data only from families containing both affected and unaffected members. The marginal/ULR approach uses data from all individuals, but its variance estimates require special computations. A C program to compute these variance estimates is available at http://www.stanford.edu/dept/HRP/epidemiology. We illustrate these pros and cons by application to data on the effects of parity on ovarian cancer risk in mother/daughter pairs, and use simulations to study the performance of the estimates. Copyright 2003 Wiley-Liss, Inc.
Tzavidis, Nikos; Salvati, Nicola; Schmid, Timo; Flouri, Eirini; Midouhas, Emily
2016-02-01
Multilevel modelling is a popular approach for longitudinal data analysis. Statistical models conventionally target a parameter at the centre of a distribution. However, when the distribution of the data is asymmetric, modelling other location parameters, e.g. percentiles, may be more informative. We present a new approach, M -quantile random-effects regression, for modelling multilevel data. The proposed method is used for modelling location parameters of the distribution of the strengths and difficulties questionnaire scores of children in England who participate in the Millennium Cohort Study. Quantile mixed models are also considered. The analyses offer insights to child psychologists about the differential effects of risk factors on children's outcomes.
Empirical likelihood inference in randomized clinical trials.
Zhang, Biao
2017-01-01
In individually randomized controlled trials, in addition to the primary outcome, information is often available on a number of covariates prior to randomization. This information is frequently utilized to undertake adjustment for baseline characteristics in order to increase precision of the estimation of average treatment effects; such adjustment is usually performed via covariate adjustment in outcome regression models. Although the use of covariate adjustment is widely seen as desirable for making treatment effect estimates more precise and the corresponding hypothesis tests more powerful, there are considerable concerns that objective inference in randomized clinical trials can potentially be compromised. In this paper, we study an empirical likelihood approach to covariate adjustment and propose two unbiased estimating functions that automatically decouple evaluation of average treatment effects from regression modeling of covariate-outcome relationships. The resulting empirical likelihood estimator of the average treatment effect is as efficient as the existing efficient adjusted estimators 1 when separate treatment-specific working regression models are correctly specified, yet are at least as efficient as the existing efficient adjusted estimators 1 for any given treatment-specific working regression models whether or not they coincide with the true treatment-specific covariate-outcome relationships. We present a simulation study to compare the finite sample performance of various methods along with some results on analysis of a data set from an HIV clinical trial. The simulation results indicate that the proposed empirical likelihood approach is more efficient and powerful than its competitors when the working covariate-outcome relationships by treatment status are misspecified.
Babcock, Chad; Finley, Andrew O.; Bradford, John B.; Kolka, Randall K.; Birdsey, Richard A.; Ryan, Michael G.
2015-01-01
Many studies and production inventory systems have shown the utility of coupling covariates derived from Light Detection and Ranging (LiDAR) data with forest variables measured on georeferenced inventory plots through regression models. The objective of this study was to propose and assess the use of a Bayesian hierarchical modeling framework that accommodates both residual spatial dependence and non-stationarity of model covariates through the introduction of spatial random effects. We explored this objective using four forest inventory datasets that are part of the North American Carbon Program, each comprising point-referenced measures of above-ground forest biomass and discrete LiDAR. For each dataset, we considered at least five regression model specifications of varying complexity. Models were assessed based on goodness of fit criteria and predictive performance using a 10-fold cross-validation procedure. Results showed that the addition of spatial random effects to the regression model intercept improved fit and predictive performance in the presence of substantial residual spatial dependence. Additionally, in some cases, allowing either some or all regression slope parameters to vary spatially, via the addition of spatial random effects, further improved model fit and predictive performance. In other instances, models showed improved fit but decreased predictive performance—indicating over-fitting and underscoring the need for cross-validation to assess predictive ability. The proposed Bayesian modeling framework provided access to pixel-level posterior predictive distributions that were useful for uncertainty mapping, diagnosing spatial extrapolation issues, revealing missing model covariates, and discovering locally significant parameters.
ERIC Educational Resources Information Center
Clarke, Paul; Crawford, Claire; Steele, Fiona; Vignoles, Anna
2015-01-01
The use of fixed (FE) and random effects (RE) in two-level hierarchical linear regression is discussed in the context of education research. We compare the robustness of FE models with the modelling flexibility and potential efficiency of those from RE models. We argue that the two should be seen as complementary approaches. We then compare both…
A Bayesian ridge regression analysis of congestion's impact on urban expressway safety.
Shi, Qi; Abdel-Aty, Mohamed; Lee, Jaeyoung
2016-03-01
With the rapid growth of traffic in urban areas, concerns about congestion and traffic safety have been heightened. This study leveraged both Automatic Vehicle Identification (AVI) system and Microwave Vehicle Detection System (MVDS) installed on an expressway in Central Florida to explore how congestion impacts the crash occurrence in urban areas. Multiple congestion measures from the two systems were developed. To ensure more precise estimates of the congestion's effects, the traffic data were aggregated into peak and non-peak hours. Multicollinearity among traffic parameters was examined. The results showed the presence of multicollinearity especially during peak hours. As a response, ridge regression was introduced to cope with this issue. Poisson models with uncorrelated random effects, correlated random effects, and both correlated random effects and random parameters were constructed within the Bayesian framework. It was proven that correlated random effects could significantly enhance model performance. The random parameters model has similar goodness-of-fit compared with the model with only correlated random effects. However, by accounting for the unobserved heterogeneity, more variables were found to be significantly related to crash frequency. The models indicated that congestion increased crash frequency during peak hours while during non-peak hours it was not a major crash contributing factor. Using the random parameter model, the three congestion measures were compared. It was found that all congestion indicators had similar effects while Congestion Index (CI) derived from MVDS data was a better congestion indicator for safety analysis. Also, analyses showed that the segments with higher congestion intensity could not only increase property damage only (PDO) crashes, but also more severe crashes. In addition, the issues regarding the necessity to incorporate specific congestion indicator for congestion's effects on safety and to take care of the multicollinearity between explanatory variables were also discussed. By including a specific congestion indicator, the model performance significantly improved. When comparing models with and without ridge regression, the magnitude of the coefficients was altered in the existence of multicollinearity. These conclusions suggest that the use of appropriate congestion measure and consideration of multicolilnearity among the variables would improve the models and our understanding about the effects of congestion on traffic safety. Copyright © 2015 Elsevier Ltd. All rights reserved.
A general framework for the use of logistic regression models in meta-analysis.
Simmonds, Mark C; Higgins, Julian Pt
2016-12-01
Where individual participant data are available for every randomised trial in a meta-analysis of dichotomous event outcomes, "one-stage" random-effects logistic regression models have been proposed as a way to analyse these data. Such models can also be used even when individual participant data are not available and we have only summary contingency table data. One benefit of this one-stage regression model over conventional meta-analysis methods is that it maximises the correct binomial likelihood for the data and so does not require the common assumption that effect estimates are normally distributed. A second benefit of using this model is that it may be applied, with only minor modification, in a range of meta-analytic scenarios, including meta-regression, network meta-analyses and meta-analyses of diagnostic test accuracy. This single model can potentially replace the variety of often complex methods used in these areas. This paper considers, with a range of meta-analysis examples, how random-effects logistic regression models may be used in a number of different types of meta-analyses. This one-stage approach is compared with widely used meta-analysis methods including Bayesian network meta-analysis and the bivariate and hierarchical summary receiver operating characteristic (ROC) models for meta-analyses of diagnostic test accuracy. © The Author(s) 2014.
Littlejohn, B P; Riley, D G; Welsh, T H; Randel, R D; Willard, S T; Vann, R C
2018-05-12
The objective was to estimate genetic parameters of temperament in beef cattle across an age continuum. The population consisted predominantly of Brahman-British crossbred cattle. Temperament was quantified by: 1) pen score (PS), the reaction of a calf to a single experienced evaluator on a scale of 1 to 5 (1 = calm, 5 = excitable); 2) exit velocity (EV), the rate (m/sec) at which a calf traveled 1.83 m upon exiting a squeeze chute; and 3) temperament score (TS), the numerical average of PS and EV. Covariates included days of age and proportion of Bos indicus in the calf and dam. Random regression models included the fixed effects determined from the repeated measures models, except for calf age. Likelihood ratio tests were used to determine the most appropriate random structures. In repeated measures models, the proportion of Bos indicus in the calf was positively related with each calf temperament trait (0.41 ± 0.20, 0.85 ± 0.21, and 0.57 ± 0.18 for PS, EV, and TS, respectively; P < 0.01). There was an effect of contemporary group (combinations of season, year of birth, and management group) and dam age (P < 0.001) in all models. From repeated records analyses, estimates of heritability (h2) were 0.34 ± 0.04, 0.31 ± 0.04, and 0.39 ± 0.04, while estimates of permanent environmental variance as a proportion of the phenotypic variance (c2) were 0.30 ± 0.04, 0.31 ± 0.03, and 0.34 ± 0.04 for PS, EV, and TS, respectively. Quadratic additive genetic random regressions on Legendre polynomials of age were significant for all traits. Quadratic permanent environmental random regressions were significant for PS and TS, but linear permanent environmental random regressions were significant for EV. Random regression results suggested that these components change across the age dimension of these data. There appeared to be an increasing influence of permanent environmental effects and decreasing influence of additive genetic effects corresponding to increasing calf age for EV, and to a lesser extent for TS. Inherited temperament may be overcome by accumulating environmental stimuli with increases in age, especially after weaning.
Summer School Effects in a Randomized Field Trial
ERIC Educational Resources Information Center
Zvoch, Keith; Stevens, Joseph J.
2013-01-01
This field-based randomized trial examined the effect of assignment to and participation in summer school for two moderately at-risk samples of struggling readers. Application of multiple regression models to difference scores capturing the change in summer reading fluency revealed that kindergarten students randomly assigned to summer school…
Neither fixed nor random: weighted least squares meta-regression.
Stanley, T D; Doucouliagos, Hristos
2017-03-01
Our study revisits and challenges two core conventional meta-regression estimators: the prevalent use of 'mixed-effects' or random-effects meta-regression analysis and the correction of standard errors that defines fixed-effects meta-regression analysis (FE-MRA). We show how and explain why an unrestricted weighted least squares MRA (WLS-MRA) estimator is superior to conventional random-effects (or mixed-effects) meta-regression when there is publication (or small-sample) bias that is as good as FE-MRA in all cases and better than fixed effects in most practical applications. Simulations and statistical theory show that WLS-MRA provides satisfactory estimates of meta-regression coefficients that are practically equivalent to mixed effects or random effects when there is no publication bias. When there is publication selection bias, WLS-MRA always has smaller bias than mixed effects or random effects. In practical applications, an unrestricted WLS meta-regression is likely to give practically equivalent or superior estimates to fixed-effects, random-effects, and mixed-effects meta-regression approaches. However, random-effects meta-regression remains viable and perhaps somewhat preferable if selection for statistical significance (publication bias) can be ruled out and when random, additive normal heterogeneity is known to directly affect the 'true' regression coefficient. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Grieve, Richard; Nixon, Richard; Thompson, Simon G
2010-01-01
Cost-effectiveness analyses (CEA) may be undertaken alongside cluster randomized trials (CRTs) where randomization is at the level of the cluster (for example, the hospital or primary care provider) rather than the individual. Costs (and outcomes) within clusters may be correlated so that the assumption made by standard bivariate regression models, that observations are independent, is incorrect. This study develops a flexible modeling framework to acknowledge the clustering in CEA that use CRTs. The authors extend previous Bayesian bivariate models for CEA of multicenter trials to recognize the specific form of clustering in CRTs. They develop new Bayesian hierarchical models (BHMs) that allow mean costs and outcomes, and also variances, to differ across clusters. They illustrate how each model can be applied using data from a large (1732 cases, 70 primary care providers) CRT evaluating alternative interventions for reducing postnatal depression. The analyses compare cost-effectiveness estimates from BHMs with standard bivariate regression models that ignore the data hierarchy. The BHMs show high levels of cost heterogeneity across clusters (intracluster correlation coefficient, 0.17). Compared with standard regression models, the BHMs yield substantially increased uncertainty surrounding the cost-effectiveness estimates, and altered point estimates. The authors conclude that ignoring clustering can lead to incorrect inferences. The BHMs that they present offer a flexible modeling framework that can be applied more generally to CEA that use CRTs.
Comparison of random regression test-day models for Polish Black and White cattle.
Strabel, T; Szyda, J; Ptak, E; Jamrozik, J
2005-10-01
Test-day milk yields of first-lactation Black and White cows were used to select the model for routine genetic evaluation of dairy cattle in Poland. The population of Polish Black and White cows is characterized by small herd size, low level of production, and relatively early peak of lactation. Several random regression models for first-lactation milk yield were initially compared using the "percentage of squared bias" criterion and the correlations between true and predicted breeding values. Models with random herd-test-date effects, fixed age-season and herd-year curves, and random additive genetic and permanent environmental curves (Legendre polynomials of different orders were used for all regressions) were chosen for further studies. Additional comparisons included analyses of the residuals and shapes of variance curves in days in milk. The low production level and early peak of lactation of the breed required the use of Legendre polynomials of order 5 to describe age-season lactation curves. For the other curves, Legendre polynomials of order 3 satisfactorily described daily milk yield variation. Fitting third-order polynomials for the permanent environmental effect made it possible to adequately account for heterogeneous residual variance at different stages of lactation.
Borquis, Rusbel Raul Aspilcueta; Neto, Francisco Ribeiro de Araujo; Baldi, Fernando; Hurtado-Lugo, Naudin; de Camargo, Gregório M F; Muñoz-Berrocal, Milthon; Tonhati, Humberto
2013-09-01
In this study, genetic parameters for test-day milk, fat, and protein yield were estimated for the first lactation. The data analyzed consisted of 1,433 first lactations of Murrah buffaloes, daughters of 113 sires from 12 herds in the state of São Paulo, Brazil, with calvings from 1985 to 2007. Ten-month classes of lactation days were considered for the test-day yields. The (co)variance components for the 3 traits were estimated using the regression analyses by Bayesian inference applying an animal model by Gibbs sampling. The contemporary groups were defined as herd-year-month of the test day. In the model, the random effects were additive genetic, permanent environment, and residual. The fixed effects were contemporary group and number of milkings (1 or 2), the linear and quadratic effects of the covariable age of the buffalo at calving, as well as the mean lactation curve of the population, which was modeled by orthogonal Legendre polynomials of fourth order. The random effects for the traits studied were modeled by Legendre polynomials of third and fourth order for additive genetic and permanent environment, respectively, the residual variances were modeled considering 4 residual classes. The heritability estimates for the traits were moderate (from 0.21-0.38), with higher estimates in the intermediate lactation phase. The genetic correlation estimates within and among the traits varied from 0.05 to 0.99. The results indicate that the selection for any trait test day will result in an indirect genetic gain for milk, fat, and protein yield in all periods of the lactation curve. The accuracy associated with estimated breeding values obtained using multi-trait random regression was slightly higher (around 8%) compared with single-trait random regression. This difference may be because to the greater amount of information available per animal. Copyright © 2013 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Robust, Adaptive Functional Regression in Functional Mixed Model Framework.
Zhu, Hongxiao; Brown, Philip J; Morris, Jeffrey S
2011-09-01
Functional data are increasingly encountered in scientific studies, and their high dimensionality and complexity lead to many analytical challenges. Various methods for functional data analysis have been developed, including functional response regression methods that involve regression of a functional response on univariate/multivariate predictors with nonparametrically represented functional coefficients. In existing methods, however, the functional regression can be sensitive to outlying curves and outlying regions of curves, so is not robust. In this paper, we introduce a new Bayesian method, robust functional mixed models (R-FMM), for performing robust functional regression within the general functional mixed model framework, which includes multiple continuous or categorical predictors and random effect functions accommodating potential between-function correlation induced by the experimental design. The underlying model involves a hierarchical scale mixture model for the fixed effects, random effect and residual error functions. These modeling assumptions across curves result in robust nonparametric estimators of the fixed and random effect functions which down-weight outlying curves and regions of curves, and produce statistics that can be used to flag global and local outliers. These assumptions also lead to distributions across wavelet coefficients that have outstanding sparsity and adaptive shrinkage properties, with great flexibility for the data to determine the sparsity and the heaviness of the tails. Together with the down-weighting of outliers, these within-curve properties lead to fixed and random effect function estimates that appear in our simulations to be remarkably adaptive in their ability to remove spurious features yet retain true features of the functions. We have developed general code to implement this fully Bayesian method that is automatic, requiring the user to only provide the functional data and design matrices. It is efficient enough to handle large data sets, and yields posterior samples of all model parameters that can be used to perform desired Bayesian estimation and inference. Although we present details for a specific implementation of the R-FMM using specific distributional choices in the hierarchical model, 1D functions, and wavelet transforms, the method can be applied more generally using other heavy-tailed distributions, higher dimensional functions (e.g. images), and using other invertible transformations as alternatives to wavelets.
Robust, Adaptive Functional Regression in Functional Mixed Model Framework
Zhu, Hongxiao; Brown, Philip J.; Morris, Jeffrey S.
2012-01-01
Functional data are increasingly encountered in scientific studies, and their high dimensionality and complexity lead to many analytical challenges. Various methods for functional data analysis have been developed, including functional response regression methods that involve regression of a functional response on univariate/multivariate predictors with nonparametrically represented functional coefficients. In existing methods, however, the functional regression can be sensitive to outlying curves and outlying regions of curves, so is not robust. In this paper, we introduce a new Bayesian method, robust functional mixed models (R-FMM), for performing robust functional regression within the general functional mixed model framework, which includes multiple continuous or categorical predictors and random effect functions accommodating potential between-function correlation induced by the experimental design. The underlying model involves a hierarchical scale mixture model for the fixed effects, random effect and residual error functions. These modeling assumptions across curves result in robust nonparametric estimators of the fixed and random effect functions which down-weight outlying curves and regions of curves, and produce statistics that can be used to flag global and local outliers. These assumptions also lead to distributions across wavelet coefficients that have outstanding sparsity and adaptive shrinkage properties, with great flexibility for the data to determine the sparsity and the heaviness of the tails. Together with the down-weighting of outliers, these within-curve properties lead to fixed and random effect function estimates that appear in our simulations to be remarkably adaptive in their ability to remove spurious features yet retain true features of the functions. We have developed general code to implement this fully Bayesian method that is automatic, requiring the user to only provide the functional data and design matrices. It is efficient enough to handle large data sets, and yields posterior samples of all model parameters that can be used to perform desired Bayesian estimation and inference. Although we present details for a specific implementation of the R-FMM using specific distributional choices in the hierarchical model, 1D functions, and wavelet transforms, the method can be applied more generally using other heavy-tailed distributions, higher dimensional functions (e.g. images), and using other invertible transformations as alternatives to wavelets. PMID:22308015
Li, Baoyue; Lingsma, Hester F; Steyerberg, Ewout W; Lesaffre, Emmanuel
2011-05-23
Logistic random effects models are a popular tool to analyze multilevel also called hierarchical data with a binary or ordinal outcome. Here, we aim to compare different statistical software implementations of these models. We used individual patient data from 8509 patients in 231 centers with moderate and severe Traumatic Brain Injury (TBI) enrolled in eight Randomized Controlled Trials (RCTs) and three observational studies. We fitted logistic random effects regression models with the 5-point Glasgow Outcome Scale (GOS) as outcome, both dichotomized as well as ordinal, with center and/or trial as random effects, and as covariates age, motor score, pupil reactivity or trial. We then compared the implementations of frequentist and Bayesian methods to estimate the fixed and random effects. Frequentist approaches included R (lme4), Stata (GLLAMM), SAS (GLIMMIX and NLMIXED), MLwiN ([R]IGLS) and MIXOR, Bayesian approaches included WinBUGS, MLwiN (MCMC), R package MCMCglmm and SAS experimental procedure MCMC.Three data sets (the full data set and two sub-datasets) were analysed using basically two logistic random effects models with either one random effect for the center or two random effects for center and trial. For the ordinal outcome in the full data set also a proportional odds model with a random center effect was fitted. The packages gave similar parameter estimates for both the fixed and random effects and for the binary (and ordinal) models for the main study and when based on a relatively large number of level-1 (patient level) data compared to the number of level-2 (hospital level) data. However, when based on relatively sparse data set, i.e. when the numbers of level-1 and level-2 data units were about the same, the frequentist and Bayesian approaches showed somewhat different results. The software implementations differ considerably in flexibility, computation time, and usability. There are also differences in the availability of additional tools for model evaluation, such as diagnostic plots. The experimental SAS (version 9.2) procedure MCMC appeared to be inefficient. On relatively large data sets, the different software implementations of logistic random effects regression models produced similar results. Thus, for a large data set there seems to be no explicit preference (of course if there is no preference from a philosophical point of view) for either a frequentist or Bayesian approach (if based on vague priors). The choice for a particular implementation may largely depend on the desired flexibility, and the usability of the package. For small data sets the random effects variances are difficult to estimate. In the frequentist approaches the MLE of this variance was often estimated zero with a standard error that is either zero or could not be determined, while for Bayesian methods the estimates could depend on the chosen "non-informative" prior of the variance parameter. The starting value for the variance parameter may be also critical for the convergence of the Markov chain.
Technology diffusion in hospitals: a log odds random effects regression model.
Blank, Jos L T; Valdmanis, Vivian G
2015-01-01
This study identifies the factors that affect the diffusion of hospital innovations. We apply a log odds random effects regression model on hospital micro data. We introduce the concept of clustering innovations and the application of a log odds random effects regression model to describe the diffusion of technologies. We distinguish a number of determinants, such as service, physician, and environmental, financial and organizational characteristics of the 60 Dutch hospitals in our sample. On the basis of this data set on Dutch general hospitals over the period 1995-2002, we conclude that there is a relation between a number of determinants and the diffusion of innovations underlining conclusions from earlier research. Positive effects were found on the basis of the size of the hospitals, competition and a hospital's commitment to innovation. It appears that if a policy is developed to further diffuse innovations, the external effects of demand and market competition need to be examined, which would de facto lead to an efficient use of technology. For the individual hospital, instituting an innovations office appears to be the most prudent course of action. © 2013 The Authors. International Journal of Health Planning and Management published by John Wiley & Sons, Ltd.
Estimation of genetic parameters related to eggshell strength using random regression models.
Guo, J; Ma, M; Qu, L; Shen, M; Dou, T; Wang, K
2015-01-01
This study examined the changes in eggshell strength and the genetic parameters related to this trait throughout a hen's laying life using random regression. The data were collected from a crossbred population between 2011 and 2014, where the eggshell strength was determined repeatedly for 2260 hens. Using random regression models (RRMs), several Legendre polynomials were employed to estimate the fixed, direct genetic and permanent environment effects. The residual effects were treated as independently distributed with heterogeneous variance for each test week. The direct genetic variance was included with second-order Legendre polynomials and the permanent environment with third-order Legendre polynomials. The heritability of eggshell strength ranged from 0.26 to 0.43, the repeatability ranged between 0.47 and 0.69, and the estimated genetic correlations between test weeks was high at > 0.67. The first eigenvalue of the genetic covariance matrix accounted for about 97% of the sum of all the eigenvalues. The flexibility and statistical power of RRM suggest that this model could be an effective method to improve eggshell quality and to reduce losses due to cracked eggs in a breeding plan.
Yen, A M-F; Liou, H-H; Lin, H-L; Chen, T H-H
2006-01-01
The study aimed to develop a predictive model to deal with data fraught with heterogeneity that cannot be explained by sampling variation or measured covariates. The random-effect Poisson regression model was first proposed to deal with over-dispersion for data fraught with heterogeneity after making allowance for measured covariates. Bayesian acyclic graphic model in conjunction with Markov Chain Monte Carlo (MCMC) technique was then applied to estimate the parameters of both relevant covariates and random effect. Predictive distribution was then generated to compare the predicted with the observed for the Bayesian model with and without random effect. Data from repeated measurement of episodes among 44 patients with intractable epilepsy were used as an illustration. The application of Poisson regression without taking heterogeneity into account to epilepsy data yielded a large value of heterogeneity (heterogeneity factor = 17.90, deviance = 1485, degree of freedom (df) = 83). After taking the random effect into account, the value of heterogeneity factor was greatly reduced (heterogeneity factor = 0.52, deviance = 42.5, df = 81). The Pearson chi2 for the comparison between the expected seizure frequencies and the observed ones at two and three months of the model with and without random effect were 34.27 (p = 1.00) and 1799.90 (p < 0.0001), respectively. The Bayesian acyclic model using the MCMC method was demonstrated to have great potential for disease prediction while data show over-dispersion attributed either to correlated property or to subject-to-subject variability.
Shteingart, Hanan; Loewenstein, Yonatan
2016-01-01
There is a long history of experiments in which participants are instructed to generate a long sequence of binary random numbers. The scope of this line of research has shifted over the years from identifying the basic psychological principles and/or the heuristics that lead to deviations from randomness, to one of predicting future choices. In this paper, we used generalized linear regression and the framework of Reinforcement Learning in order to address both points. In particular, we used logistic regression analysis in order to characterize the temporal sequence of participants' choices. Surprisingly, a population analysis indicated that the contribution of the most recent trial has only a weak effect on behavior, compared to more preceding trials, a result that seems irreconcilable with standard sequential effects that decay monotonously with the delay. However, when considering each participant separately, we found that the magnitudes of the sequential effect are a monotonous decreasing function of the delay, yet these individual sequential effects are largely averaged out in a population analysis because of heterogeneity. The substantial behavioral heterogeneity in this task is further demonstrated quantitatively by considering the predictive power of the model. We show that a heterogeneous model of sequential dependencies captures the structure available in random sequence generation. Finally, we show that the results of the logistic regression analysis can be interpreted in the framework of reinforcement learning, allowing us to compare the sequential effects in the random sequence generation task to those in an operant learning task. We show that in contrast to the random sequence generation task, sequential effects in operant learning are far more homogenous across the population. These results suggest that in the random sequence generation task, different participants adopt different cognitive strategies to suppress sequential dependencies when generating the "random" sequences.
Odegård, J; Klemetsdal, G; Heringstad, B
2005-04-01
Several selection criteria for reducing incidence of mastitis were developed from a random regression sire model for test-day somatic cell score (SCS). For comparison, sire transmitting abilities were also predicted based on a cross-sectional model for lactation mean SCS. Only first-crop daughters were used in genetic evaluation of SCS, and the different selection criteria were compared based on their correlation with incidence of clinical mastitis in second-crop daughters (measured as mean daughter deviations). Selection criteria were predicted based on both complete and reduced first-crop daughter groups (261 or 65 daughters per sire, respectively). For complete daughter groups, predicted transmitting abilities at around 30 d in milk showed the best predictive ability for incidence of clinical mastitis, closely followed by average predicted transmitting abilities over the entire lactation. Both of these criteria were derived from the random regression model. These selection criteria improved accuracy of selection by approximately 2% relative to a cross-sectional model. However, for reduced daughter groups, the cross-sectional model yielded increased predictive ability compared with the selection criteria based on the random regression model. This result may be explained by the cross-sectional model being more robust, i.e., less sensitive to precision of (co)variance components estimates and effects of data structure.
Gu, Yingxin; Wylie, Bruce K.; Boyte, Stephen; Picotte, Joshua J.; Howard, Danny; Smith, Kelcy; Nelson, Kurtis
2016-01-01
Regression tree models have been widely used for remote sensing-based ecosystem mapping. Improper use of the sample data (model training and testing data) may cause overfitting and underfitting effects in the model. The goal of this study is to develop an optimal sampling data usage strategy for any dataset and identify an appropriate number of rules in the regression tree model that will improve its accuracy and robustness. Landsat 8 data and Moderate-Resolution Imaging Spectroradiometer-scaled Normalized Difference Vegetation Index (NDVI) were used to develop regression tree models. A Python procedure was designed to generate random replications of model parameter options across a range of model development data sizes and rule number constraints. The mean absolute difference (MAD) between the predicted and actual NDVI (scaled NDVI, value from 0–200) and its variability across the different randomized replications were calculated to assess the accuracy and stability of the models. In our case study, a six-rule regression tree model developed from 80% of the sample data had the lowest MAD (MADtraining = 2.5 and MADtesting = 2.4), which was suggested as the optimal model. This study demonstrates how the training data and rule number selections impact model accuracy and provides important guidance for future remote-sensing-based ecosystem modeling.
Karami, K; Zerehdaran, S; Barzanooni, B; Lotfi, E
2017-12-01
1. The aim of the present study was to estimate genetic parameters for average egg weight (EW) and egg number (EN) at different ages in Japanese quail using multi-trait random regression (MTRR) models. 2. A total of 8534 records from 900 quail, hatched between 2014 and 2015, were used in the study. Average weekly egg weights and egg numbers were measured from second until sixth week of egg production. 3. Nine random regression models were compared to identify the best order of the Legendre polynomials (LP). The most optimal model was identified by the Bayesian Information Criterion. A model with second order of LP for fixed effects, second order of LP for additive genetic effects and third order of LP for permanent environmental effects (MTRR23) was found to be the best. 4. According to the MTRR23 model, direct heritability for EW increased from 0.26 in the second week to 0.53 in the sixth week of egg production, whereas the ratio of permanent environment to phenotypic variance decreased from 0.48 to 0.1. Direct heritability for EN was low, whereas the ratio of permanent environment to phenotypic variance decreased from 0.57 to 0.15 during the production period. 5. For each trait, estimated genetic correlations among weeks of egg production were high (from 0.85 to 0.98). Genetic correlations between EW and EN were low and negative for the first two weeks, but they were low and positive for the rest of the egg production period. 6. In conclusion, random regression models can be used effectively for analysing egg production traits in Japanese quail. Response to selection for increased egg weight would be higher at older ages because of its higher heritability and such a breeding program would have no negative genetic impact on egg production.
ERIC Educational Resources Information Center
Hedeker, Donald; And Others
1996-01-01
Methods are proposed and described for estimating the degree to which relations among variables vary at the individual level. As an example, M. Fishbein and I. Ajzen's theory of reasoned action is examined. This article illustrates the use of empirical Bayes methods based on a random-effects regression model to estimate individual influences…
2011-01-01
Background Logistic random effects models are a popular tool to analyze multilevel also called hierarchical data with a binary or ordinal outcome. Here, we aim to compare different statistical software implementations of these models. Methods We used individual patient data from 8509 patients in 231 centers with moderate and severe Traumatic Brain Injury (TBI) enrolled in eight Randomized Controlled Trials (RCTs) and three observational studies. We fitted logistic random effects regression models with the 5-point Glasgow Outcome Scale (GOS) as outcome, both dichotomized as well as ordinal, with center and/or trial as random effects, and as covariates age, motor score, pupil reactivity or trial. We then compared the implementations of frequentist and Bayesian methods to estimate the fixed and random effects. Frequentist approaches included R (lme4), Stata (GLLAMM), SAS (GLIMMIX and NLMIXED), MLwiN ([R]IGLS) and MIXOR, Bayesian approaches included WinBUGS, MLwiN (MCMC), R package MCMCglmm and SAS experimental procedure MCMC. Three data sets (the full data set and two sub-datasets) were analysed using basically two logistic random effects models with either one random effect for the center or two random effects for center and trial. For the ordinal outcome in the full data set also a proportional odds model with a random center effect was fitted. Results The packages gave similar parameter estimates for both the fixed and random effects and for the binary (and ordinal) models for the main study and when based on a relatively large number of level-1 (patient level) data compared to the number of level-2 (hospital level) data. However, when based on relatively sparse data set, i.e. when the numbers of level-1 and level-2 data units were about the same, the frequentist and Bayesian approaches showed somewhat different results. The software implementations differ considerably in flexibility, computation time, and usability. There are also differences in the availability of additional tools for model evaluation, such as diagnostic plots. The experimental SAS (version 9.2) procedure MCMC appeared to be inefficient. Conclusions On relatively large data sets, the different software implementations of logistic random effects regression models produced similar results. Thus, for a large data set there seems to be no explicit preference (of course if there is no preference from a philosophical point of view) for either a frequentist or Bayesian approach (if based on vague priors). The choice for a particular implementation may largely depend on the desired flexibility, and the usability of the package. For small data sets the random effects variances are difficult to estimate. In the frequentist approaches the MLE of this variance was often estimated zero with a standard error that is either zero or could not be determined, while for Bayesian methods the estimates could depend on the chosen "non-informative" prior of the variance parameter. The starting value for the variance parameter may be also critical for the convergence of the Markov chain. PMID:21605357
Random effects coefficient of determination for mixed and meta-analysis models
Demidenko, Eugene; Sargent, James; Onega, Tracy
2011-01-01
The key feature of a mixed model is the presence of random effects. We have developed a coefficient, called the random effects coefficient of determination, Rr2, that estimates the proportion of the conditional variance of the dependent variable explained by random effects. This coefficient takes values from 0 to 1 and indicates how strong the random effects are. The difference from the earlier suggested fixed effects coefficient of determination is emphasized. If Rr2 is close to 0, there is weak support for random effects in the model because the reduction of the variance of the dependent variable due to random effects is small; consequently, random effects may be ignored and the model simplifies to standard linear regression. The value of Rr2 apart from 0 indicates the evidence of the variance reduction in support of the mixed model. If random effects coefficient of determination is close to 1 the variance of random effects is very large and random effects turn into free fixed effects—the model can be estimated using the dummy variable approach. We derive explicit formulas for Rr2 in three special cases: the random intercept model, the growth curve model, and meta-analysis model. Theoretical results are illustrated with three mixed model examples: (1) travel time to the nearest cancer center for women with breast cancer in the U.S., (2) cumulative time watching alcohol related scenes in movies among young U.S. teens, as a risk factor for early drinking onset, and (3) the classic example of the meta-analysis model for combination of 13 studies on tuberculosis vaccine. PMID:23750070
Pereira, R J; Bignardi, A B; El Faro, L; Verneque, R S; Vercesi Filho, A E; Albuquerque, L G
2013-01-01
Studies investigating the use of random regression models for genetic evaluation of milk production in Zebu cattle are scarce. In this study, 59,744 test-day milk yield records from 7,810 first lactations of purebred dairy Gyr (Bos indicus) and crossbred (dairy Gyr × Holstein) cows were used to compare random regression models in which additive genetic and permanent environmental effects were modeled using orthogonal Legendre polynomials or linear spline functions. Residual variances were modeled considering 1, 5, or 10 classes of days in milk. Five classes fitted the changes in residual variances over the lactation adequately and were used for model comparison. The model that fitted linear spline functions with 6 knots provided the lowest sum of residual variances across lactation. On the other hand, according to the deviance information criterion (DIC) and bayesian information criterion (BIC), a model using third-order and fourth-order Legendre polynomials for additive genetic and permanent environmental effects, respectively, provided the best fit. However, the high rank correlation (0.998) between this model and that applying third-order Legendre polynomials for additive genetic and permanent environmental effects, indicates that, in practice, the same bulls would be selected by both models. The last model, which is less parameterized, is a parsimonious option for fitting dairy Gyr breed test-day milk yield records. Copyright © 2013 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
An INAR(1) Negative Multinomial Regression Model for Longitudinal Count Data.
ERIC Educational Resources Information Center
Bockenholt, Ulf
1999-01-01
Discusses a regression model for the analysis of longitudinal count data in a panel study by adapting an integer-valued first-order autoregressive (INAR(1)) Poisson process to represent time-dependent correlation between counts. Derives a new negative multinomial distribution by combining INAR(1) representation with a random effects approach.…
Chen, Carla Chia-Ming; Schwender, Holger; Keith, Jonathan; Nunkesser, Robin; Mengersen, Kerrie; Macrossan, Paula
2011-01-01
Due to advancements in computational ability, enhanced technology and a reduction in the price of genotyping, more data are being generated for understanding genetic associations with diseases and disorders. However, with the availability of large data sets comes the inherent challenges of new methods of statistical analysis and modeling. Considering a complex phenotype may be the effect of a combination of multiple loci, various statistical methods have been developed for identifying genetic epistasis effects. Among these methods, logic regression (LR) is an intriguing approach incorporating tree-like structures. Various methods have built on the original LR to improve different aspects of the model. In this study, we review four variations of LR, namely Logic Feature Selection, Monte Carlo Logic Regression, Genetic Programming for Association Studies, and Modified Logic Regression-Gene Expression Programming, and investigate the performance of each method using simulated and real genotype data. We contrast these with another tree-like approach, namely Random Forests, and a Bayesian logistic regression with stochastic search variable selection.
A Tutorial on Multilevel Survival Analysis: Methods, Models and Applications
Austin, Peter C.
2017-01-01
Summary Data that have a multilevel structure occur frequently across a range of disciplines, including epidemiology, health services research, public health, education and sociology. We describe three families of regression models for the analysis of multilevel survival data. First, Cox proportional hazards models with mixed effects incorporate cluster-specific random effects that modify the baseline hazard function. Second, piecewise exponential survival models partition the duration of follow-up into mutually exclusive intervals and fit a model that assumes that the hazard function is constant within each interval. This is equivalent to a Poisson regression model that incorporates the duration of exposure within each interval. By incorporating cluster-specific random effects, generalised linear mixed models can be used to analyse these data. Third, after partitioning the duration of follow-up into mutually exclusive intervals, one can use discrete time survival models that use a complementary log–log generalised linear model to model the occurrence of the outcome of interest within each interval. Random effects can be incorporated to account for within-cluster homogeneity in outcomes. We illustrate the application of these methods using data consisting of patients hospitalised with a heart attack. We illustrate the application of these methods using three statistical programming languages (R, SAS and Stata). PMID:29307954
A Tutorial on Multilevel Survival Analysis: Methods, Models and Applications.
Austin, Peter C
2017-08-01
Data that have a multilevel structure occur frequently across a range of disciplines, including epidemiology, health services research, public health, education and sociology. We describe three families of regression models for the analysis of multilevel survival data. First, Cox proportional hazards models with mixed effects incorporate cluster-specific random effects that modify the baseline hazard function. Second, piecewise exponential survival models partition the duration of follow-up into mutually exclusive intervals and fit a model that assumes that the hazard function is constant within each interval. This is equivalent to a Poisson regression model that incorporates the duration of exposure within each interval. By incorporating cluster-specific random effects, generalised linear mixed models can be used to analyse these data. Third, after partitioning the duration of follow-up into mutually exclusive intervals, one can use discrete time survival models that use a complementary log-log generalised linear model to model the occurrence of the outcome of interest within each interval. Random effects can be incorporated to account for within-cluster homogeneity in outcomes. We illustrate the application of these methods using data consisting of patients hospitalised with a heart attack. We illustrate the application of these methods using three statistical programming languages (R, SAS and Stata).
Random effects coefficient of determination for mixed and meta-analysis models.
Demidenko, Eugene; Sargent, James; Onega, Tracy
2012-01-01
The key feature of a mixed model is the presence of random effects. We have developed a coefficient, called the random effects coefficient of determination, [Formula: see text], that estimates the proportion of the conditional variance of the dependent variable explained by random effects. This coefficient takes values from 0 to 1 and indicates how strong the random effects are. The difference from the earlier suggested fixed effects coefficient of determination is emphasized. If [Formula: see text] is close to 0, there is weak support for random effects in the model because the reduction of the variance of the dependent variable due to random effects is small; consequently, random effects may be ignored and the model simplifies to standard linear regression. The value of [Formula: see text] apart from 0 indicates the evidence of the variance reduction in support of the mixed model. If random effects coefficient of determination is close to 1 the variance of random effects is very large and random effects turn into free fixed effects-the model can be estimated using the dummy variable approach. We derive explicit formulas for [Formula: see text] in three special cases: the random intercept model, the growth curve model, and meta-analysis model. Theoretical results are illustrated with three mixed model examples: (1) travel time to the nearest cancer center for women with breast cancer in the U.S., (2) cumulative time watching alcohol related scenes in movies among young U.S. teens, as a risk factor for early drinking onset, and (3) the classic example of the meta-analysis model for combination of 13 studies on tuberculosis vaccine.
Learning accurate and interpretable models based on regularized random forests regression
2014-01-01
Background Many biology related research works combine data from multiple sources in an effort to understand the underlying problems. It is important to find and interpret the most important information from these sources. Thus it will be beneficial to have an effective algorithm that can simultaneously extract decision rules and select critical features for good interpretation while preserving the prediction performance. Methods In this study, we focus on regression problems for biological data where target outcomes are continuous. In general, models constructed from linear regression approaches are relatively easy to interpret. However, many practical biological applications are nonlinear in essence where we can hardly find a direct linear relationship between input and output. Nonlinear regression techniques can reveal nonlinear relationship of data, but are generally hard for human to interpret. We propose a rule based regression algorithm that uses 1-norm regularized random forests. The proposed approach simultaneously extracts a small number of rules from generated random forests and eliminates unimportant features. Results We tested the approach on some biological data sets. The proposed approach is able to construct a significantly smaller set of regression rules using a subset of attributes while achieving prediction performance comparable to that of random forests regression. Conclusion It demonstrates high potential in aiding prediction and interpretation of nonlinear relationships of the subject being studied. PMID:25350120
Genetic parameters for stayability to consecutive calvings in Zebu cattle.
Silva, D O; Santana, M L; Ayres, D R; Menezes, G R O; Silva, L O C; Nobre, P R C; Pereira, R J
2017-12-22
Longer-lived cows tend to be more profitable and the stayability trait is a selection criterion correlated to longevity. An alternative to the traditional approach to evaluate stayability is its definition based on consecutive calvings, whose main advantage is the more accurate evaluation of young bulls. However, no study using this alternative approach has been conducted for Zebu breeds. Therefore, the objective of this study was to compare linear random regression models to fit stayability to consecutive calvings of Guzerá, Nelore and Tabapuã cows and to estimate genetic parameters for this trait in the respective breeds. Data up to the eighth calving were used. The models included the fixed effects of age at first calving and year-season of birth of the cow and the random effects of contemporary group, additive genetic, permanent environmental and residual. Random regressions were modeled by orthogonal Legendre polynomials of order 1 to 4 (2 to 5 coefficients) for contemporary group, additive genetic and permanent environmental effects. Using Deviance Information Criterion as the selection criterion, the model with 4 regression coefficients for each effect was the most adequate for the Nelore and Tabapuã breeds and the model with 5 coefficients is recommended for the Guzerá breed. For Guzerá, heritabilities ranged from 0.05 to 0.08, showing a quadratic trend with a peak between the fourth and sixth calving. For the Nelore and Tabapuã breeds, the estimates ranged from 0.03 to 0.07 and from 0.03 to 0.08, respectively, and increased with increasing calving number. The additive genetic correlations exhibited a similar trend among breeds and were higher for stayability between closer calvings. Even between more distant calvings (second v. eighth), stayability showed a moderate to high genetic correlation, which was 0.77, 0.57 and 0.79 for the Guzerá, Nelore and Tabapuã breeds, respectively. For Guzerá, when the models with 4 or 5 regression coefficients were compared, the rank correlations between predicted breeding values for the intercept were always higher than 0.99, indicating the possibility of practical application of the least parameterized model. In conclusion, the model with 4 random regression coefficients is recommended for the genetic evaluation of stayability to consecutive calvings in Zebu cattle.
Naserkheil, Masoumeh; Miraie-Ashtiani, Seyed Reza; Nejati-Javaremi, Ardeshir; Son, Jihyun; Lee, Deukhwan
2016-12-01
The objective of this study was to estimate the genetic parameters of milk protein yields in Iranian Holstein dairy cattle. A total of 1,112,082 test-day milk protein yield records of 167,269 first lactation Holstein cows, calved from 1990 to 2010, were analyzed. Estimates of the variance components, heritability, and genetic correlations for milk protein yields were obtained using a random regression test-day model. Milking times, herd, age of recording, year, and month of recording were included as fixed effects in the model. Additive genetic and permanent environmental random effects for the lactation curve were taken into account by applying orthogonal Legendre polynomials of the fourth order in the model. The lowest and highest additive genetic variances were estimated at the beginning and end of lactation, respectively. Permanent environmental variance was higher at both extremes. Residual variance was lowest at the middle of the lactation and contrarily, heritability increased during this period. Maximum heritability was found during the 12th lactation stage (0.213±0.007). Genetic, permanent, and phenotypic correlations among test-days decreased as the interval between consecutive test-days increased. A relatively large data set was used in this study; therefore, the estimated (co)variance components for random regression coefficients could be used for national genetic evaluation of dairy cattle in Iran.
Naserkheil, Masoumeh; Miraie-Ashtiani, Seyed Reza; Nejati-Javaremi, Ardeshir; Son, Jihyun; Lee, Deukhwan
2016-01-01
The objective of this study was to estimate the genetic parameters of milk protein yields in Iranian Holstein dairy cattle. A total of 1,112,082 test-day milk protein yield records of 167,269 first lactation Holstein cows, calved from 1990 to 2010, were analyzed. Estimates of the variance components, heritability, and genetic correlations for milk protein yields were obtained using a random regression test-day model. Milking times, herd, age of recording, year, and month of recording were included as fixed effects in the model. Additive genetic and permanent environmental random effects for the lactation curve were taken into account by applying orthogonal Legendre polynomials of the fourth order in the model. The lowest and highest additive genetic variances were estimated at the beginning and end of lactation, respectively. Permanent environmental variance was higher at both extremes. Residual variance was lowest at the middle of the lactation and contrarily, heritability increased during this period. Maximum heritability was found during the 12th lactation stage (0.213±0.007). Genetic, permanent, and phenotypic correlations among test-days decreased as the interval between consecutive test-days increased. A relatively large data set was used in this study; therefore, the estimated (co)variance components for random regression coefficients could be used for national genetic evaluation of dairy cattle in Iran. PMID:26954192
The Bayesian group lasso for confounded spatial data
Hefley, Trevor J.; Hooten, Mevin B.; Hanks, Ephraim M.; Russell, Robin E.; Walsh, Daniel P.
2017-01-01
Generalized linear mixed models for spatial processes are widely used in applied statistics. In many applications of the spatial generalized linear mixed model (SGLMM), the goal is to obtain inference about regression coefficients while achieving optimal predictive ability. When implementing the SGLMM, multicollinearity among covariates and the spatial random effects can make computation challenging and influence inference. We present a Bayesian group lasso prior with a single tuning parameter that can be chosen to optimize predictive ability of the SGLMM and jointly regularize the regression coefficients and spatial random effect. We implement the group lasso SGLMM using efficient Markov chain Monte Carlo (MCMC) algorithms and demonstrate how multicollinearity among covariates and the spatial random effect can be monitored as a derived quantity. To test our method, we compared several parameterizations of the SGLMM using simulated data and two examples from plant ecology and disease ecology. In all examples, problematic levels multicollinearity occurred and influenced sampling efficiency and inference. We found that the group lasso prior resulted in roughly twice the effective sample size for MCMC samples of regression coefficients and can have higher and less variable predictive accuracy based on out-of-sample data when compared to the standard SGLMM.
Nixon, R M; Duffy, S W; Fender, G R; Day, N E; Prevost, T C
2001-06-30
The Anglia menorrhagia education study tests the effectiveness of an education package for the treatment of menorrhagia given to doctors at a primary care level. General practices were randomized to receive or not receive the package. It is hoped that this intervention will reduce the proportion of women suffering from menorrhagia that are referred to hospital. Data are available on the treatment and referral of women in the practices in the education and control groups, both pre- and post-intervention. We define and demonstrate a random effects logistic regression model that includes pre-intervention data for calculating the effectiveness of the intervention. Copyright 2001 John Wiley & Sons, Ltd.
Liu, Xian; Engel, Charles C
2012-12-20
Researchers often encounter longitudinal health data characterized with three or more ordinal or nominal categories. Random-effects multinomial logit models are generally applied to account for potential lack of independence inherent in such clustered data. When parameter estimates are used to describe longitudinal processes, however, random effects, both between and within individuals, need to be retransformed for correctly predicting outcome probabilities. This study attempts to go beyond existing work by developing a retransformation method that derives longitudinal growth trajectories of unbiased health probabilities. We estimated variances of the predicted probabilities by using the delta method. Additionally, we transformed the covariates' regression coefficients on the multinomial logit function, not substantively meaningful, to the conditional effects on the predicted probabilities. The empirical illustration uses the longitudinal data from the Asset and Health Dynamics among the Oldest Old. Our analysis compared three sets of the predicted probabilities of three health states at six time points, obtained from, respectively, the retransformation method, the best linear unbiased prediction, and the fixed-effects approach. The results demonstrate that neglect of retransforming random errors in the random-effects multinomial logit model results in severely biased longitudinal trajectories of health probabilities as well as overestimated effects of covariates on the probabilities. Copyright © 2012 John Wiley & Sons, Ltd.
Regression discontinuity was a valid design for dichotomous outcomes in three randomized trials.
van Leeuwen, Nikki; Lingsma, Hester F; Mooijaart, Simon P; Nieboer, Daan; Trompet, Stella; Steyerberg, Ewout W
2018-06-01
Regression discontinuity (RD) is a quasi-experimental design that may provide valid estimates of treatment effects in case of continuous outcomes. We aimed to evaluate validity and precision in the RD design for dichotomous outcomes. We performed validation studies in three large randomized controlled trials (RCTs) (Corticosteroid Randomization After Significant Head injury [CRASH], the Global Utilization of Streptokinase and Tissue Plasminogen Activator for Occluded Coronary Arteries [GUSTO], and PROspective Study of Pravastatin in elderly individuals at risk of vascular disease [PROSPER]). To mimic the RD design, we selected patients above and below a cutoff (e.g., age 75 years) randomized to treatment and control, respectively. Adjusted logistic regression models using restricted cubic splines (RCS) and polynomials and local logistic regression models estimated the odds ratio (OR) for treatment, with 95% confidence intervals (CIs) to indicate precision. In CRASH, treatment increased mortality with OR 1.22 [95% CI 1.06-1.40] in the RCT. The RD estimates were 1.42 (0.94-2.16) and 1.13 (0.90-1.40) with RCS adjustment and local regression, respectively. In GUSTO, treatment reduced mortality (OR 0.83 [0.72-0.95]), with more extreme estimates in the RD analysis (OR 0.57 [0.35; 0.92] and 0.67 [0.51; 0.86]). In PROSPER, similar RCT and RD estimates were found, again with less precision in RD designs. We conclude that the RD design provides similar but substantially less precise treatment effect estimates compared with an RCT, with local regression being the preferred method of analysis. Copyright © 2018 Elsevier Inc. All rights reserved.
Model selection for logistic regression models
NASA Astrophysics Data System (ADS)
Duller, Christine
2012-09-01
Model selection for logistic regression models decides which of some given potential regressors have an effect and hence should be included in the final model. The second interesting question is whether a certain factor is heterogeneous among some subsets, i.e. whether the model should include a random intercept or not. In this paper these questions will be answered with classical as well as with Bayesian methods. The application show some results of recent research projects in medicine and business administration.
NASA Astrophysics Data System (ADS)
de Santana, Felipe Bachion; de Souza, André Marcelo; Poppi, Ronei Jesus
2018-02-01
This study evaluates the use of visible and near infrared spectroscopy (Vis-NIRS) combined with multivariate regression based on random forest to quantify some quality soil parameters. The parameters analyzed were soil cation exchange capacity (CEC), sum of exchange bases (SB), organic matter (OM), clay and sand present in the soils of several regions of Brazil. Current methods for evaluating these parameters are laborious, timely and require various wet analytical methods that are not adequate for use in precision agriculture, where faster and automatic responses are required. The random forest regression models were statistically better than PLS regression models for CEC, OM, clay and sand, demonstrating resistance to overfitting, attenuating the effect of outlier samples and indicating the most important variables for the model. The methodology demonstrates the potential of the Vis-NIR as an alternative for determination of CEC, SB, OM, sand and clay, making possible to develop a fast and automatic analytical procedure.
Genetic Parameter Estimates for Metabolizing Two Common Pharmaceuticals in Swine.
Howard, Jeremy T; Ashwell, Melissa S; Baynes, Ronald E; Brooks, James D; Yeatts, James L; Maltecca, Christian
2018-01-01
In livestock, the regulation of drugs used to treat livestock has received increased attention and it is currently unknown how much of the phenotypic variation in drug metabolism is due to the genetics of an animal. Therefore, the objective of the study was to determine the amount of phenotypic variation in fenbendazole and flunixin meglumine drug metabolism due to genetics. The population consisted of crossbred female and castrated male nursery pigs ( n = 198) that were sired by boars represented by four breeds. The animals were spread across nine batches. Drugs were administered intravenously and blood collected a minimum of 10 times over a 48 h period. Genetic parameters for the parent drug and metabolite concentration within each drug were estimated based on pharmacokinetics (PK) parameters or concentrations across time utilizing a random regression model. The PK parameters were estimated using a non-compartmental analysis. The PK model included fixed effects of sex and breed of sire along with random sire and batch effects. The random regression model utilized Legendre polynomials and included a fixed population concentration curve, sex, and breed of sire effects along with a random sire deviation from the population curve and batch effect. The sire effect included the intercept for all models except for the fenbendazole metabolite (i.e., intercept and slope). The mean heritability across PK parameters for the fenbendazole and flunixin meglumine parent drug (metabolite) was 0.15 (0.18) and 0.31 (0.40), respectively. For the parent drug (metabolite), the mean heritability across time was 0.27 (0.60) and 0.14 (0.44) for fenbendazole and flunixin meglumine, respectively. The errors surrounding the heritability estimates for the random regression model were smaller compared to estimates obtained from PK parameters. Across both the PK and plasma drug concentration across model, a moderate heritability was estimated. The model that utilized the plasma drug concentration across time resulted in estimates with a smaller standard error compared to models that utilized PK parameters. The current study found a low to moderate proportion of the phenotypic variation in metabolizing fenbendazole and flunixin meglumine that was explained by genetics in the current study.
Genetic Parameter Estimates for Metabolizing Two Common Pharmaceuticals in Swine
Howard, Jeremy T.; Ashwell, Melissa S.; Baynes, Ronald E.; Brooks, James D.; Yeatts, James L.; Maltecca, Christian
2018-01-01
In livestock, the regulation of drugs used to treat livestock has received increased attention and it is currently unknown how much of the phenotypic variation in drug metabolism is due to the genetics of an animal. Therefore, the objective of the study was to determine the amount of phenotypic variation in fenbendazole and flunixin meglumine drug metabolism due to genetics. The population consisted of crossbred female and castrated male nursery pigs (n = 198) that were sired by boars represented by four breeds. The animals were spread across nine batches. Drugs were administered intravenously and blood collected a minimum of 10 times over a 48 h period. Genetic parameters for the parent drug and metabolite concentration within each drug were estimated based on pharmacokinetics (PK) parameters or concentrations across time utilizing a random regression model. The PK parameters were estimated using a non-compartmental analysis. The PK model included fixed effects of sex and breed of sire along with random sire and batch effects. The random regression model utilized Legendre polynomials and included a fixed population concentration curve, sex, and breed of sire effects along with a random sire deviation from the population curve and batch effect. The sire effect included the intercept for all models except for the fenbendazole metabolite (i.e., intercept and slope). The mean heritability across PK parameters for the fenbendazole and flunixin meglumine parent drug (metabolite) was 0.15 (0.18) and 0.31 (0.40), respectively. For the parent drug (metabolite), the mean heritability across time was 0.27 (0.60) and 0.14 (0.44) for fenbendazole and flunixin meglumine, respectively. The errors surrounding the heritability estimates for the random regression model were smaller compared to estimates obtained from PK parameters. Across both the PK and plasma drug concentration across model, a moderate heritability was estimated. The model that utilized the plasma drug concentration across time resulted in estimates with a smaller standard error compared to models that utilized PK parameters. The current study found a low to moderate proportion of the phenotypic variation in metabolizing fenbendazole and flunixin meglumine that was explained by genetics in the current study. PMID:29487615
Anantha M. Prasad; Louis R. Iverson; Andy Liaw; Andy Liaw
2006-01-01
We evaluated four statistical models - Regression Tree Analysis (RTA), Bagging Trees (BT), Random Forests (RF), and Multivariate Adaptive Regression Splines (MARS) - for predictive vegetation mapping under current and future climate scenarios according to the Canadian Climate Centre global circulation model.
Comparative analysis of used car price evaluation models
NASA Astrophysics Data System (ADS)
Chen, Chuancan; Hao, Lulu; Xu, Cong
2017-05-01
An accurate used car price evaluation is a catalyst for the healthy development of used car market. Data mining has been applied to predict used car price in several articles. However, little is studied on the comparison of using different algorithms in used car price estimation. This paper collects more than 100,000 used car dealing records throughout China to do empirical analysis on a thorough comparison of two algorithms: linear regression and random forest. These two algorithms are used to predict used car price in three different models: model for a certain car make, model for a certain car series and universal model. Results show that random forest has a stable but not ideal effect in price evaluation model for a certain car make, but it shows great advantage in the universal model compared with linear regression. This indicates that random forest is an optimal algorithm when handling complex models with a large number of variables and samples, yet it shows no obvious advantage when coping with simple models with less variables.
Zhang, J; Feng, J-Y; Ni, Y-L; Wen, Y-J; Niu, Y; Tamba, C L; Yue, C; Song, Q; Zhang, Y-M
2017-06-01
Multilocus genome-wide association studies (GWAS) have become the state-of-the-art procedure to identify quantitative trait nucleotides (QTNs) associated with complex traits. However, implementation of multilocus model in GWAS is still difficult. In this study, we integrated least angle regression with empirical Bayes to perform multilocus GWAS under polygenic background control. We used an algorithm of model transformation that whitened the covariance matrix of the polygenic matrix K and environmental noise. Markers on one chromosome were included simultaneously in a multilocus model and least angle regression was used to select the most potentially associated single-nucleotide polymorphisms (SNPs), whereas the markers on the other chromosomes were used to calculate kinship matrix as polygenic background control. The selected SNPs in multilocus model were further detected for their association with the trait by empirical Bayes and likelihood ratio test. We herein refer to this method as the pLARmEB (polygenic-background-control-based least angle regression plus empirical Bayes). Results from simulation studies showed that pLARmEB was more powerful in QTN detection and more accurate in QTN effect estimation, had less false positive rate and required less computing time than Bayesian hierarchical generalized linear model, efficient mixed model association (EMMA) and least angle regression plus empirical Bayes. pLARmEB, multilocus random-SNP-effect mixed linear model and fast multilocus random-SNP-effect EMMA methods had almost equal power of QTN detection in simulation experiments. However, only pLARmEB identified 48 previously reported genes for 7 flowering time-related traits in Arabidopsis thaliana.
Impact of Contextual Factors on Prostate Cancer Risk and Outcomes
2013-07-01
framework with random effects (“frailty models”) while the case-control analyses (Aim 4) will use multilevel unconditional logistic regression models...contextual-level SES on prostate cancer risk within racial/ethnic groups. The survival analyses (Aims 1-3) will utilize a proportional hazards regression
NASA Technical Reports Server (NTRS)
Tomberlin, T. J.
1985-01-01
Research studies of residents' responses to noise consist of interviews with samples of individuals who are drawn from a number of different compact study areas. The statistical techniques developed provide a basis for those sample design decisions. These techniques are suitable for a wide range of sample survey applications. A sample may consist of a random sample of residents selected from a sample of compact study areas, or in a more complex design, of a sample of residents selected from a sample of larger areas (e.g., cities). The techniques may be applied to estimates of the effects on annoyance of noise level, numbers of noise events, the time-of-day of the events, ambient noise levels, or other factors. Methods are provided for determining, in advance, how accurately these effects can be estimated for different sample sizes and study designs. Using a simple cost function, they also provide for optimum allocation of the sample across the stages of the design for estimating these effects. These techniques are developed via a regression model in which the regression coefficients are assumed to be random, with components of variance associated with the various stages of a multi-stage sample design.
Logistic regression for dichotomized counts.
Preisser, John S; Das, Kalyan; Benecha, Habtamu; Stamm, John W
2016-12-01
Sometimes there is interest in a dichotomized outcome indicating whether a count variable is positive or zero. Under this scenario, the application of ordinary logistic regression may result in efficiency loss, which is quantifiable under an assumed model for the counts. In such situations, a shared-parameter hurdle model is investigated for more efficient estimation of regression parameters relating to overall effects of covariates on the dichotomous outcome, while handling count data with many zeroes. One model part provides a logistic regression containing marginal log odds ratio effects of primary interest, while an ancillary model part describes the mean count of a Poisson or negative binomial process in terms of nuisance regression parameters. Asymptotic efficiency of the logistic model parameter estimators of the two-part models is evaluated with respect to ordinary logistic regression. Simulations are used to assess the properties of the models with respect to power and Type I error, the latter investigated under both misspecified and correctly specified models. The methods are applied to data from a randomized clinical trial of three toothpaste formulations to prevent incident dental caries in a large population of Scottish schoolchildren. © The Author(s) 2014.
Finley, Andrew O.; Banerjee, Sudipto; Cook, Bruce D.; Bradford, John B.
2013-01-01
In this paper we detail a multivariate spatial regression model that couples LiDAR, hyperspectral and forest inventory data to predict forest outcome variables at a high spatial resolution. The proposed model is used to analyze forest inventory data collected on the US Forest Service Penobscot Experimental Forest (PEF), ME, USA. In addition to helping meet the regression model's assumptions, results from the PEF analysis suggest that the addition of multivariate spatial random effects improves model fit and predictive ability, compared with two commonly applied modeling approaches. This improvement results from explicitly modeling the covariation among forest outcome variables and spatial dependence among observations through the random effects. Direct application of such multivariate models to even moderately large datasets is often computationally infeasible because of cubic order matrix algorithms involved in estimation. We apply a spatial dimension reduction technique to help overcome this computational hurdle without sacrificing richness in modeling.
Genetic analyses of stillbirth in relation to litter size using random regression models.
Chen, C Y; Misztal, I; Tsuruta, S; Herring, W O; Holl, J; Culbertson, M
2010-12-01
Estimates of genetic parameters for number of stillborns (NSB) in relation to litter size (LS) were obtained with random regression models (RRM). Data were collected from 4 purebred Duroc nucleus farms between 2004 and 2008. Two data sets with 6,575 litters for the first parity (P1) and 6,259 litters for the second to fifth parity (P2-5) with a total of 8,217 and 5,066 animals in the pedigree were analyzed separately. Number of stillborns was studied as a trait on sow level. Fixed effects were contemporary groups (farm-year-season) and fixed cubic regression coefficients on LS with Legendre polynomials. Models for P2-5 included the fixed effect of parity. Random effects were additive genetic effects for both data sets with permanent environmental effects included for P2-5. Random effects modeled with Legendre polynomials (RRM-L), linear splines (RRM-S), and degree 0 B-splines (RRM-BS) with regressions on LS were used. For P1, the order of polynomial, the number of knots, and the number of intervals used for respective models were quadratic, 3, and 3, respectively. For P2-5, the same parameters were linear, 2, and 2, respectively. Heterogeneous residual variances were considered in the models. For P1, estimates of heritability were 12 to 15%, 5 to 6%, and 6 to 7% in LS 5, 9, and 13, respectively. For P2-5, estimates were 15 to 17%, 4 to 5%, and 4 to 6% in LS 6, 9, and 12, respectively. For P1, average estimates of genetic correlations between LS 5 to 9, 5 to 13, and 9 to 13 were 0.53, -0.29, and 0.65, respectively. For P2-5, same estimates averaged for RRM-L and RRM-S were 0.75, -0.21, and 0.50, respectively. For RRM-BS with 2 intervals, the correlation was 0.66 between LS 5 to 7 and 8 to 13. Parameters obtained by 3 RRM revealed the nonlinear relationship between additive genetic effect of NSB and the environmental deviation of LS. The negative correlations between the 2 extreme LS might possibly indicate different genetic bases on incidence of stillbirth.
Maas, Iris L; Nolte, Sandra; Walter, Otto B; Berger, Thomas; Hautzinger, Martin; Hohagen, Fritz; Lutz, Wolfgang; Meyer, Björn; Schröder, Johanna; Späth, Christina; Klein, Jan Philipp; Moritz, Steffen; Rose, Matthias
2017-02-01
To compare treatment effect estimates obtained from a regression discontinuity (RD) design with results from an actual randomized controlled trial (RCT). Data from an RCT (EVIDENT), which studied the effect of an Internet intervention on depressive symptoms measured with the Patient Health Questionnaire (PHQ-9), were used to perform an RD analysis, in which treatment allocation was determined by a cutoff value at baseline (PHQ-9 = 10). A linear regression model was fitted to the data, selecting participants above the cutoff who had received the intervention (n = 317) and control participants below the cutoff (n = 187). Outcome was PHQ-9 sum score 12 weeks after baseline. Robustness of the effect estimate was studied; the estimate was compared with the RCT treatment effect. The final regression model showed a regression coefficient of -2.29 [95% confidence interval (CI): -3.72 to -.85] compared with a treatment effect found in the RCT of -1.57 (95% CI: -2.07 to -1.07). Although the estimates obtained from two designs are not equal, their confidence intervals overlap, suggesting that an RD design can be a valid alternative for RCTs. This finding is particularly important for situations where an RCT may not be feasible or ethical as is often the case in clinical research settings. Copyright © 2016 Elsevier Inc. All rights reserved.
Approximating prediction uncertainty for random forest regression models
John W. Coulston; Christine E. Blinn; Valerie A. Thomas; Randolph H. Wynne
2016-01-01
Machine learning approaches such as random forest have increased for the spatial modeling and mapping of continuous variables. Random forest is a non-parametric ensemble approach, and unlike traditional regression approaches there is no direct quantification of prediction error. Understanding prediction uncertainty is important when using model-based continuous maps as...
MIXREG: a computer program for mixed-effects regression analysis with autocorrelated errors.
Hedeker, D; Gibbons, R D
1996-05-01
MIXREG is a program that provides estimates for a mixed-effects regression model (MRM) for normally-distributed response data including autocorrelated errors. This model can be used for analysis of unbalanced longitudinal data, where individuals may be measured at a different number of timepoints, or even at different timepoints. Autocorrelated errors of a general form or following an AR(1), MA(1), or ARMA(1,1) form are allowable. This model can also be used for analysis of clustered data, where the mixed-effects model assumes data within clusters are dependent. The degree of dependency is estimated jointly with estimates of the usual model parameters, thus adjusting for clustering. MIXREG uses maximum marginal likelihood estimation, utilizing both the EM algorithm and a Fisher-scoring solution. For the scoring solution, the covariance matrix of the random effects is expressed in its Gaussian decomposition, and the diagonal matrix reparameterized using the exponential transformation. Estimation of the individual random effects is accomplished using an empirical Bayes approach. Examples illustrating usage and features of MIXREG are provided.
NASA Astrophysics Data System (ADS)
Sulistianingsih, E.; Kiftiah, M.; Rosadi, D.; Wahyuni, H.
2017-04-01
Gross Domestic Product (GDP) is an indicator of economic growth in a region. GDP is a panel data, which consists of cross-section and time series data. Meanwhile, panel regression is a tool which can be utilised to analyse panel data. There are three models in panel regression, namely Common Effect Model (CEM), Fixed Effect Model (FEM) and Random Effect Model (REM). The models will be chosen based on results of Chow Test, Hausman Test and Lagrange Multiplier Test. This research analyses palm oil about production, export, and government consumption to five district GDP are in West Kalimantan, namely Sanggau, Sintang, Sambas, Ketapang and Bengkayang by panel regression. Based on the results of analyses, it concluded that REM, which adjusted-determination-coefficient is 0,823, is the best model in this case. Also, according to the result, only Export and Government Consumption that influence GDP of the districts.
Extension of the Haseman-Elston regression model to longitudinal data.
Won, Sungho; Elston, Robert C; Park, Taesung
2006-01-01
We propose an extension to longitudinal data of the Haseman and Elston regression method for linkage analysis. The proposed model is a mixed model having several random effects. As response variable, we investigate the sibship sample mean corrected cross-product (smHE) and the BLUP-mean corrected cross product (pmHE), comparing them with the original squared difference (oHE), the overall mean corrected cross-product (rHE), and the weighted average of the squared difference and the squared mean-corrected sum (wHE). The proposed model allows for the correlation structure of longitudinal data. Also, the model can test for gene x time interaction to discover genetic variation over time. The model was applied in an analysis of the Genetic Analysis Workshop 13 (GAW13) simulated dataset for a quantitative trait simulating systolic blood pressure. Independence models did not preserve the test sizes, while the mixed models with both family and sibpair random effects tended to preserve size well. Copyright 2006 S. Karger AG, Basel.
Heritability estimations for inner muscular fat in Hereford cattle using random regressions
USDA-ARS?s Scientific Manuscript database
Random regressions make possible to make genetic predictions and parameters estimation across a gradient of environments, allowing a more accurate and beneficial use of animals as breeders in specific environments. The objective of this study was to use random regression models to estimate heritabil...
Effect of folic acid on appetite in children: ordinal logistic and fuzzy logistic regressions.
Namdari, Mahshid; Abadi, Alireza; Taheri, S Mahmoud; Rezaei, Mansour; Kalantari, Naser; Omidvar, Nasrin
2014-03-01
Reduced appetite and low food intake are often a concern in preschool children, since it can lead to malnutrition, a leading cause of impaired growth and mortality in childhood. It is occasionally considered that folic acid has a positive effect on appetite enhancement and consequently growth in children. The aim of this study was to assess the effect of folic acid on the appetite of preschool children 3 to 6 y old. The study sample included 127 children ages 3 to 6 who were randomly selected from 20 preschools in the city of Tehran in 2011. Since appetite was measured by linguistic terms, a fuzzy logistic regression was applied for modeling. The obtained results were compared with a statistical ordinal logistic model. After controlling for the potential confounders, in a statistical ordinal logistic model, serum folate showed a significantly positive effect on appetite. A small but positive effect of folate was detected by fuzzy logistic regression. Based on fuzzy regression, the risk for poor appetite in preschool children was related to the employment status of their mothers. In this study, a positive association was detected between the levels of serum folate and improved appetite. For further investigation, a randomized controlled, double-blind clinical trial could be helpful to address causality. Copyright © 2014 Elsevier Inc. All rights reserved.
Fenlon, Caroline; O'Grady, Luke; Butler, Stephen; Doherty, Michael L; Dunnion, John
2017-01-01
Herd fertility in pasture-based dairy farms is a key driver of farm economics. Models for predicting nulliparous reproductive outcomes are rare, but age, genetics, weight, and BCS have been identified as factors influencing heifer conception. The aim of this study was to create a simulation model of heifer conception to service with thorough evaluation. Artificial Insemination service records from two research herds and ten commercial herds were provided to build and evaluate the models. All were managed as spring-calving pasture-based systems. The factors studied were related to age, genetics, and time of service. The data were split into training and testing sets and bootstrapping was used to train the models. Logistic regression (with and without random effects) and generalised additive modelling were selected as the model-building techniques. Two types of evaluation were used to test the predictive ability of the models: discrimination and calibration. Discrimination, which includes sensitivity, specificity, accuracy and ROC analysis, measures a model's ability to distinguish between positive and negative outcomes. Calibration measures the accuracy of the predicted probabilities with the Hosmer-Lemeshow goodness-of-fit, calibration plot and calibration error. After data cleaning and the removal of services with missing values, 1396 services remained to train the models and 597 were left for testing. Age, breed, genetic predicted transmitting ability for calving interval, month and year were significant in the multivariate models. The regression models also included an interaction between age and month. Year within herd was a random effect in the mixed regression model. Overall prediction accuracy was between 77.1% and 78.9%. All three models had very high sensitivity, but low specificity. The two regression models were very well-calibrated. The mean absolute calibration errors were all below 4%. Because the models were not adept at identifying unsuccessful services, they are not suggested for use in predicting the outcome of individual heifer services. Instead, they are useful for the comparison of services with different covariate values or as sub-models in whole-farm simulations. The mixed regression model was identified as the best model for prediction, as the random effects can be ignored and the other variables can be easily obtained or simulated.
Development of a Random Field Model for Gas Plume Detection in Multiple LWIR Images.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Heasler, Patrick G.
This report develops a random field model that describes gas plumes in LWIR remote sensing images. The random field model serves as a prior distribution that can be combined with LWIR data to produce a posterior that determines the probability that a gas plume exists in the scene and also maps the most probable location of any plume. The random field model is intended to work with a single pixel regression estimator--a regression model that estimates gas concentration on an individual pixel basis.
Cappella, Elise; Hamre, Bridget K; Kim, Ha Yeon; Henry, David B; Frazier, Stacy L; Atkins, Marc S; Schoenwald, Sonja K
2012-08-01
To examine effects of a teacher consultation and coaching program delivered by school and community mental health professionals on change in observed classroom interactions and child functioning across one school year. Thirty-six classrooms within 5 urban elementary schools (87% Latino, 11% Black) were randomly assigned to intervention (training + consultation/coaching) and control (training only) conditions. Classroom and child outcomes (n = 364; 43% girls) were assessed in the fall and spring. Random effects regression models showed main effects of intervention on teacher-student relationship closeness, academic self-concept, and peer victimization. Results of multiple regression models showed levels of observed teacher emotional support in the fall moderated intervention impact on emotional support at the end of the school year. Results suggest teacher consultation and coaching can be integrated within existing mental health activities in urban schools and impact classroom effectiveness and child adaptation across multiple domains. © 2012 American Psychological Association
Grajeda, Laura M; Ivanescu, Andrada; Saito, Mayuko; Crainiceanu, Ciprian; Jaganath, Devan; Gilman, Robert H; Crabtree, Jean E; Kelleher, Dermott; Cabrera, Lilia; Cama, Vitaliano; Checkley, William
2016-01-01
Childhood growth is a cornerstone of pediatric research. Statistical models need to consider individual trajectories to adequately describe growth outcomes. Specifically, well-defined longitudinal models are essential to characterize both population and subject-specific growth. Linear mixed-effect models with cubic regression splines can account for the nonlinearity of growth curves and provide reasonable estimators of population and subject-specific growth, velocity and acceleration. We provide a stepwise approach that builds from simple to complex models, and account for the intrinsic complexity of the data. We start with standard cubic splines regression models and build up to a model that includes subject-specific random intercepts and slopes and residual autocorrelation. We then compared cubic regression splines vis-à-vis linear piecewise splines, and with varying number of knots and positions. Statistical code is provided to ensure reproducibility and improve dissemination of methods. Models are applied to longitudinal height measurements in a cohort of 215 Peruvian children followed from birth until their fourth year of life. Unexplained variability, as measured by the variance of the regression model, was reduced from 7.34 when using ordinary least squares to 0.81 (p < 0.001) when using a linear mixed-effect models with random slopes and a first order continuous autoregressive error term. There was substantial heterogeneity in both the intercept (p < 0.001) and slopes (p < 0.001) of the individual growth trajectories. We also identified important serial correlation within the structure of the data (ρ = 0.66; 95 % CI 0.64 to 0.68; p < 0.001), which we modeled with a first order continuous autoregressive error term as evidenced by the variogram of the residuals and by a lack of association among residuals. The final model provides a parametric linear regression equation for both estimation and prediction of population- and individual-level growth in height. We show that cubic regression splines are superior to linear regression splines for the case of a small number of knots in both estimation and prediction with the full linear mixed effect model (AIC 19,352 vs. 19,598, respectively). While the regression parameters are more complex to interpret in the former, we argue that inference for any problem depends more on the estimated curve or differences in curves rather than the coefficients. Moreover, use of cubic regression splines provides biological meaningful growth velocity and acceleration curves despite increased complexity in coefficient interpretation. Through this stepwise approach, we provide a set of tools to model longitudinal childhood data for non-statisticians using linear mixed-effect models.
2018-01-01
Objective The objective of this study was to estimate genetic parameters of milk, fat, and protein yields within and across lactations in Tunisian Holsteins using a random regression test-day (TD) model. Methods A random regression multiple trait multiple lactation TD model was used to estimate genetic parameters in the Tunisian dairy cattle population. Data were TD yields of milk, fat, and protein from the first three lactations. Random regressions were modeled with third-order Legendre polynomials for the additive genetic, and permanent environment effects. Heritabilities, and genetic correlations were estimated by Bayesian techniques using the Gibbs sampler. Results All variance components tended to be high in the beginning and the end of lactations. Additive genetic variances for milk, fat, and protein yields were the lowest and were the least variable compared to permanent variances. Heritability values tended to increase with parity. Estimates of heritabilities for 305-d yield-traits were low to moderate, 0.14 to 0.2, 0.12 to 0.17, and 0.13 to 0.18 for milk, fat, and protein yields, respectively. Within-parity, genetic correlations among traits were up to 0.74. Genetic correlations among lactations for the yield traits were relatively high and ranged from 0.78±0.01 to 0.82±0.03, between the first and second parities, from 0.73±0.03 to 0.8±0.04 between the first and third parities, and from 0.82±0.02 to 0.84±0.04 between the second and third parities. Conclusion These results are comparable to previously reported estimates on the same population, indicating that the adoption of a random regression TD model as the official genetic evaluation for production traits in Tunisia, as developed by most Interbull countries, is possible in the Tunisian Holsteins. PMID:28823122
Ben Zaabza, Hafedh; Ben Gara, Abderrahmen; Rekik, Boulbaba
2018-05-01
The objective of this study was to estimate genetic parameters of milk, fat, and protein yields within and across lactations in Tunisian Holsteins using a random regression test-day (TD) model. A random regression multiple trait multiple lactation TD model was used to estimate genetic parameters in the Tunisian dairy cattle population. Data were TD yields of milk, fat, and protein from the first three lactations. Random regressions were modeled with third-order Legendre polynomials for the additive genetic, and permanent environment effects. Heritabilities, and genetic correlations were estimated by Bayesian techniques using the Gibbs sampler. All variance components tended to be high in the beginning and the end of lactations. Additive genetic variances for milk, fat, and protein yields were the lowest and were the least variable compared to permanent variances. Heritability values tended to increase with parity. Estimates of heritabilities for 305-d yield-traits were low to moderate, 0.14 to 0.2, 0.12 to 0.17, and 0.13 to 0.18 for milk, fat, and protein yields, respectively. Within-parity, genetic correlations among traits were up to 0.74. Genetic correlations among lactations for the yield traits were relatively high and ranged from 0.78±0.01 to 0.82±0.03, between the first and second parities, from 0.73±0.03 to 0.8±0.04 between the first and third parities, and from 0.82±0.02 to 0.84±0.04 between the second and third parities. These results are comparable to previously reported estimates on the same population, indicating that the adoption of a random regression TD model as the official genetic evaluation for production traits in Tunisia, as developed by most Interbull countries, is possible in the Tunisian Holsteins.
Hsieh, Chung-Ho; Lu, Ruey-Hwa; Lee, Nai-Hsin; Chiu, Wen-Ta; Hsu, Min-Huei; Li, Yu-Chuan Jack
2011-01-01
Diagnosing acute appendicitis clinically is still difficult. We developed random forests, support vector machines, and artificial neural network models to diagnose acute appendicitis. Between January 2006 and December 2008, patients who had a consultation session with surgeons for suspected acute appendicitis were enrolled. Seventy-five percent of the data set was used to construct models including random forest, support vector machines, artificial neural networks, and logistic regression. Twenty-five percent of the data set was withheld to evaluate model performance. The area under the receiver operating characteristic curve (AUC) was used to evaluate performance, which was compared with that of the Alvarado score. Data from a total of 180 patients were collected, 135 used for training and 45 for testing. The mean age of patients was 39.4 years (range, 16-85). Final diagnosis revealed 115 patients with and 65 without appendicitis. The AUC of random forest, support vector machines, artificial neural networks, logistic regression, and Alvarado was 0.98, 0.96, 0.91, 0.87, and 0.77, respectively. The sensitivity, specificity, positive, and negative predictive values of random forest were 94%, 100%, 100%, and 87%, respectively. Random forest performed better than artificial neural networks, logistic regression, and Alvarado. We demonstrated that random forest can predict acute appendicitis with good accuracy and, deployed appropriately, can be an effective tool in clinical decision making. Copyright © 2011 Mosby, Inc. All rights reserved.
Hanks, Ephraim M.; Schliep, Erin M.; Hooten, Mevin B.; Hoeting, Jennifer A.
2015-01-01
In spatial generalized linear mixed models (SGLMMs), covariates that are spatially smooth are often collinear with spatially smooth random effects. This phenomenon is known as spatial confounding and has been studied primarily in the case where the spatial support of the process being studied is discrete (e.g., areal spatial data). In this case, the most common approach suggested is restricted spatial regression (RSR) in which the spatial random effects are constrained to be orthogonal to the fixed effects. We consider spatial confounding and RSR in the geostatistical (continuous spatial support) setting. We show that RSR provides computational benefits relative to the confounded SGLMM, but that Bayesian credible intervals under RSR can be inappropriately narrow under model misspecification. We propose a posterior predictive approach to alleviating this potential problem and discuss the appropriateness of RSR in a variety of situations. We illustrate RSR and SGLMM approaches through simulation studies and an analysis of malaria frequencies in The Gambia, Africa.
Burkin, M M; Molchanova, E V
To assess an impact of indicators of social stress on demographic processes in regions of the Russian Federation using statistical methods. The data of Rosstat «Regions of Russia» and «Health care in Russia» were used as information base. Indicators of about 80 subjects of the Russian Federation (without autonomous areas) for the ten-year period (2005-2014) have been created in the form of the database consisting of the following blocks: medico-demographic situation, level of economic development of the territory and wellbeing of the population, development of social infrastructure, ecological and climatic conditions, scientific researches and innovations. In total, there were about 70 indicators. Panel data for 80 regions of Russia in 10 years, which combine both indicators of spatial type (cross-section data), and information on temporary ranks (time-series data), were used. Various models of regression according to the panel data have been realized: the integrated model of regression (pooled model), regression model with the fixed effects (fixed effect model), regression model with random effects (random effect model). Main demographic indicators (life expectancy, birth rate, mortality from the external reasons) are to a great extent connected with socio-economic factors. Social tension (social stress) caused by transition to market economy plays an important role. The integral assessment of the impact of the average per capita monetary income, incidence of alcoholism and alcoholic psychoses, criminality, sales volume of alcoholic beverages per capita and marriage relations on demographic indicators is presented. Results of modeling allow to define the priority directions in the field of development of mental health and psychotherapeutic services in the regions of the Russian Federation.
Genetic analysis of partial egg production records in Japanese quail using random regression models.
Abou Khadiga, G; Mahmoud, B Y F; Farahat, G S; Emam, A M; El-Full, E A
2017-08-01
The main objectives of this study were to detect the most appropriate random regression model (RRM) to fit the data of monthly egg production in 2 lines (selected and control) of Japanese quail and to test the consistency of different criteria of model choice. Data from 1,200 female Japanese quails for the first 5 months of egg production from 4 consecutive generations of an egg line selected for egg production in the first month (EP1) was analyzed. Eight RRMs with different orders of Legendre polynomials were compared to determine the proper model for analysis. All criteria of model choice suggested that the adequate model included the second-order Legendre polynomials for fixed effects, and the third-order for additive genetic effects and permanent environmental effects. Predictive ability of the best model was the highest among all models (ρ = 0.987). According to the best model fitted to the data, estimates of heritability were relatively low to moderate (0.10 to 0.17) showed a descending pattern from the first to the fifth month of production. A similar pattern was observed for permanent environmental effects with greater estimates in the first (0.36) and second (0.23) months of production than heritability estimates. Genetic correlations between separate production periods were higher (0.18 to 0.93) than their phenotypic counterparts (0.15 to 0.87). The superiority of the selected line over the control was observed through significant (P < 0.05) linear contrast estimates. Significant (P < 0.05) estimates of covariate effect (age at sexual maturity) showed a decreased pattern with greater impact on egg production in earlier ages (first and second months) than later ones. A methodology based on random regression animal models can be recommended for genetic evaluation of egg production in Japanese quail. © 2017 Poultry Science Association Inc.
A Bayesian Nonparametric Causal Model for Regression Discontinuity Designs
ERIC Educational Resources Information Center
Karabatsos, George; Walker, Stephen G.
2013-01-01
The regression discontinuity (RD) design (Thistlewaite & Campbell, 1960; Cook, 2008) provides a framework to identify and estimate causal effects from a non-randomized design. Each subject of a RD design is assigned to the treatment (versus assignment to a non-treatment) whenever her/his observed value of the assignment variable equals or…
Jackson, Dan; White, Ian R; Riley, Richard D
2013-01-01
Multivariate meta-analysis is becoming more commonly used. Methods for fitting the multivariate random effects model include maximum likelihood, restricted maximum likelihood, Bayesian estimation and multivariate generalisations of the standard univariate method of moments. Here, we provide a new multivariate method of moments for estimating the between-study covariance matrix with the properties that (1) it allows for either complete or incomplete outcomes and (2) it allows for covariates through meta-regression. Further, for complete data, it is invariant to linear transformations. Our method reduces to the usual univariate method of moments, proposed by DerSimonian and Laird, in a single dimension. We illustrate our method and compare it with some of the alternatives using a simulation study and a real example. PMID:23401213
Brügemann, K; Gernand, E; von Borstel, U U; König, S
2011-08-01
Data used in the present study included 1,095,980 first-lactation test-day records for protein yield of 154,880 Holstein cows housed on 196 large-scale dairy farms in Germany. Data were recorded between 2002 and 2009 and merged with meteorological data from public weather stations. The maximum distance between each farm and its corresponding weather station was 50 km. Hourly temperature-humidity indexes (THI) were calculated using the mean of hourly measurements of dry bulb temperature and relative humidity. On the phenotypic scale, an increase in THI was generally associated with a decrease in daily protein yield. For genetic analyses, a random regression model was applied using time-dependent (d in milk, DIM) and THI-dependent covariates. Additive genetic and permanent environmental effects were fitted with this random regression model and Legendre polynomials of order 3 for DIM and THI. In addition, the fixed curve was modeled with Legendre polynomials of order 3. Heterogeneous residuals were fitted by dividing DIM into 5 classes, and by dividing THI into 4 classes, resulting in 20 different classes. Additive genetic variances for daily protein yield decreased with increasing degrees of heat stress and were lowest at the beginning of lactation and at extreme THI. Due to higher additive genetic variances, slightly higher permanent environment variances, and similar residual variances, heritabilities were highest for low THI in combination with DIM at the end of lactation. Genetic correlations among individual values for THI were generally >0.90. These trends from the complex random regression model were verified by applying relatively simple bivariate animal models for protein yield measured in 2 THI environments; that is, defining a THI value of 60 as a threshold. These high correlations indicate the absence of any substantial genotype × environment interaction for protein yield. However, heritabilities and additive genetic variances from the random regression model tended to be slightly higher in the THI range corresponding to cows' comfort zone. Selecting such superior environments for progeny testing can contribute to an accurate genetic differentiation among selection candidates. Copyright © 2011 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Effect of motivational interviewing on rates of early childhood caries: a randomized trial.
Harrison, Rosamund; Benton, Tonya; Everson-Stewart, Siobhan; Weinstein, Phil
2007-01-01
The purposes of this randomized controlled trial were to: (1) test motivational interviewing (MI) to prevent early childhood caries; and (2) use Poisson regression for data analysis. A total of 240 South Asian children 6 to 18 months old were enrolled and randomly assigned to either the MI or control condition. Children had a dental exam, and their mothers completed pretested instruments at baseline and 1 and 2 years postintervention. Other covariates that might explain outcomes over and above treatment differences were modeled using Poisson regression. Hazard ratios were produced. Analyses included all participants whenever possible. Poisson regression supported a protective effect of MI (hazard ratio [HR]=0.54 (95%CI=035-0.84)-that is, the M/ group had about a 46% lower rate of dmfs at 2 years than did control children. Similar treatment effect estimates were obtained from models that included, as alternative outcomes, ds, dms, and dmfs, including "white spot lesions." Exploratory analyses revealed that rates of dmfs were higher in children whose mothers had: (1) prechewed their food; (2) been raised in a rural environment; and (3) a higher family income (P<.05). A motivational interviewing-style intervention shows promise to promote preventive behaviors in mothers of young children at high risk for caries.
Reitsma, Angela; Chu, Rong; Thorpe, Julia; McDonald, Sarah; Thabane, Lehana; Hutton, Eileen
2014-09-26
Clustering of outcomes at centers involved in multicenter trials is a type of center effect. The Consolidated Standards of Reporting Trials Statement recommends that multicenter randomized controlled trials (RCTs) should account for center effects in their analysis, however most do not. The Early External Cephalic Version (EECV) trials published in 2003 and 2011 stratified by center at randomization, but did not account for center in the analyses, and due to the nature of the intervention and number of centers, may have been prone to center effects. Using data from the EECV trials, we undertook an empirical study to compare various statistical approaches to account for center effect while estimating the impact of external cephalic version timing (early or delayed) on the outcomes of cesarean section, preterm birth, and non-cephalic presentation at the time of birth. The data from the EECV pilot trial and the EECV2 trial were merged into one dataset. Fisher's exact method was used to test the overall effect of external cephalic version timing unadjusted for center effects. Seven statistical models that accounted for center effects were applied to the data. The models included: i) the Mantel-Haenszel test, ii) logistic regression with fixed center effect and fixed treatment effect, iii) center-size weighted and iv) un-weighted logistic regression with fixed center effect and fixed treatment-by-center interaction, iv) logistic regression with random center effect and fixed treatment effect, v) logistic regression with random center effect and random treatment-by-center interaction, and vi) generalized estimating equations. For each of the three outcomes of interest approaches to account for center effect did not alter the overall findings of the trial. The results were similar for the majority of the methods used to adjust for center, illustrating the robustness of the findings. Despite literature that suggests center effect can change the estimate of effect in multicenter trials, this empirical study does not show a difference in the outcomes of the EECV trials when accounting for center effect. The EECV2 trial was registered on 30 July 30 2005 with Current Controlled Trials: ISRCTN 56498577.
The arcsine is asinine: the analysis of proportions in ecology.
Warton, David I; Hui, Francis K C
2011-01-01
The arcsine square root transformation has long been standard procedure when analyzing proportional data in ecology, with applications in data sets containing binomial and non-binomial response variables. Here, we argue that the arcsine transform should not be used in either circumstance. For binomial data, logistic regression has greater interpretability and higher power than analyses of transformed data. However, it is important to check the data for additional unexplained variation, i.e., overdispersion, and to account for it via the inclusion of random effects in the model if found. For non-binomial data, the arcsine transform is undesirable on the grounds of interpretability, and because it can produce nonsensical predictions. The logit transformation is proposed as an alternative approach to address these issues. Examples are presented in both cases to illustrate these advantages, comparing various methods of analyzing proportions including untransformed, arcsine- and logit-transformed linear models and logistic regression (with or without random effects). Simulations demonstrate that logistic regression usually provides a gain in power over other methods.
ERIC Educational Resources Information Center
Wing, Coady; Cook, Thomas D.
2013-01-01
The sharp regression discontinuity design (RDD) has three key weaknesses compared to the randomized clinical trial (RCT). It has lower statistical power, it is more dependent on statistical modeling assumptions, and its treatment effect estimates are limited to the narrow subpopulation of cases immediately around the cutoff, which is rarely of…
Bignardi, A B; El Faro, L; Torres Júnior, R A A; Cardoso, V L; Machado, P F; Albuquerque, L G
2011-10-31
We analyzed 152,145 test-day records from 7317 first lactations of Holstein cows recorded from 1995 to 2003. Our objective was to model variations in test-day milk yield during the first lactation of Holstein cows by random regression model (RRM), using various functions in order to obtain adequate and parsimonious models for the estimation of genetic parameters. Test-day milk yields were grouped into weekly classes of days in milk, ranging from 1 to 44 weeks. The contemporary groups were defined as herd-test-day. The analyses were performed using a single-trait RRM, including the direct additive, permanent environmental and residual random effects. In addition, contemporary group and linear and quadratic effects of the age of cow at calving were included as fixed effects. The mean trend of milk yield was modeled with a fourth-order orthogonal Legendre polynomial. The additive genetic and permanent environmental covariance functions were estimated by random regression on two parametric functions, Ali and Schaeffer and Wilmink, and on B-spline functions of days in milk. The covariance components and the genetic parameters were estimated by the restricted maximum likelihood method. Results from RRM parametric and B-spline functions were compared to RRM on Legendre polynomials and with a multi-trait analysis, using the same data set. Heritability estimates presented similar trends during mid-lactation (13 to 31 weeks) and between week 37 and the end of lactation, for all RRM. Heritabilities obtained by multi-trait analysis were of a lower magnitude than those estimated by RRM. The RRMs with a higher number of parameters were more useful to describe the genetic variation of test-day milk yield throughout the lactation. RRM using B-spline and Legendre polynomials as base functions appears to be the most adequate to describe the covariance structure of the data.
Sun, Jin; Xu, Xiaosu; Liu, Yiting; Zhang, Tao; Li, Yao
2016-07-12
In order to reduce the influence of fiber optic gyroscope (FOG) random drift error on inertial navigation systems, an improved auto regressive (AR) model is put forward in this paper. First, based on real-time observations at each restart of the gyroscope, the model of FOG random drift can be established online. In the improved AR model, the FOG measured signal is employed instead of the zero mean signals. Then, the modified Sage-Husa adaptive Kalman filter (SHAKF) is introduced, which can directly carry out real-time filtering on the FOG signals. Finally, static and dynamic experiments are done to verify the effectiveness. The filtering results are analyzed with Allan variance. The analysis results show that the improved AR model has high fitting accuracy and strong adaptability, and the minimum fitting accuracy of single noise is 93.2%. Based on the improved AR(3) model, the denoising method of SHAKF is more effective than traditional methods, and its effect is better than 30%. The random drift error of FOG is reduced effectively, and the precision of the FOG is improved.
Speidel, S E; Peel, R K; Crews, D H; Enns, R M
2016-02-01
Genetic evaluation research designed to reduce the required days to a specified end point has received very little attention in pertinent scientific literature, given that its economic importance was first discussed in 1957. There are many production scenarios in today's beef industry, making a prediction for the required number of days to a single end point a suboptimal option. Random regression is an attractive alternative to calculate days to weight (DTW), days to ultrasound back fat (DTUBF), and days to ultrasound rib eye area (DTUREA) genetic predictions that could overcome weaknesses of a single end point prediction. The objective of this study was to develop random regression approaches for the prediction of the DTW, DTUREA, and DTUBF. Data were obtained from the Agriculture and Agri-Food Canada Research Centre, Lethbridge, AB, Canada. Data consisted of records on 1,324 feedlot cattle spanning 1999 to 2007. Individual animals averaged 5.77 observations with weights, ultrasound rib eye area (UREA), ultrasound back fat depth (UBF), and ages ranging from 293 to 863 kg, 73.39 to 129.54 cm, 1.53 to 30.47 mm, and 276 to 519 d, respectively. Random regression models using Legendre polynomials were used to regress age of the individual on weight, UREA, and UBF. Fixed effects in the model included an overall fixed regression of age on end point (weight, UREA, and UBF) nested within breed to account for the mean relationship between age and weight as well as a contemporary group effect consisting of breed of the animal (Angus, Charolais, and Charolais sired), feedlot pen, and year of measure. Likelihood ratio tests were used to determine the appropriate random polynomial order. Use of the quadratic polynomial did not account for any additional genetic variation in days for DTW ( > 0.11), for DTUREA ( > 0.18), and for DTUBF ( > 0.20) when compared with the linear random polynomial. Heritability estimates from the linear random regression for DTW ranged from 0.54 to 0.74, corresponding to end points of 293 and 863 kg, respectively. Heritability for DTUREA ranged from 0.51 to 0.34 and for DTUBF ranged from 0.55 to 0.37. These estimates correspond to UREA end points of 35 and 125 cm and UBF end points of 1.53 and 30 mm, respectively. This range of heritability shows DTW, DTUREA, and DTUBF to be highly heritable and indicates that selection pressure aimed at reducing the number of days to reach a finish weight end point can result in genetic change given sufficient data.
NASA Astrophysics Data System (ADS)
Zhang, Hong; Hou, Rui; Yi, Lei; Meng, Juan; Pan, Zhisong; Zhou, Yuhuan
2016-07-01
The accurate identification of encrypted data stream helps to regulate illegal data, detect network attacks and protect users' information. In this paper, a novel encrypted data stream identification algorithm is introduced. The proposed method is based on randomness characteristics of encrypted data stream. We use a l1-norm regularized logistic regression to improve sparse representation of randomness features and Fuzzy Gaussian Mixture Model (FGMM) to improve identification accuracy. Experimental results demonstrate that the method can be adopted as an effective technique for encrypted data stream identification.
Cappella, Elise; Hamre, Bridget K.; Kim, Ha Yeon; Henry, David B.; Frazier, Stacy L.; Atkins, Marc S.; Schoenwald, Sonja K.
2012-01-01
Objective To examine effects of a teacher consultation and coaching program delivered by school and community mental health professionals on change in observed classroom interactions and child functioning across one school year. Method Thirty-six classrooms within five urban elementary schools (87% Latino, 11% Black) were randomly assigned to intervention (training + consultation/coaching) and control (training only) conditions. Classroom and child outcomes (n = 364; 43% girls) were assessed in the fall and spring. Results Random effects regression models showed main effects of intervention on teacher-student relationship closeness, academic self-concept, and peer victimization. Results of multiple regression models showed levels of observed teacher emotional support in the fall moderated intervention impact on emotional support at the end of the school year. Conclusions Results suggest teacher consultation and coaching can be integrated within existing mental health activities in urban schools and impact classroom effectiveness and child adaptation across multiple domains. PMID:22428941
Regression Discontinuity Designs in Epidemiology
Moscoe, Ellen; Mutevedzi, Portia; Newell, Marie-Louise; Bärnighausen, Till
2014-01-01
When patients receive an intervention based on whether they score below or above some threshold value on a continuously measured random variable, the intervention will be randomly assigned for patients close to the threshold. The regression discontinuity design exploits this fact to estimate causal treatment effects. In spite of its recent proliferation in economics, the regression discontinuity design has not been widely adopted in epidemiology. We describe regression discontinuity, its implementation, and the assumptions required for causal inference. We show that regression discontinuity is generalizable to the survival and nonlinear models that are mainstays of epidemiologic analysis. We then present an application of regression discontinuity to the much-debated epidemiologic question of when to start HIV patients on antiretroviral therapy. Using data from a large South African cohort (2007–2011), we estimate the causal effect of early versus deferred treatment eligibility on mortality. Patients whose first CD4 count was just below the 200 cells/μL CD4 count threshold had a 35% lower hazard of death (hazard ratio = 0.65 [95% confidence interval = 0.45–0.94]) than patients presenting with CD4 counts just above the threshold. We close by discussing the strengths and limitations of regression discontinuity designs for epidemiology. PMID:25061922
Zhu, K; Lou, Z; Zhou, J; Ballester, N; Kong, N; Parikh, P
2015-01-01
This article is part of the Focus Theme of Methods of Information in Medicine on "Big Data and Analytics in Healthcare". Hospital readmissions raise healthcare costs and cause significant distress to providers and patients. It is, therefore, of great interest to healthcare organizations to predict what patients are at risk to be readmitted to their hospitals. However, current logistic regression based risk prediction models have limited prediction power when applied to hospital administrative data. Meanwhile, although decision trees and random forests have been applied, they tend to be too complex to understand among the hospital practitioners. Explore the use of conditional logistic regression to increase the prediction accuracy. We analyzed an HCUP statewide inpatient discharge record dataset, which includes patient demographics, clinical and care utilization data from California. We extracted records of heart failure Medicare beneficiaries who had inpatient experience during an 11-month period. We corrected the data imbalance issue with under-sampling. In our study, we first applied standard logistic regression and decision tree to obtain influential variables and derive practically meaning decision rules. We then stratified the original data set accordingly and applied logistic regression on each data stratum. We further explored the effect of interacting variables in the logistic regression modeling. We conducted cross validation to assess the overall prediction performance of conditional logistic regression (CLR) and compared it with standard classification models. The developed CLR models outperformed several standard classification models (e.g., straightforward logistic regression, stepwise logistic regression, random forest, support vector machine). For example, the best CLR model improved the classification accuracy by nearly 20% over the straightforward logistic regression model. Furthermore, the developed CLR models tend to achieve better sensitivity of more than 10% over the standard classification models, which can be translated to correct labeling of additional 400 - 500 readmissions for heart failure patients in the state of California over a year. Lastly, several key predictor identified from the HCUP data include the disposition location from discharge, the number of chronic conditions, and the number of acute procedures. It would be beneficial to apply simple decision rules obtained from the decision tree in an ad-hoc manner to guide the cohort stratification. It could be potentially beneficial to explore the effect of pairwise interactions between influential predictors when building the logistic regression models for different data strata. Judicious use of the ad-hoc CLR models developed offers insights into future development of prediction models for hospital readmissions, which can lead to better intuition in identifying high-risk patients and developing effective post-discharge care strategies. Lastly, this paper is expected to raise the awareness of collecting data on additional markers and developing necessary database infrastructure for larger-scale exploratory studies on readmission risk prediction.
Panel regressions to estimate low-flow response to rainfall variability in ungaged basins
Bassiouni, Maoya; Vogel, Richard M.; Archfield, Stacey A.
2016-01-01
Multicollinearity and omitted-variable bias are major limitations to developing multiple linear regression models to estimate streamflow characteristics in ungaged areas and varying rainfall conditions. Panel regression is used to overcome limitations of traditional regression methods, and obtain reliable model coefficients, in particular to understand the elasticity of streamflow to rainfall. Using annual rainfall and selected basin characteristics at 86 gaged streams in the Hawaiian Islands, regional regression models for three stream classes were developed to estimate the annual low-flow duration discharges. Three panel-regression structures (random effects, fixed effects, and pooled) were compared to traditional regression methods, in which space is substituted for time. Results indicated that panel regression generally was able to reproduce the temporal behavior of streamflow and reduce the standard errors of model coefficients compared to traditional regression, even for models in which the unobserved heterogeneity between streams is significant and the variance inflation factor for rainfall is much greater than 10. This is because both spatial and temporal variability were better characterized in panel regression. In a case study, regional rainfall elasticities estimated from panel regressions were applied to ungaged basins on Maui, using available rainfall projections to estimate plausible changes in surface-water availability and usable stream habitat for native species. The presented panel-regression framework is shown to offer benefits over existing traditional hydrologic regression methods for developing robust regional relations to investigate streamflow response in a changing climate.
Panel regressions to estimate low-flow response to rainfall variability in ungaged basins
NASA Astrophysics Data System (ADS)
Bassiouni, Maoya; Vogel, Richard M.; Archfield, Stacey A.
2016-12-01
Multicollinearity and omitted-variable bias are major limitations to developing multiple linear regression models to estimate streamflow characteristics in ungaged areas and varying rainfall conditions. Panel regression is used to overcome limitations of traditional regression methods, and obtain reliable model coefficients, in particular to understand the elasticity of streamflow to rainfall. Using annual rainfall and selected basin characteristics at 86 gaged streams in the Hawaiian Islands, regional regression models for three stream classes were developed to estimate the annual low-flow duration discharges. Three panel-regression structures (random effects, fixed effects, and pooled) were compared to traditional regression methods, in which space is substituted for time. Results indicated that panel regression generally was able to reproduce the temporal behavior of streamflow and reduce the standard errors of model coefficients compared to traditional regression, even for models in which the unobserved heterogeneity between streams is significant and the variance inflation factor for rainfall is much greater than 10. This is because both spatial and temporal variability were better characterized in panel regression. In a case study, regional rainfall elasticities estimated from panel regressions were applied to ungaged basins on Maui, using available rainfall projections to estimate plausible changes in surface-water availability and usable stream habitat for native species. The presented panel-regression framework is shown to offer benefits over existing traditional hydrologic regression methods for developing robust regional relations to investigate streamflow response in a changing climate.
Chirombo, James; Lowe, Rachel; Kazembe, Lawrence
2014-01-01
Background After years of implementing Roll Back Malaria (RBM) interventions, the changing landscape of malaria in terms of risk factors and spatial pattern has not been fully investigated. This paper uses the 2010 malaria indicator survey data to investigate if known malaria risk factors remain relevant after many years of interventions. Methods We adopted a structured additive logistic regression model that allowed for spatial correlation, to more realistically estimate malaria risk factors. Our model included child and household level covariates, as well as climatic and environmental factors. Continuous variables were modelled by assuming second order random walk priors, while spatial correlation was specified as a Markov random field prior, with fixed effects assigned diffuse priors. Inference was fully Bayesian resulting in an under five malaria risk map for Malawi. Results Malaria risk increased with increasing age of the child. With respect to socio-economic factors, the greater the household wealth, the lower the malaria prevalence. A general decline in malaria risk was observed as altitude increased. Minimum temperatures and average total rainfall in the three months preceding the survey did not show a strong association with disease risk. Conclusions The structured additive regression model offered a flexible extension to standard regression models by enabling simultaneous modelling of possible nonlinear effects of continuous covariates, spatial correlation and heterogeneity, while estimating usual fixed effects of categorical and continuous observed variables. Our results confirmed that malaria epidemiology is a complex interaction of biotic and abiotic factors, both at the individual, household and community level and that risk factors are still relevant many years after extensive implementation of RBM activities. PMID:24991915
Chirombo, James; Lowe, Rachel; Kazembe, Lawrence
2014-01-01
After years of implementing Roll Back Malaria (RBM) interventions, the changing landscape of malaria in terms of risk factors and spatial pattern has not been fully investigated. This paper uses the 2010 malaria indicator survey data to investigate if known malaria risk factors remain relevant after many years of interventions. We adopted a structured additive logistic regression model that allowed for spatial correlation, to more realistically estimate malaria risk factors. Our model included child and household level covariates, as well as climatic and environmental factors. Continuous variables were modelled by assuming second order random walk priors, while spatial correlation was specified as a Markov random field prior, with fixed effects assigned diffuse priors. Inference was fully Bayesian resulting in an under five malaria risk map for Malawi. Malaria risk increased with increasing age of the child. With respect to socio-economic factors, the greater the household wealth, the lower the malaria prevalence. A general decline in malaria risk was observed as altitude increased. Minimum temperatures and average total rainfall in the three months preceding the survey did not show a strong association with disease risk. The structured additive regression model offered a flexible extension to standard regression models by enabling simultaneous modelling of possible nonlinear effects of continuous covariates, spatial correlation and heterogeneity, while estimating usual fixed effects of categorical and continuous observed variables. Our results confirmed that malaria epidemiology is a complex interaction of biotic and abiotic factors, both at the individual, household and community level and that risk factors are still relevant many years after extensive implementation of RBM activities.
Hunter, Paul R
2009-12-01
Household water treatment (HWT) is being widely promoted as an appropriate intervention for reducing the burden of waterborne disease in poor communities in developing countries. A recent study has raised concerns about the effectiveness of HWT, in part because of concerns over the lack of blinding and in part because of considerable heterogeneity in the reported effectiveness of randomized controlled trials. This study set out to attempt to investigate the causes of this heterogeneity and so identify factors associated with good health gains. Studies identified in an earlier systematic review and meta-analysis were supplemented with more recently published randomized controlled trials. A total of 28 separate studies of randomized controlled trials of HWT with 39 intervention arms were included in the analysis. Heterogeneity was studied using the "metareg" command in Stata. Initial analyses with single candidate predictors were undertaken and all variables significant at the P < 0.2 level were included in a final regression model. Further analyses were done to estimate the effect of the interventions over time by MonteCarlo modeling using @Risk and the parameter estimates from the final regression model. The overall effect size of all unblinded studies was relative risk = 0.56 (95% confidence intervals 0.51-0.63), but after adjusting for bias due to lack of blinding the effect size was much lower (RR = 0.85, 95% CI = 0.76-0.97). Four main variables were significant predictors of effectiveness of intervention in a multipredictor meta regression model: Log duration of study follow-up (regression coefficient of log effect size = 0.186, standard error (SE) = 0.072), whether or not the study was blinded (coefficient 0.251, SE 0.066) and being conducted in an emergency setting (coefficient -0.351, SE 0.076) were all significant predictors of effect size in the final model. Compared to the ceramic filter all other interventions were much less effective (Biosand 0.247, 0.073; chlorine and safe waste storage 0.295, 0.061; combined coagulant-chlorine 0.2349, 0.067; SODIS 0.302, 0.068). A Monte Carlo model predicted that over 12 months ceramic filters were likely to be still effective at reducing disease, whereas SODIS, chlorination, and coagulation-chlorination had little if any benefit. Indeed these three interventions are predicted to have the same or less effect than what may be expected due purely to reporting bias in unblinded studies With the currently available evidence ceramic filters are the most effective form of HWT in the longterm, disinfection-only interventions including SODIS appear to have poor if any longterm public health benefit.
Norrie, John; Davidson, Kate; Tata, Philip; Gumley, Andrew
2013-09-01
We investigated the treatment effects reported from a high-quality randomized controlled trial of cognitive behavioural therapy (CBT) for 106 people with borderline personality disorder attending community-based clinics in the UK National Health Service - the BOSCOT trial. Specifically, we examined whether the amount of therapy and therapist competence had an impact on our primary outcome, the number of suicidal acts, using instrumental variables regression modelling. Randomized controlled trial. Participants from across three sites (London, Glasgow, and Ayrshire/Arran) were randomized equally to CBT for personality disorders (CBTpd) plus Treatment as Usual or to Treatment as Usual. Treatment as Usual varied between sites and individuals, but was consistent with routine treatment in the UK National Health Service at the time. CBTpd comprised an average 16 sessions (range 0-35) over 12 months. We used instrumental variable regression modelling to estimate the impact of quantity and quality of therapy received (recording activities and behaviours that took place after randomization) on number of suicidal acts and inpatient psychiatric hospitalization. A total of 101 participants provided full outcome data at 2 years post randomization. The previously reported intention-to-treat (ITT) results showed on average a reduction of 0.91 (95% confidence interval 0.15-1.67) suicidal acts over 2 years for those randomized to CBT. By incorporating the influence of quantity of therapy and therapist competence, we show that this estimate of the effect of CBTpd could be approximately two to three times greater for those receiving the right amount of therapy from a competent therapist. Trials should routinely control for and collect data on both quantity of therapy and therapist competence, which can be used, via instrumental variable regression modelling, to estimate treatment effects for optimal delivery of therapy. Such estimates complement rather than replace the ITT results, which are properly the principal analysis results from such trials. © 2013 The British Psychological Society.
He, Jie; Zhao, Yunfeng; Zhao, Jingli; Gao, Jin; Han, Dandan; Xu, Pao; Yang, Runqing
2017-11-02
Because of their high economic importance, growth traits in fish are under continuous improvement. For growth traits that are recorded at multiple time-points in life, the use of univariate and multivariate animal models is limited because of the variable and irregular timing of these measures. Thus, the univariate random regression model (RRM) was introduced for the genetic analysis of dynamic growth traits in fish breeding. We used a multivariate random regression model (MRRM) to analyze genetic changes in growth traits recorded at multiple time-point of genetically-improved farmed tilapia. Legendre polynomials of different orders were applied to characterize the influences of fixed and random effects on growth trajectories. The final MRRM was determined by optimizing the univariate RRM for the analyzed traits separately via penalizing adaptively the likelihood statistical criterion, which is superior to both the Akaike information criterion and the Bayesian information criterion. In the selected MRRM, the additive genetic effects were modeled by Legendre polynomials of three orders for body weight (BWE) and body length (BL) and of two orders for body depth (BD). By using the covariance functions of the MRRM, estimated heritabilities were between 0.086 and 0.628 for BWE, 0.155 and 0.556 for BL, and 0.056 and 0.607 for BD. Only heritabilities for BD measured from 60 to 140 days of age were consistently higher than those estimated by the univariate RRM. All genetic correlations between growth time-points exceeded 0.5 for either single or pairwise time-points. Moreover, correlations between early and late growth time-points were lower. Thus, for phenotypes that are measured repeatedly in aquaculture, an MRRM can enhance the efficiency of the comprehensive selection for BWE and the main morphological traits.
MIXOR: a computer program for mixed-effects ordinal regression analysis.
Hedeker, D; Gibbons, R D
1996-03-01
MIXOR provides maximum marginal likelihood estimates for mixed-effects ordinal probit, logistic, and complementary log-log regression models. These models can be used for analysis of dichotomous and ordinal outcomes from either a clustered or longitudinal design. For clustered data, the mixed-effects model assumes that data within clusters are dependent. The degree of dependency is jointly estimated with the usual model parameters, thus adjusting for dependence resulting from clustering of the data. Similarly, for longitudinal data, the mixed-effects approach can allow for individual-varying intercepts and slopes across time, and can estimate the degree to which these time-related effects vary in the population of individuals. MIXOR uses marginal maximum likelihood estimation, utilizing a Fisher-scoring solution. For the scoring solution, the Cholesky factor of the random-effects variance-covariance matrix is estimated, along with the effects of model covariates. Examples illustrating usage and features of MIXOR are provided.
A note on variance estimation in random effects meta-regression.
Sidik, Kurex; Jonkman, Jeffrey N
2005-01-01
For random effects meta-regression inference, variance estimation for the parameter estimates is discussed. Because estimated weights are used for meta-regression analysis in practice, the assumed or estimated covariance matrix used in meta-regression is not strictly correct, due to possible errors in estimating the weights. Therefore, this note investigates the use of a robust variance estimation approach for obtaining variances of the parameter estimates in random effects meta-regression inference. This method treats the assumed covariance matrix of the effect measure variables as a working covariance matrix. Using an example of meta-analysis data from clinical trials of a vaccine, the robust variance estimation approach is illustrated in comparison with two other methods of variance estimation. A simulation study is presented, comparing the three methods of variance estimation in terms of bias and coverage probability. We find that, despite the seeming suitability of the robust estimator for random effects meta-regression, the improved variance estimator of Knapp and Hartung (2003) yields the best performance among the three estimators, and thus may provide the best protection against errors in the estimated weights.
Comparison of methods for the analysis of relatively simple mediation models.
Rijnhart, Judith J M; Twisk, Jos W R; Chinapaw, Mai J M; de Boer, Michiel R; Heymans, Martijn W
2017-09-01
Statistical mediation analysis is an often used method in trials, to unravel the pathways underlying the effect of an intervention on a particular outcome variable. Throughout the years, several methods have been proposed, such as ordinary least square (OLS) regression, structural equation modeling (SEM), and the potential outcomes framework. Most applied researchers do not know that these methods are mathematically equivalent when applied to mediation models with a continuous mediator and outcome variable. Therefore, the aim of this paper was to demonstrate the similarities between OLS regression, SEM, and the potential outcomes framework in three mediation models: 1) a crude model, 2) a confounder-adjusted model, and 3) a model with an interaction term for exposure-mediator interaction. Secondary data analysis of a randomized controlled trial that included 546 schoolchildren. In our data example, the mediator and outcome variable were both continuous. We compared the estimates of the total, direct and indirect effects, proportion mediated, and 95% confidence intervals (CIs) for the indirect effect across OLS regression, SEM, and the potential outcomes framework. OLS regression, SEM, and the potential outcomes framework yielded the same effect estimates in the crude mediation model, the confounder-adjusted mediation model, and the mediation model with an interaction term for exposure-mediator interaction. Since OLS regression, SEM, and the potential outcomes framework yield the same results in three mediation models with a continuous mediator and outcome variable, researchers can continue using the method that is most convenient to them.
Polynomial order selection in random regression models via penalizing adaptively the likelihood.
Corrales, J D; Munilla, S; Cantet, R J C
2015-08-01
Orthogonal Legendre polynomials (LP) are used to model the shape of additive genetic and permanent environmental effects in random regression models (RRM). Frequently, the Akaike (AIC) and the Bayesian (BIC) information criteria are employed to select LP order. However, it has been theoretically shown that neither AIC nor BIC is simultaneously optimal in terms of consistency and efficiency. Thus, the goal was to introduce a method, 'penalizing adaptively the likelihood' (PAL), as a criterion to select LP order in RRM. Four simulated data sets and real data (60,513 records, 6675 Colombian Holstein cows) were employed. Nested models were fitted to the data, and AIC, BIC and PAL were calculated for all of them. Results showed that PAL and BIC identified with probability of one the true LP order for the additive genetic and permanent environmental effects, but AIC tended to favour over parameterized models. Conversely, when the true model was unknown, PAL selected the best model with higher probability than AIC. In the latter case, BIC never favoured the best model. To summarize, PAL selected a correct model order regardless of whether the 'true' model was within the set of candidates. © 2015 Blackwell Verlag GmbH.
Use of AMMI and linear regression models to analyze genotype-environment interaction in durum wheat.
Nachit, M M; Nachit, G; Ketata, H; Gauch, H G; Zobel, R W
1992-03-01
The joint durum wheat (Triticum turgidum L var 'durum') breeding program of the International Maize and Wheat Improvement Center (CIMMYT) and the International Center for Agricultural Research in the Dry Areas (ICARDA) for the Mediterranean region employs extensive multilocation testing. Multilocation testing produces significant genotype-environment (GE) interaction that reduces the accuracy for estimating yield and selecting appropriate germ plasm. The sum of squares (SS) of GE interaction was partitioned by linear regression techniques into joint, genotypic, and environmental regressions, and by Additive Main effects and the Multiplicative Interactions (AMMI) model into five significant Interaction Principal Component Axes (IPCA). The AMMI model was more effective in partitioning the interaction SS than the linear regression technique. The SS contained in the AMMI model was 6 times higher than the SS for all three regressions. Postdictive assessment recommended the use of the first five IPCA axes, while predictive assessment AMMI1 (main effects plus IPCA1). After elimination of random variation, AMMI1 estimates for genotypic yields within sites were more precise than unadjusted means. This increased precision was equivalent to increasing the number of replications by a factor of 3.7.
NASA Astrophysics Data System (ADS)
Muller, Sybrand Jacobus; van Niekerk, Adriaan
2016-07-01
Soil salinity often leads to reduced crop yield and quality and can render soils barren. Irrigated areas are particularly at risk due to intensive cultivation and secondary salinization caused by waterlogging. Regular monitoring of salt accumulation in irrigation schemes is needed to keep its negative effects under control. The dynamic spatial and temporal characteristics of remote sensing can provide a cost-effective solution for monitoring salt accumulation at irrigation scheme level. This study evaluated a range of pan-fused SPOT-5 derived features (spectral bands, vegetation indices, image textures and image transformations) for classifying salt-affected areas in two distinctly different irrigation schemes in South Africa, namely Vaalharts and Breede River. The relationship between the input features and electro conductivity measurements were investigated using regression modelling (stepwise linear regression, partial least squares regression, curve fit regression modelling) and supervised classification (maximum likelihood, nearest neighbour, decision tree analysis, support vector machine and random forests). Classification and regression trees and random forest were used to select the most important features for differentiating salt-affected and unaffected areas. The results showed that the regression analyses produced weak models (<0.4 R squared). Better results were achieved using the supervised classifiers, but the algorithms tend to over-estimate salt-affected areas. A key finding was that none of the feature sets or classification algorithms stood out as being superior for monitoring salt accumulation at irrigation scheme level. This was attributed to the large variations in the spectral responses of different crops types at different growing stages, coupled with their individual tolerances to saline conditions.
ERIC Educational Resources Information Center
Tipton, Elizabeth; Pustejovsky, James E.
2015-01-01
Randomized experiments are commonly used to evaluate the effectiveness of educational interventions. The goal of the present investigation is to develop small-sample corrections for multiple contrast hypothesis tests (i.e., F-tests) such as the omnibus test of meta-regression fit or a test for equality of three or more levels of a categorical…
Genetic analysis of groups of mid-infrared predicted fatty acids in milk.
Narayana, S G; Schenkel, F S; Fleming, A; Koeck, A; Malchiodi, F; Jamrozik, J; Johnston, J; Sargolzaei, M; Miglior, F
2017-06-01
The objective of this study was to investigate genetic variability of mid-infrared predicted fatty acid groups in Canadian Holstein cattle. Genetic parameters were estimated for 5 groups of fatty acids: short-chain (4 to 10 carbons), medium-chain (11 to 16 carbons), long-chain (17 to 22 carbons), saturated, and unsaturated fatty acids. The data set included 49,127 test-day records from 10,029 first-lactation Holstein cows in 810 herds. The random regression animal test-day model included days in milk, herd-test date, and age-season of calving (polynomial regression) as fixed effects, herd-year of calving, animal additive genetic effect, and permanent environment effects as random polynomial regressions, and random residual effect. Legendre polynomials of the third degree were selected for the fixed regression for age-season of calving effect and Legendre polynomials of the fourth degree were selected for the random regression for animal additive genetic, permanent environment, and herd-year effect. The average daily heritability over the lactation for the medium-chain fatty acid group (0.32) was higher than for the short-chain (0.24) and long-chain (0.23) fatty acid groups. The average daily heritability for the saturated fatty acid group (0.33) was greater than for the unsaturated fatty acid group (0.21). Estimated average daily genetic correlations were positive among all fatty acid groups and ranged from moderate to high (0.63-0.96). The genetic correlations illustrated similarities and differences in their origin and the makeup of the groupings based on chain length and saturation. These results provide evidence for the existence of genetic variation in mid-infrared predicted fatty acid groups, and the possibility of improving milk fatty acid profile through genetic selection in Canadian dairy cattle. Copyright © 2017 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
A method for fitting regression splines with varying polynomial order in the linear mixed model.
Edwards, Lloyd J; Stewart, Paul W; MacDougall, James E; Helms, Ronald W
2006-02-15
The linear mixed model has become a widely used tool for longitudinal analysis of continuous variables. The use of regression splines in these models offers the analyst additional flexibility in the formulation of descriptive analyses, exploratory analyses and hypothesis-driven confirmatory analyses. We propose a method for fitting piecewise polynomial regression splines with varying polynomial order in the fixed effects and/or random effects of the linear mixed model. The polynomial segments are explicitly constrained by side conditions for continuity and some smoothness at the points where they join. By using a reparameterization of this explicitly constrained linear mixed model, an implicitly constrained linear mixed model is constructed that simplifies implementation of fixed-knot regression splines. The proposed approach is relatively simple, handles splines in one variable or multiple variables, and can be easily programmed using existing commercial software such as SAS or S-plus. The method is illustrated using two examples: an analysis of longitudinal viral load data from a study of subjects with acute HIV-1 infection and an analysis of 24-hour ambulatory blood pressure profiles.
Santellano-Estrada, E; Becerril-Pérez, C M; de Alba, J; Chang, Y M; Gianola, D; Torres-Hernández, G; Ramírez-Valverde, R
2008-11-01
This study inferred genetic and permanent environmental variation of milk yield in Tropical Milking Criollo cattle and compared 5 random regression test-day models using Wilmink's function and Legendre polynomials. Data consisted of 15,377 test-day records from 467 Tropical Milking Criollo cows that calved between 1974 and 2006 in the tropical lowlands of the Gulf Coast of Mexico and in southern Nicaragua. Estimated heritabilities of test-day milk yields ranged from 0.18 to 0.45, and repeatabilities ranged from 0.35 to 0.68 for the period spanning from 6 to 400 d in milk. Genetic correlation between days in milk 10 and 400 was around 0.50 but greater than 0.90 for most pairs of test days. The model that used first-order Legendre polynomials for additive genetic effects and second-order Legendre polynomials for permanent environmental effects gave the smallest residual variance and was also favored by the Akaike information criterion and likelihood ratio tests.
Baba, Hiromi; Takahara, Jun-ichi; Yamashita, Fumiyoshi; Hashida, Mitsuru
2015-11-01
The solvent effect on skin permeability is important for assessing the effectiveness and toxicological risk of new dermatological formulations in pharmaceuticals and cosmetics development. The solvent effect occurs by diverse mechanisms, which could be elucidated by efficient and reliable prediction models. However, such prediction models have been hampered by the small variety of permeants and mixture components archived in databases and by low predictive performance. Here, we propose a solution to both problems. We first compiled a novel large database of 412 samples from 261 structurally diverse permeants and 31 solvents reported in the literature. The data were carefully screened to ensure their collection under consistent experimental conditions. To construct a high-performance predictive model, we then applied support vector regression (SVR) and random forest (RF) with greedy stepwise descriptor selection to our database. The models were internally and externally validated. The SVR achieved higher performance statistics than RF. The (externally validated) determination coefficient, root mean square error, and mean absolute error of SVR were 0.899, 0.351, and 0.268, respectively. Moreover, because all descriptors are fully computational, our method can predict as-yet unsynthesized compounds. Our high-performance prediction model offers an attractive alternative to permeability experiments for pharmaceutical and cosmetic candidate screening and optimizing skin-permeable topical formulations.
Mota, L F M; Martins, P G M A; Littiere, T O; Abreu, L R A; Silva, M A; Bonafé, C M
2018-04-01
The objective was to estimate (co)variance functions using random regression models (RRM) with Legendre polynomials, B-spline function and multi-trait models aimed at evaluating genetic parameters of growth traits in meat-type quail. A database containing the complete pedigree information of 7000 meat-type quail was utilized. The models included the fixed effects of contemporary group and generation. Direct additive genetic and permanent environmental effects, considered as random, were modeled using B-spline functions considering quadratic and cubic polynomials for each individual segment, and Legendre polynomials for age. Residual variances were grouped in four age classes. Direct additive genetic and permanent environmental effects were modeled using 2 to 4 segments and were modeled by Legendre polynomial with orders of fit ranging from 2 to 4. The model with quadratic B-spline adjustment, using four segments for direct additive genetic and permanent environmental effects, was the most appropriate and parsimonious to describe the covariance structure of the data. The RRM using Legendre polynomials presented an underestimation of the residual variance. Lesser heritability estimates were observed for multi-trait models in comparison with RRM for the evaluated ages. In general, the genetic correlations between measures of BW from hatching to 35 days of age decreased as the range between the evaluated ages increased. Genetic trend for BW was positive and significant along the selection generations. The genetic response to selection for BW in the evaluated ages presented greater values for RRM compared with multi-trait models. In summary, RRM using B-spline functions with four residual variance classes and segments were the best fit for genetic evaluation of growth traits in meat-type quail. In conclusion, RRM should be considered in genetic evaluation of breeding programs.
Nixon, R M; Bansback, N; Brennan, A
2007-03-15
Mixed treatment comparison (MTC) is a generalization of meta-analysis. Instead of the same treatment for a disease being tested in a number of studies, a number of different interventions are considered. Meta-regression is also a generalization of meta-analysis where an attempt is made to explain the heterogeneity between the treatment effects in the studies by regressing on study-level covariables. Our focus is where there are several different treatments considered in a number of randomized controlled trials in a specific disease, the same treatment can be applied in several arms within a study, and where differences in efficacy can be explained by differences in the study settings. We develop methods for simultaneously comparing several treatments and adjusting for study-level covariables by combining ideas from MTC and meta-regression. We use a case study from rheumatoid arthritis. We identified relevant trials of biologic verses standard therapy or placebo and extracted the doses, comparators and patient baseline characteristics. Efficacy is measured using the log odds ratio of achieving six-month ACR50 responder status. A random-effects meta-regression model is fitted which adjusts the log odds ratio for study-level prognostic factors. A different random-effect distribution on the log odds ratios is allowed for each different treatment. The odds ratio is found as a function of the prognostic factors for each treatment. The apparent differences in the randomized trials between tumour necrosis factor alpha (TNF- alpha) antagonists are explained by differences in prognostic factors and the analysis suggests that these drugs as a class are not different from each other. Copyright (c) 2006 John Wiley & Sons, Ltd.
Regression dilution bias: tools for correction methods and sample size calculation.
Berglund, Lars
2012-08-01
Random errors in measurement of a risk factor will introduce downward bias of an estimated association to a disease or a disease marker. This phenomenon is called regression dilution bias. A bias correction may be made with data from a validity study or a reliability study. In this article we give a non-technical description of designs of reliability studies with emphasis on selection of individuals for a repeated measurement, assumptions of measurement error models, and correction methods for the slope in a simple linear regression model where the dependent variable is a continuous variable. Also, we describe situations where correction for regression dilution bias is not appropriate. The methods are illustrated with the association between insulin sensitivity measured with the euglycaemic insulin clamp technique and fasting insulin, where measurement of the latter variable carries noticeable random error. We provide software tools for estimation of a corrected slope in a simple linear regression model assuming data for a continuous dependent variable and a continuous risk factor from a main study and an additional measurement of the risk factor in a reliability study. Also, we supply programs for estimation of the number of individuals needed in the reliability study and for choice of its design. Our conclusion is that correction for regression dilution bias is seldom applied in epidemiological studies. This may cause important effects of risk factors with large measurement errors to be neglected.
Punzo, Antonio; Ingrassia, Salvatore; Maruotti, Antonello
2018-04-22
A time-varying latent variable model is proposed to jointly analyze multivariate mixed-support longitudinal data. The proposal can be viewed as an extension of hidden Markov regression models with fixed covariates (HMRMFCs), which is the state of the art for modelling longitudinal data, with a special focus on the underlying clustering structure. HMRMFCs are inadequate for applications in which a clustering structure can be identified in the distribution of the covariates, as the clustering is independent from the covariates distribution. Here, hidden Markov regression models with random covariates are introduced by explicitly specifying state-specific distributions for the covariates, with the aim of improving the recovering of the clusters in the data with respect to a fixed covariates paradigm. The hidden Markov regression models with random covariates class is defined focusing on the exponential family, in a generalized linear model framework. Model identifiability conditions are sketched, an expectation-maximization algorithm is outlined for parameter estimation, and various implementation and operational issues are discussed. Properties of the estimators of the regression coefficients, as well as of the hidden path parameters, are evaluated through simulation experiments and compared with those of HMRMFCs. The method is applied to physical activity data. Copyright © 2018 John Wiley & Sons, Ltd.
Breastfeeding and intelligence: a systematic review and meta-analysis.
Horta, Bernardo L; Loret de Mola, Christian; Victora, Cesar G
2015-12-01
This study was aimed at systematically reviewing evidence of the association between breastfeeding and performance in intelligence tests. Two independent searches were carried out using Medline, LILACS, SCIELO and Web of Science. Studies restricted to infants and those where estimates were not adjusted for stimulation or interaction at home were excluded. Fixed- and random-effects models were used to pool the effect estimates, and a random-effects regression was used to assess potential sources of heterogeneity. We included 17 studies with 18 estimates of the relationship between breastfeeding and performance in intelligence tests. In a random-effects model, breastfed subjects achieved a higher IQ [mean difference: 3.44 points (95% confidence interval: 2.30; 4.58)]. We found no evidence of publication bias. Studies that controlled for maternal IQ showed a smaller benefit from breastfeeding [mean difference 2.62 points (95% confidence interval: 1.25; 3.98)]. In the meta-regression, none of the study characteristics explained the heterogeneity among the studies. Breastfeeding is related to improved performance in intelligence tests. A positive effect of breastfeeding on cognition was also observed in a randomised trial. This suggests that the association is causal. ©2015 The Authors. Acta Paediatrica published by John Wiley & Sons Ltd on behalf of Foundation Acta Paediatrica.
Revisiting crash spatial heterogeneity: A Bayesian spatially varying coefficients approach.
Xu, Pengpeng; Huang, Helai; Dong, Ni; Wong, S C
2017-01-01
This study was performed to investigate the spatially varying relationships between crash frequency and related risk factors. A Bayesian spatially varying coefficients model was elaborately introduced as a methodological alternative to simultaneously account for the unstructured and spatially structured heterogeneity of the regression coefficients in predicting crash frequencies. The proposed method was appealing in that the parameters were modeled via a conditional autoregressive prior distribution, which involved a single set of random effects and a spatial correlation parameter with extreme values corresponding to pure unstructured or pure spatially correlated random effects. A case study using a three-year crash dataset from the Hillsborough County, Florida, was conducted to illustrate the proposed model. Empirical analysis confirmed the presence of both unstructured and spatially correlated variations in the effects of contributory factors on severe crash occurrences. The findings also suggested that ignoring spatially structured heterogeneity may result in biased parameter estimates and incorrect inferences, while assuming the regression coefficients to be spatially clustered only is probably subject to the issue of over-smoothness. Copyright © 2016 Elsevier Ltd. All rights reserved.
Austin, Peter C.; Stryhn, Henrik; Leckie, George; Merlo, Juan
2017-01-01
Multilevel data occur frequently in many research areas like health services research and epidemiology. A suitable way to analyze such data is through the use of multilevel regression models. These models incorporate cluster‐specific random effects that allow one to partition the total variation in the outcome into between‐cluster variation and between‐individual variation. The magnitude of the effect of clustering provides a measure of the general contextual effect. When outcomes are binary or time‐to‐event in nature, the general contextual effect can be quantified by measures of heterogeneity like the median odds ratio or the median hazard ratio, respectively, which can be calculated from a multilevel regression model. Outcomes that are integer counts denoting the number of times that an event occurred are common in epidemiological and medical research. The median (incidence) rate ratio in multilevel Poisson regression for counts that corresponds to the median odds ratio or median hazard ratio for binary or time‐to‐event outcomes respectively is relatively unknown and is rarely used. The median rate ratio is the median relative change in the rate of the occurrence of the event when comparing identical subjects from 2 randomly selected different clusters that are ordered by rate. We also describe how the variance partition coefficient, which denotes the proportion of the variation in the outcome that is attributable to between‐cluster differences, can be computed with count outcomes. We illustrate the application and interpretation of these measures in a case study analyzing the rate of hospital readmission in patients discharged from hospital with a diagnosis of heart failure. PMID:29114926
The PX-EM algorithm for fast stable fitting of Henderson's mixed model
Foulley, Jean-Louis; Van Dyk, David A
2000-01-01
This paper presents procedures for implementing the PX-EM algorithm of Liu, Rubin and Wu to compute REML estimates of variance covariance components in Henderson's linear mixed models. The class of models considered encompasses several correlated random factors having the same vector length e.g., as in random regression models for longitudinal data analysis and in sire-maternal grandsire models for genetic evaluation. Numerical examples are presented to illustrate the procedures. Much better results in terms of convergence characteristics (number of iterations and time required for convergence) are obtained for PX-EM relative to the basic EM algorithm in the random regression. PMID:14736399
Shrinkage Estimation of Varying Covariate Effects Based On Quantile Regression
Peng, Limin; Xu, Jinfeng; Kutner, Nancy
2013-01-01
Varying covariate effects often manifest meaningful heterogeneity in covariate-response associations. In this paper, we adopt a quantile regression model that assumes linearity at a continuous range of quantile levels as a tool to explore such data dynamics. The consideration of potential non-constancy of covariate effects necessitates a new perspective for variable selection, which, under the assumed quantile regression model, is to retain variables that have effects on all quantiles of interest as well as those that influence only part of quantiles considered. Current work on l1-penalized quantile regression either does not concern varying covariate effects or may not produce consistent variable selection in the presence of covariates with partial effects, a practical scenario of interest. In this work, we propose a shrinkage approach by adopting a novel uniform adaptive LASSO penalty. The new approach enjoys easy implementation without requiring smoothing. Moreover, it can consistently identify the true model (uniformly across quantiles) and achieve the oracle estimation efficiency. We further extend the proposed shrinkage method to the case where responses are subject to random right censoring. Numerical studies confirm the theoretical results and support the utility of our proposals. PMID:25332515
Pearl, D L; Louie, M; Chui, L; Doré, K; Grimsrud, K M; Martin, S W; Michel, P; Svenson, L W; McEwen, S A
2008-04-01
Using multivariable models, we compared whether there were significant differences between reported outbreak and sporadic cases in terms of their sex, age, and mode and site of disease transmission. We also determined the potential role of administrative, temporal, and spatial factors within these models. We compared a variety of approaches to account for clustering of cases in outbreaks including weighted logistic regression, random effects models, general estimating equations, robust variance estimates, and the random selection of one case from each outbreak. Age and mode of transmission were the only epidemiologically and statistically significant covariates in our final models using the above approaches. Weighing observations in a logistic regression model by the inverse of their outbreak size appeared to be a relatively robust and valid means for modelling these data. Some analytical techniques, designed to account for clustering, had difficulty converging or producing realistic measures of association.
Conditional Monte Carlo randomization tests for regression models.
Parhat, Parwen; Rosenberger, William F; Diao, Guoqing
2014-08-15
We discuss the computation of randomization tests for clinical trials of two treatments when the primary outcome is based on a regression model. We begin by revisiting the seminal paper of Gail, Tan, and Piantadosi (1988), and then describe a method based on Monte Carlo generation of randomization sequences. The tests based on this Monte Carlo procedure are design based, in that they incorporate the particular randomization procedure used. We discuss permuted block designs, complete randomization, and biased coin designs. We also use a new technique by Plamadeala and Rosenberger (2012) for simple computation of conditional randomization tests. Like Gail, Tan, and Piantadosi, we focus on residuals from generalized linear models and martingale residuals from survival models. Such techniques do not apply to longitudinal data analysis, and we introduce a method for computation of randomization tests based on the predicted rate of change from a generalized linear mixed model when outcomes are longitudinal. We show, by simulation, that these randomization tests preserve the size and power well under model misspecification. Copyright © 2014 John Wiley & Sons, Ltd.
Random forest models to predict aqueous solubility.
Palmer, David S; O'Boyle, Noel M; Glen, Robert C; Mitchell, John B O
2007-01-01
Random Forest regression (RF), Partial-Least-Squares (PLS) regression, Support Vector Machines (SVM), and Artificial Neural Networks (ANN) were used to develop QSPR models for the prediction of aqueous solubility, based on experimental data for 988 organic molecules. The Random Forest regression model predicted aqueous solubility more accurately than those created by PLS, SVM, and ANN and offered methods for automatic descriptor selection, an assessment of descriptor importance, and an in-parallel measure of predictive ability, all of which serve to recommend its use. The prediction of log molar solubility for an external test set of 330 molecules that are solid at 25 degrees C gave an r2 = 0.89 and RMSE = 0.69 log S units. For a standard data set selected from the literature, the model performed well with respect to other documented methods. Finally, the diversity of the training and test sets are compared to the chemical space occupied by molecules in the MDL drug data report, on the basis of molecular descriptors selected by the regression analysis.
Quality Quandaries: Predicting a Population of Curves
Fugate, Michael Lynn; Hamada, Michael Scott; Weaver, Brian Phillip
2017-12-19
We present a random effects spline regression model based on splines that provides an integrated approach for analyzing functional data, i.e., curves, when the shape of the curves is not parametrically specified. An analysis using this model is presented that makes inferences about a population of curves as well as features of the curves.
Quality Quandaries: Predicting a Population of Curves
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fugate, Michael Lynn; Hamada, Michael Scott; Weaver, Brian Phillip
We present a random effects spline regression model based on splines that provides an integrated approach for analyzing functional data, i.e., curves, when the shape of the curves is not parametrically specified. An analysis using this model is presented that makes inferences about a population of curves as well as features of the curves.
A mixed-effects regression model for longitudinal multivariate ordinal data.
Liu, Li C; Hedeker, Donald
2006-03-01
A mixed-effects item response theory model that allows for three-level multivariate ordinal outcomes and accommodates multiple random subject effects is proposed for analysis of multivariate ordinal outcomes in longitudinal studies. This model allows for the estimation of different item factor loadings (item discrimination parameters) for the multiple outcomes. The covariates in the model do not have to follow the proportional odds assumption and can be at any level. Assuming either a probit or logistic response function, maximum marginal likelihood estimation is proposed utilizing multidimensional Gauss-Hermite quadrature for integration of the random effects. An iterative Fisher scoring solution, which provides standard errors for all model parameters, is used. An analysis of a longitudinal substance use data set, where four items of substance use behavior (cigarette use, alcohol use, marijuana use, and getting drunk or high) are repeatedly measured over time, is used to illustrate application of the proposed model.
Yu, Wenxi; Liu, Yang; Ma, Zongwei; Bi, Jun
2017-08-01
Using satellite-based aerosol optical depth (AOD) measurements and statistical models to estimate ground-level PM 2.5 is a promising way to fill the areas that are not covered by ground PM 2.5 monitors. The statistical models used in previous studies are primarily Linear Mixed Effects (LME) and Geographically Weighted Regression (GWR) models. In this study, we developed a new regression model between PM 2.5 and AOD using Gaussian processes in a Bayesian hierarchical setting. Gaussian processes model the stochastic nature of the spatial random effects, where the mean surface and the covariance function is specified. The spatial stochastic process is incorporated under the Bayesian hierarchical framework to explain the variation of PM 2.5 concentrations together with other factors, such as AOD, spatial and non-spatial random effects. We evaluate the results of our model and compare them with those of other, conventional statistical models (GWR and LME) by within-sample model fitting and out-of-sample validation (cross validation, CV). The results show that our model possesses a CV result (R 2 = 0.81) that reflects higher accuracy than that of GWR and LME (0.74 and 0.48, respectively). Our results indicate that Gaussian process models have the potential to improve the accuracy of satellite-based PM 2.5 estimates.
Jarnevich, Catherine S.; Talbert, Marian; Morisette, Jeffrey T.; Aldridge, Cameron L.; Brown, Cynthia; Kumar, Sunil; Manier, Daniel; Talbert, Colin; Holcombe, Tracy R.
2017-01-01
Evaluating the conditions where a species can persist is an important question in ecology both to understand tolerances of organisms and to predict distributions across landscapes. Presence data combined with background or pseudo-absence locations are commonly used with species distribution modeling to develop these relationships. However, there is not a standard method to generate background or pseudo-absence locations, and method choice affects model outcomes. We evaluated combinations of both model algorithms (simple and complex generalized linear models, multivariate adaptive regression splines, Maxent, boosted regression trees, and random forest) and background methods (random, minimum convex polygon, and continuous and binary kernel density estimator (KDE)) to assess the sensitivity of model outcomes to choices made. We evaluated six questions related to model results, including five beyond the common comparison of model accuracy assessment metrics (biological interpretability of response curves, cross-validation robustness, independent data accuracy and robustness, and prediction consistency). For our case study with cheatgrass in the western US, random forest was least sensitive to background choice and the binary KDE method was least sensitive to model algorithm choice. While this outcome may not hold for other locations or species, the methods we used can be implemented to help determine appropriate methodologies for particular research questions.
Reboussin, Beth A; Preisser, John S; Song, Eun-Young; Wolfson, Mark
2012-07-01
Under-age drinking is an enormous public health issue in the USA. Evidence that community level structures may impact on under-age drinking has led to a proliferation of efforts to change the environment surrounding the use of alcohol. Although the focus of these efforts is to reduce drinking by individual youths, environmental interventions are typically implemented at the community level with entire communities randomized to the same intervention condition. A distinct feature of these trials is the tendency of the behaviours of individuals residing in the same community to be more alike than that of others residing in different communities, which is herein called 'clustering'. Statistical analyses and sample size calculations must account for this clustering to avoid type I errors and to ensure an appropriately powered trial. Clustering itself may also be of scientific interest. We consider the alternating logistic regressions procedure within the population-averaged modelling framework to estimate the effect of a law enforcement intervention on the prevalence of under-age drinking behaviours while modelling the clustering at multiple levels, e.g. within communities and within neighbourhoods nested within communities, by using pairwise odds ratios. We then derive sample size formulae for estimating intervention effects when planning a post-test-only or repeated cross-sectional community-randomized trial using the alternating logistic regressions procedure.
[Using fractional polynomials to estimate the safety threshold of fluoride in drinking water].
Pan, Shenling; An, Wei; Li, Hongyan; Yang, Min
2014-01-01
To study the dose-response relationship between fluoride content in drinking water and prevalence of dental fluorosis on the national scale, then to determine the safety threshold of fluoride in drinking water. Meta-regression analysis was applied to the 2001-2002 national endemic fluorosis survey data of key wards. First, fractional polynomial (FP) was adopted to establish fixed effect model, determining the best FP structure, after that restricted maximum likelihood (REML) was adopted to estimate between-study variance, then the best random effect model was established. The best FP structure was first-order logarithmic transformation. Based on the best random effect model, the benchmark dose (BMD) of fluoride in drinking water and its lower limit (BMDL) was calculated as 0.98 mg/L and 0.78 mg/L. Fluoride in drinking water can only explain 35.8% of the variability of the prevalence, among other influencing factors, ward type was a significant factor, while temperature condition and altitude were not. Fractional polynomial-based meta-regression method is simple, practical and can provide good fitting effect, based on it, the safety threshold of fluoride in drinking water of our country is determined as 0.8 mg/L.
Roso, V M; Schenkel, F S; Miller, S P; Schaeffer, L R
2005-08-01
Breed additive, dominance, and epistatic loss effects are of concern in the genetic evaluation of a multibreed population. Multiple regression equations used for fitting these effects may show a high degree of multicollinearity among predictor variables. Typically, when strong linear relationships exist, the regression coefficients have large SE and are sensitive to changes in the data file and to the addition or deletion of variables in the model. Generalized ridge regression methods were applied to obtain stable estimates of direct and maternal breed additive, dominance, and epistatic loss effects in the presence of multicollinearity among predictor variables. Preweaning weight gains of beef calves in Ontario, Canada, from 1986 to 1999 were analyzed. The genetic model included fixed direct and maternal breed additive, dominance, and epistatic loss effects, fixed environmental effects of age of the calf, contemporary group, and age of the dam x sex of the calf, random additive direct and maternal genetic effects, and random maternal permanent environment effect. The degree and the nature of the multicollinearity were identified and ridge regression methods were used as an alternative to ordinary least squares (LS). Ridge parameters were obtained using two different objective methods: 1) generalized ridge estimator of Hoerl and Kennard (R1); and 2) bootstrap in combination with cross-validation (R2). Both ridge regression methods outperformed the LS estimator with respect to mean squared error of predictions (MSEP) and variance inflation factors (VIF) computed over 100 bootstrap samples. The MSEP of R1 and R2 were similar, and they were 3% less than the MSEP of LS. The average VIF of LS, R1, and R2 were equal to 26.81, 6.10, and 4.18, respectively. Ridge regression methods were particularly effective in decreasing the multicollinearity involving predictor variables of breed additive effects. Because of a high degree of confounding between estimates of maternal dominance and direct epistatic loss effects, it was not possible to compare the relative importance of these effects with a high level of confidence. The inclusion of epistatic loss effects in the additive-dominance model did not cause noticeable reranking of sires, dams, and calves based on across-breed EBV. More precise estimates of breed effects as a result of this study may result in more stable across-breed estimated breeding values over the years.
Carabaño, M J; Díaz, C; Ugarte, C; Serrano, M
2007-02-01
Artificial insemination centers routinely collect records of quantity and quality of semen of bulls throughout the animals' productive period. The goal of this paper was to explore the use of random regression models with orthogonal polynomials to analyze repeated measures of semen production of Spanish Holstein bulls. A total of 8,773 records of volume of first ejaculate (VFE) collected between 12 and 30 mo of age from 213 Spanish Holstein bulls was analyzed under alternative random regression models. Legendre polynomial functions of increasing order (0 to 6) were fitted to the average trajectory, additive genetic and permanent environmental effects. Age at collection and days in production were used as time variables. Heterogeneous and homogeneous residual variances were alternatively assumed. Analyses were carried out within a Bayesian framework. The logarithm of the marginal density and the cross-validation predictive ability of the data were used as model comparison criteria. Based on both criteria, age at collection as a time variable and heterogeneous residuals models are recommended to analyze changes of VFE over time. Both criteria indicated that fitting random curves for genetic and permanent environmental components as well as for the average trajector improved the quality of models. Furthermore, models with a higher order polynomial for the permanent environmental (5 to 6) than for the genetic components (4 to 5) and the average trajectory (2 to 3) tended to perform best. High-order polynomials were needed to accommodate the highly oscillating nature of the phenotypic values. Heritability and repeatability estimates, disregarding the extremes of the studied period, ranged from 0.15 to 0.35 and from 0.20 to 0.50, respectively, indicating that selection for VFE may be effective at any stage. Small differences among models were observed. Apart from the extremes, estimated correlations between ages decreased steadily from 0.9 and 0.4 for measures 1 mo apart to 0.4 and 0.2 for most distant measures for additive genetic and phenotypic components, respectively. Further investigation to account for environmental factors that may be responsible for the oscillating observations of VFE is needed.
Sando, Roy; Chase, Katherine J.
2017-03-23
A common statistical procedure for estimating streamflow statistics at ungaged locations is to develop a relational model between streamflow and drainage basin characteristics at gaged locations using least squares regression analysis; however, least squares regression methods are parametric and make constraining assumptions about the data distribution. The random forest regression method provides an alternative nonparametric method for estimating streamflow characteristics at ungaged sites and requires that the data meet fewer statistical conditions than least squares regression methods.Random forest regression analysis was used to develop predictive models for 89 streamflow characteristics using Precipitation-Runoff Modeling System simulated streamflow data and drainage basin characteristics at 179 sites in central and eastern Montana. The predictive models were developed from streamflow data simulated for current (baseline, water years 1982–99) conditions and three future periods (water years 2021–38, 2046–63, and 2071–88) under three different climate-change scenarios. These predictive models were then used to predict streamflow characteristics for baseline conditions and three future periods at 1,707 fish sampling sites in central and eastern Montana. The average root mean square error for all predictive models was about 50 percent. When streamflow predictions at 23 fish sampling sites were compared to nearby locations with simulated data, the mean relative percent difference was about 43 percent. When predictions were compared to streamflow data recorded at 21 U.S. Geological Survey streamflow-gaging stations outside of the calibration basins, the average mean absolute percent error was about 73 percent.
Multiple Imputation of a Randomly Censored Covariate Improves Logistic Regression Analysis.
Atem, Folefac D; Qian, Jing; Maye, Jacqueline E; Johnson, Keith A; Betensky, Rebecca A
2016-01-01
Randomly censored covariates arise frequently in epidemiologic studies. The most commonly used methods, including complete case and single imputation or substitution, suffer from inefficiency and bias. They make strong parametric assumptions or they consider limit of detection censoring only. We employ multiple imputation, in conjunction with semi-parametric modeling of the censored covariate, to overcome these shortcomings and to facilitate robust estimation. We develop a multiple imputation approach for randomly censored covariates within the framework of a logistic regression model. We use the non-parametric estimate of the covariate distribution or the semiparametric Cox model estimate in the presence of additional covariates in the model. We evaluate this procedure in simulations, and compare its operating characteristics to those from the complete case analysis and a survival regression approach. We apply the procedures to an Alzheimer's study of the association between amyloid positivity and maternal age of onset of dementia. Multiple imputation achieves lower standard errors and higher power than the complete case approach under heavy and moderate censoring and is comparable under light censoring. The survival regression approach achieves the highest power among all procedures, but does not produce interpretable estimates of association. Multiple imputation offers a favorable alternative to complete case analysis and ad hoc substitution methods in the presence of randomly censored covariates within the framework of logistic regression.
Norrie, John; Davidson, Kate; Tata, Philip; Gumley, Andrew
2013-01-01
Objectives We investigated the treatment effects reported from a high-quality randomized controlled trial of cognitive behavioural therapy (CBT) for 106 people with borderline personality disorder attending community-based clinics in the UK National Health Service – the BOSCOT trial. Specifically, we examined whether the amount of therapy and therapist competence had an impact on our primary outcome, the number of suicidal acts†, using instrumental variables regression modelling. Design Randomized controlled trial. Participants from across three sites (London, Glasgow, and Ayrshire/Arran) were randomized equally to CBT for personality disorders (CBTpd) plus Treatment as Usual or to Treatment as Usual. Treatment as Usual varied between sites and individuals, but was consistent with routine treatment in the UK National Health Service at the time. CBTpd comprised an average 16 sessions (range 0–35) over 12 months. Method We used instrumental variable regression modelling to estimate the impact of quantity and quality of therapy received (recording activities and behaviours that took place after randomization) on number of suicidal acts and inpatient psychiatric hospitalization. Results A total of 101 participants provided full outcome data at 2 years post randomization. The previously reported intention-to-treat (ITT) results showed on average a reduction of 0.91 (95% confidence interval 0.15–1.67) suicidal acts over 2 years for those randomized to CBT. By incorporating the influence of quantity of therapy and therapist competence, we show that this estimate of the effect of CBTpd could be approximately two to three times greater for those receiving the right amount of therapy from a competent therapist. Conclusions Trials should routinely control for and collect data on both quantity of therapy and therapist competence, which can be used, via instrumental variable regression modelling, to estimate treatment effects for optimal delivery of therapy. Such estimates complement rather than replace the ITT results, which are properly the principal analysis results from such trials. Practitioner points Assessing the impact of the quantity and quality of therapy (competence of therapists) is complex. More competent therapists, trained in CBTpd, may significantly reduce the number of suicidal act in patients with borderline personality disorder. PMID:23420622
Bennett, Bradley C; Husby, Chad E
2008-03-28
Botanical pharmacopoeias are non-random subsets of floras, with some taxonomic groups over- or under-represented. Moerman [Moerman, D.E., 1979. Symbols and selectivity: a statistical analysis of Native American medical ethnobotany, Journal of Ethnopharmacology 1, 111-119] introduced linear regression/residual analysis to examine these patterns. However, regression, the commonly-employed analysis, suffers from several statistical flaws. We use contingency table and binomial analyses to examine patterns of Shuar medicinal plant use (from Amazonian Ecuador). We first analyzed the Shuar data using Moerman's approach, modified to better meet requirements of linear regression analysis. Second, we assessed the exact randomization contingency table test for goodness of fit. Third, we developed a binomial model to test for non-random selection of plants in individual families. Modified regression models (which accommodated assumptions of linear regression) reduced R(2) to from 0.59 to 0.38, but did not eliminate all problems associated with regression analyses. Contingency table analyses revealed that the entire flora departs from the null model of equal proportions of medicinal plants in all families. In the binomial analysis, only 10 angiosperm families (of 115) differed significantly from the null model. These 10 families are largely responsible for patterns seen at higher taxonomic levels. Contingency table and binomial analyses offer an easy and statistically valid alternative to the regression approach.
Hamel, Sandra; Yoccoz, Nigel G; Gaillard, Jean-Michel
2017-05-01
Mixed models are now well-established methods in ecology and evolution because they allow accounting for and quantifying within- and between-individual variation. However, the required normal distribution of the random effects can often be violated by the presence of clusters among subjects, which leads to multi-modal distributions. In such cases, using what is known as mixture regression models might offer a more appropriate approach. These models are widely used in psychology, sociology, and medicine to describe the diversity of trajectories occurring within a population over time (e.g. psychological development, growth). In ecology and evolution, however, these models are seldom used even though understanding changes in individual trajectories is an active area of research in life-history studies. Our aim is to demonstrate the value of using mixture models to describe variation in individual life-history tactics within a population, and hence to promote the use of these models by ecologists and evolutionary ecologists. We first ran a set of simulations to determine whether and when a mixture model allows teasing apart latent clustering, and to contrast the precision and accuracy of estimates obtained from mixture models versus mixed models under a wide range of ecological contexts. We then used empirical data from long-term studies of large mammals to illustrate the potential of using mixture models for assessing within-population variation in life-history tactics. Mixture models performed well in most cases, except for variables following a Bernoulli distribution and when sample size was small. The four selection criteria we evaluated [Akaike information criterion (AIC), Bayesian information criterion (BIC), and two bootstrap methods] performed similarly well, selecting the right number of clusters in most ecological situations. We then showed that the normality of random effects implicitly assumed by evolutionary ecologists when using mixed models was often violated in life-history data. Mixed models were quite robust to this violation in the sense that fixed effects were unbiased at the population level. However, fixed effects at the cluster level and random effects were better estimated using mixture models. Our empirical analyses demonstrated that using mixture models facilitates the identification of the diversity of growth and reproductive tactics occurring within a population. Therefore, using this modelling framework allows testing for the presence of clusters and, when clusters occur, provides reliable estimates of fixed and random effects for each cluster of the population. In the presence or expectation of clusters, using mixture models offers a suitable extension of mixed models, particularly when evolutionary ecologists aim at identifying how ecological and evolutionary processes change within a population. Mixture regression models therefore provide a valuable addition to the statistical toolbox of evolutionary ecologists. As these models are complex and have their own limitations, we provide recommendations to guide future users. © 2016 Cambridge Philosophical Society.
The role of gender in a smoking cessation intervention: a cluster randomized clinical trial.
Puente, Diana; Cabezas, Carmen; Rodriguez-Blanco, Teresa; Fernández-Alonso, Carmen; Cebrian, Tránsito; Torrecilla, Miguel; Clemente, Lourdes; Martín, Carlos
2011-05-23
The prevalence of smoking in Spain is high in both men and women. The aim of our study was to evaluate the role of gender in the effectiveness of a specific smoking cessation intervention conducted in Spain. This study was a secondary analysis of a cluster randomized clinical trial in which the randomization unit was the Basic Care Unit (family physician and nurse who care for the same group of patients). The intervention consisted of a six-month period of implementing the recommendations of a Clinical Practice Guideline. A total of 2,937 current smokers at 82 Primary Care Centers in 13 different regions of Spain were included (2003-2005). The success rate was measured by a six-month continued abstinence rate at the one-year follow-up. A logistic mixed-effects regression model, taking Basic Care Units as random-effect parameter, was performed in order to analyze gender as a predictor of smoking cessation. At the one-year follow-up, the six-month continuous abstinence quit rate was 9.4% in men and 8.5% in women (p = 0.400). The logistic mixed-effects regression model showed that women did not have a higher odds of being an ex-smoker than men after the analysis was adjusted for confounders (OR adjusted = 0.9, 95% CI = 0.7-1.2). Gender does not appear to be a predictor of smoking cessation at the one-year follow-up in individuals presenting at Primary Care Centers. CLINICALTRIALS.GOV IDENTIFIER: NCT00125905.
Linear regression analysis of survival data with missing censoring indicators.
Wang, Qihua; Dinse, Gregg E
2011-04-01
Linear regression analysis has been studied extensively in a random censorship setting, but typically all of the censoring indicators are assumed to be observed. In this paper, we develop synthetic data methods for estimating regression parameters in a linear model when some censoring indicators are missing. We define estimators based on regression calibration, imputation, and inverse probability weighting techniques, and we prove all three estimators are asymptotically normal. The finite-sample performance of each estimator is evaluated via simulation. We illustrate our methods by assessing the effects of sex and age on the time to non-ambulatory progression for patients in a brain cancer clinical trial.
Random forests as cumulative effects models: A case study of lakes and rivers in Muskoka, Canada.
Jones, F Chris; Plewes, Rachel; Murison, Lorna; MacDougall, Mark J; Sinclair, Sarah; Davies, Christie; Bailey, John L; Richardson, Murray; Gunn, John
2017-10-01
Cumulative effects assessment (CEA) - a type of environmental appraisal - lacks effective methods for modeling cumulative effects, evaluating indicators of ecosystem condition, and exploring the likely outcomes of development scenarios. Random forests are an extension of classification and regression trees, which model response variables by recursive partitioning. Random forests were used to model a series of candidate ecological indicators that described lakes and rivers from a case study watershed (The Muskoka River Watershed, Canada). Suitability of the candidate indicators for use in cumulative effects assessment and watershed monitoring was assessed according to how well they could be predicted from natural habitat features and how sensitive they were to human land-use. The best models explained 75% of the variation in a multivariate descriptor of lake benthic-macroinvertebrate community structure, and 76% of the variation in the conductivity of river water. Similar results were obtained by cross-validation. Several candidate indicators detected a simulated doubling of urban land-use in their catchments, and a few were able to detect a simulated doubling of agricultural land-use. The paper demonstrates that random forests can be used to describe the combined and singular effects of multiple stressors and natural environmental factors, and furthermore, that random forests can be used to evaluate the performance of monitoring indicators. The numerical methods presented are applicable to any ecosystem and indicator type, and therefore represent a step forward for CEA. Crown Copyright © 2017. Published by Elsevier Ltd. All rights reserved.
Mixed effect Poisson log-linear models for clinical and epidemiological sleep hypnogram data
Swihart, Bruce J.; Caffo, Brian S.; Crainiceanu, Ciprian; Punjabi, Naresh M.
2013-01-01
Bayesian Poisson log-linear multilevel models scalable to epidemiological studies are proposed to investigate population variability in sleep state transition rates. Hierarchical random effects are used to account for pairings of subjects and repeated measures within those subjects, as comparing diseased to non-diseased subjects while minimizing bias is of importance. Essentially, non-parametric piecewise constant hazards are estimated and smoothed, allowing for time-varying covariates and segment of the night comparisons. The Bayesian Poisson regression is justified through a re-derivation of a classical algebraic likelihood equivalence of Poisson regression with a log(time) offset and survival regression assuming exponentially distributed survival times. Such re-derivation allows synthesis of two methods currently used to analyze sleep transition phenomena: stratified multi-state proportional hazards models and log-linear models with GEE for transition counts. An example data set from the Sleep Heart Health Study is analyzed. Supplementary material includes the analyzed data set as well as the code for a reproducible analysis. PMID:22241689
Roth, Daniel E; Richard, Stephanie A; Black, Robert E
2010-06-01
Routine zinc supplementation is a potential intervention for the prevention of acute lower respiratory infection (ALRI) in developing countries. However, discrepant findings from recent randomized trials remain unexplained. Randomized trials of zinc supplementation in young children in developing countries were identified by a systematic literature review. Trials included in the meta-analysis met specific criteria, including participants <5 years of age, daily/weekly zinc and control supplementation for greater than 3 months, active household surveillance for respiratory morbidity and use of a case definition that included at least one sign of lower respiratory tract illness. ALRI case definitions were classified on the basis of specificity/severity. Incidence rate ratios (IRRs) were pooled by random-effects models. Meta-regression and sub-group analysis were performed to assess potential sources of between-study heterogeneity. Ten trials were eligible for inclusion (n = 49 450 children randomized). Zinc reduced the incidence of ALRI defined by specific clinical criteria [IRR 0.65, 95% confidence interval (CI) 0.52-0.82], but had no effect on lower-specificity ALRI case definitions based on caregiver report (IRR 1.01, 95% CI 0.91-1.12) or World Health Organization 'non-severe pneumonia' (0.96, 95% CI 0.86-1.08). By meta-regression, the effect of zinc was associated with ALRI case definition, but not with mean baseline age, geographic location, nutritional status or zinc dose. Routine zinc supplementation reduced the incidence of childhood ALRI defined by relatively specific clinical criteria, but the effect was null if lower specificity case definitions were applied. The choice of ALRI case definition may substantially influence inferences from community trials regarding the efficacy of preventive interventions.
Berry, D P; Buckley, F; Dillon, P; Evans, R D; Rath, M; Veerkamp, R F
2003-11-01
Genetic (co)variances between body condition score (BCS), body weight (BW), milk yield, and fertility were estimated using a random regression animal model extended to multivariate analysis. The data analyzed included 81,313 BCS observations, 91,937 BW observations, and 100,458 milk test-day yields from 8725 multiparous Holstein-Friesian cows. A cubic random regression was sufficient to model the changing genetic variances for BCS, BW, and milk across different days in milk. The genetic correlations between BCS and fertility changed little over the lactation; genetic correlations between BCS and interval to first service and between BCS and pregnancy rate to first service varied from -0.47 to -0.31, and from 0.15 to 0.38, respectively. This suggests that maximum genetic gain in fertility from indirect selection on BCS should be based on measurements taken in midlactation when the genetic variance for BCS is largest. Selection for increased BW resulted in shorter intervals to first service, but more services and poorer pregnancy rates; genetic correlations between BW and pregnancy rate to first service varied from -0.52 to -0.45. Genetic selection for higher lactation milk yield alone through selection on increased milk yield in early lactation is likely to have a more deleterious effect on genetic merit for fertility than selection on higher milk yield in late lactation.
Nonparametric estimation and testing of fixed effects panel data models
Henderson, Daniel J.; Carroll, Raymond J.; Li, Qi
2009-01-01
In this paper we consider the problem of estimating nonparametric panel data models with fixed effects. We introduce an iterative nonparametric kernel estimator. We also extend the estimation method to the case of a semiparametric partially linear fixed effects model. To determine whether a parametric, semiparametric or nonparametric model is appropriate, we propose test statistics to test between the three alternatives in practice. We further propose a test statistic for testing the null hypothesis of random effects against fixed effects in a nonparametric panel data regression model. Simulations are used to examine the finite sample performance of the proposed estimators and the test statistics. PMID:19444335
Soares, Marta O.; Palmer, Stephen; Ades, Anthony E.; Harrison, David; Shankar-Hari, Manu; Rowan, Kathy M.
2015-01-01
Cost-effectiveness analysis (CEA) models are routinely used to inform health care policy. Key model inputs include relative effectiveness of competing treatments, typically informed by meta-analysis. Heterogeneity is ubiquitous in meta-analysis, and random effects models are usually used when there is variability in effects across studies. In the absence of observed treatment effect modifiers, various summaries from the random effects distribution (random effects mean, predictive distribution, random effects distribution, or study-specific estimate [shrunken or independent of other studies]) can be used depending on the relationship between the setting for the decision (population characteristics, treatment definitions, and other contextual factors) and the included studies. If covariates have been measured that could potentially explain the heterogeneity, then these can be included in a meta-regression model. We describe how covariates can be included in a network meta-analysis model and how the output from such an analysis can be used in a CEA model. We outline a model selection procedure to help choose between competing models and stress the importance of clinical input. We illustrate the approach with a health technology assessment of intravenous immunoglobulin for the management of adult patients with severe sepsis in an intensive care setting, which exemplifies how risk of bias information can be incorporated into CEA models. We show that the results of the CEA and value-of-information analyses are sensitive to the model and highlight the importance of sensitivity analyses when conducting CEA in the presence of heterogeneity. The methods presented extend naturally to heterogeneity in other model inputs, such as baseline risk. PMID:25712447
Welton, Nicky J; Soares, Marta O; Palmer, Stephen; Ades, Anthony E; Harrison, David; Shankar-Hari, Manu; Rowan, Kathy M
2015-07-01
Cost-effectiveness analysis (CEA) models are routinely used to inform health care policy. Key model inputs include relative effectiveness of competing treatments, typically informed by meta-analysis. Heterogeneity is ubiquitous in meta-analysis, and random effects models are usually used when there is variability in effects across studies. In the absence of observed treatment effect modifiers, various summaries from the random effects distribution (random effects mean, predictive distribution, random effects distribution, or study-specific estimate [shrunken or independent of other studies]) can be used depending on the relationship between the setting for the decision (population characteristics, treatment definitions, and other contextual factors) and the included studies. If covariates have been measured that could potentially explain the heterogeneity, then these can be included in a meta-regression model. We describe how covariates can be included in a network meta-analysis model and how the output from such an analysis can be used in a CEA model. We outline a model selection procedure to help choose between competing models and stress the importance of clinical input. We illustrate the approach with a health technology assessment of intravenous immunoglobulin for the management of adult patients with severe sepsis in an intensive care setting, which exemplifies how risk of bias information can be incorporated into CEA models. We show that the results of the CEA and value-of-information analyses are sensitive to the model and highlight the importance of sensitivity analyses when conducting CEA in the presence of heterogeneity. The methods presented extend naturally to heterogeneity in other model inputs, such as baseline risk. © The Author(s) 2015.
Application of random effects to the study of resource selection by animals
Gillies, C.S.; Hebblewhite, M.; Nielsen, S.E.; Krawchuk, M.A.; Aldridge, Cameron L.; Frair, J.L.; Saher, D.J.; Stevens, C.E.; Jerde, C.L.
2006-01-01
1. Resource selection estimated by logistic regression is used increasingly in studies to identify critical resources for animal populations and to predict species occurrence.2. Most frequently, individual animals are monitored and pooled to estimate population-level effects without regard to group or individual-level variation. Pooling assumes that both observations and their errors are independent, and resource selection is constant given individual variation in resource availability.3. Although researchers have identified ways to minimize autocorrelation, variation between individuals caused by differences in selection or available resources, including functional responses in resource selection, have not been well addressed.4. Here we review random-effects models and their application to resource selection modelling to overcome these common limitations. We present a simple case study of an analysis of resource selection by grizzly bears in the foothills of the Canadian Rocky Mountains with and without random effects.5. Both categorical and continuous variables in the grizzly bear model differed in interpretation, both in statistical significance and coefficient sign, depending on how a random effect was included. We used a simulation approach to clarify the application of random effects under three common situations for telemetry studies: (a) discrepancies in sample sizes among individuals; (b) differences among individuals in selection where availability is constant; and (c) differences in availability with and without a functional response in resource selection.6. We found that random intercepts accounted for unbalanced sample designs, and models with random intercepts and coefficients improved model fit given the variation in selection among individuals and functional responses in selection. Our empirical example and simulations demonstrate how including random effects in resource selection models can aid interpretation and address difficult assumptions limiting their generality. This approach will allow researchers to appropriately estimate marginal (population) and conditional (individual) responses, and account for complex grouping, unbalanced sample designs and autocorrelation.
Application of random effects to the study of resource selection by animals.
Gillies, Cameron S; Hebblewhite, Mark; Nielsen, Scott E; Krawchuk, Meg A; Aldridge, Cameron L; Frair, Jacqueline L; Saher, D Joanne; Stevens, Cameron E; Jerde, Christopher L
2006-07-01
1. Resource selection estimated by logistic regression is used increasingly in studies to identify critical resources for animal populations and to predict species occurrence. 2. Most frequently, individual animals are monitored and pooled to estimate population-level effects without regard to group or individual-level variation. Pooling assumes that both observations and their errors are independent, and resource selection is constant given individual variation in resource availability. 3. Although researchers have identified ways to minimize autocorrelation, variation between individuals caused by differences in selection or available resources, including functional responses in resource selection, have not been well addressed. 4. Here we review random-effects models and their application to resource selection modelling to overcome these common limitations. We present a simple case study of an analysis of resource selection by grizzly bears in the foothills of the Canadian Rocky Mountains with and without random effects. 5. Both categorical and continuous variables in the grizzly bear model differed in interpretation, both in statistical significance and coefficient sign, depending on how a random effect was included. We used a simulation approach to clarify the application of random effects under three common situations for telemetry studies: (a) discrepancies in sample sizes among individuals; (b) differences among individuals in selection where availability is constant; and (c) differences in availability with and without a functional response in resource selection. 6. We found that random intercepts accounted for unbalanced sample designs, and models with random intercepts and coefficients improved model fit given the variation in selection among individuals and functional responses in selection. Our empirical example and simulations demonstrate how including random effects in resource selection models can aid interpretation and address difficult assumptions limiting their generality. This approach will allow researchers to appropriately estimate marginal (population) and conditional (individual) responses, and account for complex grouping, unbalanced sample designs and autocorrelation.
Baseline adjustments for binary data in repeated cross-sectional cluster randomized trials.
Nixon, R M; Thompson, S G
2003-09-15
Analysis of covariance models, which adjust for a baseline covariate, are often used to compare treatment groups in a controlled trial in which individuals are randomized. Such analysis adjusts for any baseline imbalance and usually increases the precision of the treatment effect estimate. We assess the value of such adjustments in the context of a cluster randomized trial with repeated cross-sectional design and a binary outcome. In such a design, a new sample of individuals is taken from the clusters at each measurement occasion, so that baseline adjustment has to be at the cluster level. Logistic regression models are used to analyse the data, with cluster level random effects to allow for different outcome probabilities in each cluster. We compare the estimated treatment effect and its precision in models that incorporate a covariate measuring the cluster level probabilities at baseline and those that do not. In two data sets, taken from a cluster randomized trial in the treatment of menorrhagia, the value of baseline adjustment is only evident when the number of subjects per cluster is large. We assess the generalizability of these findings by undertaking a simulation study, and find that increased precision of the treatment effect requires both large cluster sizes and substantial heterogeneity between clusters at baseline, but baseline imbalance arising by chance in a randomized study can always be effectively adjusted for. Copyright 2003 John Wiley & Sons, Ltd.
NASA Astrophysics Data System (ADS)
Rooper, Christopher N.; Zimmermann, Mark; Prescott, Megan M.
2017-08-01
Deep-sea coral and sponge ecosystems are widespread throughout most of Alaska's marine waters, and are associated with many different species of fishes and invertebrates. These ecosystems are vulnerable to the effects of commercial fishing activities and climate change. We compared four commonly used species distribution models (general linear models, generalized additive models, boosted regression trees and random forest models) and an ensemble model to predict the presence or absence and abundance of six groups of benthic invertebrate taxa in the Gulf of Alaska. All four model types performed adequately on training data for predicting presence and absence, with regression forest models having the best overall performance measured by the area under the receiver-operating-curve (AUC). The models also performed well on the test data for presence and absence with average AUCs ranging from 0.66 to 0.82. For the test data, ensemble models performed the best. For abundance data, there was an obvious demarcation in performance between the two regression-based methods (general linear models and generalized additive models), and the tree-based models. The boosted regression tree and random forest models out-performed the other models by a wide margin on both the training and testing data. However, there was a significant drop-off in performance for all models of invertebrate abundance ( 50%) when moving from the training data to the testing data. Ensemble model performance was between the tree-based and regression-based methods. The maps of predictions from the models for both presence and abundance agreed very well across model types, with an increase in variability in predictions for the abundance data. We conclude that where data conforms well to the modeled distribution (such as the presence-absence data and binomial distribution in this study), the four types of models will provide similar results, although the regression-type models may be more consistent with biological theory. For data with highly zero-inflated distributions and non-normal distributions such as the abundance data from this study, the tree-based methods performed better. Ensemble models that averaged predictions across the four model types, performed better than the GLM or GAM models but slightly poorer than the tree-based methods, suggesting ensemble models might be more robust to overfitting than tree methods, while mitigating some of the disadvantages in predictive performance of regression methods.
Multilevel Modeling in Psychosomatic Medicine Research
Myers, Nicholas D.; Brincks, Ahnalee M.; Ames, Allison J.; Prado, Guillermo J.; Penedo, Frank J.; Benedict, Catherine
2012-01-01
The primary purpose of this manuscript is to provide an overview of multilevel modeling for Psychosomatic Medicine readers and contributors. The manuscript begins with a general introduction to multilevel modeling. Multilevel regression modeling at two-levels is emphasized because of its prevalence in psychosomatic medicine research. Simulated datasets based on some core ideas from the Familias Unidas effectiveness study are used to illustrate key concepts including: communication of model specification, parameter interpretation, sample size and power, and missing data. Input and key output files from Mplus and SAS are provided. A cluster randomized trial with repeated measures (i.e., three-level regression model) is then briefly presented with simulated data based on some core ideas from a cognitive behavioral stress management intervention in prostate cancer. PMID:23107843
Graphical Models for Quasi-Experimental Designs
ERIC Educational Resources Information Center
Steiner, Peter M.; Kim, Yongnam; Hall, Courtney E.; Su, Dan
2017-01-01
Randomized controlled trials (RCTs) and quasi-experimental designs like regression discontinuity (RD) designs, instrumental variable (IV) designs, and matching and propensity score (PS) designs are frequently used for inferring causal effects. It is well known that the features of these designs facilitate the identification of a causal estimand…
Huynh-Tran, V H; Gilbert, H; David, I
2017-11-01
The objective of the present study was to compare a random regression model, usually used in genetic analyses of longitudinal data, with the structured antedependence (SAD) model to study the longitudinal feed conversion ratio (FCR) in growing Large White pigs and to propose criteria for animal selection when used for genetic evaluation. The study was based on data from 11,790 weekly FCR measures collected on 1,186 Large White male growing pigs. Random regression (RR) using orthogonal polynomial Legendre and SAD models was used to estimate genetic parameters and predict FCR-based EBV for each of the 10 wk of the test. The results demonstrated that the best SAD model (1 order of antedependence of degree 2 and a polynomial of degree 2 for the innovation variance for the genetic and permanent environmental effects, i.e., 12 parameters) provided a better fit for the data than RR with a quadratic function for the genetic and permanent environmental effects (13 parameters), with Bayesian information criteria values of -10,060 and -9,838, respectively. Heritabilities with the SAD model were higher than those of RR over the first 7 wk of the test. Genetic correlations between weeks were higher than 0.68 for short intervals between weeks and decreased to 0.08 for the SAD model and -0.39 for RR for the longest intervals. These differences in genetic parameters showed that, contrary to the RR approach, the SAD model does not suffer from border effect problems and can handle genetic correlations that tend to 0. Summarized breeding values were proposed for each approach as linear combinations of the individual weekly EBV weighted by the coefficients of the first or second eigenvector computed from the genetic covariance matrix of the additive genetic effects. These summarized breeding values isolated EBV trajectories over time, capturing either the average general value or the slope of the trajectory. Finally, applying the SAD model over a reduced period of time suggested that similar selection choices would result from the use of the records from the first 8 wk of the test. To conclude, the SAD model performed well for the genetic evaluation of longitudinal phenotypes.
Genetic evaluation of weekly body weight in Japanese quail using random regression models.
Karami, K; Zerehdaran, S; Tahmoorespur, M; Barzanooni, B; Lotfi, E
2017-02-01
1. A total of 11 826 records from 2489 quails, hatched between 2012 and 2013, were used to estimate genetic parameters for BW (body weight) of Japanese quail using random regression models. Weekly BW was measured from hatch until 49 d of age. WOMBAT software (University of New England, Australia) was used for estimating genetic and phenotypic parameters. 2. Nineteen models were evaluated to identify the best orders of Legendre polynomials. A model with Legendre polynomial of order 3 for additive genetic effect, order 3 for permanent environmental effects and order 1 for maternal permanent environmental effects was chosen as the best model. 3. According to the best model, phenotypic and genetic variances were higher at the end of the rearing period. Although direct heritability for BW reduced from 0.18 at hatch to 0.12 at 7 d of age, it gradually increased to 0.42 at 49 d of age. It indicates that BW at older ages is more controlled by genetic components in Japanese quail. 4. Phenotypic and genetic correlations between adjacent periods except hatching weight were more closely correlated than remote periods. The present results suggested that BW at earlier ages, especially at hatch, are different traits compared to BW at older ages. Therefore, BW at earlier ages could not be used as a selection criterion for improving BW at slaughter age.
ERIC Educational Resources Information Center
Bulcock, J. W.; And Others
Advantages of normalization regression estimation over ridge regression estimation are demonstrated by reference to Bloom's model of school learning. Theoretical concern centered on the structure of scholastic achievement at grade 10 in Canadian high schools. Data on 886 students were randomly sampled from the Carnegie Human Resources Data Bank.…
[On the effectiveness of the homeopathic remedy Arnica montana].
Lüdtke, Rainer; Hacke, Daniela
2005-11-01
Arnica montana is a homeopathic remedy often prescribed after traumata and injuries. To assess whether Arnica is effective beyond placebo and to identify factors which support or contradict this effectiveness. All prospective, controlled trials on the effectiveness of homeopathic Arnica were included. Overall effectiveness was assessed by meta-analysis and meta-regression techniques. 68 comparisons from 49 clinical trials show a significant effectiveness of Arnica in traumatic injuries in random effects meta-analysis (odds ratio [OR], 0.36; 95% confidence interval [CI], 0.24-0.55), but not in meta-regression models (OR, 0.37; CI, 0.11-1.24). We found no evidence for publication bias. Studies from Medline-listed journals and high-quality studies are less likely to report positive results (p = 0.0006 and p = 0.0167). The hypothesis that homeopathic Arnica is effective could neither be proved nor rejected. All trials were highly heterogeneous, meta-regression does not help to explain this heterogeneity substantially.
Bayesian Nonparametric Inference – Why and How
Müller, Peter; Mitra, Riten
2013-01-01
We review inference under models with nonparametric Bayesian (BNP) priors. The discussion follows a set of examples for some common inference problems. The examples are chosen to highlight problems that are challenging for standard parametric inference. We discuss inference for density estimation, clustering, regression and for mixed effects models with random effects distributions. While we focus on arguing for the need for the flexibility of BNP models, we also review some of the more commonly used BNP models, thus hopefully answering a bit of both questions, why and how to use BNP. PMID:24368932
Zheng, Han; Kimber, Alan; Goodwin, Victoria A; Pickering, Ruth M
2018-01-01
A common design for a falls prevention trial is to assess falling at baseline, randomize participants into an intervention or control group, and ask them to record the number of falls they experience during a follow-up period of time. This paper addresses how best to include the baseline count in the analysis of the follow-up count of falls in negative binomial (NB) regression. We examine the performance of various approaches in simulated datasets where both counts are generated from a mixed Poisson distribution with shared random subject effect. Including the baseline count after log-transformation as a regressor in NB regression (NB-logged) or as an offset (NB-offset) resulted in greater power than including the untransformed baseline count (NB-unlogged). Cook and Wei's conditional negative binomial (CNB) model replicates the underlying process generating the data. In our motivating dataset, a statistically significant intervention effect resulted from the NB-logged, NB-offset, and CNB models, but not from NB-unlogged, and large, outlying baseline counts were overly influential in NB-unlogged but not in NB-logged. We conclude that there is little to lose by including the log-transformed baseline count in standard NB regression compared to CNB for moderate to larger sized datasets. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Mixed models, linear dependency, and identification in age-period-cohort models.
O'Brien, Robert M
2017-07-20
This paper examines the identification problem in age-period-cohort models that use either linear or categorically coded ages, periods, and cohorts or combinations of these parameterizations. These models are not identified using the traditional fixed effect regression model approach because of a linear dependency between the ages, periods, and cohorts. However, these models can be identified if the researcher introduces a single just identifying constraint on the model coefficients. The problem with such constraints is that the results can differ substantially depending on the constraint chosen. Somewhat surprisingly, age-period-cohort models that specify one or more of ages and/or periods and/or cohorts as random effects are identified. This is the case without introducing an additional constraint. I label this identification as statistical model identification and show how statistical model identification comes about in mixed models and why which effects are treated as fixed and which are treated as random can substantially change the estimates of the age, period, and cohort effects. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Modeling Longitudinal Data Containing Non-Normal Within Subject Errors
NASA Technical Reports Server (NTRS)
Feiveson, Alan; Glenn, Nancy L.
2013-01-01
The mission of the National Aeronautics and Space Administration’s (NASA) human research program is to advance safe human spaceflight. This involves conducting experiments, collecting data, and analyzing data. The data are longitudinal and result from a relatively few number of subjects; typically 10 – 20. A longitudinal study refers to an investigation where participant outcomes and possibly treatments are collected at multiple follow-up times. Standard statistical designs such as mean regression with random effects and mixed–effects regression are inadequate for such data because the population is typically not approximately normally distributed. Hence, more advanced data analysis methods are necessary. This research focuses on four such methods for longitudinal data analysis: the recently proposed linear quantile mixed models (lqmm) by Geraci and Bottai (2013), quantile regression, multilevel mixed–effects linear regression, and robust regression. This research also provides computational algorithms for longitudinal data that scientists can directly use for human spaceflight and other longitudinal data applications, then presents statistical evidence that verifies which method is best for specific situations. This advances the study of longitudinal data in a broad range of applications including applications in the sciences, technology, engineering and mathematics fields.
Dynamic prediction in functional concurrent regression with an application to child growth.
Leroux, Andrew; Xiao, Luo; Crainiceanu, Ciprian; Checkley, William
2018-04-15
In many studies, it is of interest to predict the future trajectory of subjects based on their historical data, referred to as dynamic prediction. Mixed effects models have traditionally been used for dynamic prediction. However, the commonly used random intercept and slope model is often not sufficiently flexible for modeling subject-specific trajectories. In addition, there may be useful exposures/predictors of interest that are measured concurrently with the outcome, complicating dynamic prediction. To address these problems, we propose a dynamic functional concurrent regression model to handle the case where both the functional response and the functional predictors are irregularly measured. Currently, such a model cannot be fit by existing software. We apply the model to dynamically predict children's length conditional on prior length, weight, and baseline covariates. Inference on model parameters and subject-specific trajectories is conducted using the mixed effects representation of the proposed model. An extensive simulation study shows that the dynamic functional regression model provides more accurate estimation and inference than existing methods. Methods are supported by fast, flexible, open source software that uses heavily tested smoothing techniques. © 2017 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.
Multi-fidelity Gaussian process regression for prediction of random fields
DOE Office of Scientific and Technical Information (OSTI.GOV)
Parussini, L.; Venturi, D., E-mail: venturi@ucsc.edu; Perdikaris, P.
We propose a new multi-fidelity Gaussian process regression (GPR) approach for prediction of random fields based on observations of surrogate models or hierarchies of surrogate models. Our method builds upon recent work on recursive Bayesian techniques, in particular recursive co-kriging, and extends it to vector-valued fields and various types of covariances, including separable and non-separable ones. The framework we propose is general and can be used to perform uncertainty propagation and quantification in model-based simulations, multi-fidelity data fusion, and surrogate-based optimization. We demonstrate the effectiveness of the proposed recursive GPR techniques through various examples. Specifically, we study the stochastic Burgersmore » equation and the stochastic Oberbeck–Boussinesq equations describing natural convection within a square enclosure. In both cases we find that the standard deviation of the Gaussian predictors as well as the absolute errors relative to benchmark stochastic solutions are very small, suggesting that the proposed multi-fidelity GPR approaches can yield highly accurate results.« less
The role of gender in a smoking cessation intervention: a cluster randomized clinical trial
2011-01-01
Background The prevalence of smoking in Spain is high in both men and women. The aim of our study was to evaluate the role of gender in the effectiveness of a specific smoking cessation intervention conducted in Spain. Methods This study was a secondary analysis of a cluster randomized clinical trial in which the randomization unit was the Basic Care Unit (family physician and nurse who care for the same group of patients). The intervention consisted of a six-month period of implementing the recommendations of a Clinical Practice Guideline. A total of 2,937 current smokers at 82 Primary Care Centers in 13 different regions of Spain were included (2003-2005). The success rate was measured by a six-month continued abstinence rate at the one-year follow-up. A logistic mixed-effects regression model, taking Basic Care Units as random-effect parameter, was performed in order to analyze gender as a predictor of smoking cessation. Results At the one-year follow-up, the six-month continuous abstinence quit rate was 9.4% in men and 8.5% in women (p = 0.400). The logistic mixed-effects regression model showed that women did not have a higher odds of being an ex-smoker than men after the analysis was adjusted for confounders (OR adjusted = 0.9, 95% CI = 0.7-1.2). Conclusions Gender does not appear to be a predictor of smoking cessation at the one-year follow-up in individuals presenting at Primary Care Centers. ClinicalTrials.gov Identifier NCT00125905. PMID:21605389
Austin, Peter C; Wagner, Philippe; Merlo, Juan
2017-03-15
Multilevel data occurs frequently in many research areas like health services research and epidemiology. A suitable way to analyze such data is through the use of multilevel regression models (MLRM). MLRM incorporate cluster-specific random effects which allow one to partition the total individual variance into between-cluster variation and between-individual variation. Statistically, MLRM account for the dependency of the data within clusters and provide correct estimates of uncertainty around regression coefficients. Substantively, the magnitude of the effect of clustering provides a measure of the General Contextual Effect (GCE). When outcomes are binary, the GCE can also be quantified by measures of heterogeneity like the Median Odds Ratio (MOR) calculated from a multilevel logistic regression model. Time-to-event outcomes within a multilevel structure occur commonly in epidemiological and medical research. However, the Median Hazard Ratio (MHR) that corresponds to the MOR in multilevel (i.e., 'frailty') Cox proportional hazards regression is rarely used. Analogously to the MOR, the MHR is the median relative change in the hazard of the occurrence of the outcome when comparing identical subjects from two randomly selected different clusters that are ordered by risk. We illustrate the application and interpretation of the MHR in a case study analyzing the hazard of mortality in patients hospitalized for acute myocardial infarction at hospitals in Ontario, Canada. We provide R code for computing the MHR. The MHR is a useful and intuitive measure for expressing cluster heterogeneity in the outcome and, thereby, estimating general contextual effects in multilevel survival analysis. © 2016 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. © 2016 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.
Wagner, Philippe; Merlo, Juan
2016-01-01
Multilevel data occurs frequently in many research areas like health services research and epidemiology. A suitable way to analyze such data is through the use of multilevel regression models (MLRM). MLRM incorporate cluster‐specific random effects which allow one to partition the total individual variance into between‐cluster variation and between‐individual variation. Statistically, MLRM account for the dependency of the data within clusters and provide correct estimates of uncertainty around regression coefficients. Substantively, the magnitude of the effect of clustering provides a measure of the General Contextual Effect (GCE). When outcomes are binary, the GCE can also be quantified by measures of heterogeneity like the Median Odds Ratio (MOR) calculated from a multilevel logistic regression model. Time‐to‐event outcomes within a multilevel structure occur commonly in epidemiological and medical research. However, the Median Hazard Ratio (MHR) that corresponds to the MOR in multilevel (i.e., ‘frailty’) Cox proportional hazards regression is rarely used. Analogously to the MOR, the MHR is the median relative change in the hazard of the occurrence of the outcome when comparing identical subjects from two randomly selected different clusters that are ordered by risk. We illustrate the application and interpretation of the MHR in a case study analyzing the hazard of mortality in patients hospitalized for acute myocardial infarction at hospitals in Ontario, Canada. We provide R code for computing the MHR. The MHR is a useful and intuitive measure for expressing cluster heterogeneity in the outcome and, thereby, estimating general contextual effects in multilevel survival analysis. © 2016 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. PMID:27885709
ERIC Educational Resources Information Center
Barry, Tammy D.; Lochman, John E.; Fite, Paula J.; Wells, Karen C.; Colder, Craig R.
2012-01-01
The current study utilized a longitudinal design to examine the effects of neighborhood and parenting on 120 at-risk children's academic and aggressive outcomes, concurrently and at two later timepoints during the transition to middle school. Random effects regression models were estimated to examine whether neighborhood characteristics and harsh…
NASA Technical Reports Server (NTRS)
Hohenemser, K. H.; Crews, S. T.
1972-01-01
A two bladed 16-inch hingeless rotor model was built and tested outside and inside a 24 by 24 inch wind tunnel test section at collective pitch settings up to 5 deg and rotor advance ratios up to .4. The rotor model has a simple eccentric mechanism to provide progressing or regressing cyclic pitch excitation. The flapping responses were compared to analytically determined responses which included flap-bending elasticity but excluded rotor wake effects. Substantial systematic deviations of the measured responses from the computed responses were found, which were interpreted as the effects of interaction of the blades with a rotating asymmetrical wake.
ERIC Educational Resources Information Center
Trochim, William M. K.; And Others
1991-01-01
The regression-discontinuity design involving a treatment interaction effect (TIE), pretest-posttest functional form specification, and choice of point-of-estimation of the TIE are examined. Formulas for controlling the magnitude of TIE in simulations can be used for simulating the randomized experimental case where estimation is not at the…
NASA Astrophysics Data System (ADS)
Tang, Jie; Liu, Rong; Zhang, Yue-Li; Liu, Mou-Ze; Hu, Yong-Fang; Shao, Ming-Jie; Zhu, Li-Jun; Xin, Hua-Wen; Feng, Gui-Wen; Shang, Wen-Jun; Meng, Xiang-Guang; Zhang, Li-Rong; Ming, Ying-Zi; Zhang, Wei
2017-02-01
Tacrolimus has a narrow therapeutic window and considerable variability in clinical use. Our goal was to compare the performance of multiple linear regression (MLR) and eight machine learning techniques in pharmacogenetic algorithm-based prediction of tacrolimus stable dose (TSD) in a large Chinese cohort. A total of 1,045 renal transplant patients were recruited, 80% of which were randomly selected as the “derivation cohort” to develop dose-prediction algorithm, while the remaining 20% constituted the “validation cohort” to test the final selected algorithm. MLR, artificial neural network (ANN), regression tree (RT), multivariate adaptive regression splines (MARS), boosted regression tree (BRT), support vector regression (SVR), random forest regression (RFR), lasso regression (LAR) and Bayesian additive regression trees (BART) were applied and their performances were compared in this work. Among all the machine learning models, RT performed best in both derivation [0.71 (0.67-0.76)] and validation cohorts [0.73 (0.63-0.82)]. In addition, the ideal rate of RT was 4% higher than that of MLR. To our knowledge, this is the first study to use machine learning models to predict TSD, which will further facilitate personalized medicine in tacrolimus administration in the future.
An Empirical Comparison of Randomized Control Trials and Regression Discontinuity Estimations
ERIC Educational Resources Information Center
Barrera-Osorio, Felipe; Filmer, Deon; McIntyre, Joe
2014-01-01
Randomized controlled trials (RCTs) and regression discontinuity (RD) studies both provide estimates of causal effects. A major difference between the two is that RD only estimates local average treatment effects (LATE) near the cutoff point of the forcing variable. This has been cited as a drawback to RD designs (Cook & Wong, 2008).…
Covariate Selection for Multilevel Models with Missing Data
Marino, Miguel; Buxton, Orfeu M.; Li, Yi
2017-01-01
Missing covariate data hampers variable selection in multilevel regression settings. Current variable selection techniques for multiply-imputed data commonly address missingness in the predictors through list-wise deletion and stepwise-selection methods which are problematic. Moreover, most variable selection methods are developed for independent linear regression models and do not accommodate multilevel mixed effects regression models with incomplete covariate data. We develop a novel methodology that is able to perform covariate selection across multiply-imputed data for multilevel random effects models when missing data is present. Specifically, we propose to stack the multiply-imputed data sets from a multiple imputation procedure and to apply a group variable selection procedure through group lasso regularization to assess the overall impact of each predictor on the outcome across the imputed data sets. Simulations confirm the advantageous performance of the proposed method compared with the competing methods. We applied the method to reanalyze the Healthy Directions-Small Business cancer prevention study, which evaluated a behavioral intervention program targeting multiple risk-related behaviors in a working-class, multi-ethnic population. PMID:28239457
Digital Games, Design, and Learning: A Systematic Review and Meta-Analysis
ERIC Educational Resources Information Center
Clark, Douglas B.; Tanner-Smith, Emily E.; Killingsworth, Stephen S.
2016-01-01
In this meta-analysis, we systematically reviewed research on digital games and learning for K-16 students. We synthesized comparisons of game versus nongame conditions (i.e., media comparisons) and comparisons of augmented games versus standard game designs (i.e., value-added comparisons). We used random-effects meta-regression models with robust…
An overview of longitudinal data analysis methods for neurological research.
Locascio, Joseph J; Atri, Alireza
2011-01-01
The purpose of this article is to provide a concise, broad and readily accessible overview of longitudinal data analysis methods, aimed to be a practical guide for clinical investigators in neurology. In general, we advise that older, traditional methods, including (1) simple regression of the dependent variable on a time measure, (2) analyzing a single summary subject level number that indexes changes for each subject and (3) a general linear model approach with a fixed-subject effect, should be reserved for quick, simple or preliminary analyses. We advocate the general use of mixed-random and fixed-effect regression models for analyses of most longitudinal clinical studies. Under restrictive situations or to provide validation, we recommend: (1) repeated-measure analysis of covariance (ANCOVA), (2) ANCOVA for two time points, (3) generalized estimating equations and (4) latent growth curve/structural equation models.
NASA Astrophysics Data System (ADS)
Norajitra, Tobias; Meinzer, Hans-Peter; Maier-Hein, Klaus H.
2015-03-01
During image segmentation, 3D Statistical Shape Models (SSM) usually conduct a limited search for target landmarks within one-dimensional search profiles perpendicular to the model surface. In addition, landmark appearance is modeled only locally based on linear profiles and weak learners, altogether leading to segmentation errors from landmark ambiguities and limited search coverage. We present a new method for 3D SSM segmentation based on 3D Random Forest Regression Voting. For each surface landmark, a Random Regression Forest is trained that learns a 3D spatial displacement function between the according reference landmark and a set of surrounding sample points, based on an infinite set of non-local randomized 3D Haar-like features. Landmark search is then conducted omni-directionally within 3D search spaces, where voxelwise forest predictions on landmark position contribute to a common voting map which reflects the overall position estimate. Segmentation experiments were conducted on a set of 45 CT volumes of the human liver, of which 40 images were randomly chosen for training and 5 for testing. Without parameter optimization, using a simple candidate selection and a single resolution approach, excellent results were achieved, while faster convergence and better concavity segmentation were observed, altogether underlining the potential of our approach in terms of increased robustness from distinct landmark detection and from better search coverage.
Modeling Verdict Outcomes Using Social Network Measures: The Watergate and Caviar Network Cases.
Masías, Víctor Hugo; Valle, Mauricio; Morselli, Carlo; Crespo, Fernando; Vargas, Augusto; Laengle, Sigifredo
2016-01-01
Modelling criminal trial verdict outcomes using social network measures is an emerging research area in quantitative criminology. Few studies have yet analyzed which of these measures are the most important for verdict modelling or which data classification techniques perform best for this application. To compare the performance of different techniques in classifying members of a criminal network, this article applies three different machine learning classifiers-Logistic Regression, Naïve Bayes and Random Forest-with a range of social network measures and the necessary databases to model the verdicts in two real-world cases: the U.S. Watergate Conspiracy of the 1970's and the now-defunct Canada-based international drug trafficking ring known as the Caviar Network. In both cases it was found that the Random Forest classifier did better than either Logistic Regression or Naïve Bayes, and its superior performance was statistically significant. This being so, Random Forest was used not only for classification but also to assess the importance of the measures. For the Watergate case, the most important one proved to be betweenness centrality while for the Caviar Network, it was the effective size of the network. These results are significant because they show that an approach combining machine learning with social network analysis not only can generate accurate classification models but also helps quantify the importance social network variables in modelling verdict outcomes. We conclude our analysis with a discussion and some suggestions for future work in verdict modelling using social network measures.
Regression Discontinuity for Causal Effect Estimation in Epidemiology.
Oldenburg, Catherine E; Moscoe, Ellen; Bärnighausen, Till
Regression discontinuity analyses can generate estimates of the causal effects of an exposure when a continuously measured variable is used to assign the exposure to individuals based on a threshold rule. Individuals just above the threshold are expected to be similar in their distribution of measured and unmeasured baseline covariates to individuals just below the threshold, resulting in exchangeability. At the threshold exchangeability is guaranteed if there is random variation in the continuous assignment variable, e.g., due to random measurement error. Under exchangeability, causal effects can be identified at the threshold. The regression discontinuity intention-to-treat (RD-ITT) effect on an outcome can be estimated as the difference in the outcome between individuals just above (or below) versus just below (or above) the threshold. This effect is analogous to the ITT effect in a randomized controlled trial. Instrumental variable methods can be used to estimate the effect of exposure itself utilizing the threshold as the instrument. We review the recent epidemiologic literature reporting regression discontinuity studies and find that while regression discontinuity designs are beginning to be utilized in a variety of applications in epidemiology, they are still relatively rare, and analytic and reporting practices vary. Regression discontinuity has the potential to greatly contribute to the evidence base in epidemiology, in particular on the real-life and long-term effects and side-effects of medical treatments that are provided based on threshold rules - such as treatments for low birth weight, hypertension or diabetes.
Solvency supervision based on a total balance sheet approach
NASA Astrophysics Data System (ADS)
Pitselis, Georgios
2009-11-01
In this paper we investigate the adequacy of the own funds a company requires in order to remain healthy and avoid insolvency. Two methods are applied here; the quantile regression method and the method of mixed effects models. Quantile regression is capable of providing a more complete statistical analysis of the stochastic relationship among random variables than least squares estimation. The estimated mixed effects line can be considered as an internal industry equation (norm), which explains a systematic relation between a dependent variable (such as own funds) with independent variables (e.g. financial characteristics, such as assets, provisions, etc.). The above two methods are implemented with two data sets.
NASA Astrophysics Data System (ADS)
Hoffman, A.; Forest, C. E.; Kemanian, A.
2016-12-01
A significant number of food-insecure nations exist in regions of the world where dust plays a large role in the climate system. While the impacts of common climate variables (e.g. temperature, precipitation, ozone, and carbon dioxide) on crop yields are relatively well understood, the impact of mineral aerosols on yields have not yet been thoroughly investigated. This research aims to develop the data and tools to progress our understanding of mineral aerosol impacts on crop yields. Suspended dust affects crop yields by altering the amount and type of radiation reaching the plant, modifying local temperature and precipitation. While dust events (i.e. dust storms) affect crop yields by depleting the soil of nutrients or by defoliation via particle abrasion. The impact of dust on yields is modeled statistically because we are uncertain which impacts will dominate the response on national and regional scales considered in this study. Multiple linear regression is used in a number of large-scale statistical crop modeling studies to estimate yield responses to various climate variables. In alignment with previous work, we develop linear crop models, but build upon this simple method of regression with machine-learning techniques (e.g. random forests) to identify important statistical predictors and isolate how dust affects yields on the scales of interest. To perform this analysis, we develop a crop-climate dataset for maize, soybean, groundnut, sorghum, rice, and wheat for the regions of West Africa, East Africa, South Africa, and the Sahel. Random forest regression models consistently model historic crop yields better than the linear models. In several instances, the random forest models accurately capture the temperature and precipitation threshold behavior in crops. Additionally, improving agricultural technology has caused a well-documented positive trend that dominates time series of global and regional yields. This trend is often removed before regression with traditional crop models, but likely at the cost of removing climate information. Our random forest models consistently discover the positive trend without removing any additional data. The application of random forests as a statistical crop model provides insight into understanding the impact of dust on yields in marginal food producing regions.
Veerkamp, R F; Koenen, E P; De Jong, G
2001-10-01
Twenty type classifiers scored body condition (BCS) of 91,738 first-parity cows from 601 sires and 5518 maternal grandsires. Fertility data during first lactation were extracted for 177,220 cows, of which 67,278 also had a BCS observation, and first-lactation 305-d milk, fat, and protein yields were added for 180,631 cows. Heritabilities and genetic correlations were estimated using a sire-maternal grandsire model. Heritability of BCS was 0.38. Heritabilities for fertility traits were low (0.01 to 0.07), but genetic standard deviations were substantial, 9 d for days to first service and calving interval, 0.25 for number of services, and 5% for first-service conception. Phenotypic correlations between fertility and yield or BCS were small (-0.15 to 0.20). Genetic correlations between yield and all fertility traits were unfavorable (0.37 to 0.74). Genetic correlations with BCS were between -0.4 and -0.6 for calving interval and days to first service. Random regression analysis (RR) showed that correlations changed with days in milk for BCS. Little agreement was found between variances and correlations from RR, and analysis including a single month (mo 1 to 10) of data for BCS, especially during early and late lactation. However, this was due to excluding data from the conventional analysis, rather than due to the polynomials used. RR and a conventional five-traits model where BCS in mo 1, 4, 7, and 10 was treated as a separate traits (plus yield or fertility) gave similar results. Thus a parsimonious random regression model gave more realistic estimates for the (co)variances than a series of bivariate analysis on subsets of the data for BCS. A higher genetic merit for yield has unfavorable effects on fertility, but the genetic correlation suggests that BCS (at some stages of lactation) might help to alleviate the unfavorable effect of selection for higher yield on fertility.
Kim, Sungduk; Chen, Ming-Hui; Ibrahim, Joseph G.; Shah, Arvind K.; Lin, Jianxin
2013-01-01
In this paper, we propose a class of Box-Cox transformation regression models with multidimensional random effects for analyzing multivariate responses for individual patient data (IPD) in meta-analysis. Our modeling formulation uses a multivariate normal response meta-analysis model with multivariate random effects, in which each response is allowed to have its own Box-Cox transformation. Prior distributions are specified for the Box-Cox transformation parameters as well as the regression coefficients in this complex model, and the Deviance Information Criterion (DIC) is used to select the best transformation model. Since the model is quite complex, a novel Monte Carlo Markov chain (MCMC) sampling scheme is developed to sample from the joint posterior of the parameters. This model is motivated by a very rich dataset comprising 26 clinical trials involving cholesterol lowering drugs where the goal is to jointly model the three dimensional response consisting of Low Density Lipoprotein Cholesterol (LDL-C), High Density Lipoprotein Cholesterol (HDL-C), and Triglycerides (TG) (LDL-C, HDL-C, TG). Since the joint distribution of (LDL-C, HDL-C, TG) is not multivariate normal and in fact quite skewed, a Box-Cox transformation is needed to achieve normality. In the clinical literature, these three variables are usually analyzed univariately: however, a multivariate approach would be more appropriate since these variables are correlated with each other. A detailed analysis of these data is carried out using the proposed methodology. PMID:23580436
Kim, Sungduk; Chen, Ming-Hui; Ibrahim, Joseph G; Shah, Arvind K; Lin, Jianxin
2013-10-15
In this paper, we propose a class of Box-Cox transformation regression models with multidimensional random effects for analyzing multivariate responses for individual patient data in meta-analysis. Our modeling formulation uses a multivariate normal response meta-analysis model with multivariate random effects, in which each response is allowed to have its own Box-Cox transformation. Prior distributions are specified for the Box-Cox transformation parameters as well as the regression coefficients in this complex model, and the deviance information criterion is used to select the best transformation model. Because the model is quite complex, we develop a novel Monte Carlo Markov chain sampling scheme to sample from the joint posterior of the parameters. This model is motivated by a very rich dataset comprising 26 clinical trials involving cholesterol-lowering drugs where the goal is to jointly model the three-dimensional response consisting of low density lipoprotein cholesterol (LDL-C), high density lipoprotein cholesterol (HDL-C), and triglycerides (TG) (LDL-C, HDL-C, TG). Because the joint distribution of (LDL-C, HDL-C, TG) is not multivariate normal and in fact quite skewed, a Box-Cox transformation is needed to achieve normality. In the clinical literature, these three variables are usually analyzed univariately; however, a multivariate approach would be more appropriate because these variables are correlated with each other. We carry out a detailed analysis of these data by using the proposed methodology. Copyright © 2013 John Wiley & Sons, Ltd.
Arano, Ichiro; Sugimoto, Tomoyuki; Hamasaki, Toshimitsu; Ohno, Yuko
2010-04-23
Survival analysis methods such as the Kaplan-Meier method, log-rank test, and Cox proportional hazards regression (Cox regression) are commonly used to analyze data from randomized withdrawal studies in patients with major depressive disorder. However, unfortunately, such common methods may be inappropriate when a long-term censored relapse-free time appears in data as the methods assume that if complete follow-up were possible for all individuals, each would eventually experience the event of interest. In this paper, to analyse data including such a long-term censored relapse-free time, we discuss a semi-parametric cure regression (Cox cure regression), which combines a logistic formulation for the probability of occurrence of an event with a Cox proportional hazards specification for the time of occurrence of the event. In specifying the treatment's effect on disease-free survival, we consider the fraction of long-term survivors and the risks associated with a relapse of the disease. In addition, we develop a tree-based method for the time to event data to identify groups of patients with differing prognoses (cure survival CART). Although analysis methods typically adapt the log-rank statistic for recursive partitioning procedures, the method applied here used a likelihood ratio (LR) test statistic from a fitting of cure survival regression assuming exponential and Weibull distributions for the latency time of relapse. The method is illustrated using data from a sertraline randomized withdrawal study in patients with major depressive disorder. We concluded that Cox cure regression reveals facts on who may be cured, and how the treatment and other factors effect on the cured incidence and on the relapse time of uncured patients, and that cure survival CART output provides easily understandable and interpretable information, useful both in identifying groups of patients with differing prognoses and in utilizing Cox cure regression models leading to meaningful interpretations.
Tangen, C M; Koch, G G
1999-03-01
In the randomized clinical trial setting, controlling for covariates is expected to produce variance reduction for the treatment parameter estimate and to adjust for random imbalances of covariates between the treatment groups. However, for the logistic regression model, variance reduction is not obviously obtained. This can lead to concerns about the assumptions of the logistic model. We introduce a complementary nonparametric method for covariate adjustment. It provides results that are usually compatible with expectations for analysis of covariance. The only assumptions required are based on randomization and sampling arguments. The resulting treatment parameter is a (unconditional) population average log-odds ratio that has been adjusted for random imbalance of covariates. Data from a randomized clinical trial are used to compare results from the traditional maximum likelihood logistic method with those from the nonparametric logistic method. We examine treatment parameter estimates, corresponding standard errors, and significance levels in models with and without covariate adjustment. In addition, we discuss differences between unconditional population average treatment parameters and conditional subpopulation average treatment parameters. Additional features of the nonparametric method, including stratified (multicenter) and multivariate (multivisit) analyses, are illustrated. Extensions of this methodology to the proportional odds model are also made.
L.R. Iverson; A.M. Prasad; A. Liaw
2004-01-01
More and better machine learning tools are becoming available for landscape ecologists to aid in understanding species-environment relationships and to map probable species occurrence now and potentially into the future. To thal end, we evaluated three statistical models: Regression Tree Analybib (RTA), Bagging Trees (BT) and Random Forest (RF) for their utility in...
Multiscale measurement error models for aggregated small area health data.
Aregay, Mehreteab; Lawson, Andrew B; Faes, Christel; Kirby, Russell S; Carroll, Rachel; Watjou, Kevin
2016-08-01
Spatial data are often aggregated from a finer (smaller) to a coarser (larger) geographical level. The process of data aggregation induces a scaling effect which smoothes the variation in the data. To address the scaling problem, multiscale models that link the convolution models at different scale levels via the shared random effect have been proposed. One of the main goals in aggregated health data is to investigate the relationship between predictors and an outcome at different geographical levels. In this paper, we extend multiscale models to examine whether a predictor effect at a finer level hold true at a coarser level. To adjust for predictor uncertainty due to aggregation, we applied measurement error models in the framework of multiscale approach. To assess the benefit of using multiscale measurement error models, we compare the performance of multiscale models with and without measurement error in both real and simulated data. We found that ignoring the measurement error in multiscale models underestimates the regression coefficient, while it overestimates the variance of the spatially structured random effect. On the other hand, accounting for the measurement error in multiscale models provides a better model fit and unbiased parameter estimates. © The Author(s) 2016.
An application of quantile random forests for predictive mapping of forest attributes
E.A. Freeman; G.G. Moisen
2015-01-01
Increasingly, random forest models are used in predictive mapping of forest attributes. Traditional random forests output the mean prediction from the random trees. Quantile regression forests (QRF) is an extension of random forests developed by Nicolai Meinshausen that provides non-parametric estimates of the median predicted value as well as prediction quantiles. It...
Coupé, Christophe
2018-01-01
As statistical approaches are getting increasingly used in linguistics, attention must be paid to the choice of methods and algorithms used. This is especially true since they require assumptions to be satisfied to provide valid results, and because scientific articles still often fall short of reporting whether such assumptions are met. Progress is being, however, made in various directions, one of them being the introduction of techniques able to model data that cannot be properly analyzed with simpler linear regression models. We report recent advances in statistical modeling in linguistics. We first describe linear mixed-effects regression models (LMM), which address grouping of observations, and generalized linear mixed-effects models (GLMM), which offer a family of distributions for the dependent variable. Generalized additive models (GAM) are then introduced, which allow modeling non-linear parametric or non-parametric relationships between the dependent variable and the predictors. We then highlight the possibilities offered by generalized additive models for location, scale, and shape (GAMLSS). We explain how they make it possible to go beyond common distributions, such as Gaussian or Poisson, and offer the appropriate inferential framework to account for ‘difficult’ variables such as count data with strong overdispersion. We also demonstrate how they offer interesting perspectives on data when not only the mean of the dependent variable is modeled, but also its variance, skewness, and kurtosis. As an illustration, the case of phonemic inventory size is analyzed throughout the article. For over 1,500 languages, we consider as predictors the number of speakers, the distance from Africa, an estimation of the intensity of language contact, and linguistic relationships. We discuss the use of random effects to account for genealogical relationships, the choice of appropriate distributions to model count data, and non-linear relationships. Relying on GAMLSS, we assess a range of candidate distributions, including the Sichel, Delaporte, Box-Cox Green and Cole, and Box-Cox t distributions. We find that the Box-Cox t distribution, with appropriate modeling of its parameters, best fits the conditional distribution of phonemic inventory size. We finally discuss the specificities of phoneme counts, weak effects, and how GAMLSS should be considered for other linguistic variables. PMID:29713298
Coupé, Christophe
2018-01-01
As statistical approaches are getting increasingly used in linguistics, attention must be paid to the choice of methods and algorithms used. This is especially true since they require assumptions to be satisfied to provide valid results, and because scientific articles still often fall short of reporting whether such assumptions are met. Progress is being, however, made in various directions, one of them being the introduction of techniques able to model data that cannot be properly analyzed with simpler linear regression models. We report recent advances in statistical modeling in linguistics. We first describe linear mixed-effects regression models (LMM), which address grouping of observations, and generalized linear mixed-effects models (GLMM), which offer a family of distributions for the dependent variable. Generalized additive models (GAM) are then introduced, which allow modeling non-linear parametric or non-parametric relationships between the dependent variable and the predictors. We then highlight the possibilities offered by generalized additive models for location, scale, and shape (GAMLSS). We explain how they make it possible to go beyond common distributions, such as Gaussian or Poisson, and offer the appropriate inferential framework to account for 'difficult' variables such as count data with strong overdispersion. We also demonstrate how they offer interesting perspectives on data when not only the mean of the dependent variable is modeled, but also its variance, skewness, and kurtosis. As an illustration, the case of phonemic inventory size is analyzed throughout the article. For over 1,500 languages, we consider as predictors the number of speakers, the distance from Africa, an estimation of the intensity of language contact, and linguistic relationships. We discuss the use of random effects to account for genealogical relationships, the choice of appropriate distributions to model count data, and non-linear relationships. Relying on GAMLSS, we assess a range of candidate distributions, including the Sichel, Delaporte, Box-Cox Green and Cole, and Box-Cox t distributions. We find that the Box-Cox t distribution, with appropriate modeling of its parameters, best fits the conditional distribution of phonemic inventory size. We finally discuss the specificities of phoneme counts, weak effects, and how GAMLSS should be considered for other linguistic variables.
Mookprom, S; Boonkum, W; Kunhareang, S; Siripanya, S; Duangjinda, M
2017-02-01
The objective of this research is to investigate appropriate random regression models with various covariance functions, for the genetic evaluation of test-day egg production. Data included 7,884 monthly egg production records from 657 Thai native chickens (Pradu Hang Dam) that were obtained during the first to sixth generation and were born during 2007 to 2014 at the Research and Development Network Center for Animal Breeding (Native Chickens), Khon Kaen University. Average annual and monthly egg productions were 117 ± 41 and 10.20 ± 6.40 eggs, respectively. Nine random regression models were analyzed using the Wilmink function (WM), Koops and Grossman function (KG), Legendre polynomials functions with second, third, and fourth orders (LG2, LG3, LG4), and spline functions with 4, 5, 6, and 8 knots (SP4, SP5, SP6, and SP8). All covariance functions were nested within the same additive genetic and permanent environmental random effects, and the variance components were estimated by Restricted Maximum Likelihood (REML). In model comparisons, mean square error (MSE) and the coefficient of detemination (R 2 ) calculated the goodness of fit; and the correlation between observed and predicted values [Formula: see text] was used to calculate the cross-validated predictive abilities. We found that the covariance functions of SP5, SP6, and SP8 proved appropriate for the genetic evaluation of the egg production curves for Thai native chickens. The estimated heritability of monthly egg production ranged from 0.07 to 0.39, and the highest heritability was found during the first to third months of egg production. In conclusion, the spline functions within monthly egg production can be applied to breeding programs for the improvement of both egg number and persistence of egg production. © 2016 Poultry Science Association Inc.
Fridman, M; Hodgkins, P S; Kahle, J S; Erder, M H
2015-06-01
There are few approved therapies for adults with attention-deficit/hyperactivity disorder (ADHD) in Europe. Lisdexamfetamine (LDX) is an effective treatment for ADHD; however, no clinical trials examining the efficacy of LDX specifically in European adults have been conducted. Therefore, to estimate the efficacy of LDX in European adults we performed a meta-regression of existing clinical data. A systematic review identified US- and Europe-based randomized efficacy trials of LDX, atomoxetine (ATX), or osmotic-release oral system methylphenidate (OROS-MPH) in children/adolescents and adults. A meta-regression model was then fitted to the published/calculated effect sizes (Cohen's d) using medication, geographical location, and age group as predictors. The LDX effect size in European adults was extrapolated from the fitted model. Sensitivity analyses performed included using adult-only studies and adding studies with placebo designs other than a standard pill-placebo design. Twenty-two of 2832 identified articles met inclusion criteria. The model-estimated effect size of LDX for European adults was 1.070 (95% confidence interval: 0.738, 1.401), larger than the 0.8 threshold for large effect sizes. The overall model fit was adequate (80%) and stable in the sensitivity analyses. This model predicts that LDX may have a large treatment effect size in European adults with ADHD. Copyright © 2015 Elsevier Masson SAS. All rights reserved.
Comparative effectiveness research in cancer with observational data.
Giordano, Sharon H
2015-01-01
Observational studies are increasingly being used for comparative effectiveness research. These studies can have the greatest impact when randomized trials are not feasible or when randomized studies have not included the population or outcomes of interest. However, careful attention must be paid to study design to minimize the likelihood of selection biases. Analytic techniques, such as multivariable regression modeling, propensity score analysis, and instrumental variable analysis, also can also be used to help address confounding. Oncology has many existing large and clinically rich observational databases that can be used for comparative effectiveness research. With careful study design, observational studies can produce valid results to assess the benefits and harms of a treatment or intervention in representative real-world populations.
Forecasting Daily Patient Outflow From a Ward Having No Real-Time Clinical Data
Tran, Truyen; Luo, Wei; Phung, Dinh; Venkatesh, Svetha
2016-01-01
Background: Modeling patient flow is crucial in understanding resource demand and prioritization. We study patient outflow from an open ward in an Australian hospital, where currently bed allocation is carried out by a manager relying on past experiences and looking at demand. Automatic methods that provide a reasonable estimate of total next-day discharges can aid in efficient bed management. The challenges in building such methods lie in dealing with large amounts of discharge noise introduced by the nonlinear nature of hospital procedures, and the nonavailability of real-time clinical information in wards. Objective Our study investigates different models to forecast the total number of next-day discharges from an open ward having no real-time clinical data. Methods We compared 5 popular regression algorithms to model total next-day discharges: (1) autoregressive integrated moving average (ARIMA), (2) the autoregressive moving average with exogenous variables (ARMAX), (3) k-nearest neighbor regression, (4) random forest regression, and (5) support vector regression. Although the autoregressive integrated moving average model relied on past 3-month discharges, nearest neighbor forecasting used median of similar discharges in the past in estimating next-day discharge. In addition, the ARMAX model used the day of the week and number of patients currently in ward as exogenous variables. For the random forest and support vector regression models, we designed a predictor set of 20 patient features and 88 ward-level features. Results Our data consisted of 12,141 patient visits over 1826 days. Forecasting quality was measured using mean forecast error, mean absolute error, symmetric mean absolute percentage error, and root mean square error. When compared with a moving average prediction model, all 5 models demonstrated superior performance with the random forests achieving 22.7% improvement in mean absolute error, for all days in the year 2014. Conclusions In the absence of clinical information, our study recommends using patient-level and ward-level data in predicting next-day discharges. Random forest and support vector regression models are able to use all available features from such data, resulting in superior performance over traditional autoregressive methods. An intelligent estimate of available beds in wards plays a crucial role in relieving access block in emergency departments. PMID:27444059
An Overview of Longitudinal Data Analysis Methods for Neurological Research
Locascio, Joseph J.; Atri, Alireza
2011-01-01
The purpose of this article is to provide a concise, broad and readily accessible overview of longitudinal data analysis methods, aimed to be a practical guide for clinical investigators in neurology. In general, we advise that older, traditional methods, including (1) simple regression of the dependent variable on a time measure, (2) analyzing a single summary subject level number that indexes changes for each subject and (3) a general linear model approach with a fixed-subject effect, should be reserved for quick, simple or preliminary analyses. We advocate the general use of mixed-random and fixed-effect regression models for analyses of most longitudinal clinical studies. Under restrictive situations or to provide validation, we recommend: (1) repeated-measure analysis of covariance (ANCOVA), (2) ANCOVA for two time points, (3) generalized estimating equations and (4) latent growth curve/structural equation models. PMID:22203825
Lee, Soo Yee; Mediani, Ahmed; Maulidiani, Maulidiani; Khatib, Alfi; Ismail, Intan Safinar; Zawawi, Norhasnida; Abas, Faridah
2018-01-01
Neptunia oleracea is a plant consumed as a vegetable and which has been used as a folk remedy for several diseases. Herein, two regression models (partial least squares, PLS; and random forest, RF) in a metabolomics approach were compared and applied to the evaluation of the relationship between phenolics and bioactivities of N. oleracea. In addition, the effects of different extraction conditions on the phenolic constituents were assessed by pattern recognition analysis. Comparison of the PLS and RF showed that RF exhibited poorer generalization and hence poorer predictive performance. Both the regression coefficient of PLS and the variable importance of RF revealed that quercetin and kaempferol derivatives, caffeic acid and vitexin-2-O-rhamnoside were significant towards the tested bioactivities. Furthermore, principal component analysis (PCA) and partial least squares-discriminant analysis (PLS-DA) results showed that sonication and absolute ethanol are the preferable extraction method and ethanol ratio, respectively, to produce N. oleracea extracts with high phenolic levels and therefore high DPPH scavenging and α-glucosidase inhibitory activities. Both PLS and RF are useful regression models in metabolomics studies. This work provides insight into the performance of different multivariate data analysis tools and the effects of different extraction conditions on the extraction of desired phenolics from plants. © 2017 Society of Chemical Industry. © 2017 Society of Chemical Industry.
Tehran Air Pollutants Prediction Based on Random Forest Feature Selection Method
NASA Astrophysics Data System (ADS)
Shamsoddini, A.; Aboodi, M. R.; Karami, J.
2017-09-01
Air pollution as one of the most serious forms of environmental pollutions poses huge threat to human life. Air pollution leads to environmental instability, and has harmful and undesirable effects on the environment. Modern prediction methods of the pollutant concentration are able to improve decision making and provide appropriate solutions. This study examines the performance of the Random Forest feature selection in combination with multiple-linear regression and Multilayer Perceptron Artificial Neural Networks methods, in order to achieve an efficient model to estimate carbon monoxide and nitrogen dioxide, sulfur dioxide and PM2.5 contents in the air. The results indicated that Artificial Neural Networks fed by the attributes selected by Random Forest feature selection method performed more accurate than other models for the modeling of all pollutants. The estimation accuracy of sulfur dioxide emissions was lower than the other air contaminants whereas the nitrogen dioxide was predicted more accurate than the other pollutants.
On Models for Binomial Data with Random Numbers of Trials
Comulada, W. Scott; Weiss, Robert E.
2010-01-01
Summary A binomial outcome is a count s of the number of successes out of the total number of independent trials n = s + f, where f is a count of the failures. The n are random variables not fixed by design in many studies. Joint modeling of (s, f) can provide additional insight into the science and into the probability π of success that cannot be directly incorporated by the logistic regression model. Observations where n = 0 are excluded from the binomial analysis yet may be important to understanding how π is influenced by covariates. Correlation between s and f may exist and be of direct interest. We propose Bayesian multivariate Poisson models for the bivariate response (s, f), correlated through random effects. We extend our models to the analysis of longitudinal and multivariate longitudinal binomial outcomes. Our methodology was motivated by two disparate examples, one from teratology and one from an HIV tertiary intervention study. PMID:17688514
Hart, Carl R; Reznicek, Nathan J; Wilson, D Keith; Pettit, Chris L; Nykaza, Edward T
2016-05-01
Many outdoor sound propagation models exist, ranging from highly complex physics-based simulations to simplified engineering calculations, and more recently, highly flexible statistical learning methods. Several engineering and statistical learning models are evaluated by using a particular physics-based model, namely, a Crank-Nicholson parabolic equation (CNPE), as a benchmark. Narrowband transmission loss values predicted with the CNPE, based upon a simulated data set of meteorological, boundary, and source conditions, act as simulated observations. In the simulated data set sound propagation conditions span from downward refracting to upward refracting, for acoustically hard and soft boundaries, and low frequencies. Engineering models used in the comparisons include the ISO 9613-2 method, Harmonoise, and Nord2000 propagation models. Statistical learning methods used in the comparisons include bagged decision tree regression, random forest regression, boosting regression, and artificial neural network models. Computed skill scores are relative to sound propagation in a homogeneous atmosphere over a rigid ground. Overall skill scores for the engineering noise models are 0.6%, -7.1%, and 83.8% for the ISO 9613-2, Harmonoise, and Nord2000 models, respectively. Overall skill scores for the statistical learning models are 99.5%, 99.5%, 99.6%, and 99.6% for bagged decision tree, random forest, boosting, and artificial neural network regression models, respectively.
NASA Astrophysics Data System (ADS)
Kiyohara, Shin; Mizoguchi, Teruyasu
2018-03-01
Grain boundary segregation of dopants plays a crucial role in materials properties. To investigate the dopant segregation behavior at the grain boundary, an enormous number of combinations have to be considered in the segregation of multiple dopants at the complex grain boundary structures. Here, two data mining techniques, the random-forests regression and the genetic algorithm, were applied to determine stable segregation sites at grain boundaries efficiently. Using the random-forests method, a predictive model was constructed from 2% of the segregation configurations and it has been shown that this model could determine the stable segregation configurations. Furthermore, the genetic algorithm also successfully determined the most stable segregation configuration with great efficiency. We demonstrate that these approaches are quite effective to investigate the dopant segregation behaviors at grain boundaries.
NASA Astrophysics Data System (ADS)
Sadler, J. M.; Goodall, J. L.; Morsy, M. M.; Spencer, K.
2018-04-01
Sea level rise has already caused more frequent and severe coastal flooding and this trend will likely continue. Flood prediction is an essential part of a coastal city's capacity to adapt to and mitigate this growing problem. Complex coastal urban hydrological systems however, do not always lend themselves easily to physically-based flood prediction approaches. This paper presents a method for using a data-driven approach to estimate flood severity in an urban coastal setting using crowd-sourced data, a non-traditional but growing data source, along with environmental observation data. Two data-driven models, Poisson regression and Random Forest regression, are trained to predict the number of flood reports per storm event as a proxy for flood severity, given extensive environmental data (i.e., rainfall, tide, groundwater table level, and wind conditions) as input. The method is demonstrated using data from Norfolk, Virginia USA from September 2010 to October 2016. Quality-controlled, crowd-sourced street flooding reports ranging from 1 to 159 per storm event for 45 storm events are used to train and evaluate the models. Random Forest performed better than Poisson regression at predicting the number of flood reports and had a lower false negative rate. From the Random Forest model, total cumulative rainfall was by far the most dominant input variable in predicting flood severity, followed by low tide and lower low tide. These methods serve as a first step toward using data-driven methods for spatially and temporally detailed coastal urban flood prediction.
[Associations between dormitory environment/other factors and sleep quality of medical students].
Zheng, Bang; Wang, Kailu; Pan, Ziqi; Li, Man; Pan, Yuting; Liu, Ting; Xu, Dan; Lyu, Jun
2016-03-01
To investigate the sleep quality and related factors among medical students in China, understand the association between dormitory environment and sleep quality, and provide evidence and recommendations for sleep hygiene intervention. A total of 555 undergraduate students were selected from a medical school of an university in Beijing through stratified-cluster random-sampling to conduct a questionnaire survey by using Chinese version of Pittsburgh Sleep Quality Index (PSQI) and self-designed questionnaire. Analyses were performed by using multiple logistic regression model as well as multilevel linear regression model. The prevalence of sleep disorder was 29.1%(149/512), and 39.1%(200/512) of the students reported that the sleep quality was influenced by dormitory environment. PSQI score was negatively correlated with self-reported rating of dormitory environment (γs=-0.310, P<0.001). Logistic regression analysis showed the related factors of sleep disorder included grade, sleep regularity, self-rated health status, pressures of school work and employment, as well as dormitory environment. RESULTS of multilevel regression analysis also indicated that perception on dormitory environment (individual level) was associated with sleep quality with the dormitory level random effects under control (b=-0.619, P<0.001). The prevalence of sleep disorder was high in medical students, which was associated with multiple factors. Dormitory environment should be taken into consideration when the interventions are taken to improve the sleep quality of students.
The value of a statistical life: a meta-analysis with a mixed effects regression model.
Bellavance, François; Dionne, Georges; Lebeau, Martin
2009-03-01
The value of a statistical life (VSL) is a very controversial topic, but one which is essential to the optimization of governmental decisions. We see a great variability in the values obtained from different studies. The source of this variability needs to be understood, in order to offer public decision-makers better guidance in choosing a value and to set clearer guidelines for future research on the topic. This article presents a meta-analysis based on 39 observations obtained from 37 studies (from nine different countries) which all use a hedonic wage method to calculate the VSL. Our meta-analysis is innovative in that it is the first to use the mixed effects regression model [Raudenbush, S.W., 1994. Random effects models. In: Cooper, H., Hedges, L.V. (Eds.), The Handbook of Research Synthesis. Russel Sage Foundation, New York] to analyze studies on the value of a statistical life. We conclude that the variability found in the values studied stems in large part from differences in methodologies.
Sera, Francesco; Ferrari, Pietro
2015-01-01
In a multicenter study, the overall relationship between exposure and the risk of cancer can be broken down into a within-center component, which reflects the individual level association, and a between-center relationship, which captures the association at the aggregate level. A piecewise exponential proportional hazards model with random effects was used to evaluate the association between dietary fiber intake and colorectal cancer (CRC) risk in the EPIC study. During an average follow-up of 11.0 years, 4,517 CRC events occurred among study participants recruited in 28 centers from ten European countries. Models were adjusted by relevant confounding factors. Heterogeneity among centers was modelled with random effects. Linear regression calibration was used to account for errors in dietary questionnaire (DQ) measurements. Risk ratio estimates for a 10 g/day increment in dietary fiber were equal to 0.90 (95%CI: 0.85, 0.96) and 0.85 (0.64, 1.14), at the individual and aggregate levels, respectively, while calibrated estimates were 0.85 (0.76, 0.94), and 0.87 (0.65, 1.15), respectively. In multicenter studies, over a straightforward ecological analysis, random effects models allow information at the individual and ecologic levels to be captured, while controlling for confounding at both levels of evidence.
Chakraborty, Somsubhra; Weindorf, David C; Li, Bin; Ali Aldabaa, Abdalsamad Abdalsatar; Ghosh, Rakesh Kumar; Paul, Sathi; Nasim Ali, Md
2015-05-01
Using 108 petroleum contaminated soil samples, this pilot study proposed a new analytical approach of combining visible near-infrared diffuse reflectance spectroscopy (VisNIR DRS) and portable X-ray fluorescence spectrometry (PXRF) for rapid and improved quantification of soil petroleum contamination. Results indicated that an advanced fused model where VisNIR DRS spectra-based penalized spline regression (PSR) was used to predict total petroleum hydrocarbon followed by PXRF elemental data-based random forest regression was used to model the PSR residuals, it outperformed (R(2)=0.78, residual prediction deviation (RPD)=2.19) all other models tested, even producing better generalization than using VisNIR DRS alone (RPD's of 1.64, 1.86, and 1.96 for random forest, penalized spline regression, and partial least squares regression, respectively). Additionally, unsupervised principal component analysis using the PXRF+VisNIR DRS system qualitatively separated contaminated soils from control samples. Fusion of PXRF elemental data and VisNIR derivative spectra produced an optimized model for total petroleum hydrocarbon quantification in soils. Copyright © 2015 Elsevier B.V. All rights reserved.
Sharma, Ashok K; Srivastava, Gopal N; Roy, Ankita; Sharma, Vineet K
2017-01-01
The experimental methods for the prediction of molecular toxicity are tedious and time-consuming tasks. Thus, the computational approaches could be used to develop alternative methods for toxicity prediction. We have developed a tool for the prediction of molecular toxicity along with the aqueous solubility and permeability of any molecule/metabolite. Using a comprehensive and curated set of toxin molecules as a training set, the different chemical and structural based features such as descriptors and fingerprints were exploited for feature selection, optimization and development of machine learning based classification and regression models. The compositional differences in the distribution of atoms were apparent between toxins and non-toxins, and hence, the molecular features were used for the classification and regression. On 10-fold cross-validation, the descriptor-based, fingerprint-based and hybrid-based classification models showed similar accuracy (93%) and Matthews's correlation coefficient (0.84). The performances of all the three models were comparable (Matthews's correlation coefficient = 0.84-0.87) on the blind dataset. In addition, the regression-based models using descriptors as input features were also compared and evaluated on the blind dataset. Random forest based regression model for the prediction of solubility performed better ( R 2 = 0.84) than the multi-linear regression (MLR) and partial least square regression (PLSR) models, whereas, the partial least squares based regression model for the prediction of permeability (caco-2) performed better ( R 2 = 0.68) in comparison to the random forest and MLR based regression models. The performance of final classification and regression models was evaluated using the two validation datasets including the known toxins and commonly used constituents of health products, which attests to its accuracy. The ToxiM web server would be a highly useful and reliable tool for the prediction of toxicity, solubility, and permeability of small molecules.
Sharma, Ashok K.; Srivastava, Gopal N.; Roy, Ankita; Sharma, Vineet K.
2017-01-01
The experimental methods for the prediction of molecular toxicity are tedious and time-consuming tasks. Thus, the computational approaches could be used to develop alternative methods for toxicity prediction. We have developed a tool for the prediction of molecular toxicity along with the aqueous solubility and permeability of any molecule/metabolite. Using a comprehensive and curated set of toxin molecules as a training set, the different chemical and structural based features such as descriptors and fingerprints were exploited for feature selection, optimization and development of machine learning based classification and regression models. The compositional differences in the distribution of atoms were apparent between toxins and non-toxins, and hence, the molecular features were used for the classification and regression. On 10-fold cross-validation, the descriptor-based, fingerprint-based and hybrid-based classification models showed similar accuracy (93%) and Matthews's correlation coefficient (0.84). The performances of all the three models were comparable (Matthews's correlation coefficient = 0.84–0.87) on the blind dataset. In addition, the regression-based models using descriptors as input features were also compared and evaluated on the blind dataset. Random forest based regression model for the prediction of solubility performed better (R2 = 0.84) than the multi-linear regression (MLR) and partial least square regression (PLSR) models, whereas, the partial least squares based regression model for the prediction of permeability (caco-2) performed better (R2 = 0.68) in comparison to the random forest and MLR based regression models. The performance of final classification and regression models was evaluated using the two validation datasets including the known toxins and commonly used constituents of health products, which attests to its accuracy. The ToxiM web server would be a highly useful and reliable tool for the prediction of toxicity, solubility, and permeability of small molecules. PMID:29249969
To, Minh-Son; Prakash, Shivesh; Poonnoose, Santosh I; Bihari, Shailesh
2018-05-01
The study uses meta-regression analysis to quantify the dose-dependent effects of statin pharmacotherapy on vasospasm, delayed ischemic neurologic deficits (DIND), and mortality in aneurysmal subarachnoid hemorrhage. Prospective, retrospective observational studies, and randomized controlled trials (RCTs) were retrieved by a systematic database search. Summary estimates were expressed as absolute risk (AR) for a given statin dose or control (placebo). Meta-regression using inverse variance weighting and robust variance estimation was performed to assess the effect of statin dose on transformed AR in a random effects model. Dose-dependence of predicted AR with 95% confidence interval (CI) was recovered by using Miller's Freeman-Tukey inverse. The database search and study selection criteria yielded 18 studies (2594 patients) for analysis. These included 12 RCTs, 4 retrospective observational studies, and 2 prospective observational studies. Twelve studies investigated simvastatin, whereas the remaining studies investigated atorvastatin, pravastatin, or pitavastatin, with simvastatin-equivalent doses ranging from 20 to 80 mg. Meta-regression revealed dose-dependent reductions in Freeman-Tukey-transformed AR of vasospasm (slope coefficient -0.00404, 95% CI -0.00720 to -0.00087; P = 0.0321), DIND (slope coefficient -0.00316, 95% CI -0.00586 to -0.00047; P = 0.0392), and mortality (slope coefficient -0.00345, 95% CI -0.00623 to -0.00067; P = 0.0352). The present meta-regression provides weak evidence for dose-dependent reductions in vasospasm, DIND and mortality associated with acute statin use after aneurysmal subarachnoid hemorrhage. However, the analysis was limited by substantial heterogeneity among individual studies. Greater dosing strategies are a potential consideration for future RCTs. Copyright © 2018 Elsevier Inc. All rights reserved.
Comparing spatial regression to random forests for large ...
Environmental data may be “large” due to number of records, number of covariates, or both. Random forests has a reputation for good predictive performance when using many covariates, whereas spatial regression, when using reduced rank methods, has a reputation for good predictive performance when using many records. In this study, we compare these two techniques using a data set containing the macroinvertebrate multimetric index (MMI) at 1859 stream sites with over 200 landscape covariates. Our primary goal is predicting MMI at over 1.1 million perennial stream reaches across the USA. For spatial regression modeling, we develop two new methods to accommodate large data: (1) a procedure that estimates optimal Box-Cox transformations to linearize covariate relationships; and (2) a computationally efficient covariate selection routine that takes into account spatial autocorrelation. We show that our new methods lead to cross-validated performance similar to random forests, but that there is an advantage for spatial regression when quantifying the uncertainty of the predictions. Simulations are used to clarify advantages for each method. This research investigates different approaches for modeling and mapping national stream condition. We use MMI data from the EPA's National Rivers and Streams Assessment and predictors from StreamCat (Hill et al., 2015). Previous studies have focused on modeling the MMI condition classes (i.e., good, fair, and po
Modeling Verdict Outcomes Using Social Network Measures: The Watergate and Caviar Network Cases
2016-01-01
Modelling criminal trial verdict outcomes using social network measures is an emerging research area in quantitative criminology. Few studies have yet analyzed which of these measures are the most important for verdict modelling or which data classification techniques perform best for this application. To compare the performance of different techniques in classifying members of a criminal network, this article applies three different machine learning classifiers–Logistic Regression, Naïve Bayes and Random Forest–with a range of social network measures and the necessary databases to model the verdicts in two real–world cases: the U.S. Watergate Conspiracy of the 1970’s and the now–defunct Canada–based international drug trafficking ring known as the Caviar Network. In both cases it was found that the Random Forest classifier did better than either Logistic Regression or Naïve Bayes, and its superior performance was statistically significant. This being so, Random Forest was used not only for classification but also to assess the importance of the measures. For the Watergate case, the most important one proved to be betweenness centrality while for the Caviar Network, it was the effective size of the network. These results are significant because they show that an approach combining machine learning with social network analysis not only can generate accurate classification models but also helps quantify the importance social network variables in modelling verdict outcomes. We conclude our analysis with a discussion and some suggestions for future work in verdict modelling using social network measures. PMID:26824351
NASA Astrophysics Data System (ADS)
Elmore, K. L.
2016-12-01
The Metorological Phenomemna Identification NeartheGround (mPING) project is an example of a crowd-sourced, citizen science effort to gather data of sufficeint quality and quantity needed by new post processing methods that use machine learning. Transportation and infrastructure are particularly sensitive to precipitation type in winter weather. We extract attributes from operational numerical forecast models and use them in a random forest to generate forecast winter precipitation types. We find that random forests applied to forecast soundings are effective at generating skillful forecasts of surface ptype with consideralbly more skill than the current algorithms, especuially for ice pellets and freezing rain. We also find that three very different forecast models yuield similar overall results, showing that random forests are able to extract essentially equivalent information from different forecast models. We also show that the random forest for each model, and each profile type is unique to the particular forecast model and that the random forests developed using a particular model suffer significant degradation when given attributes derived from a different model. This implies that no single algorithm can perform well across all forecast models. Clearly, random forests extract information unavailable to "physically based" methods because the physical information in the models does not appear as we expect. One intersting result is that results from the classic "warm nose" sounding profile are, by far, the most sensitive to the particular forecast model, but this profile is also the one for which random forests are most skillful. Finally, a method for calibrarting probabilties for each different ptype using multinomial logistic regression is shown.
NASA Technical Reports Server (NTRS)
Campbell, J. W.
1973-01-01
A stochasitc model of the atmosphere between 30 and 90 km was developed for use in Monte Carlo space shuttle entry studies. The model is actually a family of models, one for each latitude-season category as defined in the 1966 U.S. Standard Atmosphere Supplements. Each latitude-season model generates a pseudo-random temperature profile whose mean is the appropriate temperature profile from the Standard Atmosphere Supplements. The standard deviation of temperature at each altitude for a given latitude-season model was estimated from sounding-rocket data. Departures from the mean temperature at each altitude were produced by assuming a linear regression of temperature on the solar heating rate of ozone. A profile of random ozone concentrations was first generated using an auxiliary stochastic ozone model, also developed as part of this study, and then solar heating rates were computed for the random ozone concentrations.
Correction of confounding bias in non-randomized studies by appropriate weighting.
Schmoor, Claudia; Gall, Christine; Stampf, Susanne; Graf, Erika
2011-03-01
In non-randomized studies, the assessment of a causal effect of treatment or exposure on outcome is hampered by possible confounding. Applying multiple regression models including the effects of treatment and covariates on outcome is the well-known classical approach to adjust for confounding. In recent years other approaches have been promoted. One of them is based on the propensity score and considers the effect of possible confounders on treatment as a relevant criterion for adjustment. Another proposal is based on using an instrumental variable. Here inference relies on a factor, the instrument, which affects treatment but is thought to be otherwise unrelated to outcome, so that it mimics randomization. Each of these approaches can basically be interpreted as a simple reweighting scheme, designed to address confounding. The procedures will be compared with respect to their fundamental properties, namely, which bias they aim to eliminate, which effect they aim to estimate, and which parameter is modelled. We will expand our overview of methods for analysis of non-randomized studies to methods for analysis of randomized controlled trials and show that analyses of both study types may target different effects and different parameters. The considerations will be illustrated using a breast cancer study with a so-called Comprehensive Cohort Study design, including a randomized controlled trial and a non-randomized study in the same patient population as sub-cohorts. This design offers ideal opportunities to discuss and illustrate the properties of the different approaches. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Black Clouds vs Random Variation in Hospital Admissions.
Ong, Luei Wern; Dawson, Jeffrey D; Ely, John W
2018-06-01
Physicians often accuse their peers of being "black clouds" if they repeatedly have more than the average number of hospital admissions while on call. Our purpose was to determine whether the black-cloud phenomenon is real or explainable by random variation. We analyzed hospital admissions to the University of Iowa family medicine service from July 1, 2010 to June 30, 2015. Analyses were stratified by peer group (eg, night shift attending physicians, day shift senior residents). We analyzed admission numbers to find evidence of black-cloud physicians (those with significantly more admissions than their peers) and white-cloud physicians (those with significantly fewer admissions). The statistical significance of whether there were actual differences across physicians was tested with mixed-effects negative binomial regression. The 5-year study included 96 physicians and 6,194 admissions. The number of daytime admissions ranged from 0 to 10 (mean 2.17, SD 1.63). Night admissions ranged from 0 to 11 (mean 1.23, SD 1.22). Admissions increased from 1,016 in the first year to 1,523 in the fifth year. We found 18 white-cloud and 16 black-cloud physicians in simple regression models that did not control for this upward trend. After including study year and other potential confounding variables in the regression models, there were no significant associations between physicians and admission numbers and therefore no true black or white clouds. In this study, apparent black-cloud and white-cloud physicians could be explained by random variation in hospital admissions. However, this randomness incorporated a wide range in workload among physicians, with potential impact on resident education at the low end and patient safety at the high end.
Dong, Chunjiao; Clarke, David B; Yan, Xuedong; Khattak, Asad; Huang, Baoshan
2014-09-01
Crash data are collected through police reports and integrated with road inventory data for further analysis. Integrated police reports and inventory data yield correlated multivariate data for roadway entities (e.g., segments or intersections). Analysis of such data reveals important relationships that can help focus on high-risk situations and coming up with safety countermeasures. To understand relationships between crash frequencies and associated variables, while taking full advantage of the available data, multivariate random-parameters models are appropriate since they can simultaneously consider the correlation among the specific crash types and account for unobserved heterogeneity. However, a key issue that arises with correlated multivariate data is the number of crash-free samples increases, as crash counts have many categories. In this paper, we describe a multivariate random-parameters zero-inflated negative binomial (MRZINB) regression model for jointly modeling crash counts. The full Bayesian method is employed to estimate the model parameters. Crash frequencies at urban signalized intersections in Tennessee are analyzed. The paper investigates the performance of MZINB and MRZINB regression models in establishing the relationship between crash frequencies, pavement conditions, traffic factors, and geometric design features of roadway intersections. Compared to the MZINB model, the MRZINB model identifies additional statistically significant factors and provides better goodness of fit in developing the relationships. The empirical results show that MRZINB model possesses most of the desirable statistical properties in terms of its ability to accommodate unobserved heterogeneity and excess zero counts in correlated data. Notably, in the random-parameters MZINB model, the estimated parameters vary significantly across intersections for different crash types. Copyright © 2014 Elsevier Ltd. All rights reserved.
Simultaneous confidence bands for Cox regression from semiparametric random censorship.
Mondal, Shoubhik; Subramanian, Sundarraman
2016-01-01
Cox regression is combined with semiparametric random censorship models to construct simultaneous confidence bands (SCBs) for subject-specific survival curves. Simulation results are presented to compare the performance of the proposed SCBs with the SCBs that are based only on standard Cox. The new SCBs provide correct empirical coverage and are more informative. The proposed SCBs are illustrated with two real examples. An extension to handle missing censoring indicators is also outlined.
Model selection with multiple regression on distance matrices leads to incorrect inferences.
Franckowiak, Ryan P; Panasci, Michael; Jarvis, Karl J; Acuña-Rodriguez, Ian S; Landguth, Erin L; Fortin, Marie-Josée; Wagner, Helene H
2017-01-01
In landscape genetics, model selection procedures based on Information Theoretic and Bayesian principles have been used with multiple regression on distance matrices (MRM) to test the relationship between multiple vectors of pairwise genetic, geographic, and environmental distance. Using Monte Carlo simulations, we examined the ability of model selection criteria based on Akaike's information criterion (AIC), its small-sample correction (AICc), and the Bayesian information criterion (BIC) to reliably rank candidate models when applied with MRM while varying the sample size. The results showed a serious problem: all three criteria exhibit a systematic bias toward selecting unnecessarily complex models containing spurious random variables and erroneously suggest a high level of support for the incorrectly ranked best model. These problems effectively increased with increasing sample size. The failure of AIC, AICc, and BIC was likely driven by the inflated sample size and different sum-of-squares partitioned by MRM, and the resulting effect on delta values. Based on these findings, we strongly discourage the continued application of AIC, AICc, and BIC for model selection with MRM.
Mainou, Maria; Madenidou, Anastasia-Vasiliki; Liakos, Aris; Paschos, Paschalis; Karagiannis, Thomas; Bekiari, Eleni; Vlachaki, Efthymia; Wang, Zhen; Murad, Mohammad Hassan; Kumar, Shaji; Tsapas, Apostolos
2017-06-01
We performed a systematic review and meta-regression analysis of randomized control trials to investigate the association between response to initial treatment and survival outcomes in patients with newly diagnosed multiple myeloma (MM). Response outcomes included complete response (CR) and the combined outcome of CR or very good partial response (VGPR), while survival outcomes were overall survival (OS) and progression-free survival (PFS). We used random-effect meta-regression models and conducted sensitivity analyses based on definition of CR and study quality. Seventy-two trials were included in the systematic review, 63 of which contributed data in meta-regression analyses. There was no association between OS and CR in patients without autologous stem cell transplant (ASCT) (regression coefficient: .02, 95% confidence interval [CI] -0.06, 0.10), in patients undergoing ASCT (-.11, 95% CI -0.44, 0.22) and in trials comparing ASCT with non-ASCT patients (.04, 95% CI -0.29, 0.38). Similarly, OS did not correlate with the combined metric of CR or VGPR, and no association was evident between response outcomes and PFS. Sensitivity analyses yielded similar results. This meta-regression analysis suggests that there is no association between conventional response outcomes and survival in patients with newly diagnosed MM. © 2017 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
NASA Astrophysics Data System (ADS)
Koran, John J., Jr.; Koran, Mary Lou
In a study designed to explore the effects of teacher anxiety and modeling on acquisition of a science teaching skill and concomitant student performance, 69 preservice secondary teachers and 295 eighth grade students were randomly assigned to microteaching sessions. Prior to microteaching, teachers were given an anxiety test, then randomly assigned to one of three treatments; a transcript model, a protocol model, or a control condition. Subsequently both teacher and student performance was assessed using written and behavioral measures. Analysis of variance indicated that subjects in the two modeling treatments significantly exceeded performance of control group subjects on all measures of the dependent variable, with the protocol model being generally superior to the transcript model. The differential effects of the modeling treatments were further reflected in student performance. Regression analysis of aptitude-treatment interactions indicated that teacher anxiety scores interacted significantly with instructional treatments, with high anxiety teachers performing best in the protocol modeling treatment. Again, this interaction was reflected in student performance, where students taught by highly anxious teachers performed significantly better when their teachers had received the protocol model. These results were discussed in terms of teacher concerns and a memory model of the effects of anxiety on performance.
Dai, James Y.; Chan, Kwun Chuen Gary; Hsu, Li
2014-01-01
Instrumental variable regression is one way to overcome unmeasured confounding and estimate causal effect in observational studies. Built on structural mean models, there has been considerale work recently developed for consistent estimation of causal relative risk and causal odds ratio. Such models can sometimes suffer from identification issues for weak instruments. This hampered the applicability of Mendelian randomization analysis in genetic epidemiology. When there are multiple genetic variants available as instrumental variables, and causal effect is defined in a generalized linear model in the presence of unmeasured confounders, we propose to test concordance between instrumental variable effects on the intermediate exposure and instrumental variable effects on the disease outcome, as a means to test the causal effect. We show that a class of generalized least squares estimators provide valid and consistent tests of causality. For causal effect of a continuous exposure on a dichotomous outcome in logistic models, the proposed estimators are shown to be asymptotically conservative. When the disease outcome is rare, such estimators are consistent due to the log-linear approximation of the logistic function. Optimality of such estimators relative to the well-known two-stage least squares estimator and the double-logistic structural mean model is further discussed. PMID:24863158
Guo, Yanyong; Li, Zhibin; Wu, Yao; Xu, Chengcheng
2018-06-01
Bicyclists running the red light at crossing facilities increase the potential of colliding with motor vehicles. Exploring the contributing factors could improve the prediction of running red-light probability and develop countermeasures to reduce such behaviors. However, individuals could have unobserved heterogeneities in running a red light, which make the accurate prediction more challenging. Traditional models assume that factor parameters are fixed and cannot capture the varying impacts on red-light running behaviors. In this study, we employed the full Bayesian random parameters logistic regression approach to account for the unobserved heterogeneous effects. Two types of crossing facilities were considered which were the signalized intersection crosswalks and the road segment crosswalks. Electric and conventional bikes were distinguished in the modeling. Data were collected from 16 crosswalks in urban area of Nanjing, China. Factors such as individual characteristics, road geometric design, environmental features, and traffic variables were examined. Model comparison indicates that the full Bayesian random parameters logistic regression approach is statistically superior to the standard logistic regression model. More red-light runners are predicted at signalized intersection crosswalks than at road segment crosswalks. Factors affecting red-light running behaviors are gender, age, bike type, road width, presence of raised median, separation width, signal type, green ratio, bike and vehicle volume, and average vehicle speed. Factors associated with the unobserved heterogeneity are gender, bike type, signal type, separation width, and bike volume. Copyright © 2018 Elsevier Ltd. All rights reserved.
Chander, Geetanjali; Hutton, Heidi E.; Lau, Bryan; Xu, Xiaoqiang; McCaul, Mary E.
2015-01-01
Objective Hazardous alcohol use by HIV-infected women is associated with poor HIV outcomes and HIV transmission risk behaviors. We examined the effectiveness of brief alcohol intervention (BI) among hazardous drinking women receiving care in an urban, HIV clinic. Methods Women were randomized to a 2-session BI or usual care. Outcomes assessed at baseline, 3, 6 and 12 months included 90-day frequency of any alcohol use and heavy/binge drinking (≥4 drinks per occasion), and average drinks per drinking episode. Secondary outcomes included HIV medication and appointment adherence, HIV1-RNA suppression, and days of unprotected vaginal sex. We examined intervention effectiveness using generalized mixed effect models and quantile regression. Results Of 148 eligible women, 74 were randomized to each arm. In mixed effects models, 90-day drinking frequency decreased among intervention group compared to control, with women in the intervention condition less likely to have a drinking day (OR: 0.42 (95% CI: 0.23–0.75). Heavy/binge drinking days and drinks per drinking day did not differ significantly between groups. Quantile regression demonstrated a decrease in drinking frequency in the middle to upper ranges of the distribution of drinking days and heavy/binge drinking days that differed significantly between intervention and control conditions. At follow-up, the intervention group had significantly fewer episodes of unprotected vaginal sex. No intervention effects were observed for other outcomes. Conclusions Brief alcohol intervention reduces frequency of alcohol use and unprotected vaginal sex among HIV-infected women. More intensive services may be needed to lower drinks per drinking day and enhance care for more severely affected drinkers. PMID:25967270
Wolf, Alexander; Leucht, Stefan; Pajonk, Frank-Gerald
2017-04-01
Behavioural and psychological symptoms in dementia (BPSD) are common and often treated with antipsychotics, which are known to have small efficacy and to cause many side effects. One potential side effect might be cognitive decline. We searched MEDLINE, Scopus, CENTRAL and www.ClincalStudyResult.org for randomized, double-blind, placebo-controlled trials using antipsychotics for treating BPSD and evaluated cognitive functioning. The studies identified were summarized in a meta-analysis with the standardized mean difference (SMD, Hedges's g) as the effect size. Meta-regression was additionally performed to identify associated factors. Ten studies provided data on the course of cognitive functioning. The random effects model of the pooled analysis showed a not significant effect (SMD = -0.065, 95 % CI -0.186 to 0.057, I 2 = 41 %). Meta-regression revealed a significant correlation between cognitive impairment and treatment duration (R 2 = 0.78, p < 0.02) as well as baseline MMSE (R 2 = 0.92, p < 0.005). These correlations depend on only two out of ten studies and should interpret cautiously.
Auras, Silke; Ostermann, Thomas; de Cruppé, Werner; Bitzer, Eva-Maria; Diel, Franziska; Geraedts, Max
2016-12-01
The study aimed to illustrate the effect of the patients' sex, age, self-rated health and medical practice specialization on patient satisfaction. Secondary analysis of patient survey data using multilevel analysis (generalized linear mixed model, medical practice as random effect) using a sequential modelling strategy. We examined the effects of the patients' sex, age, self-rated health and medical practice specialization on four patient satisfaction dimensions: medical practice organization, information, interaction, professional competence. The study was performed in 92 German medical practices providing ambulatory care in general medicine, internal medicine or gynaecology. In total, 9888 adult patients participated in a patient survey using the validated 'questionnaire on satisfaction with ambulatory care-quality from the patient perspective [ZAP]'. We calculated four models for each satisfaction dimension, revealing regression coefficients with 95% confidence intervals (CIs) for all independent variables, and using Wald Chi-Square statistic for each modelling step (model validity) and LR-Tests to compare the models of each step with the previous model. The patients' sex and age had a weak effect (maximum regression coefficient 1.09, CI 0.39; 1.80), and the patients' self-rated health had the strongest positive effect (maximum regression coefficient 7.66, CI 6.69; 8.63) on satisfaction ratings. The effect of medical practice specialization was heterogeneous. All factors studied, specifically the patients' self-rated health, affected patient satisfaction. Adjustment should always be considered because it improves the comparability of patient satisfaction in medical practices with atypically varying patient populations and increases the acceptance of comparisons. © The Author 2016. Published by Oxford University Press in association with the International Society for Quality in Health Care. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com
Modeling spatial effects of PM{sub 2.5} on term low birth weight in Los Angeles County
DOE Office of Scientific and Technical Information (OSTI.GOV)
Coker, Eric, E-mail: cokerer@onid.orst.edu; Ghosh, Jokay; Jerrett, Michael
Air pollution epidemiological studies suggest that elevated exposure to fine particulate matter (PM{sub 2.5}) is associated with higher prevalence of term low birth weight (TLBW). Previous studies have generally assumed the exposure–response of PM{sub 2.5} on TLBW to be the same throughout a large geographical area. Health effects related to PM{sub 2.5} exposures, however, may not be uniformly distributed spatially, creating a need for studies that explicitly investigate the spatial distribution of the exposure–response relationship between individual-level exposure to PM{sub 2.5} and TLBW. Here, we examine the overall and spatially varying exposure–response relationship between PM{sub 2.5} and TLBW throughout urbanmore » Los Angeles (LA) County, California. We estimated PM{sub 2.5} from a combination of land use regression (LUR), aerosol optical depth from remote sensing, and atmospheric modeling techniques. Exposures were assigned to LA County individual pregnancies identified from electronic birth certificates between the years 1995-2006 (N=1,359,284) provided by the California Department of Public Health. We used a single pollutant multivariate logistic regression model, with multilevel spatially structured and unstructured random effects set in a Bayesian framework to estimate global and spatially varying pollutant effects on TLBW at the census tract level. Overall, increased PM{sub 2.5} level was associated with higher prevalence of TLBW county-wide. The spatial random effects model, however, demonstrated that the exposure–response for PM{sub 2.5} and TLBW was not uniform across urban LA County. Rather, the magnitude and certainty of the exposure–response estimates for PM{sub 2.5} on log odds of TLBW were greatest in the urban core of Central and Southern LA County census tracts. These results suggest that the effects may be spatially patterned, and that simply estimating global pollutant effects obscures disparities suggested by spatial patterns of effects. Studies that incorporate spatial multilevel modeling with random coefficients allow us to identify areas where air pollutant effects on adverse birth outcomes may be most severe and policies to further reduce air pollution might be most effective. - Highlights: • We model the spatial dependency of PM{sub 2.5} effects on term low birth weight (TLBW). • PM{sub 2.5} effects on TLBW are shown to vary spatially across urban LA County. • Modeling spatial dependency of PM{sub 2.5} health effects may identify effect 'hotspots'. • Birth outcomes studies should consider the spatial dependency of PM{sub 2.5} effects.« less
Testing the Hypothesis of a Homoscedastic Error Term in Simple, Nonparametric Regression
ERIC Educational Resources Information Center
Wilcox, Rand R.
2006-01-01
Consider the nonparametric regression model Y = m(X)+ [tau](X)[epsilon], where X and [epsilon] are independent random variables, [epsilon] has a median of zero and variance [sigma][squared], [tau] is some unknown function used to model heteroscedasticity, and m(X) is an unknown function reflecting some conditional measure of location associated…
Prague, Mélanie; Commenges, Daniel; Guedj, Jérémie; Drylewicz, Julia; Thiébaut, Rodolphe
2013-08-01
Models based on ordinary differential equations (ODE) are widespread tools for describing dynamical systems. In biomedical sciences, data from each subject can be sparse making difficult to precisely estimate individual parameters by standard non-linear regression but information can often be gained from between-subjects variability. This makes natural the use of mixed-effects models to estimate population parameters. Although the maximum likelihood approach is a valuable option, identifiability issues favour Bayesian approaches which can incorporate prior knowledge in a flexible way. However, the combination of difficulties coming from the ODE system and from the presence of random effects raises a major numerical challenge. Computations can be simplified by making a normal approximation of the posterior to find the maximum of the posterior distribution (MAP). Here we present the NIMROD program (normal approximation inference in models with random effects based on ordinary differential equations) devoted to the MAP estimation in ODE models. We describe the specific implemented features such as convergence criteria and an approximation of the leave-one-out cross-validation to assess the model quality of fit. In pharmacokinetics models, first, we evaluate the properties of this algorithm and compare it with FOCE and MCMC algorithms in simulations. Then, we illustrate NIMROD use on Amprenavir pharmacokinetics data from the PUZZLE clinical trial in HIV infected patients. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Martínez, Carlos Alberto; Khare, Kshitij; Banerjee, Arunava; Elzo, Mauricio A
2017-03-21
It is important to consider heterogeneity of marker effects and allelic frequencies in across population genome-wide prediction studies. Moreover, all regression models used in genome-wide prediction overlook randomness of genotypes. In this study, a family of hierarchical Bayesian models to perform across population genome-wide prediction modeling genotypes as random variables and allowing population-specific effects for each marker was developed. Models shared a common structure and differed in the priors used and the assumption about residual variances (homogeneous or heterogeneous). Randomness of genotypes was accounted for by deriving the joint probability mass function of marker genotypes conditional on allelic frequencies and pedigree information. As a consequence, these models incorporated kinship and genotypic information that not only permitted to account for heterogeneity of allelic frequencies, but also to include individuals with missing genotypes at some or all loci without the need for previous imputation. This was possible because the non-observed fraction of the design matrix was treated as an unknown model parameter. For each model, a simpler version ignoring population structure, but still accounting for randomness of genotypes was proposed. Implementation of these models and computation of some criteria for model comparison were illustrated using two simulated datasets. Theoretical and computational issues along with possible applications, extensions and refinements were discussed. Some features of the models developed in this study make them promising for genome-wide prediction, the use of information contained in the probability distribution of genotypes is perhaps the most appealing. Further studies to assess the performance of the models proposed here and also to compare them with conventional models used in genome-wide prediction are needed. Copyright © 2017 Elsevier Ltd. All rights reserved.
Lu, Liming; Shi, Leiyu; Zeng, Jingchun; Wen, Zehuai
2017-01-01
Background Previous meta-analyses on the relationship between aspirin use and breast cancer risk have drawn inconsistent results. In addition, the threshold effect of different doses, frequencies and durations of aspirin use in preventing breast cancer have yet to be established. Results The search yielded 13 prospective cohort studies (N=857,831 participants) that reported an average of 7.6 cases/1,000 person-years of breast cancer during a follow-up period of from 4.4 to 14 years. With a random effects model, a borderline significant inverse association was observed between overall aspirin use and breast cancer risk, with a summarized RR = 0.94 (P = 0.051, 95% CI 0.87-1.01). The linear regression model was a better fit for the dose-response relationship, which displayed a potential relationship between the frequency of aspirin use and breast cancer risk (RR = 0.97, 0.95 and 0.90 for 5, 10 and 20 times/week aspirin use, respectively). It was also a better fit for the duration of aspirin use and breast cancer risk (RR = 0.86, 0.73 and 0.54 for 5, 10 and 20 years of aspirin use). Methods We searched MEDLINE, EMBASE and CENTRAL databases through early October 2016 for relevant prospective cohort studies of aspirin use and breast cancer risk. Meta-analysis of relative risks (RR) estimates associated with aspirin intake were presented by fixed or random effects models. The dose-response meta-analysis was performed by linear trend regression and restricted cubic spline regression. Conclusion Our study confirmed a dose-response relationship between aspirin use and breast cancer risk. For clinical prevention, long term (>5 years) consistent use (2-7 times/week) of aspirin appears to be more effective in achieving a protective effect against breast cancer. PMID:28418881
Lu, Liming; Shi, Leiyu; Zeng, Jingchun; Wen, Zehuai
2017-06-20
Previous meta-analyses on the relationship between aspirin use and breast cancer risk have drawn inconsistent results. In addition, the threshold effect of different doses, frequencies and durations of aspirin use in preventing breast cancer have yet to be established. The search yielded 13 prospective cohort studies (N=857,831 participants) that reported an average of 7.6 cases/1,000 person-years of breast cancer during a follow-up period of from 4.4 to 14 years. With a random effects model, a borderline significant inverse association was observed between overall aspirin use and breast cancer risk, with a summarized RR = 0.94 (P = 0.051, 95% CI 0.87-1.01). The linear regression model was a better fit for the dose-response relationship, which displayed a potential relationship between the frequency of aspirin use and breast cancer risk (RR = 0.97, 0.95 and 0.90 for 5, 10 and 20 times/week aspirin use, respectively). It was also a better fit for the duration of aspirin use and breast cancer risk (RR = 0.86, 0.73 and 0.54 for 5, 10 and 20 years of aspirin use). We searched MEDLINE, EMBASE and CENTRAL databases through early October 2016 for relevant prospective cohort studies of aspirin use and breast cancer risk. Meta-analysis of relative risks (RR) estimates associated with aspirin intake were presented by fixed or random effects models. The dose-response meta-analysis was performed by linear trend regression and restricted cubic spline regression. Our study confirmed a dose-response relationship between aspirin use and breast cancer risk. For clinical prevention, long term (>5 years) consistent use (2-7 times/week) of aspirin appears to be more effective in achieving a protective effect against breast cancer.
Genetic background in partitioning of metabolizable energy efficiency in dairy cows.
Mehtiö, T; Negussie, E; Mäntysaari, P; Mäntysaari, E A; Lidauer, M H
2018-05-01
The main objective of this study was to assess the genetic differences in metabolizable energy efficiency and efficiency in partitioning metabolizable energy in different pathways: maintenance, milk production, and growth in primiparous dairy cows. Repeatability models for residual energy intake (REI) and metabolizable energy intake (MEI) were compared and the genetic and permanent environmental variations in MEI were partitioned into its energy sinks using random regression models. We proposed 2 new feed efficiency traits: metabolizable energy efficiency (MEE), which is formed by modeling MEI fitting regressions on energy sinks [metabolic body weight (BW 0.75 ), energy-corrected milk, body weight gain, and body weight loss] directly; and partial MEE (pMEE), where the model for MEE is extended with regressions on energy sinks nested within additive genetic and permanent environmental effects. The data used were collected from Luke's experimental farms Rehtijärvi and Minkiö between 1998 and 2014. There were altogether 12,350 weekly MEI records on 495 primiparous Nordic Red dairy cows from wk 2 to 40 of lactation. Heritability estimates for REI and MEE were moderate, 0.33 and 0.26, respectively. The estimate of the residual variance was smaller for MEE than for REI, indicating that analyzing weekly MEI observations simultaneously with energy sinks is preferable. Model validation based on Akaike's information criterion showed that pMEE models fitted the data even better and also resulted in smaller residual variance estimates. However, models that included random regression on BW 0.75 converged slowly. The resulting genetic standard deviation estimate from the pMEE coefficient for milk production was 0.75 MJ of MEI/kg of energy-corrected milk. The derived partial heritabilities for energy efficiency in maintenance, milk production, and growth were 0.02, 0.06, and 0.04, respectively, indicating that some genetic variation may exist in the efficiency of using metabolizable energy for different pathways in dairy cows. Copyright © 2018 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Stylianou, Neophytos; Akbarov, Artur; Kontopantelis, Evangelos; Buchan, Iain; Dunn, Ken W
2015-08-01
Predicting mortality from burn injury has traditionally employed logistic regression models. Alternative machine learning methods have been introduced in some areas of clinical prediction as the necessary software and computational facilities have become accessible. Here we compare logistic regression and machine learning predictions of mortality from burn. An established logistic mortality model was compared to machine learning methods (artificial neural network, support vector machine, random forests and naïve Bayes) using a population-based (England & Wales) case-cohort registry. Predictive evaluation used: area under the receiver operating characteristic curve; sensitivity; specificity; positive predictive value and Youden's index. All methods had comparable discriminatory abilities, similar sensitivities, specificities and positive predictive values. Although some machine learning methods performed marginally better than logistic regression the differences were seldom statistically significant and clinically insubstantial. Random forests were marginally better for high positive predictive value and reasonable sensitivity. Neural networks yielded slightly better prediction overall. Logistic regression gives an optimal mix of performance and interpretability. The established logistic regression model of burn mortality performs well against more complex alternatives. Clinical prediction with a small set of strong, stable, independent predictors is unlikely to gain much from machine learning outside specialist research contexts. Copyright © 2015 Elsevier Ltd and ISBI. All rights reserved.
Assessing mediation using marginal structural models in the presence of confounding and moderation
Coffman, Donna L.; Zhong, Wei
2012-01-01
This paper presents marginal structural models (MSMs) with inverse propensity weighting (IPW) for assessing mediation. Generally, individuals are not randomly assigned to levels of the mediator. Therefore, confounders of the mediator and outcome may exist that limit causal inferences, a goal of mediation analysis. Either regression adjustment or IPW can be used to take confounding into account, but IPW has several advantages. Regression adjustment of even one confounder of the mediator and outcome that has been influenced by treatment results in biased estimates of the direct effect (i.e., the effect of treatment on the outcome that does not go through the mediator). One advantage of IPW is that it can properly adjust for this type of confounding, assuming there are no unmeasured confounders. Further, we illustrate that IPW estimation provides unbiased estimates of all effects when there is a baseline moderator variable that interacts with the treatment, when there is a baseline moderator variable that interacts with the mediator, and when the treatment interacts with the mediator. IPW estimation also provides unbiased estimates of all effects in the presence of non-randomized treatments. In addition, for testing mediation we propose a test of the null hypothesis of no mediation. Finally, we illustrate this approach with an empirical data set in which the mediator is continuous, as is often the case in psychological research. PMID:22905648
Karabatsos, George
2017-02-01
Most of applied statistics involves regression analysis of data. In practice, it is important to specify a regression model that has minimal assumptions which are not violated by data, to ensure that statistical inferences from the model are informative and not misleading. This paper presents a stand-alone and menu-driven software package, Bayesian Regression: Nonparametric and Parametric Models, constructed from MATLAB Compiler. Currently, this package gives the user a choice from 83 Bayesian models for data analysis. They include 47 Bayesian nonparametric (BNP) infinite-mixture regression models; 5 BNP infinite-mixture models for density estimation; and 31 normal random effects models (HLMs), including normal linear models. Each of the 78 regression models handles either a continuous, binary, or ordinal dependent variable, and can handle multi-level (grouped) data. All 83 Bayesian models can handle the analysis of weighted observations (e.g., for meta-analysis), and the analysis of left-censored, right-censored, and/or interval-censored data. Each BNP infinite-mixture model has a mixture distribution assigned one of various BNP prior distributions, including priors defined by either the Dirichlet process, Pitman-Yor process (including the normalized stable process), beta (two-parameter) process, normalized inverse-Gaussian process, geometric weights prior, dependent Dirichlet process, or the dependent infinite-probits prior. The software user can mouse-click to select a Bayesian model and perform data analysis via Markov chain Monte Carlo (MCMC) sampling. After the sampling completes, the software automatically opens text output that reports MCMC-based estimates of the model's posterior distribution and model predictive fit to the data. Additional text and/or graphical output can be generated by mouse-clicking other menu options. This includes output of MCMC convergence analyses, and estimates of the model's posterior predictive distribution, for selected functionals and values of covariates. The software is illustrated through the BNP regression analysis of real data.
2011-01-01
Background Previous research has documented heterogeneity in the effects of maternal education on adverse birth outcomes by nativity and Hispanic subgroup in the United States. In this article, we considered the risk of preterm birth (PTB) using 9 years of vital statistics birth data from New York City. We employed finer categorizations of exposure than used previously and estimated the risk dose-response across the range of education by nativity and ethnicity. Methods Using Bayesian random effects logistic regression models with restricted quadratic spline terms for years of completed maternal education, we calculated and plotted the estimated posterior probabilities of PTB (gestational age < 37 weeks) for each year of education by ethnic and nativity subgroups adjusted for only maternal age, as well as with more extensive covariate adjustments. We then estimated the posterior risk difference between native and foreign born mothers by ethnicity over the continuous range of education exposures. Results The risk of PTB varied substantially by education, nativity and ethnicity. Native born groups showed higher absolute risk of PTB and declining risk associated with higher levels of education beyond about 10 years, as did foreign-born Puerto Ricans. For most other foreign born groups, however, risk of PTB was flatter across the education range. For Mexicans, Central Americans, Dominicans, South Americans and "Others", the protective effect of foreign birth diminished progressively across the educational range. Only for Puerto Ricans was there no nativity advantage for the foreign born, although small numbers of foreign born Cubans limited precision of estimates for that group. Conclusions Using flexible Bayesian regression models with random effects allowed us to estimate absolute risks without strong modeling assumptions. Risk comparisons for any sub-groups at any exposure level were simple to calculate. Shrinkage of posterior estimates through the use of random effects allowed for finer categorization of exposures without restricting joint effects to follow a fixed parametric scale. Although foreign born Hispanic women with the least education appeared to generally have low risk, this seems likely to be a marker for unmeasured environmental and behavioral factors, rather than a causally protective effect of low education itself. PMID:21504612
Kaufman, Jay S; MacLehose, Richard F; Torrone, Elizabeth A; Savitz, David A
2011-04-19
Previous research has documented heterogeneity in the effects of maternal education on adverse birth outcomes by nativity and Hispanic subgroup in the United States. In this article, we considered the risk of preterm birth (PTB) using 9 years of vital statistics birth data from New York City. We employed finer categorizations of exposure than used previously and estimated the risk dose-response across the range of education by nativity and ethnicity. Using Bayesian random effects logistic regression models with restricted quadratic spline terms for years of completed maternal education, we calculated and plotted the estimated posterior probabilities of PTB (gestational age < 37 weeks) for each year of education by ethnic and nativity subgroups adjusted for only maternal age, as well as with more extensive covariate adjustments. We then estimated the posterior risk difference between native and foreign born mothers by ethnicity over the continuous range of education exposures. The risk of PTB varied substantially by education, nativity and ethnicity. Native born groups showed higher absolute risk of PTB and declining risk associated with higher levels of education beyond about 10 years, as did foreign-born Puerto Ricans. For most other foreign born groups, however, risk of PTB was flatter across the education range. For Mexicans, Central Americans, Dominicans, South Americans and "Others", the protective effect of foreign birth diminished progressively across the educational range. Only for Puerto Ricans was there no nativity advantage for the foreign born, although small numbers of foreign born Cubans limited precision of estimates for that group. Using flexible Bayesian regression models with random effects allowed us to estimate absolute risks without strong modeling assumptions. Risk comparisons for any sub-groups at any exposure level were simple to calculate. Shrinkage of posterior estimates through the use of random effects allowed for finer categorization of exposures without restricting joint effects to follow a fixed parametric scale. Although foreign born Hispanic women with the least education appeared to generally have low risk, this seems likely to be a marker for unmeasured environmental and behavioral factors, rather than a causally protective effect of low education itself.
ERIC Educational Resources Information Center
Shear, Benjamin R.; Zumbo, Bruno D.
2013-01-01
Type I error rates in multiple regression, and hence the chance for false positive research findings, can be drastically inflated when multiple regression models are used to analyze data that contain random measurement error. This article shows the potential for inflated Type I error rates in commonly encountered scenarios and provides new…
Marginalized zero-altered models for longitudinal count data.
Tabb, Loni Philip; Tchetgen, Eric J Tchetgen; Wellenius, Greg A; Coull, Brent A
2016-10-01
Count data often exhibit more zeros than predicted by common count distributions like the Poisson or negative binomial. In recent years, there has been considerable interest in methods for analyzing zero-inflated count data in longitudinal or other correlated data settings. A common approach has been to extend zero-inflated Poisson models to include random effects that account for correlation among observations. However, these models have been shown to have a few drawbacks, including interpretability of regression coefficients and numerical instability of fitting algorithms even when the data arise from the assumed model. To address these issues, we propose a model that parameterizes the marginal associations between the count outcome and the covariates as easily interpretable log relative rates, while including random effects to account for correlation among observations. One of the main advantages of this marginal model is that it allows a basis upon which we can directly compare the performance of standard methods that ignore zero inflation with that of a method that explicitly takes zero inflation into account. We present simulations of these various model formulations in terms of bias and variance estimation. Finally, we apply the proposed approach to analyze toxicological data of the effect of emissions on cardiac arrhythmias.
Marginalized zero-altered models for longitudinal count data
Tabb, Loni Philip; Tchetgen, Eric J. Tchetgen; Wellenius, Greg A.; Coull, Brent A.
2015-01-01
Count data often exhibit more zeros than predicted by common count distributions like the Poisson or negative binomial. In recent years, there has been considerable interest in methods for analyzing zero-inflated count data in longitudinal or other correlated data settings. A common approach has been to extend zero-inflated Poisson models to include random effects that account for correlation among observations. However, these models have been shown to have a few drawbacks, including interpretability of regression coefficients and numerical instability of fitting algorithms even when the data arise from the assumed model. To address these issues, we propose a model that parameterizes the marginal associations between the count outcome and the covariates as easily interpretable log relative rates, while including random effects to account for correlation among observations. One of the main advantages of this marginal model is that it allows a basis upon which we can directly compare the performance of standard methods that ignore zero inflation with that of a method that explicitly takes zero inflation into account. We present simulations of these various model formulations in terms of bias and variance estimation. Finally, we apply the proposed approach to analyze toxicological data of the effect of emissions on cardiac arrhythmias. PMID:27867423
NASA Astrophysics Data System (ADS)
Trigila, Alessandro; Iadanza, Carla; Esposito, Carlo; Scarascia-Mugnozza, Gabriele
2015-11-01
The aim of this work is to define reliable susceptibility models for shallow landslides using Logistic Regression and Random Forests multivariate statistical techniques. The study area, located in North-East Sicily, was hit on October 1st 2009 by a severe rainstorm (225 mm of cumulative rainfall in 7 h) which caused flash floods and more than 1000 landslides. Several small villages, such as Giampilieri, were hit with 31 fatalities, 6 missing persons and damage to buildings and transportation infrastructures. Landslides, mainly types such as earth and debris translational slides evolving into debris flows, were triggered on steep slopes and involved colluvium and regolith materials which cover the underlying metamorphic bedrock. The work has been carried out with the following steps: i) realization of a detailed event landslide inventory map through field surveys coupled with observation of high resolution aerial colour orthophoto; ii) identification of landslide source areas; iii) data preparation of landslide controlling factors and descriptive statistics based on a bivariate method (Frequency Ratio) to get an initial overview on existing relationships between causative factors and shallow landslide source areas; iv) choice of criteria for the selection and sizing of the mapping unit; v) implementation of 5 multivariate statistical susceptibility models based on Logistic Regression and Random Forests techniques and focused on landslide source areas; vi) evaluation of the influence of sample size and type of sampling on results and performance of the models; vii) evaluation of the predictive capabilities of the models using ROC curve, AUC and contingency tables; viii) comparison of model results and obtained susceptibility maps; and ix) analysis of temporal variation of landslide susceptibility related to input parameter changes. Models based on Logistic Regression and Random Forests have demonstrated excellent predictive capabilities. Land use and wildfire variables were found to have a strong control on the occurrence of very rapid shallow landslides.
ERIC Educational Resources Information Center
Moss, Brian G.; Yeaton, William H.; Lloyd, Jane E.
2014-01-01
Using a novel design approach, a randomized experiment (RE) was embedded within a regression discontinuity (RD) design (R-RE-D) to evaluate the impact of developmental mathematics at a large midwestern college ("n" = 2,122). Within a region of uncertainty near the cut-score, estimates of benefit from a prospective RE were closely…
Hyper-Spectral Image Analysis With Partially Latent Regression and Spatial Markov Dependencies
NASA Astrophysics Data System (ADS)
Deleforge, Antoine; Forbes, Florence; Ba, Sileye; Horaud, Radu
2015-09-01
Hyper-spectral data can be analyzed to recover physical properties at large planetary scales. This involves resolving inverse problems which can be addressed within machine learning, with the advantage that, once a relationship between physical parameters and spectra has been established in a data-driven fashion, the learned relationship can be used to estimate physical parameters for new hyper-spectral observations. Within this framework, we propose a spatially-constrained and partially-latent regression method which maps high-dimensional inputs (hyper-spectral images) onto low-dimensional responses (physical parameters such as the local chemical composition of the soil). The proposed regression model comprises two key features. Firstly, it combines a Gaussian mixture of locally-linear mappings (GLLiM) with a partially-latent response model. While the former makes high-dimensional regression tractable, the latter enables to deal with physical parameters that cannot be observed or, more generally, with data contaminated by experimental artifacts that cannot be explained with noise models. Secondly, spatial constraints are introduced in the model through a Markov random field (MRF) prior which provides a spatial structure to the Gaussian-mixture hidden variables. Experiments conducted on a database composed of remotely sensed observations collected from the Mars planet by the Mars Express orbiter demonstrate the effectiveness of the proposed model.
Ready-to-use supplementary food increases fat mass and BMI in Haitian school-aged children.
Iannotti, Lora L; Henretty, Nicole M; Delnatus, Jacques Raymond; Previl, Windy; Stehl, Tom; Vorkoper, Susan; Bodden, Jaime; Maust, Amanda; Smidt, Rachel; Nash, Marilyn L; Tamimie, Courtney A; Owen, Bridget C; Wolff, Patricia B
2015-04-01
In Haiti and other countries, large-scale investments in school feeding programs have been made with marginal evidence of nutrition outcomes. We aimed to examine the effectiveness of a fortified ready-to-use supplementary food (RUSF), Mamba, on reduced anemia and improved body composition in school-aged children compared to an unfortified cereal bar, Tablet Yo, and control groups. A cluster, randomized trial with children ages 3-13 y (n = 1167) was conducted in the north of Haiti. Six schools were matched and randomized to the control group, Tablet Yo group (42 g, 165 kcal), or Mamba group (50 g, 260 kcal, and >75% of the RDA for critical micronutrients). Children in the supplementation groups received the snack daily for 100 d, and all were followed longitudinally for hemoglobin concentrations, anthropometry, and bioelectrical impedance measures: baseline (December 2012), midline (March 2013), and endline (June 2013). Parent surveys were conducted at baseline and endline to examine secondary outcomes of morbidities and dietary intakes. Longitudinal regression modeling using generalized least squares and logit with random effects tested the main effects. At baseline,14.0% of children were stunted, 14.5% underweight, 9.1% thin, and 73% anemic. Fat mass percentage (mean ± SD) was 8.1% ± 4.3% for boys and 12.5% ± 4.4% for girls. In longitudinal modeling, Mamba supplementation increased body mass index z score (regression coefficient ± SEE) 0.25 ± 0.06, fat mass 0.45 ± 0.14 kg, and percentage fat mass 1.28% ± 0.27% compared with control at each time point (P < 0.001). Among boys, Mamba increased fat mass (regression coefficient ± SEE) 0.73 ± 0.19 kg and fat-free mass 0.62 ± 0.34 kg compared with control (P < 0.001). Mamba reduced the odds of developing anemia by 28% compared to control (adjusted OR: 0.72; 95% CI: 0.57, 0.91; P < 0.001). No treatment effect was found for hemoglobin concentration. To our knowledge, this is the first study to give evidence of body composition effects from an RUSF in school-aged children. © 2015 American Society for Nutrition.
Bayesian Regression with Network Prior: Optimal Bayesian Filtering Perspective
Qian, Xiaoning; Dougherty, Edward R.
2017-01-01
The recently introduced intrinsically Bayesian robust filter (IBRF) provides fully optimal filtering relative to a prior distribution over an uncertainty class ofjoint random process models, whereas formerly the theory was limited to model-constrained Bayesian robust filters, for which optimization was limited to the filters that are optimal for models in the uncertainty class. This paper extends the IBRF theory to the situation where there are both a prior on the uncertainty class and sample data. The result is optimal Bayesian filtering (OBF), where optimality is relative to the posterior distribution derived from the prior and the data. The IBRF theories for effective characteristics and canonical expansions extend to the OBF setting. A salient focus of the present work is to demonstrate the advantages of Bayesian regression within the OBF setting over the classical Bayesian approach in the context otlinear Gaussian models. PMID:28824268
Using Cluster Bootstrapping to Analyze Nested Data With a Few Clusters.
Huang, Francis L
2018-04-01
Cluster randomized trials involving participants nested within intact treatment and control groups are commonly performed in various educational, psychological, and biomedical studies. However, recruiting and retaining intact groups present various practical, financial, and logistical challenges to evaluators and often, cluster randomized trials are performed with a low number of clusters (~20 groups). Although multilevel models are often used to analyze nested data, researchers may be concerned of potentially biased results due to having only a few groups under study. Cluster bootstrapping has been suggested as an alternative procedure when analyzing clustered data though it has seen very little use in educational and psychological studies. Using a Monte Carlo simulation that varied the number of clusters, average cluster size, and intraclass correlations, we compared standard errors using cluster bootstrapping with those derived using ordinary least squares regression and multilevel models. Results indicate that cluster bootstrapping, though more computationally demanding, can be used as an alternative procedure for the analysis of clustered data when treatment effects at the group level are of primary interest. Supplementary material showing how to perform cluster bootstrapped regressions using R is also provided.
van der Meer, D; Hoekstra, P J; van Donkelaar, M; Bralten, J; Oosterlaan, J; Heslenfeld, D; Faraone, S V; Franke, B; Buitelaar, J K; Hartman, C A
2017-01-01
Identifying genetic variants contributing to attention-deficit/hyperactivity disorder (ADHD) is complicated by the involvement of numerous common genetic variants with small effects, interacting with each other as well as with environmental factors, such as stress exposure. Random forest regression is well suited to explore this complexity, as it allows for the analysis of many predictors simultaneously, taking into account any higher-order interactions among them. Using random forest regression, we predicted ADHD severity, measured by Conners’ Parent Rating Scales, from 686 adolescents and young adults (of which 281 were diagnosed with ADHD). The analysis included 17 374 single-nucleotide polymorphisms (SNPs) across 29 genes previously linked to hypothalamic–pituitary–adrenal (HPA) axis activity, together with information on exposure to 24 individual long-term difficulties or stressful life events. The model explained 12.5% of variance in ADHD severity. The most important SNP, which also showed the strongest interaction with stress exposure, was located in a region regulating the expression of telomerase reverse transcriptase (TERT). Other high-ranking SNPs were found in or near NPSR1, ESR1, GABRA6, PER3, NR3C2 and DRD4. Chronic stressors were more influential than single, severe, life events. Top hits were partly shared with conduct problems. We conclude that random forest regression may be used to investigate how multiple genetic and environmental factors jointly contribute to ADHD. It is able to implicate novel SNPs of interest, interacting with stress exposure, and may explain inconsistent findings in ADHD genetics. This exploratory approach may be best combined with more hypothesis-driven research; top predictors and their interactions with one another should be replicated in independent samples. PMID:28585928
2012-01-01
Background With the current focus on personalized medicine, patient/subject level inference is often of key interest in translational research. As a result, random effects models (REM) are becoming popular for patient level inference. However, for very large data sets that are characterized by large sample size, it can be difficult to fit REM using commonly available statistical software such as SAS since they require inordinate amounts of computer time and memory allocations beyond what are available preventing model convergence. For example, in a retrospective cohort study of over 800,000 Veterans with type 2 diabetes with longitudinal data over 5 years, fitting REM via generalized linear mixed modeling using currently available standard procedures in SAS (e.g. PROC GLIMMIX) was very difficult and same problems exist in Stata’s gllamm or R’s lme packages. Thus, this study proposes and assesses the performance of a meta regression approach and makes comparison with methods based on sampling of the full data. Data We use both simulated and real data from a national cohort of Veterans with type 2 diabetes (n=890,394) which was created by linking multiple patient and administrative files resulting in a cohort with longitudinal data collected over 5 years. Methods and results The outcome of interest was mean annual HbA1c measured over a 5 years period. Using this outcome, we compared parameter estimates from the proposed random effects meta regression (REMR) with estimates based on simple random sampling and VISN (Veterans Integrated Service Networks) based stratified sampling of the full data. Our results indicate that REMR provides parameter estimates that are less likely to be biased with tighter confidence intervals when the VISN level estimates are homogenous. Conclusion When the interest is to fit REM in repeated measures data with very large sample size, REMR can be used as a good alternative. It leads to reasonable inference for both Gaussian and non-Gaussian responses if parameter estimates are homogeneous across VISNs. PMID:23095325
Risk estimation using probability machines
2014-01-01
Background Logistic regression has been the de facto, and often the only, model used in the description and analysis of relationships between a binary outcome and observed features. It is widely used to obtain the conditional probabilities of the outcome given predictors, as well as predictor effect size estimates using conditional odds ratios. Results We show how statistical learning machines for binary outcomes, provably consistent for the nonparametric regression problem, can be used to provide both consistent conditional probability estimation and conditional effect size estimates. Effect size estimates from learning machines leverage our understanding of counterfactual arguments central to the interpretation of such estimates. We show that, if the data generating model is logistic, we can recover accurate probability predictions and effect size estimates with nearly the same efficiency as a correct logistic model, both for main effects and interactions. We also propose a method using learning machines to scan for possible interaction effects quickly and efficiently. Simulations using random forest probability machines are presented. Conclusions The models we propose make no assumptions about the data structure, and capture the patterns in the data by just specifying the predictors involved and not any particular model structure. So they do not run the same risks of model mis-specification and the resultant estimation biases as a logistic model. This methodology, which we call a “risk machine”, will share properties from the statistical machine that it is derived from. PMID:24581306
Risk estimation using probability machines.
Dasgupta, Abhijit; Szymczak, Silke; Moore, Jason H; Bailey-Wilson, Joan E; Malley, James D
2014-03-01
Logistic regression has been the de facto, and often the only, model used in the description and analysis of relationships between a binary outcome and observed features. It is widely used to obtain the conditional probabilities of the outcome given predictors, as well as predictor effect size estimates using conditional odds ratios. We show how statistical learning machines for binary outcomes, provably consistent for the nonparametric regression problem, can be used to provide both consistent conditional probability estimation and conditional effect size estimates. Effect size estimates from learning machines leverage our understanding of counterfactual arguments central to the interpretation of such estimates. We show that, if the data generating model is logistic, we can recover accurate probability predictions and effect size estimates with nearly the same efficiency as a correct logistic model, both for main effects and interactions. We also propose a method using learning machines to scan for possible interaction effects quickly and efficiently. Simulations using random forest probability machines are presented. The models we propose make no assumptions about the data structure, and capture the patterns in the data by just specifying the predictors involved and not any particular model structure. So they do not run the same risks of model mis-specification and the resultant estimation biases as a logistic model. This methodology, which we call a "risk machine", will share properties from the statistical machine that it is derived from.
Safety and efficacy of antihypertensive prescription at emergency department discharge.
Brody, Aaron; Rahman, Tahsin; Reed, Brian; Millis, Scott; Ference, Brian; Flack, John M; Levy, Phillip D
2015-05-01
Poor blood pressure (BP) control is a primary risk factor for target organ damage in the heart, brain, and kidney. Uncontrolled hypertension is common among emergency department (ED) patients, particularly in underresourced settings, but it is unclear what role ED providers should play in the management of chronic antihypertensive therapy. The objective was to evaluate the safety and efficacy of prescribing antihypertensive therapy from the ED. This was a retrospective study of data pooled from two prospective, longitudinal, randomized controlled trials, both of which enrolled ED patients with asymptomatic hypertension. Antihypertensives were prescribed at emergency physician discretion, and this was not related to randomization arm. Demographic data, BP at screening and randomization visit, and data on adverse effects potentially related to antihypertensive therapy were compiled. Means were compared using Student's t-tests, and proportions were compared using chi-square tests. The effect of antihypertensive therapy on BP control was further analyzed using multivariable regression modeling controlling for age, race, sex, hypertension history, study cohort, and ED BP. Data were abstracted for 217 subjects. The median interval from ED visit to randomization was 12 days. Seventy-six subjects (35%) received one or more prescriptions for antihypertensive therapy. Age, sex, race, hypertension history, and mean duration of hypertension were equivalent between groups. Although mean ED BP was higher among those who received prescriptions, the mean systolic BP (sBP) reduction from ED to randomization was significantly greater (difference = 19 mm Hg, 95% confidence interval = 12 to 26 mm Hg). No patient in either group had an sBP less than 100 mm Hg at randomization. On multiple regression modeling, randomization sBP reduction was independently associated with antihypertensive prescription (p = 0.001). The incidence of adverse effects was equivalent and low in both groups. No new neurological deficits, ischemic events, or life-threatening anaphylactic reactions were reported in either group. Prescription of antihypertensive medication from the ED is associated with significantly lower sBP at short-term outpatient follow-up. Antihypertensive therapy was not associated with an increased incidence of adverse events, and BP reduction did not exceed potentially harmful levels. Initiation of chronic antihypertensive therapy in the ED is safe and effective and may be a reasonable consideration for at-risk populations. © 2015 by the Society for Academic Emergency Medicine.
Honarvar, Mohammad Reza; Eghtesadi, Shahryar; Gill, Pooria; Jazayeri, Shima; Vakili, Mohammad Ali; Shamsardekani, Mohammad Reza; Abbasi, Abdollah
2016-01-01
Background: Acceleration in sputum smear conversion helps faster improvement and decreased probability of the transfer of TB. In this study, we aimed to investigate the effect of green tea extract supplementation on sputum smear conversion and weight changes in smear positive pulmonary TB patients in Iran. Methods: In this double blind clinical study, TB patients were divided into intervention, (n=43) receiving 500 mg green tea extract (GTE), and control groups (n=40) receiving placebo for two months, using balanced randomization. Random allocation and allocation concealment were observed. Height and weight were measured at the beginning, and two and six months post-treatment. Evaluations were performed on three slides, using the ZiehlNeelsen method. Independent and paired t test, McNemar’s, Wilcoxon, Kaplan-Meier, Cox regression model and Log-Rank test were utilized. Statistical significance was set at p<0.05. This trial was registered under IRCT201212232602N11. Results: The interventional changes and the interactive effect of intervention on weight were not significant (p>0.05). In terms of shortening the duration of conversion, the case to control proportion showed a significant difference (p=0.032). Based on the Cox regression model, the hazard ratio of the relative risk of delay in sputum smear conversion was 3.7 (p=0.002) in the higher microbial load group compared to the placebo group and 0.54 (95% CI: 0.31-0.94) in the intervention compared to the placebo group. Conclusion: GTE decreases the risk of delay in sputum smear conversion, but has no effect on weight gain. Moreover, it may be used as an adjuvant therapy for faster rehabilitation for pulmonary TB patients. PMID:27493925
Marchese Robinson, Richard L; Palczewska, Anna; Palczewski, Jan; Kidley, Nathan
2017-08-28
The ability to interpret the predictions made by quantitative structure-activity relationships (QSARs) offers a number of advantages. While QSARs built using nonlinear modeling approaches, such as the popular Random Forest algorithm, might sometimes be more predictive than those built using linear modeling approaches, their predictions have been perceived as difficult to interpret. However, a growing number of approaches have been proposed for interpreting nonlinear QSAR models in general and Random Forest in particular. In the current work, we compare the performance of Random Forest to those of two widely used linear modeling approaches: linear Support Vector Machines (SVMs) (or Support Vector Regression (SVR)) and partial least-squares (PLS). We compare their performance in terms of their predictivity as well as the chemical interpretability of the predictions using novel scoring schemes for assessing heat map images of substructural contributions. We critically assess different approaches for interpreting Random Forest models as well as for obtaining predictions from the forest. We assess the models on a large number of widely employed public-domain benchmark data sets corresponding to regression and binary classification problems of relevance to hit identification and toxicology. We conclude that Random Forest typically yields comparable or possibly better predictive performance than the linear modeling approaches and that its predictions may also be interpreted in a chemically and biologically meaningful way. In contrast to earlier work looking at interpretation of nonlinear QSAR models, we directly compare two methodologically distinct approaches for interpreting Random Forest models. The approaches for interpreting Random Forest assessed in our article were implemented using open-source programs that we have made available to the community. These programs are the rfFC package ( https://r-forge.r-project.org/R/?group_id=1725 ) for the R statistical programming language and the Python program HeatMapWrapper [ https://doi.org/10.5281/zenodo.495163 ] for heat map generation.
2015-01-01
In a multicenter study, the overall relationship between exposure and the risk of cancer can be broken down into a within-center component, which reflects the individual level association, and a between-center relationship, which captures the association at the aggregate level. A piecewise exponential proportional hazards model with random effects was used to evaluate the association between dietary fiber intake and colorectal cancer (CRC) risk in the EPIC study. During an average follow-up of 11.0 years, 4,517 CRC events occurred among study participants recruited in 28 centers from ten European countries. Models were adjusted by relevant confounding factors. Heterogeneity among centers was modelled with random effects. Linear regression calibration was used to account for errors in dietary questionnaire (DQ) measurements. Risk ratio estimates for a 10 g/day increment in dietary fiber were equal to 0.90 (95%CI: 0.85, 0.96) and 0.85 (0.64, 1.14), at the individual and aggregate levels, respectively, while calibrated estimates were 0.85 (0.76, 0.94), and 0.87 (0.65, 1.15), respectively. In multicenter studies, over a straightforward ecological analysis, random effects models allow information at the individual and ecologic levels to be captured, while controlling for confounding at both levels of evidence. PMID:25785729
Cai, W; Kaiser, M S; Dekkers, J C M
2011-05-01
A 5-generation selection experiment in Yorkshire pigs for feed efficiency consists of a line selected for low residual feed intake (LRFI) and a random control line (CTRL). The objectives of this study were to use random regression models to estimate genetic parameters for daily feed intake (DFI), BW, backfat (BF), and loin muscle area (LMA) along the growth trajectory and to evaluate the effect of LRFI selection on genetic curves for DFI and BW. An additional objective was to compare random regression models using polynomials (RRP) and spline functions (RRS). Data from approximately 3 to 8 mo of age on 586 boars and 495 gilts across 5 generations were used. The average number of measurements was 85, 14, 5, and 5 for DFI, BW, BF, and LMA. The RRP models for these 4 traits were fitted with pen × on-test group as a fixed effect, second-order Legendre polynomials of age as fixed curves for each generation, and random curves for additive genetic and permanent environmental effects. Different residual variances were used for the first and second halves of the test period. The RRS models were fitted with the same fixed effects and residual variance structure as the RRP models and included genetic and permanent environmental random effects for both splines and linear Legendre polynomials of age. The RRP model was used for further analysis because the RRS model had erratic estimates of phenotypic variance and heritability, despite having a smaller Bayesian information criterion than the RRP model. From 91 to 210 d of age, estimates of heritability from the RRP model ranged from 0.10 to 0.37 for boars and 0.14 to 0.26 for gilts for DFI, from 0.39 to 0.58 for boars and 0.55 to 0.61 for gilts for BW, from 0.48 to 0.61 for boars and 0.61 to 0.79 for gilts for BF, and from 0.46 to 0.55 for boars and 0.63 to 0.81 for gilts for LMA. In generation 5, LRFI pigs had lower average genetic curves than CTRL pigs for DFI and BW, especially toward the end of the test period; estimated line differences (CTRL-LRFI) for DFI were 0.04 kg/d for boars and 0.12 kg/d for gilts at 105 d and 0.20 kg/d for boars and 0.24 kg/d for gilts at 195 d. Line differences for BW were 0.17 kg for boars and 0.69 kg for gilts at 105 d and 3.49 kg for boars and 8.96 kg for gilts at 195 d. In conclusion, selection for LRFI has resulted in a lower feed intake curve and a lower BW curve toward maturity.
Kassanjee, Reshma; De Angelis, Daniela; Farah, Marian; Hanson, Debra; Labuschagne, Jan Phillipus Lourens; Laeyendecker, Oliver; Le Vu, Stéphane; Tom, Brian; Wang, Rui; Welte, Alex
2017-03-01
The application of biomarkers for 'recent' infection in cross-sectional HIV incidence surveillance requires the estimation of critical biomarker characteristics. Various approaches have been employed for using longitudinal data to estimate the Mean Duration of Recent Infection (MDRI) - the average time in the 'recent' state. In this systematic benchmarking of MDRI estimation approaches, a simulation platform was used to measure accuracy and precision of over twenty approaches, in thirty scenarios capturing various study designs, subject behaviors and test dynamics that may be encountered in practice. Results highlight that assuming a single continuous sojourn in the 'recent' state can produce substantial bias. Simple interpolation provides useful MDRI estimates provided subjects are tested at regular intervals. Regression performs the best - while 'random effects' describe the subject-clustering in the data, regression models without random effects proved easy to implement, stable, and of similar accuracy in scenarios considered; robustness to parametric assumptions was improved by regressing 'recent'/'non-recent' classifications rather than continuous biomarker readings. All approaches were vulnerable to incorrect assumptions about subjects' (unobserved) infection times. Results provided show the relationships between MDRI estimation performance and the number of subjects, inter-visit intervals, missed visits, loss to follow-up, and aspects of biomarker signal and noise.
Liu, Qi; Wu, Youcong; Yuan, Youhua; Bai, Li; Niu, Kun
2011-12-01
To research the relationship between the virulence factors of Saccharomyces albicans (S. albicans) and the random amplified polymorphic DNA (RAPD) bands of them, and establish the regression model by multiple regression analysis. Extracellular phospholipase, secreted proteinase, ability to generate germ tubes and adhere to oral mucosal cells of 92 strains of S. albicans were measured in vitro; RAPD-polymerase chain reaction (RAPD-PCR) was used to get their bands. Multiple regression for virulence factors of S. albicans and RAPD-PCR bands was established. The extracellular phospholipase activity was associated with 4 RAPD bands: 350, 450, 650 and 1 300 bp (P < 0.05); secreted proteinase activity of S. albicans was associated with 2 bands: 350 and 1 200 bp (P < 0.05); the ability of germ tube produce was associated with 2 bands: 400 and 550 bp (P < 0.05). Some RAPD bands will reflect the virulence factors of S. albicans indirectly. These bands would contain some important messages for regulation of S. albicans virulence factors.
Theobald, Roddy; Freeman, Scott
2014-01-01
Although researchers in undergraduate science, technology, engineering, and mathematics education are currently using several methods to analyze learning gains from pre- and posttest data, the most commonly used approaches have significant shortcomings. Chief among these is the inability to distinguish whether differences in learning gains are due to the effect of an instructional intervention or to differences in student characteristics when students cannot be assigned to control and treatment groups at random. Using pre- and posttest scores from an introductory biology course, we illustrate how the methods currently in wide use can lead to erroneous conclusions, and how multiple linear regression offers an effective framework for distinguishing the impact of an instructional intervention from the impact of student characteristics on test score gains. In general, we recommend that researchers always use student-level regression models that control for possible differences in student ability and preparation to estimate the effect of any nonrandomized instructional intervention on student performance. PMID:24591502
Theobald, Roddy; Freeman, Scott
2014-01-01
Although researchers in undergraduate science, technology, engineering, and mathematics education are currently using several methods to analyze learning gains from pre- and posttest data, the most commonly used approaches have significant shortcomings. Chief among these is the inability to distinguish whether differences in learning gains are due to the effect of an instructional intervention or to differences in student characteristics when students cannot be assigned to control and treatment groups at random. Using pre- and posttest scores from an introductory biology course, we illustrate how the methods currently in wide use can lead to erroneous conclusions, and how multiple linear regression offers an effective framework for distinguishing the impact of an instructional intervention from the impact of student characteristics on test score gains. In general, we recommend that researchers always use student-level regression models that control for possible differences in student ability and preparation to estimate the effect of any nonrandomized instructional intervention on student performance.
Linear Regression with a Randomly Censored Covariate: Application to an Alzheimer's Study.
Atem, Folefac D; Qian, Jing; Maye, Jacqueline E; Johnson, Keith A; Betensky, Rebecca A
2017-01-01
The association between maternal age of onset of dementia and amyloid deposition (measured by in vivo positron emission tomography (PET) imaging) in cognitively normal older offspring is of interest. In a regression model for amyloid, special methods are required due to the random right censoring of the covariate of maternal age of onset of dementia. Prior literature has proposed methods to address the problem of censoring due to assay limit of detection, but not random censoring. We propose imputation methods and a survival regression method that do not require parametric assumptions about the distribution of the censored covariate. Existing imputation methods address missing covariates, but not right censored covariates. In simulation studies, we compare these methods to the simple, but inefficient complete case analysis, and to thresholding approaches. We apply the methods to the Alzheimer's study.
Kaitaniemi, Pekka
2008-04-09
Allometric equations are widely used in many branches of biological science. The potential information content of the normalization constant b in allometric equations of the form Y = bX(a) has, however, remained largely neglected. To demonstrate the potential for utilizing this information, I generated a large number of artificial datasets that resembled those that are frequently encountered in biological studies, i.e., relatively small samples including measurement error or uncontrolled variation. The value of X was allowed to vary randomly within the limits describing different data ranges, and a was set to a fixed theoretical value. The constant b was set to a range of values describing the effect of a continuous environmental variable. In addition, a normally distributed random error was added to the values of both X and Y. Two different approaches were then used to model the data. The traditional approach estimated both a and b using a regression model, whereas an alternative approach set the exponent a at its theoretical value and only estimated the value of b. Both approaches produced virtually the same model fit with less than 0.3% difference in the coefficient of determination. Only the alternative approach was able to precisely reproduce the effect of the environmental variable, which was largely lost among noise variation when using the traditional approach. The results show how the value of b can be used as a source of valuable biological information if an appropriate regression model is selected.
Peer influence on pre-adolescent girls' snack intake: effects of weight status.
Salvy, Sarah-Jeanne; Romero, Natalie; Paluch, Rocco; Epstein, Leonard H
2007-07-01
Although most eating occurs in a social context, the effects of peer influence on child eating have not been the object of systematic experimental study. The present study assesses the effects of peer influence on lean and overweight pre-adolescent girls' snack intake as a function of the co-eaters' weight status. The weight status of the participants was varied by studying weight discordant dyads (i.e., one lean and one overweight participant) and weight concordant dyads (i.e., both members of the dyads were either lean or overweight). Results from the random regression model indicate that overweight girls eating with an overweight peer consumed more kilocalories than overweight participants eating with a normal-weight peer. Normal-weight participants eating with overweight peers ate similar amounts as those eating with lean eating companions. The regression model improved when the partners' food intake was entered in the model, indicating that the peers' intake was a significant predictor of participants' snack consumption. This study underscores differences in responses to the social environment between overweight and non-overweight youths.
Hewitt, Angela L.; Popa, Laurentiu S.; Pasalar, Siavash; Hendrix, Claudia M.
2011-01-01
Encoding of movement kinematics in Purkinje cell simple spike discharge has important implications for hypotheses of cerebellar cortical function. Several outstanding questions remain regarding representation of these kinematic signals. It is uncertain whether kinematic encoding occurs in unpredictable, feedback-dependent tasks or kinematic signals are conserved across tasks. Additionally, there is a need to understand the signals encoded in the instantaneous discharge of single cells without averaging across trials or time. To address these questions, this study recorded Purkinje cell firing in monkeys trained to perform a manual random tracking task in addition to circular tracking and center-out reach. Random tracking provides for extensive coverage of kinematic workspaces. Direction and speed errors are significantly greater during random than circular tracking. Cross-correlation analyses comparing hand and target velocity profiles show that hand velocity lags target velocity during random tracking. Correlations between simple spike firing from 120 Purkinje cells and hand position, velocity, and speed were evaluated with linear regression models including a time constant, τ, as a measure of the firing lead/lag relative to the kinematic parameters. Across the population, velocity accounts for the majority of simple spike firing variability (63 ± 30% of Radj2), followed by position (28 ± 24% of Radj2) and speed (11 ± 19% of Radj2). Simple spike firing often leads hand kinematics. Comparison of regression models based on averaged vs. nonaveraged firing and kinematics reveals lower Radj2 values for nonaveraged data; however, regression coefficients and τ values are highly similar. Finally, for most cells, model coefficients generated from random tracking accurately estimate simple spike firing in either circular tracking or center-out reach. These findings imply that the cerebellum controls movement kinematics, consistent with a forward internal model that predicts upcoming limb kinematics. PMID:21795616
The extension of total gain (TG) statistic in survival models: properties and applications.
Choodari-Oskooei, Babak; Royston, Patrick; Parmar, Mahesh K B
2015-07-01
The results of multivariable regression models are usually summarized in the form of parameter estimates for the covariates, goodness-of-fit statistics, and the relevant p-values. These statistics do not inform us about whether covariate information will lead to any substantial improvement in prediction. Predictive ability measures can be used for this purpose since they provide important information about the practical significance of prognostic factors. R (2)-type indices are the most familiar forms of such measures in survival models, but they all have limitations and none is widely used. In this paper, we extend the total gain (TG) measure, proposed for a logistic regression model, to survival models and explore its properties using simulations and real data. TG is based on the binary regression quantile plot, otherwise known as the predictiveness curve. Standardised TG ranges from 0 (no explanatory power) to 1 ('perfect' explanatory power). The results of our simulations show that unlike many of the other R (2)-type predictive ability measures, TG is independent of random censoring. It increases as the effect of a covariate increases and can be applied to different types of survival models, including models with time-dependent covariate effects. We also apply TG to quantify the predictive ability of multivariable prognostic models developed in several disease areas. Overall, TG performs well in our simulation studies and can be recommended as a measure to quantify the predictive ability in survival models.
Mazidi, Mohsen; Karimi, Ehsan; Rezaie, Peyman; Ferns, Gordon A
2017-07-01
To undertake a systematic review and meta-analysis of randomized controlled trials of the effect of glucagon-like peptide-1 receptor agonist (GLP-1 RAs) therapy on serum C-reactive protein (CRP) concentrations. PubMed-Medline, SCOPUS, Web of Science and Google Scholar databases were searched for the period up until March 16, 2016. Prospective studies evaluating the impact of GLP-1 RAs on serum CRP were identified. A random effects model (using the DerSimonian-Laird method) and generic inverse variance methods were used for quantitative data synthesis. Sensitivity analysis was conducted using the leave-one-out method. Heterogeneity was quantitatively assessed using the I 2 index. Random effects meta-regression was performed using unrestricted maximum likelihood method to evaluate the impact of potential moderator. International Prospective Register for Systematic Reviews (PROSPERO) number CRD42016036868. Meta-analysis of the data from 7 treatment arms revealed a significant reduction in serum CRP concentrations following treatment with GLP-1 RAs (WMD -2.14 (mg/dL), 95% CI -3.51, -0.78, P=0.002; I 2 96.1%). Removal of one study in the meta-analysis did not change the result in the sensitivity analysis (WMD -2.14 (mg/dL), 95% CI -3.51, -0.78, P=0.002; I 2 96.1%), indicating that our results could not be solely attributed to the effect of a single study. Random effects meta-regression was performed to evaluate the impact of potential moderator on the estimated effect size. Changes in serum CRP concentration were associated with the duration of treatment (slope -0.097, 95% CI -0.158, -0.042, P<0.001). This meta-analysis suggests that GLP-1 RAs therapy causes a significant reduction in CRP. Copyright © 2016 Elsevier Inc. All rights reserved.
Metabolic Profiling of Adiponectin Levels in Adults: Mendelian Randomization Analysis.
Borges, Maria Carolina; Barros, Aluísio J D; Ferreira, Diana L Santos; Casas, Juan Pablo; Horta, Bernardo Lessa; Kivimaki, Mika; Kumari, Meena; Menon, Usha; Gaunt, Tom R; Ben-Shlomo, Yoav; Freitas, Deise F; Oliveira, Isabel O; Gentry-Maharaj, Aleksandra; Fourkala, Evangelia; Lawlor, Debbie A; Hingorani, Aroon D
2017-12-01
Adiponectin, a circulating adipocyte-derived protein, has insulin-sensitizing, anti-inflammatory, antiatherogenic, and cardiomyocyte-protective properties in animal models. However, the systemic effects of adiponectin in humans are unknown. Our aims were to define the metabolic profile associated with higher blood adiponectin concentration and investigate whether variation in adiponectin concentration affects the systemic metabolic profile. We applied multivariable regression in ≤5909 adults and Mendelian randomization (using cis -acting genetic variants in the vicinity of the adiponectin gene as instrumental variables) for analyzing the causal effect of adiponectin in the metabolic profile of ≤37 545 adults. Participants were largely European from 6 longitudinal studies and 1 genome-wide association consortium. In the multivariable regression analyses, higher circulating adiponectin was associated with higher high-density lipoprotein lipids and lower very-low-density lipoprotein lipids, glucose levels, branched-chain amino acids, and inflammatory markers. However, these findings were not supported by Mendelian randomization analyses for most metabolites. Findings were consistent between sexes and after excluding high-risk groups (defined by age and occurrence of previous cardiovascular event) and 1 study with admixed population. Our findings indicate that blood adiponectin concentration is more likely to be an epiphenomenon in the context of metabolic disease than a key determinant. © 2017 The Authors.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yahya, Noorazrul, E-mail: noorazrul.yahya@research.uwa.edu.au; Ebert, Martin A.; Bulsara, Max
Purpose: Given the paucity of available data concerning radiotherapy-induced urinary toxicity, it is important to ensure derivation of the most robust models with superior predictive performance. This work explores multiple statistical-learning strategies for prediction of urinary symptoms following external beam radiotherapy of the prostate. Methods: The performance of logistic regression, elastic-net, support-vector machine, random forest, neural network, and multivariate adaptive regression splines (MARS) to predict urinary symptoms was analyzed using data from 754 participants accrued by TROG03.04-RADAR. Predictive features included dose-surface data, comorbidities, and medication-intake. Four symptoms were analyzed: dysuria, haematuria, incontinence, and frequency, each with three definitions (grade ≥more » 1, grade ≥ 2 and longitudinal) with event rate between 2.3% and 76.1%. Repeated cross-validations producing matched models were implemented. A synthetic minority oversampling technique was utilized in endpoints with rare events. Parameter optimization was performed on the training data. Area under the receiver operating characteristic curve (AUROC) was used to compare performance using sample size to detect differences of ≥0.05 at the 95% confidence level. Results: Logistic regression, elastic-net, random forest, MARS, and support-vector machine were the highest-performing statistical-learning strategies in 3, 3, 3, 2, and 1 endpoints, respectively. Logistic regression, MARS, elastic-net, random forest, neural network, and support-vector machine were the best, or were not significantly worse than the best, in 7, 7, 5, 5, 3, and 1 endpoints. The best-performing statistical model was for dysuria grade ≥ 1 with AUROC ± standard deviation of 0.649 ± 0.074 using MARS. For longitudinal frequency and dysuria grade ≥ 1, all strategies produced AUROC>0.6 while all haematuria endpoints and longitudinal incontinence models produced AUROC<0.6. Conclusions: Logistic regression and MARS were most likely to be the best-performing strategy for the prediction of urinary symptoms with elastic-net and random forest producing competitive results. The predictive power of the models was modest and endpoint-dependent. New features, including spatial dose maps, may be necessary to achieve better models.« less
Bartholdy, Cecilie; Juhl, Carsten; Christensen, Robin; Lund, Hans; Zhang, Weiya; Henriksen, Marius
2017-08-01
To analyze if exercise interventions for patients with knee osteoarthritis (OA) following the American College of Sports Medicine (ACSM) definition of muscle strength training differs from other types of exercise, and to analyze associations between changes in muscle strength, pain, and disability. A systematic search in 5 electronic databases was performed to identify randomized controlled trials comparing exercise interventions with no intervention in knee OA, and reporting changes in muscle strength and in pain or disability assessed as standardized mean differences (SMD) with 95% confidence intervals (95% CI). Interventions were categorized as ACSM interventions or not-ACSM interventions and compared using stratified random effects meta-analysis models. Associations between knee extensor strength gain and changes in pain/disability were assessed using meta-regression analyses. The 45 eligible trials with 4699 participants and 56 comparisons (22 ACSM interventions) were included in this analysis. A statistically significant difference favoring the ACSM interventions with respect to knee extensor strength was found [SMD difference: 0.448 (95% CI: 0.091-0.805)]. No differences were observed regarding effects on pain and disability. The meta-regressions indicated that increases in knee extensor strength of 30-40% would be necessary for a likely concomitant beneficial effect on pain and disability, respectively. Exercise interventions following the ACSM criteria for strength training provide superior outcomes in knee extensor strength but not in pain or disability. An increase of less than 30% in knee extensor strength is not likely to be clinically beneficial in terms of changes in pain and disability (PROSPERO: CRD42014015344). Copyright © 2017 Elsevier Inc. All rights reserved.
Salas, Eric Ariel L; Valdez, Raul; Michel, Stefan
2017-11-01
We modeled summer and winter habitat suitability of Marco Polo argali in the Pamir Mountains in southeastern Tajikistan using these statistical algorithms: Generalized Linear Model, Random Forest, Boosted Regression Tree, Maxent, and Multivariate Adaptive Regression Splines. Using sheep occurrence data collected from 2009 to 2015 and a set of selected habitat predictors, we produced summer and winter habitat suitability maps and determined the important habitat suitability predictors for both seasons. Our results demonstrated that argali selected proximity to riparian areas and greenness as the two most relevant variables for summer, and the degree of slope (gentler slopes between 0° to 20°) and Landsat temperature band for winter. The terrain roughness was also among the most important variables in summer and winter models. Aspect was only significant for winter habitat, with argali preferring south-facing mountain slopes. We evaluated various measures of model performance such as the Area Under the Curve (AUC) and the True Skill Statistic (TSS). Comparing the five algorithms, the AUC scored highest for Boosted Regression Tree in summer (AUC = 0.94) and winter model runs (AUC = 0.94). In contrast, Random Forest underperformed in both model runs.
Misspecification of Cox regression models with composite endpoints
Wu, Longyang; Cook, Richard J
2012-01-01
Researchers routinely adopt composite endpoints in multicenter randomized trials designed to evaluate the effect of experimental interventions in cardiovascular disease, diabetes, and cancer. Despite their widespread use, relatively little attention has been paid to the statistical properties of estimators of treatment effect based on composite endpoints. We consider this here in the context of multivariate models for time to event data in which copula functions link marginal distributions with a proportional hazards structure. We then examine the asymptotic and empirical properties of the estimator of treatment effect arising from a Cox regression model for the time to the first event. We point out that even when the treatment effect is the same for the component events, the limiting value of the estimator based on the composite endpoint is usually inconsistent for this common value. We find that in this context the limiting value is determined by the degree of association between the events, the stochastic ordering of events, and the censoring distribution. Within the framework adopted, marginal methods for the analysis of multivariate failure time data yield consistent estimators of treatment effect and are therefore preferred. We illustrate the methods by application to a recent asthma study. Copyright © 2012 John Wiley & Sons, Ltd. PMID:22736519
Pu, Jie; Fang, Di; Wilson, Jeffrey R
2017-02-03
The analysis of correlated binary data is commonly addressed through the use of conditional models with random effects included in the systematic component as opposed to generalized estimating equations (GEE) models that addressed the random component. Since the joint distribution of the observations is usually unknown, the conditional distribution is a natural approach. Our objective was to compare the fit of different binary models for correlated data in Tabaco use. We advocate that the joint modeling of the mean and dispersion may be at times just as adequate. We assessed the ability of these models to account for the intraclass correlation. In so doing, we concentrated on fitting logistic regression models to address smoking behaviors. Frequentist and Bayes' hierarchical models were used to predict conditional probabilities, and the joint modeling (GLM and GAM) models were used to predict marginal probabilities. These models were fitted to National Longitudinal Study of Adolescent to Adult Health (Add Health) data for Tabaco use. We found that people were less likely to smoke if they had higher income, high school or higher education and religious. Individuals were more likely to smoke if they had abused drug or alcohol, spent more time on TV and video games, and been arrested. Moreover, individuals who drank alcohol early in life were more likely to be a regular smoker. Children who experienced mistreatment from their parents were more likely to use Tabaco regularly. The joint modeling of the mean and dispersion models offered a flexible and meaningful method of addressing the intraclass correlation. They do not require one to identify random effects nor distinguish from one level of the hierarchy to the other. Moreover, once one can identify the significant random effects, one can obtain similar results to the random coefficient models. We found that the set of marginal models accounting for extravariation through the additional dispersion submodel produced similar results with regards to inferences and predictions. Moreover, both marginal and conditional models demonstrated similar predictive power.
[Theory, method and application of method R on estimation of (co)variance components].
Liu, Wen-Zhong
2004-07-01
Theory, method and application of Method R on estimation of (co)variance components were reviewed in order to make the method be reasonably used. Estimation requires R values,which are regressions of predicted random effects that are calculated using complete dataset on predicted random effects that are calculated using random subsets of the same data. By using multivariate iteration algorithm based on a transformation matrix,and combining with the preconditioned conjugate gradient to solve the mixed model equations, the computation efficiency of Method R is much improved. Method R is computationally inexpensive,and the sampling errors and approximate credible intervals of estimates can be obtained. Disadvantages of Method R include a larger sampling variance than other methods for the same data,and biased estimates in small datasets. As an alternative method, Method R can be used in larger datasets. It is necessary to study its theoretical properties and broaden its application range further.
Not all that glitters is RMT in the forecasting of risk of portfolios in the Brazilian stock market
NASA Astrophysics Data System (ADS)
Sandoval, Leonidas; Bortoluzzo, Adriana Bruscato; Venezuela, Maria Kelly
2014-09-01
Using stocks of the Brazilian stock exchange (BM&F-Bovespa), we build portfolios of stocks based on Markowitz's theory and test the predicted and realized risks. This is done using the correlation matrices between stocks, and also using Random Matrix Theory in order to clean such correlation matrices from noise. We also calculate correlation matrices using a regression model in order to remove the effect of common market movements and their cleaned versions using Random Matrix Theory. This is done for years of both low and high volatility of the Brazilian stock market, from 2004 to 2012. The results show that the use of regression to subtract the market effect on returns greatly increases the accuracy of the prediction of risk, and that, although the cleaning of the correlation matrix often leads to portfolios that better predict risks, in periods of high volatility of the market this procedure may fail to do so. The results may be used in the assessment of the true risks when one builds a portfolio of stocks during periods of crisis.
NASA Technical Reports Server (NTRS)
Boyce, Lola; Bast, Callie C.
1992-01-01
The research included ongoing development of methodology that provides probabilistic lifetime strength of aerospace materials via computational simulation. A probabilistic material strength degradation model, in the form of a randomized multifactor interaction equation, is postulated for strength degradation of structural components of aerospace propulsion systems subjected to a number of effects or primative variables. These primative variable may include high temperature, fatigue or creep. In most cases, strength is reduced as a result of the action of a variable. This multifactor interaction strength degradation equation has been randomized and is included in the computer program, PROMISS. Also included in the research is the development of methodology to calibrate the above described constitutive equation using actual experimental materials data together with linear regression of that data, thereby predicting values for the empirical material constraints for each effect or primative variable. This regression methodology is included in the computer program, PROMISC. Actual experimental materials data were obtained from the open literature for materials typically of interest to those studying aerospace propulsion system components. Material data for Inconel 718 was analyzed using the developed methodology.
Qiu, Lefeng; Wang, Kai; Long, Wenli; Wang, Ke; Hu, Wei; Amable, Gabriel S.
2016-01-01
Soil cadmium (Cd) contamination has attracted a great deal of attention because of its detrimental effects on animals and humans. This study aimed to develop and compare the performances of stepwise linear regression (SLR), classification and regression tree (CART) and random forest (RF) models in the prediction and mapping of the spatial distribution of soil Cd and to identify likely sources of Cd accumulation in Fuyang County, eastern China. Soil Cd data from 276 topsoil (0–20 cm) samples were collected and randomly divided into calibration (222 samples) and validation datasets (54 samples). Auxiliary data, including detailed land use information, soil organic matter, soil pH, and topographic data, were incorporated into the models to simulate the soil Cd concentrations and further identify the main factors influencing soil Cd variation. The predictive models for soil Cd concentration exhibited acceptable overall accuracies (72.22% for SLR, 70.37% for CART, and 75.93% for RF). The SLR model exhibited the largest predicted deviation, with a mean error (ME) of 0.074 mg/kg, a mean absolute error (MAE) of 0.160 mg/kg, and a root mean squared error (RMSE) of 0.274 mg/kg, and the RF model produced the results closest to the observed values, with an ME of 0.002 mg/kg, an MAE of 0.132 mg/kg, and an RMSE of 0.198 mg/kg. The RF model also exhibited the greatest R2 value (0.772). The CART model predictions closely followed, with ME, MAE, RMSE, and R2 values of 0.013 mg/kg, 0.154 mg/kg, 0.230 mg/kg and 0.644, respectively. The three prediction maps generally exhibited similar and realistic spatial patterns of soil Cd contamination. The heavily Cd-affected areas were primarily located in the alluvial valley plain of the Fuchun River and its tributaries because of the dramatic industrialization and urbanization processes that have occurred there. The most important variable for explaining high levels of soil Cd accumulation was the presence of metal smelting industries. The good performance of the RF model was attributable to its ability to handle the non-linear and hierarchical relationships between soil Cd and environmental variables. These results confirm that the RF approach is promising for the prediction and spatial distribution mapping of soil Cd at the regional scale. PMID:26964095
Qiu, Lefeng; Wang, Kai; Long, Wenli; Wang, Ke; Hu, Wei; Amable, Gabriel S
2016-01-01
Soil cadmium (Cd) contamination has attracted a great deal of attention because of its detrimental effects on animals and humans. This study aimed to develop and compare the performances of stepwise linear regression (SLR), classification and regression tree (CART) and random forest (RF) models in the prediction and mapping of the spatial distribution of soil Cd and to identify likely sources of Cd accumulation in Fuyang County, eastern China. Soil Cd data from 276 topsoil (0-20 cm) samples were collected and randomly divided into calibration (222 samples) and validation datasets (54 samples). Auxiliary data, including detailed land use information, soil organic matter, soil pH, and topographic data, were incorporated into the models to simulate the soil Cd concentrations and further identify the main factors influencing soil Cd variation. The predictive models for soil Cd concentration exhibited acceptable overall accuracies (72.22% for SLR, 70.37% for CART, and 75.93% for RF). The SLR model exhibited the largest predicted deviation, with a mean error (ME) of 0.074 mg/kg, a mean absolute error (MAE) of 0.160 mg/kg, and a root mean squared error (RMSE) of 0.274 mg/kg, and the RF model produced the results closest to the observed values, with an ME of 0.002 mg/kg, an MAE of 0.132 mg/kg, and an RMSE of 0.198 mg/kg. The RF model also exhibited the greatest R2 value (0.772). The CART model predictions closely followed, with ME, MAE, RMSE, and R2 values of 0.013 mg/kg, 0.154 mg/kg, 0.230 mg/kg and 0.644, respectively. The three prediction maps generally exhibited similar and realistic spatial patterns of soil Cd contamination. The heavily Cd-affected areas were primarily located in the alluvial valley plain of the Fuchun River and its tributaries because of the dramatic industrialization and urbanization processes that have occurred there. The most important variable for explaining high levels of soil Cd accumulation was the presence of metal smelting industries. The good performance of the RF model was attributable to its ability to handle the non-linear and hierarchical relationships between soil Cd and environmental variables. These results confirm that the RF approach is promising for the prediction and spatial distribution mapping of soil Cd at the regional scale.
Ali, S. M.; Mehmood, C. A; Khan, B.; Jawad, M.; Farid, U; Jadoon, J. K.; Ali, M.; Tareen, N. K.; Usman, S.; Majid, M.; Anwar, S. M.
2016-01-01
In smart grid paradigm, the consumer demands are random and time-dependent, owning towards stochastic probabilities. The stochastically varying consumer demands have put the policy makers and supplying agencies in a demanding position for optimal generation management. The utility revenue functions are highly dependent on the consumer deterministic stochastic demand models. The sudden drifts in weather parameters effects the living standards of the consumers that in turn influence the power demands. Considering above, we analyzed stochastically and statistically the effect of random consumer demands on the fixed and variable revenues of the electrical utilities. Our work presented the Multi-Variate Gaussian Distribution Function (MVGDF) probabilistic model of the utility revenues with time-dependent consumer random demands. Moreover, the Gaussian probabilities outcome of the utility revenues is based on the varying consumer n demands data-pattern. Furthermore, Standard Monte Carlo (SMC) simulations are performed that validated the factor of accuracy in the aforesaid probabilistic demand-revenue model. We critically analyzed the effect of weather data parameters on consumer demands using correlation and multi-linear regression schemes. The statistical analysis of consumer demands provided a relationship between dependent (demand) and independent variables (weather data) for utility load management, generation control, and network expansion. PMID:27314229
Ali, S M; Mehmood, C A; Khan, B; Jawad, M; Farid, U; Jadoon, J K; Ali, M; Tareen, N K; Usman, S; Majid, M; Anwar, S M
2016-01-01
In smart grid paradigm, the consumer demands are random and time-dependent, owning towards stochastic probabilities. The stochastically varying consumer demands have put the policy makers and supplying agencies in a demanding position for optimal generation management. The utility revenue functions are highly dependent on the consumer deterministic stochastic demand models. The sudden drifts in weather parameters effects the living standards of the consumers that in turn influence the power demands. Considering above, we analyzed stochastically and statistically the effect of random consumer demands on the fixed and variable revenues of the electrical utilities. Our work presented the Multi-Variate Gaussian Distribution Function (MVGDF) probabilistic model of the utility revenues with time-dependent consumer random demands. Moreover, the Gaussian probabilities outcome of the utility revenues is based on the varying consumer n demands data-pattern. Furthermore, Standard Monte Carlo (SMC) simulations are performed that validated the factor of accuracy in the aforesaid probabilistic demand-revenue model. We critically analyzed the effect of weather data parameters on consumer demands using correlation and multi-linear regression schemes. The statistical analysis of consumer demands provided a relationship between dependent (demand) and independent variables (weather data) for utility load management, generation control, and network expansion.
Wu, Yiping; Zhao, Xiaohua; Chen, Chen; He, Jiayuan; Rong, Jian; Ma, Jianming
2016-10-01
In China, the Chevron alignment sign on highways is a vertical rectangle with a white arrow and border on a blue background, which differs from its counterpart in other countries. Moreover, little research has been devoted to the effectiveness of China's Chevron signs; there is still no practical method to quantitatively describe the impact of Chevron signs on driver performance in roadway curves. In this paper, a driving simulator experiment collected data on the driving performance of 30 young male drivers as they navigated on 29 different horizontal curves under different conditions (presence of Chevron signs, curve radius and curve direction). To address the heterogeneity issue in the data, three models were estimated and tested: a pooled data linear regression model, a fixed effects model, and a random effects model. According to the Hausman Test and Akaike Information Criterion (AIC), the random effects model offers the best fit. The current study explores the relationship between driver performance (i.e., vehicle speed and lane position) and horizontal curves with respect to the horizontal curvature, presence of Chevron signs, and curve direction. This study lays a foundation for developing procedures and guidelines that would allow more uniform and efficient deployment of Chevron signs on China's highways. Copyright © 2016 Elsevier Ltd. All rights reserved.
Simple to complex modeling of breathing volume using a motion sensor.
John, Dinesh; Staudenmayer, John; Freedson, Patty
2013-06-01
To compare simple and complex modeling techniques to estimate categories of low, medium, and high ventilation (VE) from ActiGraph™ activity counts. Vertical axis ActiGraph™ GT1M activity counts, oxygen consumption and VE were measured during treadmill walking and running, sports, household chores and labor-intensive employment activities. Categories of low (<19.3 l/min), medium (19.3 to 35.4 l/min) and high (>35.4 l/min) VEs were derived from activity intensity classifications (light <2.9 METs, moderate 3.0 to 5.9 METs and vigorous >6.0 METs). We examined the accuracy of two simple techniques (multiple regression and activity count cut-point analyses) and one complex (random forest technique) modeling technique in predicting VE from activity counts. Prediction accuracy of the complex random forest technique was marginally better than the simple multiple regression method. Both techniques accurately predicted VE categories almost 80% of the time. The multiple regression and random forest techniques were more accurate (85 to 88%) in predicting medium VE. Both techniques predicted the high VE (70 to 73%) with greater accuracy than low VE (57 to 60%). Actigraph™ cut-points for light, medium and high VEs were <1381, 1381 to 3660 and >3660 cpm. There were minor differences in prediction accuracy between the multiple regression and the random forest technique. This study provides methods to objectively estimate VE categories using activity monitors that can easily be deployed in the field. Objective estimates of VE should provide a better understanding of the dose-response relationship between internal exposure to pollutants and disease. Copyright © 2013 Elsevier B.V. All rights reserved.
Ramis, Rebeca; Vidal, Enrique; García-Pérez, Javier; Lope, Virginia; Aragonés, Nuria; Pérez-Gómez, Beatriz; Pollán, Marina; López-Abente, Gonzalo
2009-01-01
Background Non-Hodgkin's lymphomas (NHLs) have been linked to proximity to industrial areas, but evidence regarding the health risk posed by residence near pollutant industries is very limited. The European Pollutant Emission Register (EPER) is a public register that furnishes valuable information on industries that release pollutants to air and water, along with their geographical location. This study sought to explore the relationship between NHL mortality in small areas in Spain and environmental exposure to pollutant emissions from EPER-registered industries, using three Poisson-regression-based mathematical models. Methods Observed cases were drawn from mortality registries in Spain for the period 1994–2003. Industries were grouped into the following sectors: energy; metal; mineral; organic chemicals; waste; paper; food; and use of solvents. Populations having an industry within a radius of 1, 1.5, or 2 kilometres from the municipal centroid were deemed to be exposed. Municipalities outside those radii were considered as reference populations. The relative risks (RRs) associated with proximity to pollutant industries were estimated using the following methods: Poisson Regression; mixed Poisson model with random provincial effect; and spatial autoregressive modelling (BYM model). Results Only proximity of paper industries to population centres (>2 km) could be associated with a greater risk of NHL mortality (mixed model: RR:1.24, 95% CI:1.09–1.42; BYM model: RR:1.21, 95% CI:1.01–1.45; Poisson model: RR:1.16, 95% CI:1.06–1.27). Spatial models yielded higher estimates. Conclusion The reported association between exposure to air pollution from the paper, pulp and board industry and NHL mortality is independent of the model used. Inclusion of spatial random effects terms in the risk estimate improves the study of associations between environmental exposures and mortality. The EPER could be of great utility when studying the effects of industrial pollution on the health of the population. PMID:19159450
Analysis of Machine Learning Techniques for Heart Failure Readmissions.
Mortazavi, Bobak J; Downing, Nicholas S; Bucholz, Emily M; Dharmarajan, Kumar; Manhapra, Ajay; Li, Shu-Xia; Negahban, Sahand N; Krumholz, Harlan M
2016-11-01
The current ability to predict readmissions in patients with heart failure is modest at best. It is unclear whether machine learning techniques that address higher dimensional, nonlinear relationships among variables would enhance prediction. We sought to compare the effectiveness of several machine learning algorithms for predicting readmissions. Using data from the Telemonitoring to Improve Heart Failure Outcomes trial, we compared the effectiveness of random forests, boosting, random forests combined hierarchically with support vector machines or logistic regression (LR), and Poisson regression against traditional LR to predict 30- and 180-day all-cause readmissions and readmissions because of heart failure. We randomly selected 50% of patients for a derivation set, and a validation set comprised the remaining patients, validated using 100 bootstrapped iterations. We compared C statistics for discrimination and distributions of observed outcomes in risk deciles for predictive range. In 30-day all-cause readmission prediction, the best performing machine learning model, random forests, provided a 17.8% improvement over LR (mean C statistics, 0.628 and 0.533, respectively). For readmissions because of heart failure, boosting improved the C statistic by 24.9% over LR (mean C statistic 0.678 and 0.543, respectively). For 30-day all-cause readmission, the observed readmission rates in the lowest and highest deciles of predicted risk with random forests (7.8% and 26.2%, respectively) showed a much wider separation than LR (14.2% and 16.4%, respectively). Machine learning methods improved the prediction of readmission after hospitalization for heart failure compared with LR and provided the greatest predictive range in observed readmission rates. © 2016 American Heart Association, Inc.
NASA Technical Reports Server (NTRS)
Duda, David P.; Minnis, Patrick
2009-01-01
Straightforward application of the Schmidt-Appleman contrail formation criteria to diagnose persistent contrail occurrence from numerical weather prediction data is hindered by significant bias errors in the upper tropospheric humidity. Logistic models of contrail occurrence have been proposed to overcome this problem, but basic questions remain about how random measurement error may affect their accuracy. A set of 5000 synthetic contrail observations is created to study the effects of random error in these probabilistic models. The simulated observations are based on distributions of temperature, humidity, and vertical velocity derived from Advanced Regional Prediction System (ARPS) weather analyses. The logistic models created from the simulated observations were evaluated using two common statistical measures of model accuracy, the percent correct (PC) and the Hanssen-Kuipers discriminant (HKD). To convert the probabilistic results of the logistic models into a dichotomous yes/no choice suitable for the statistical measures, two critical probability thresholds are considered. The HKD scores are higher when the climatological frequency of contrail occurrence is used as the critical threshold, while the PC scores are higher when the critical probability threshold is 0.5. For both thresholds, typical random errors in temperature, relative humidity, and vertical velocity are found to be small enough to allow for accurate logistic models of contrail occurrence. The accuracy of the models developed from synthetic data is over 85 percent for both the prediction of contrail occurrence and non-occurrence, although in practice, larger errors would be anticipated.
Irvine, Kathryn M.; Thornton, Jamie; Backus, Vickie M.; Hohmann, Matthew G.; Lehnhoff, Erik A.; Maxwell, Bruce D.; Michels, Kurt; Rew, Lisa
2013-01-01
Commonly in environmental and ecological studies, species distribution data are recorded as presence or absence throughout a spatial domain of interest. Field based studies typically collect observations by sampling a subset of the spatial domain. We consider the effects of six different adaptive and two non-adaptive sampling designs and choice of three binary models on both predictions to unsampled locations and parameter estimation of the regression coefficients (species–environment relationships). Our simulation study is unique compared to others to date in that we virtually sample a true known spatial distribution of a nonindigenous plant species, Bromus inermis. The census of B. inermis provides a good example of a species distribution that is both sparsely (1.9 % prevalence) and patchily distributed. We find that modeling the spatial correlation using a random effect with an intrinsic Gaussian conditionally autoregressive prior distribution was equivalent or superior to Bayesian autologistic regression in terms of predicting to un-sampled areas when strip adaptive cluster sampling was used to survey B. inermis. However, inferences about the relationships between B. inermis presence and environmental predictors differed between the two spatial binary models. The strip adaptive cluster designs we investigate provided a significant advantage in terms of Markov chain Monte Carlo chain convergence when trying to model a sparsely distributed species across a large area. In general, there was little difference in the choice of neighborhood, although the adaptive king was preferred when transects were randomly placed throughout the spatial domain.
Fathers’ Participation in Parenting and Maternal Parenting Stress: Variation by Relationship Status
Nomaguchi, Kei; Brown, Susan L.; Leyman, Tanya M.
2015-01-01
The growing diversity in mother-father relationship status has led to a debate over the role of fathers in parenting. Little is known, however, about how fathers’ participation in parenting is linked to maternal well-being across different mother-father relationship statuses. Using data from the Fragile Families and Child Wellbeing Study (N = 2,062), fixed-effects as well as random-effects regression models show that overall fathers’ engagement with children and sharing in child-related chores are negatively related to maternal parenting stress. Fathers’ cooperative coparenting is negatively related to maternal parenting stress only in the random-effects model, suggesting that the association is driven by selection factors. There is little variation in these associations by mother-father relationship status, once selection factors are controlled for. These findings extend support for the current cultural emphasis on benefits of fathers’ active participation in parenting for mothers and children even after the mother-father relationship dissolved. PMID:28479649
Xiao, Yongling; Abrahamowicz, Michal
2010-03-30
We propose two bootstrap-based methods to correct the standard errors (SEs) from Cox's model for within-cluster correlation of right-censored event times. The cluster-bootstrap method resamples, with replacement, only the clusters, whereas the two-step bootstrap method resamples (i) the clusters, and (ii) individuals within each selected cluster, with replacement. In simulations, we evaluate both methods and compare them with the existing robust variance estimator and the shared gamma frailty model, which are available in statistical software packages. We simulate clustered event time data, with latent cluster-level random effects, which are ignored in the conventional Cox's model. For cluster-level covariates, both proposed bootstrap methods yield accurate SEs, and type I error rates, and acceptable coverage rates, regardless of the true random effects distribution, and avoid serious variance under-estimation by conventional Cox-based standard errors. However, the two-step bootstrap method over-estimates the variance for individual-level covariates. We also apply the proposed bootstrap methods to obtain confidence bands around flexible estimates of time-dependent effects in a real-life analysis of cluster event times.
Kheirabadi, Khabat; Rashidi, Amir; Alijani, Sadegh; Imumorin, Ikhide
2014-11-01
We compared the goodness of fit of three mathematical functions (including: Legendre polynomials, Lidauer-Mäntysaari function and Wilmink function) for describing the lactation curve of primiparous Iranian Holstein cows by using multiple-trait random regression models (MT-RRM). Lactational submodels provided the largest daily additive genetic (AG) and permanent environmental (PE) variance estimates at the end and at the onset of lactation, respectively, as well as low genetic correlations between peripheral test-day records. For all models, heritability estimates were highest at the end of lactation (245 to 305 days) and ranged from 0.05 to 0.26, 0.03 to 0.12 and 0.04 to 0.24 for milk, fat and protein yields, respectively. Generally, the genetic correlations between traits depend on how far apart they are or whether they are on the same day in any two traits. On average, genetic correlations between milk and fat were the lowest and those between fat and protein were intermediate, while those between milk and protein were the highest. Results from all criteria (Akaike's and Schwarz's Bayesian information criterion, and -2*logarithm of the likelihood function) suggested that a model with 2 and 5 coefficients of Legendre polynomials for AG and PE effects, respectively, was the most adequate for fitting the data. © 2014 Japanese Society of Animal Science.
Ebrahimi, Hossein; Sadeghi, Mahdi; Amanpour, Farzaneh; Vahedi, Hamid
2016-04-01
Diabetes education is a major subject in achieving optimal glycemic control. Effective empowerment approach can be beneficial for improving patients' health. The aim of this study was to evaluate the effect of empowerment model on indicators of metabolic control in patients with type 2 diabetes. a randomized controlled trial of 103 patients with type 2 diabetes were randomly assigned to either the intervention (empowerment approach training) or the control group (conventional training) 2014. Empowerment approach training were performed for the experimental group for eight weeks. Data collection tool included demographic information form and indicators of metabolic control checklist. Analysis was performed by one-way analysis of variance, chi-square test, paired t-test, independent t-test and multiple linear regression. Before the intervention, two groups were homogeneous in terms of demographic variables, glycosylated hemoglobin (HbA1C), and other indicators of metabolic control. After the intervention, average HbA1C and other metabolic indicators except for LDL showed significant differences in the experimental group compared to the control group. study results indicated the positive effects of applying the empowerment model on the metabolic control indicators. Therefore, applying this model is recommended to nurses and the relevant authorities in order to improve clinical outcomes in diabetic patients. Copyright © 2015 Primary Care Diabetes Europe. Published by Elsevier Ltd. All rights reserved.
Cho, C. I.; Alam, M.; Choi, T. J.; Choy, Y. H.; Choi, J. G.; Lee, S. S.; Cho, K. H.
2016-01-01
The objectives of the study were to estimate genetic parameters for milk production traits of Holstein cattle using random regression models (RRMs), and to compare the goodness of fit of various RRMs with homogeneous and heterogeneous residual variances. A total of 126,980 test-day milk production records of the first parity Holstein cows between 2007 and 2014 from the Dairy Cattle Improvement Center of National Agricultural Cooperative Federation in South Korea were used. These records included milk yield (MILK), fat yield (FAT), protein yield (PROT), and solids-not-fat yield (SNF). The statistical models included random effects of genetic and permanent environments using Legendre polynomials (LP) of the third to fifth order (L3–L5), fixed effects of herd-test day, year-season at calving, and a fixed regression for the test-day record (third to fifth order). The residual variances in the models were either homogeneous (HOM) or heterogeneous (15 classes, HET15; 60 classes, HET60). A total of nine models (3 orders of polynomials×3 types of residual variance) including L3-HOM, L3-HET15, L3-HET60, L4-HOM, L4-HET15, L4-HET60, L5-HOM, L5-HET15, and L5-HET60 were compared using Akaike information criteria (AIC) and/or Schwarz Bayesian information criteria (BIC) statistics to identify the model(s) of best fit for their respective traits. The lowest BIC value was observed for the models L5-HET15 (MILK; PROT; SNF) and L4-HET15 (FAT), which fit the best. In general, the BIC values of HET15 models for a particular polynomial order was lower than that of the HET60 model in most cases. This implies that the orders of LP and types of residual variances affect the goodness of models. Also, the heterogeneity of residual variances should be considered for the test-day analysis. The heritability estimates of from the best fitted models ranged from 0.08 to 0.15 for MILK, 0.06 to 0.14 for FAT, 0.08 to 0.12 for PROT, and 0.07 to 0.13 for SNF according to days in milk of first lactation. Genetic variances for studied traits tended to decrease during the earlier stages of lactation, which were followed by increases in the middle and decreases further at the end of lactation. With regards to the fitness of the models and the differential genetic parameters across the lactation stages, we could estimate genetic parameters more accurately from RRMs than from lactation models. Therefore, we suggest using RRMs in place of lactation models to make national dairy cattle genetic evaluations for milk production traits in Korea. PMID:26954184
Cho, C I; Alam, M; Choi, T J; Choy, Y H; Choi, J G; Lee, S S; Cho, K H
2016-05-01
The objectives of the study were to estimate genetic parameters for milk production traits of Holstein cattle using random regression models (RRMs), and to compare the goodness of fit of various RRMs with homogeneous and heterogeneous residual variances. A total of 126,980 test-day milk production records of the first parity Holstein cows between 2007 and 2014 from the Dairy Cattle Improvement Center of National Agricultural Cooperative Federation in South Korea were used. These records included milk yield (MILK), fat yield (FAT), protein yield (PROT), and solids-not-fat yield (SNF). The statistical models included random effects of genetic and permanent environments using Legendre polynomials (LP) of the third to fifth order (L3-L5), fixed effects of herd-test day, year-season at calving, and a fixed regression for the test-day record (third to fifth order). The residual variances in the models were either homogeneous (HOM) or heterogeneous (15 classes, HET15; 60 classes, HET60). A total of nine models (3 orders of polynomials×3 types of residual variance) including L3-HOM, L3-HET15, L3-HET60, L4-HOM, L4-HET15, L4-HET60, L5-HOM, L5-HET15, and L5-HET60 were compared using Akaike information criteria (AIC) and/or Schwarz Bayesian information criteria (BIC) statistics to identify the model(s) of best fit for their respective traits. The lowest BIC value was observed for the models L5-HET15 (MILK; PROT; SNF) and L4-HET15 (FAT), which fit the best. In general, the BIC values of HET15 models for a particular polynomial order was lower than that of the HET60 model in most cases. This implies that the orders of LP and types of residual variances affect the goodness of models. Also, the heterogeneity of residual variances should be considered for the test-day analysis. The heritability estimates of from the best fitted models ranged from 0.08 to 0.15 for MILK, 0.06 to 0.14 for FAT, 0.08 to 0.12 for PROT, and 0.07 to 0.13 for SNF according to days in milk of first lactation. Genetic variances for studied traits tended to decrease during the earlier stages of lactation, which were followed by increases in the middle and decreases further at the end of lactation. With regards to the fitness of the models and the differential genetic parameters across the lactation stages, we could estimate genetic parameters more accurately from RRMs than from lactation models. Therefore, we suggest using RRMs in place of lactation models to make national dairy cattle genetic evaluations for milk production traits in Korea.
NASA Astrophysics Data System (ADS)
Dokuchaev, P. M.; Meshalkina, J. L.; Yaroslavtsev, A. M.
2018-01-01
Comparative analysis of soils geospatial modeling using multinomial logistic regression, decision trees, random forest, regression trees and support vector machines algorithms was conducted. The visual interpretation of the digital maps obtained and their comparison with the existing map, as well as the quantitative assessment of the individual soil groups detection overall accuracy and of the models kappa showed that multiple logistic regression, support vector method, and random forest models application with spatial prediction of the conditional soil groups distribution can be reliably used for mapping of the study area. It has shown the most accurate detection for sod-podzolics soils (Phaeozems Albic) lightly eroded and moderately eroded soils. In second place, according to the mean overall accuracy of the prediction, there are sod-podzolics soils - non-eroded and warp one, as well as sod-gley soils (Umbrisols Gleyic) and alluvial soils (Fluvisols Dystric, Umbric). Heavy eroded sod-podzolics and gray forest soils (Phaeozems Albic) were detected by methods of automatic classification worst of all.
Random-effects meta-analysis: the number of studies matters.
Guolo, Annamaria; Varin, Cristiano
2017-06-01
This paper investigates the impact of the number of studies on meta-analysis and meta-regression within the random-effects model framework. It is frequently neglected that inference in random-effects models requires a substantial number of studies included in meta-analysis to guarantee reliable conclusions. Several authors warn about the risk of inaccurate results of the traditional DerSimonian and Laird approach especially in the common case of meta-analysis involving a limited number of studies. This paper presents a selection of likelihood and non-likelihood methods for inference in meta-analysis proposed to overcome the limitations of the DerSimonian and Laird procedure, with a focus on the effect of the number of studies. The applicability and the performance of the methods are investigated in terms of Type I error rates and empirical power to detect effects, according to scenarios of practical interest. Simulation studies and applications to real meta-analyses highlight that it is not possible to identify an approach uniformly superior to alternatives. The overall recommendation is to avoid the DerSimonian and Laird method when the number of meta-analysis studies is modest and prefer a more comprehensive procedure that compares alternative inferential approaches. R code for meta-analysis according to all of the inferential methods examined in the paper is provided.
Crosby, Richard A; Mena, Leandro; Smith, Rachel Vickers
2018-06-01
The aim of this study is to determine, among young Black men who have sex with men (YBMSM), the 12-month efficacy of a single-session, clinic-based intervention promoting condom use to enhance sexual pleasure (purpose 1) and the use of condoms from the start-to-finish of anal sex (purpose 2). A pre-test, post-test randomized controlled trial was conducted, using a 12-month period of follow-up observation, in STI clinics. Data from 394 YBMSM completing baseline and 12-month follow-up assessments were analyzed. The experimental condition comprised a one-to-one, interactive program (Focus on the Future) designed for tailored delivery. Regarding study purpose 1, in an age-adjusted linear regression model for 277 HIV-uninfected men, there was a significant effect of the intervention (Beta=0.13, P =0.036) relative to more favorable sexual experiences when using condoms. Regarding study purpose 2, in an adjusted logistic regression model, for HIV-uninfected men, there was a significant effect of the intervention (AOR=0.54, P =0.048) relative to using condoms from start-to-finish of anal sex. Significant effects for HIV-infected men were not observed. A small, but non-significant, effect was observed relative to men's self-report of always using condoms. This single-session program may be a valuable counseling tool for use in conjunction with pre-exposure prophylaxis-related care for HIV-uninfected YBMSM.
Job strain and resting heart rate: a cross-sectional study in a Swedish random working sample.
Eriksson, Peter; Schiöler, Linus; Söderberg, Mia; Rosengren, Annika; Torén, Kjell
2016-03-05
Numerous studies have reported an association between stressing work conditions and cardiovascular disease. However, more evidence is needed, and the etiological mechanisms are unknown. Elevated resting heart rate has emerged as a possible risk factor for cardiovascular disease, but little is known about the relation to work-related stress. This study therefore investigated the association between job strain, job control, and job demands and resting heart rate. We conducted a cross-sectional survey of randomly selected men and women in Västra Götalandsregionen, Sweden (West county of Sweden) (n = 1552). Information about job strain, job demands, job control, heart rate and covariates was collected during the period 2001-2004 as part of the INTERGENE/ADONIX research project. Six different linear regression models were used with adjustments for gender, age, BMI, smoking, education, and physical activity in the fully adjusted model. Job strain was operationalized as the log-transformed ratio of job demands over job control in the statistical analyses. No associations were seen between resting heart rate and job demands. Job strain was associated with elevated resting heart rate in the unadjusted model (linear regression coefficient 1.26, 95 % CI 0.14 to 2.38), but not in any of the extended models. Low job control was associated with elevated resting heart rate after adjustments for gender, age, BMI, and smoking (linear regression coefficient -0.18, 95 % CI -0.30 to -0.02). However, there were no significant associations in the fully adjusted model. Low job control and job strain, but not job demands, were associated with elevated resting heart rate. However, the observed associations were modest and may be explained by confounding effects.
Effect of soy isoflavone supplementation on plasma lipoprotein(a) concentrations: A meta-analysis.
Simental-Mendía, Luis E; Gotto, Antonio M; Atkin, Stephen L; Banach, Maciej; Pirro, Matteo; Sahebkar, Amirhossein
Soy supplementation has been shown to reduce total and low-density lipoprotein cholesterol, while increasing high-density lipoprotein cholesterol. However, contradictory effects of soy isoflavone supplementation on lipoprotein(a) [Lp(a)] have been reported suggesting the need for a meta-analysis to be undertaken. The aim of the study was to investigate the impact of supplementation with soy isoflavones on plasma Lp(a) levels through a systematic review and meta-analysis of eligible randomized placebo-controlled trials. The search included PubMed-Medline, Scopus, ISI Web of Knowledge, and Google Scholar databases (by March 26, 2017), and quality of studies was evaluated according to Cochrane criteria. Quantitative data synthesis was performed using a random-effects model, with standardized mean difference and 95% confidence interval as summary statistics. Meta-regression and leave-one-out sensitivity analysis were performed to assess the modifiers of treatment response. Ten eligible studies comprising 11 treatment arms with 973 subjects were selected for the meta-analysis. Meta-analysis did not suggest any significant alteration of plasma Lp(a) levels after supplementation with soy isoflavones (standardized mean difference: 0.08, 95% confidence interval: -0.05, 0.20, P = .228). The effect size was robust in the leave-one-out sensitivity analysis. In meta-regression analysis, neither dose nor duration of supplementation with soy isoflavones was significantly associated with the effect size. This meta-analysis of the 10 available randomized placebo-controlled trials revealed no significant effect of soy isoflavones treatment on plasma Lp(a) concentrations. Copyright © 2017 National Lipid Association. Published by Elsevier Inc. All rights reserved.
DOT National Transportation Integrated Search
2016-09-01
We consider the problem of solving mixed random linear equations with k components. This is the noiseless setting of mixed linear regression. The goal is to estimate multiple linear models from mixed samples in the case where the labels (which sample...
E. Freeman; G. Moisen; J. Coulston; B. Wilson
2014-01-01
Random forests (RF) and stochastic gradient boosting (SGB), both involving an ensemble of classification and regression trees, are compared for modeling tree canopy cover for the 2011 National Land Cover Database (NLCD). The objectives of this study were twofold. First, sensitivity of RF and SGB to choices in tuning parameters was explored. Second, performance of the...
Hobbs, Brian P.; Sargent, Daniel J.; Carlin, Bradley P.
2014-01-01
Assessing between-study variability in the context of conventional random-effects meta-analysis is notoriously difficult when incorporating data from only a small number of historical studies. In order to borrow strength, historical and current data are often assumed to be fully homogeneous, but this can have drastic consequences for power and Type I error if the historical information is biased. In this paper, we propose empirical and fully Bayesian modifications of the commensurate prior model (Hobbs et al., 2011) extending Pocock (1976), and evaluate their frequentist and Bayesian properties for incorporating patient-level historical data using general and generalized linear mixed regression models. Our proposed commensurate prior models lead to preposterior admissible estimators that facilitate alternative bias-variance trade-offs than those offered by pre-existing methodologies for incorporating historical data from a small number of historical studies. We also provide a sample analysis of a colon cancer trial comparing time-to-disease progression using a Weibull regression model. PMID:24795786
A refined method for multivariate meta-analysis and meta-regression.
Jackson, Daniel; Riley, Richard D
2014-02-20
Making inferences about the average treatment effect using the random effects model for meta-analysis is problematic in the common situation where there is a small number of studies. This is because estimates of the between-study variance are not precise enough to accurately apply the conventional methods for testing and deriving a confidence interval for the average effect. We have found that a refined method for univariate meta-analysis, which applies a scaling factor to the estimated effects' standard error, provides more accurate inference. We explain how to extend this method to the multivariate scenario and show that our proposal for refined multivariate meta-analysis and meta-regression can provide more accurate inferences than the more conventional approach. We explain how our proposed approach can be implemented using standard output from multivariate meta-analysis software packages and apply our methodology to two real examples. Copyright © 2013 John Wiley & Sons, Ltd.
Modeling nitrate at domestic and public-supply well depths in the Central Valley, California
Nolan, Bernard T.; Gronberg, JoAnn M.; Faunt, Claudia C.; Eberts, Sandra M.; Belitz, Ken
2014-01-01
Aquifer vulnerability models were developed to map groundwater nitrate concentration at domestic and public-supply well depths in the Central Valley, California. We compared three modeling methods for ability to predict nitrate concentration >4 mg/L: logistic regression (LR), random forest classification (RFC), and random forest regression (RFR). All three models indicated processes of nitrogen fertilizer input at the land surface, transmission through coarse-textured, well-drained soils, and transport in the aquifer to the well screen. The total percent correct predictions were similar among the three models (69–82%), but RFR had greater sensitivity (84% for shallow wells and 51% for deep wells). The results suggest that RFR can better identify areas with high nitrate concentration but that LR and RFC may better describe bulk conditions in the aquifer. A unique aspect of the modeling approach was inclusion of outputs from previous, physically based hydrologic and textural models as predictor variables, which were important to the models. Vertical water fluxes in the aquifer and percent coarse material above the well screen were ranked moderately high-to-high in the RFR models, and the average vertical water flux during the irrigation season was highly significant (p < 0.0001) in logistic regression.
Naghibi, Seyed Amir; Pourghasemi, Hamid Reza; Dixon, Barnali
2016-01-01
Groundwater is considered one of the most valuable fresh water resources. The main objective of this study was to produce groundwater spring potential maps in the Koohrang Watershed, Chaharmahal-e-Bakhtiari Province, Iran, using three machine learning models: boosted regression tree (BRT), classification and regression tree (CART), and random forest (RF). Thirteen hydrological-geological-physiographical (HGP) factors that influence locations of springs were considered in this research. These factors include slope degree, slope aspect, altitude, topographic wetness index (TWI), slope length (LS), plan curvature, profile curvature, distance to rivers, distance to faults, lithology, land use, drainage density, and fault density. Subsequently, groundwater spring potential was modeled and mapped using CART, RF, and BRT algorithms. The predicted results from the three models were validated using the receiver operating characteristics curve (ROC). From 864 springs identified, 605 (≈70 %) locations were used for the spring potential mapping, while the remaining 259 (≈30 %) springs were used for the model validation. The area under the curve (AUC) for the BRT model was calculated as 0.8103 and for CART and RF the AUC were 0.7870 and 0.7119, respectively. Therefore, it was concluded that the BRT model produced the best prediction results while predicting locations of springs followed by CART and RF models, respectively. Geospatially integrated BRT, CART, and RF methods proved to be useful in generating the spring potential map (SPM) with reasonable accuracy.
On the Bias-Amplifying Effect of Near Instruments in Observational Studies
ERIC Educational Resources Information Center
Steiner, Peter M.; Kim, Yongnam
2014-01-01
In contrast to randomized experiments, the estimation of unbiased treatment effects from observational data requires an analysis that conditions on all confounding covariates. Conditioning on covariates can be done via standard parametric regression techniques or nonparametric matching like propensity score (PS) matching. The regression or…
Cade, Brian S.; Noon, Barry R.; Scherer, Rick D.; Keane, John J.
2017-01-01
Counts of avian fledglings, nestlings, or clutch size that are bounded below by zero and above by some small integer form a discrete random variable distribution that is not approximated well by conventional parametric count distributions such as the Poisson or negative binomial. We developed a logistic quantile regression model to provide estimates of the empirical conditional distribution of a bounded discrete random variable. The logistic quantile regression model requires that counts are randomly jittered to a continuous random variable, logit transformed to bound them between specified lower and upper values, then estimated in conventional linear quantile regression, repeating the 3 steps and averaging estimates. Back-transformation to the original discrete scale relies on the fact that quantiles are equivariant to monotonic transformations. We demonstrate this statistical procedure by modeling 20 years of California Spotted Owl fledgling production (0−3 per territory) on the Lassen National Forest, California, USA, as related to climate, demographic, and landscape habitat characteristics at territories. Spotted Owl fledgling counts increased nonlinearly with decreasing precipitation in the early nesting period, in the winter prior to nesting, and in the prior growing season; with increasing minimum temperatures in the early nesting period; with adult compared to subadult parents; when there was no fledgling production in the prior year; and when percentage of the landscape surrounding nesting sites (202 ha) with trees ≥25 m height increased. Changes in production were primarily driven by changes in the proportion of territories with 2 or 3 fledglings. Average variances of the discrete cumulative distributions of the estimated fledgling counts indicated that temporal changes in climate and parent age class explained 18% of the annual variance in owl fledgling production, which was 34% of the total variance. Prior fledgling production explained as much of the variance in the fledgling counts as climate, parent age class, and landscape habitat predictors. Our logistic quantile regression model can be used for any discrete response variables with fixed upper and lower bounds.
NASA Technical Reports Server (NTRS)
Bast, Callie Corinne Scheidt
1994-01-01
This thesis presents the on-going development of methodology for a probabilistic material strength degradation model. The probabilistic model, in the form of a postulated randomized multifactor equation, provides for quantification of uncertainty in the lifetime material strength of aerospace propulsion system components subjected to a number of diverse random effects. This model is embodied in the computer program entitled PROMISS, which can include up to eighteen different effects. Presently, the model includes four effects that typically reduce lifetime strength: high temperature, mechanical fatigue, creep, and thermal fatigue. Statistical analysis was conducted on experimental Inconel 718 data obtained from the open literature. This analysis provided regression parameters for use as the model's empirical material constants, thus calibrating the model specifically for Inconel 718. Model calibration was carried out for four variables, namely, high temperature, mechanical fatigue, creep, and thermal fatigue. Methodology to estimate standard deviations of these material constants for input into the probabilistic material strength model was developed. Using the current version of PROMISS, entitled PROMISS93, a sensitivity study for the combined effects of mechanical fatigue, creep, and thermal fatigue was performed. Results, in the form of cumulative distribution functions, illustrated the sensitivity of lifetime strength to any current value of an effect. In addition, verification studies comparing a combination of mechanical fatigue and high temperature effects by model to the combination by experiment were conducted. Thus, for Inconel 718, the basic model assumption of independence between effects was evaluated. Results from this limited verification study strongly supported this assumption.
Medalie, Laura
2014-01-01
Annual and daily concentrations and fluxes of total and dissolved phosphorus, total nitrogen, chloride, and total suspended solids were estimated for 18 monitored tributaries to Lake Champlain by using the Weighted Regressions on Time, Discharge, and Seasons regression model. Estimates were made for 21 or 23 years, depending on data availability, for the purpose of providing timely and accessible summary reports as stipulated in the 2010 update to the Lake Champlain “Opportunities for Action” management plan. Estimates of concentration and flux were provided for each tributary based on (1) observed daily discharges and (2) a flow-normalizing procedure, which removed the random fluctuations of climate-related variability. The flux bias statistic, an indicator of the ability of the Weighted Regressions on Time, Discharge, and Season regression models to provide accurate representations of flux, showed acceptable bias (less than ±10 percent) for 68 out of 72 models for total and dissolved phosphorus, total nitrogen, and chloride. Six out of 18 models for total suspended solids had moderate bias (between 10 and 30 percent), an expected result given the frequently nonlinear relation between total suspended solids and discharge. One model for total suspended solids with a very high bias was influenced by a single extreme value; however, removal of that value, although reducing the bias substantially, had little effect on annual fluxes.
Liu, Danping; Yeung, Edwina H; McLain, Alexander C; Xie, Yunlong; Buck Louis, Germaine M; Sundaram, Rajeshwari
2017-09-01
Imperfect follow-up in longitudinal studies commonly leads to missing outcome data that can potentially bias the inference when the missingness is nonignorable; that is, the propensity of missingness depends on missing values in the data. In the Upstate KIDS Study, we seek to determine if the missingness of child development outcomes is nonignorable, and how a simple model assuming ignorable missingness would compare with more complicated models for a nonignorable mechanism. To correct for nonignorable missingness, the shared random effects model (SREM) jointly models the outcome and the missing mechanism. However, the computational complexity and lack of software packages has limited its practical applications. This paper proposes a novel two-step approach to handle nonignorable missing outcomes in generalized linear mixed models. We first analyse the missing mechanism with a generalized linear mixed model and predict values of the random effects; then, the outcome model is fitted adjusting for the predicted random effects to account for heterogeneity in the missingness propensity. Extensive simulation studies suggest that the proposed method is a reliable approximation to SREM, with a much faster computation. The nonignorability of missing data in the Upstate KIDS Study is estimated to be mild to moderate, and the analyses using the two-step approach or SREM are similar to the model assuming ignorable missingness. The two-step approach is a computationally straightforward method that can be conducted as sensitivity analyses in longitudinal studies to examine violations to the ignorable missingness assumption and the implications relative to health outcomes. © 2017 John Wiley & Sons Ltd.
Calibrating random forests for probability estimation.
Dankowski, Theresa; Ziegler, Andreas
2016-09-30
Probabilities can be consistently estimated using random forests. It is, however, unclear how random forests should be updated to make predictions for other centers or at different time points. In this work, we present two approaches for updating random forests for probability estimation. The first method has been proposed by Elkan and may be used for updating any machine learning approach yielding consistent probabilities, so-called probability machines. The second approach is a new strategy specifically developed for random forests. Using the terminal nodes, which represent conditional probabilities, the random forest is first translated to logistic regression models. These are, in turn, used for re-calibration. The two updating strategies were compared in a simulation study and are illustrated with data from the German Stroke Study Collaboration. In most simulation scenarios, both methods led to similar improvements. In the simulation scenario in which the stricter assumptions of Elkan's method were not met, the logistic regression-based re-calibration approach for random forests outperformed Elkan's method. It also performed better on the stroke data than Elkan's method. The strength of Elkan's method is its general applicability to any probability machine. However, if the strict assumptions underlying this approach are not met, the logistic regression-based approach is preferable for updating random forests for probability estimation. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.
Can Predictive Modeling Identify Head and Neck Oncology Patients at Risk for Readmission?
Manning, Amy M; Casper, Keith A; Peter, Kay St; Wilson, Keith M; Mark, Jonathan R; Collar, Ryan M
2018-05-01
Objective Unplanned readmission within 30 days is a contributor to health care costs in the United States. The use of predictive modeling during hospitalization to identify patients at risk for readmission offers a novel approach to quality improvement and cost reduction. Study Design Two-phase study including retrospective analysis of prospectively collected data followed by prospective longitudinal study. Setting Tertiary academic medical center. Subjects and Methods Prospectively collected data for patients undergoing surgical treatment for head and neck cancer from January 2013 to January 2015 were used to build predictive models for readmission within 30 days of discharge using logistic regression, classification and regression tree (CART) analysis, and random forests. One model (logistic regression) was then placed prospectively into the discharge workflow from March 2016 to May 2016 to determine the model's ability to predict which patients would be readmitted within 30 days. Results In total, 174 admissions had descriptive data. Thirty-two were excluded due to incomplete data. Logistic regression, CART, and random forest predictive models were constructed using the remaining 142 admissions. When applied to 106 consecutive prospective head and neck oncology patients at the time of discharge, the logistic regression model predicted readmissions with a specificity of 94%, a sensitivity of 47%, a negative predictive value of 90%, and a positive predictive value of 62% (odds ratio, 14.9; 95% confidence interval, 4.02-55.45). Conclusion Prospectively collected head and neck cancer databases can be used to develop predictive models that can accurately predict which patients will be readmitted. This offers valuable support for quality improvement initiatives and readmission-related cost reduction in head and neck cancer care.
Prediction of Baseflow Index of Catchments using Machine Learning Algorithms
NASA Astrophysics Data System (ADS)
Yadav, B.; Hatfield, K.
2017-12-01
We present the results of eight machine learning techniques for predicting the baseflow index (BFI) of ungauged basins using a surrogate of catchment scale climate and physiographic data. The tested algorithms include ordinary least squares, ridge regression, least absolute shrinkage and selection operator (lasso), elasticnet, support vector machine, gradient boosted regression trees, random forests, and extremely randomized trees. Our work seeks to identify the dominant controls of BFI that can be readily obtained from ancillary geospatial databases and remote sensing measurements, such that the developed techniques can be extended to ungauged catchments. More than 800 gauged catchments spanning the continental United States were selected to develop the general methodology. The BFI calculation was based on the baseflow separated from daily streamflow hydrograph using HYSEP filter. The surrogate catchment attributes were compiled from multiple sources including digital elevation model, soil, landuse, climate data, other publicly available ancillary and geospatial data. 80% catchments were used to train the ML algorithms, and the remaining 20% of the catchments were used as an independent test set to measure the generalization performance of fitted models. A k-fold cross-validation using exhaustive grid search was used to fit the hyperparameters of each model. Initial model development was based on 19 independent variables, but after variable selection and feature ranking, we generated revised sparse models of BFI prediction that are based on only six catchment attributes. These key predictive variables selected after the careful evaluation of bias-variance tradeoff include average catchment elevation, slope, fraction of sand, permeability, temperature, and precipitation. The most promising algorithms exceeding an accuracy score (r-square) of 0.7 on test data include support vector machine, gradient boosted regression trees, random forests, and extremely randomized trees. Considering both the accuracy and the computational complexity of these algorithms, we identify the extremely randomized trees as the best performing algorithm for BFI prediction in ungauged basins.
Predicting recycling behaviour: Comparison of a linear regression model and a fuzzy logic model.
Vesely, Stepan; Klöckner, Christian A; Dohnal, Mirko
2016-03-01
In this paper we demonstrate that fuzzy logic can provide a better tool for predicting recycling behaviour than the customarily used linear regression. To show this, we take a set of empirical data on recycling behaviour (N=664), which we randomly divide into two halves. The first half is used to estimate a linear regression model of recycling behaviour, and to develop a fuzzy logic model of recycling behaviour. As the first comparison, the fit of both models to the data included in estimation of the models (N=332) is evaluated. As the second comparison, predictive accuracy of both models for "new" cases (hold-out data not included in building the models, N=332) is assessed. In both cases, the fuzzy logic model significantly outperforms the regression model in terms of fit. To conclude, when accurate predictions of recycling and possibly other environmental behaviours are needed, fuzzy logic modelling seems to be a promising technique. Copyright © 2015 Elsevier Ltd. All rights reserved.
Strand, Matthew; Sillau, Stefan; Grunwald, Gary K; Rabinovitch, Nathan
2014-02-10
Regression calibration provides a way to obtain unbiased estimators of fixed effects in regression models when one or more predictors are measured with error. Recent development of measurement error methods has focused on models that include interaction terms between measured-with-error predictors, and separately, methods for estimation in models that account for correlated data. In this work, we derive explicit and novel forms of regression calibration estimators and associated asymptotic variances for longitudinal models that include interaction terms, when data from instrumental and unbiased surrogate variables are available but not the actual predictors of interest. The longitudinal data are fit using linear mixed models that contain random intercepts and account for serial correlation and unequally spaced observations. The motivating application involves a longitudinal study of exposure to two pollutants (predictors) - outdoor fine particulate matter and cigarette smoke - and their association in interactive form with levels of a biomarker of inflammation, leukotriene E4 (LTE 4 , outcome) in asthmatic children. Because the exposure concentrations could not be directly observed, we used measurements from a fixed outdoor monitor and urinary cotinine concentrations as instrumental variables, and we used concentrations of fine ambient particulate matter and cigarette smoke measured with error by personal monitors as unbiased surrogate variables. We applied the derived regression calibration methods to estimate coefficients of the unobserved predictors and their interaction, allowing for direct comparison of toxicity of the different pollutants. We used simulations to verify accuracy of inferential methods based on asymptotic theory. Copyright © 2013 John Wiley & Sons, Ltd.
NASA Astrophysics Data System (ADS)
Pham, Binh Thai; Prakash, Indra; Tien Bui, Dieu
2018-02-01
A hybrid machine learning approach of Random Subspace (RSS) and Classification And Regression Trees (CART) is proposed to develop a model named RSSCART for spatial prediction of landslides. This model is a combination of the RSS method which is known as an efficient ensemble technique and the CART which is a state of the art classifier. The Luc Yen district of Yen Bai province, a prominent landslide prone area of Viet Nam, was selected for the model development. Performance of the RSSCART model was evaluated through the Receiver Operating Characteristic (ROC) curve, statistical analysis methods, and the Chi Square test. Results were compared with other benchmark landslide models namely Support Vector Machines (SVM), single CART, Naïve Bayes Trees (NBT), and Logistic Regression (LR). In the development of model, ten important landslide affecting factors related with geomorphology, geology and geo-environment were considered namely slope angles, elevation, slope aspect, curvature, lithology, distance to faults, distance to rivers, distance to roads, and rainfall. Performance of the RSSCART model (AUC = 0.841) is the best compared with other popular landslide models namely SVM (0.835), single CART (0.822), NBT (0.821), and LR (0.723). These results indicate that performance of the RSSCART is a promising method for spatial landslide prediction.
Austin, Peter C.; van Klaveren, David; Vergouwe, Yvonne; Nieboer, Daan; Lee, Douglas S.; Steyerberg, Ewout W.
2018-01-01
Background Stability in baseline risk and estimated predictor effects both geographically and temporally is a desirable property of clinical prediction models. However, this issue has received little attention in the methodological literature. Our objective was to examine methods for assessing temporal and geographic heterogeneity in baseline risk and predictor effects in prediction models. Methods We studied 14,857 patients hospitalized with heart failure at 90 hospitals in Ontario, Canada, in two time periods. We focussed on geographic and temporal variation in baseline risk (intercept) and predictor effects (regression coefficients) of the EFFECT-HF mortality model for predicting 1-year mortality in patients hospitalized for heart failure. We used random effects logistic regression models for the 14,857 patients. Results The baseline risk of mortality displayed moderate geographic variation, with the hospital-specific probability of 1-year mortality for a reference patient lying between 0.168 and 0.290 for 95% of hospitals. Furthermore, the odds of death were 11% lower in the second period than in the first period. However, we found minimal geographic or temporal variation in predictor effects. Among 11 tests of differences in time for predictor variables, only one had a modestly significant P value (0.03). Conclusions This study illustrates how temporal and geographic heterogeneity of prediction models can be assessed in settings with a large sample of patients from a large number of centers at different time periods. PMID:29350215
Cooley, Richard L.
1982-01-01
Prior information on the parameters of a groundwater flow model can be used to improve parameter estimates obtained from nonlinear regression solution of a modeling problem. Two scales of prior information can be available: (1) prior information having known reliability (that is, bias and random error structure) and (2) prior information consisting of best available estimates of unknown reliability. A regression method that incorporates the second scale of prior information assumes the prior information to be fixed for any particular analysis to produce improved, although biased, parameter estimates. Approximate optimization of two auxiliary parameters of the formulation is used to help minimize the bias, which is almost always much smaller than that resulting from standard ridge regression. It is shown that if both scales of prior information are available, then a combined regression analysis may be made.
Sahebkar, Amirhossein; Simental-Mendía, Luis E; Giorgini, Paolo; Ferri, Claudio; Grassi, Davide
2016-10-15
Transport of oxidized low-density lipoprotein across the endothelium into the artery wall is considered a fundamental priming step for the atherosclerotic process. Recent studies reported potential therapeutic effects of micronutrients found in natural products, indicating positive applications for controlling the pathogenesis of chronic cardiovascular disease driven by cardiovascular risk factors and oxidative stress. A particular attention has been recently addressed to pomegranate; however findings of clinical studies have been contrasting. To evaluate the effects of pomegranate consumption on plasma lipid concentrations through a systematic review and meta-analysis of randomized controlled trials (RCTs). The study was designed according to the preferred reporting items for systematic reviews and meta-analysis (PRISMA) statement. Scopus and Medline databases were searched to identify randomized placebo-controlled trials investigating the impact of pomegranate on plasma lipid concentrations. A fixed-effects model and the generic inverse variance method were used for quantitative data synthesis. Sensitivity analysis was conducted using the one-study remove approach. Random-effects meta-regression was performed to assess the impact of potential confounders on the estimated effect sizes. A total of 545 individuals were recruited from the 12 RCTs. Fixed-effect meta-analysis of data from 12 RCTs (13 treatment arms) did not show any significant effect of pomegranate consumption on plasma lipid concentrations. The results of meta-regression did not suggest any significant association between duration of supplementation and impact of pomegranate on total cholesterol and HDL-C, while an inverse association was found with changes in triglycerides levels (slope: -1.07; 95% CI: -2.03 to -0.11; p = 0.029). There was no association between the amount of pomegranate juice consumed per day and respective changes in plasma total cholesterol, LDL-C, HDL-C and triglycerides. The present meta-analysis of RCTs did not suggest any effect of pomegranate consumption on lipid profile in human. Copyright © 2016. Published by Elsevier GmbH.
Particulate air pollution and panel studies in children: a systematic review
Ward, D; Ayres, J
2004-01-01
Aims: To systematically review the results of such studies in children, estimate summary measures of effect, and investigate potential sources of heterogeneity. Methods: Studies were identified by searching electronic databases to June 2002, including those where outcomes and particulate level measurements were made at least daily for ⩾8 weeks, and analysed using an appropriate regression model. Study results were compared using forest plots, and fixed and random effects summary effect estimates obtained. Publication bias was considered using a funnel plot. Results: Twenty two studies were identified, all except two reporting PM10 (24 hour mean) >50 µg.m-3. Reported effects of PM10 on PEF were widely spread and smaller than those for PM2.5 (fixed effects summary: -0.012 v -0.063 l.min-1 per µg.m-3 rise). A similar pattern was evident for symptoms. Random effects models produced larger estimates. Overall, in between-study comparisons, panels of children with diagnosed asthma or pre-existing respiratory symptoms appeared less affected by PM10 levels than those without, and effect estimates were larger where studies were conducted in higher ozone conditions. Larger PM10 effect estimates were obtained from studies using generalised estimating equations to model autocorrelation and where results were derived by pooling subject specific regression coefficients. A funnel plot of PM10 results for PEF was markedly asymmetrical. Conclusions: The majority of identified studies indicate an adverse effect of particulate air pollution that is greater for PM2.5 than PM10. However, results show considerable heterogeneity and there is evidence consistent with publication bias, so limited confidence may be placed on summary estimates of effect. The possibility of interaction between particle and ozone effects merits further investigation, as does variability due to analytical differences that alter the interpretation of final estimates. PMID:15031404
MoghaddamHosseini, Vahideh; Nazarzadeh, Milad; Jahanfar, Shayesteh
2017-11-07
Fear of childbirth is a problematic mental health issue during pregnancy. But, effective interventions to reduce this problem are not well understood. To examine effective interventions for reducing fear of childbirth. The Cochrane Central Register of Controlled Trials, PubMed, Embase and PsycINFO were searched since inception till September 2017 without any restriction. Randomised controlled trials and quasi-randomised controlled trials comparing interventions for treatment of fear of childbirth were included. The standardized mean differences were pooled using random and fixed effect models. The heterogeneity was determined using the Cochran's test and I 2 index and was further explored in meta-regression model and subgroup analyses. Ten studies inclusive of 3984 participants were included in the meta-analysis (2 quasi-randomized and 8 randomized clinical trials). Eight studies investigated education and two studies investigated hypnosis-based intervention. The pooled standardized mean differences of fear for the education intervention and hypnosis group in comparison with control group were -0.46 (95% CI -0.73 to -0.19) and -0.22 (95% CI -0.34 to -0.10), respectively. Both types of interventions were effective in reducing fear of childbirth; however our pooled results revealed that educational interventions may reduce fear with double the effect of hypnosis. Further large scale randomized clinical trials and individual patient data meta-analysis are warranted for assessing the association. Copyright © 2017 Australian College of Midwives. Published by Elsevier Ltd. All rights reserved.
Hao, Xu; Yujun, Sun; Xinjie, Wang; Jin, Wang; Yao, Fu
2015-01-01
A multiple linear model was developed for individual tree crown width of Cunninghamia lanceolata (Lamb.) Hook in Fujian province, southeast China. Data were obtained from 55 sample plots of pure China-fir plantation stands. An Ordinary Linear Least Squares (OLS) regression was used to establish the crown width model. To adjust for correlations between observations from the same sample plots, we developed one level linear mixed-effects (LME) models based on the multiple linear model, which take into account the random effects of plots. The best random effects combinations for the LME models were determined by the Akaike's information criterion, the Bayesian information criterion and the -2logarithm likelihood. Heteroscedasticity was reduced by three residual variance functions: the power function, the exponential function and the constant plus power function. The spatial correlation was modeled by three correlation structures: the first-order autoregressive structure [AR(1)], a combination of first-order autoregressive and moving average structures [ARMA(1,1)], and the compound symmetry structure (CS). Then, the LME model was compared to the multiple linear model using the absolute mean residual (AMR), the root mean square error (RMSE), and the adjusted coefficient of determination (adj-R2). For individual tree crown width models, the one level LME model showed the best performance. An independent dataset was used to test the performance of the models and to demonstrate the advantage of calibrating LME models.
Leveraging prognostic baseline variables to gain precision in randomized trials
Colantuoni, Elizabeth; Rosenblum, Michael
2015-01-01
We focus on estimating the average treatment effect in a randomized trial. If baseline variables are correlated with the outcome, then appropriately adjusting for these variables can improve precision. An example is the analysis of covariance (ANCOVA) estimator, which applies when the outcome is continuous, the quantity of interest is the difference in mean outcomes comparing treatment versus control, and a linear model with only main effects is used. ANCOVA is guaranteed to be at least as precise as the standard unadjusted estimator, asymptotically, under no parametric model assumptions and also is locally semiparametric efficient. Recently, several estimators have been developed that extend these desirable properties to more general settings that allow any real-valued outcome (e.g., binary or count), contrasts other than the difference in mean outcomes (such as the relative risk), and estimators based on a large class of generalized linear models (including logistic regression). To the best of our knowledge, we give the first simulation study in the context of randomized trials that compares these estimators. Furthermore, our simulations are not based on parametric models; instead, our simulations are based on resampling data from completed randomized trials in stroke and HIV in order to assess estimator performance in realistic scenarios. We provide practical guidance on when these estimators are likely to provide substantial precision gains and describe a quick assessment method that allows clinical investigators to determine whether these estimators could be useful in their specific trial contexts. PMID:25872751
NASA Astrophysics Data System (ADS)
Houborg, Rasmus; McCabe, Matthew F.
2018-01-01
With an increasing volume and dimensionality of Earth observation data, enhanced integration of machine-learning methodologies is needed to effectively analyze and utilize these information rich datasets. In machine-learning, a training dataset is required to establish explicit associations between a suite of explanatory 'predictor' variables and the target property. The specifics of this learning process can significantly influence model validity and portability, with a higher generalization level expected with an increasing number of observable conditions being reflected in the training dataset. Here we propose a hybrid training approach for leaf area index (LAI) estimation, which harnesses synergistic attributes of scattered in-situ measurements and systematically distributed physically based model inversion results to enhance the information content and spatial representativeness of the training data. To do this, a complimentary training dataset of independent LAI was derived from a regularized model inversion of RapidEye surface reflectances and subsequently used to guide the development of LAI regression models via Cubist and random forests (RF) decision tree methods. The application of the hybrid training approach to a broad set of Landsat 8 vegetation index (VI) predictor variables resulted in significantly improved LAI prediction accuracies and spatial consistencies, relative to results relying on in-situ measurements alone for model training. In comparing the prediction capacity and portability of the two machine-learning algorithms, a pair of relatively simple multi-variate regression models established by Cubist performed best, with an overall relative mean absolute deviation (rMAD) of ∼11%, determined based on a stringent scene-specific cross-validation approach. In comparison, the portability of RF regression models was less effective (i.e., an overall rMAD of ∼15%), which was attributed partly to model saturation at high LAI in association with inherent extrapolation and transferability limitations. Explanatory VIs formed from bands in the near-infrared (NIR) and shortwave infrared domains (e.g., NDWI) were associated with the highest predictive ability, whereas Cubist models relying entirely on VIs based on NIR and red band combinations (e.g., NDVI) were associated with comparatively high uncertainties (i.e., rMAD ∼ 21%). The most transferable and best performing models were based on combinations of several predictor variables, which included both NDWI- and NDVI-like variables. In this process, prior screening of input VIs based on an assessment of variable relevance served as an effective mechanism for optimizing prediction accuracies from both Cubist and RF. While this study demonstrated benefit in combining data mining operations with physically based constraints via a hybrid training approach, the concept of transferability and portability warrants further investigations in order to realize the full potential of emerging machine-learning techniques for regression purposes.
Using regression methods to estimate stream phosphorus loads at the Illinois River, Arkansas
Haggard, B.E.; Soerens, T.S.; Green, W.R.; Richards, R.P.
2003-01-01
The development of total maximum daily loads (TMDLs) requires evaluating existing constituent loads in streams. Accurate estimates of constituent loads are needed to calibrate watershed and reservoir models for TMDL development. The best approach to estimate constituent loads is high frequency sampling, particularly during storm events, and mass integration of constituents passing a point in a stream. Most often, resources are limited and discrete water quality samples are collected on fixed intervals and sometimes supplemented with directed sampling during storm events. When resources are limited, mass integration is not an accurate means to determine constituent loads and other load estimation techniques such as regression models are used. The objective of this work was to determine a minimum number of water-quality samples needed to provide constituent concentration data adequate to estimate constituent loads at a large stream. Twenty sets of water quality samples with and without supplemental storm samples were randomly selected at various fixed intervals from a database at the Illinois River, northwest Arkansas. The random sets were used to estimate total phosphorus (TP) loads using regression models. The regression-based annual TP loads were compared to the integrated annual TP load estimated using all the data. At a minimum, monthly sampling plus supplemental storm samples (six samples per year) was needed to produce a root mean square error of less than 15%. Water quality samples should be collected at least semi-monthly (every 15 days) in studies less than two years if seasonal time factors are to be used in the regression models. Annual TP loads estimated from independently collected discrete water quality samples further demonstrated the utility of using regression models to estimate annual TP loads in this stream system.
Arevalillo, Jorge M; Sztein, Marcelo B; Kotloff, Karen L; Levine, Myron M; Simon, Jakub K
2017-10-01
Immunologic correlates of protection are important in vaccine development because they give insight into mechanisms of protection, assist in the identification of promising vaccine candidates, and serve as endpoints in bridging clinical vaccine studies. Our goal is the development of a methodology to identify immunologic correlates of protection using the Shigella challenge as a model. The proposed methodology utilizes the Random Forests (RF) machine learning algorithm as well as Classification and Regression Trees (CART) to detect immune markers that predict protection, identify interactions between variables, and define optimal cutoffs. Logistic regression modeling is applied to estimate the probability of protection and the confidence interval (CI) for such a probability is computed by bootstrapping the logistic regression models. The results demonstrate that the combination of Classification and Regression Trees and Random Forests complements the standard logistic regression and uncovers subtle immune interactions. Specific levels of immunoglobulin IgG antibody in blood on the day of challenge predicted protection in 75% (95% CI 67-86). Of those subjects that did not have blood IgG at or above a defined threshold, 100% were protected if they had IgA antibody secreting cells above a defined threshold. Comparison with the results obtained by applying only logistic regression modeling with standard Akaike Information Criterion for model selection shows the usefulness of the proposed method. Given the complexity of the immune system, the use of machine learning methods may enhance traditional statistical approaches. When applied together, they offer a novel way to quantify important immune correlates of protection that may help the development of vaccines. Copyright © 2017 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Lin, M.; Yang, Z.; Park, H.; Qian, S.; Chen, J.; Fan, P.
2017-12-01
Impervious surface area (ISA) has become an important indicator for studying urban environments, but mapping ISA at the regional or global scale is still challenging due to the complexity of impervious surface features. The Defense Meteorological Satellite Program's Operational Linescan System (DMSP-OLS) nighttime light data is (NTL) and Resolution Imaging Spectroradiometer (MODIS) are the major remote sensing data source for regional ISA mapping. A single regression relationship between fractional ISA and NTL or various index derived based on NTL and MODIS vegetation index (NDVI) data was established in many previous studies for regional ISA mapping. However, due to the varying geographical, climatic, and socio-economic characteristics of different cities, the same regression relationship may vary significantly across different cities in the same region in terms of both fitting performance (i.e. R2) and the rate of change (Slope). In this study, we examined the regression relationship between fractional ISA and Vegetation Adjusted Nighttime light Urban Index (VANUI) for 120 randomly selected cities around the world with a multilevel regression model. We found that indeed there is substantial variability of both the R2 (0.68±0.29) and slopes (0.64±0.40) among individual regressions, which suggests that multilevel/hierarchical models are needed for accuracy improvement of future regional ISA mapping .Further analysis also let us find the this substantial variability are affected by climate conditions, socio-economic status, and urban spatial structures. However, all these effects are nonlinear rather than linear, thus could not modeled explicitly in multilevel linear regression models.
Assessing the predictive capability of randomized tree-based ensembles in streamflow modelling
NASA Astrophysics Data System (ADS)
Galelli, S.; Castelletti, A.
2013-02-01
Combining randomization methods with ensemble prediction is emerging as an effective option to balance accuracy and computational efficiency in data-driven modeling. In this paper we investigate the prediction capability of extremely randomized trees (Extra-Trees), in terms of accuracy, explanation ability and computational efficiency, in a streamflow modeling exercise. Extra-Trees are a totally randomized tree-based ensemble method that (i) alleviates the poor generalization property and tendency to overfitting of traditional standalone decision trees (e.g. CART); (ii) is computationally very efficient; and, (iii) allows to infer the relative importance of the input variables, which might help in the ex-post physical interpretation of the model. The Extra-Trees potential is analyzed on two real-world case studies (Marina catchment (Singapore) and Canning River (Western Australia)) representing two different morphoclimatic contexts comparatively with other tree-based methods (CART and M5) and parametric data-driven approaches (ANNs and multiple linear regression). Results show that Extra-Trees perform comparatively well to the best of the benchmarks (i.e. M5) in both the watersheds, while outperforming the other approaches in terms of computational requirement when adopted on large datasets. In addition, the ranking of the input variable provided can be given a physically meaningful interpretation.
Assessing the predictive capability of randomized tree-based ensembles in streamflow modelling
NASA Astrophysics Data System (ADS)
Galelli, S.; Castelletti, A.
2013-07-01
Combining randomization methods with ensemble prediction is emerging as an effective option to balance accuracy and computational efficiency in data-driven modelling. In this paper, we investigate the prediction capability of extremely randomized trees (Extra-Trees), in terms of accuracy, explanation ability and computational efficiency, in a streamflow modelling exercise. Extra-Trees are a totally randomized tree-based ensemble method that (i) alleviates the poor generalisation property and tendency to overfitting of traditional standalone decision trees (e.g. CART); (ii) is computationally efficient; and, (iii) allows to infer the relative importance of the input variables, which might help in the ex-post physical interpretation of the model. The Extra-Trees potential is analysed on two real-world case studies - Marina catchment (Singapore) and Canning River (Western Australia) - representing two different morphoclimatic contexts. The evaluation is performed against other tree-based methods (CART and M5) and parametric data-driven approaches (ANNs and multiple linear regression). Results show that Extra-Trees perform comparatively well to the best of the benchmarks (i.e. M5) in both the watersheds, while outperforming the other approaches in terms of computational requirement when adopted on large datasets. In addition, the ranking of the input variable provided can be given a physically meaningful interpretation.
Kupek, Emil
2006-03-15
Structural equation modelling (SEM) has been increasingly used in medical statistics for solving a system of related regression equations. However, a great obstacle for its wider use has been its difficulty in handling categorical variables within the framework of generalised linear models. A large data set with a known structure among two related outcomes and three independent variables was generated to investigate the use of Yule's transformation of odds ratio (OR) into Q-metric by (OR-1)/(OR+1) to approximate Pearson's correlation coefficients between binary variables whose covariance structure can be further analysed by SEM. Percent of correctly classified events and non-events was compared with the classification obtained by logistic regression. The performance of SEM based on Q-metric was also checked on a small (N = 100) random sample of the data generated and on a real data set. SEM successfully recovered the generated model structure. SEM of real data suggested a significant influence of a latent confounding variable which would have not been detectable by standard logistic regression. SEM classification performance was broadly similar to that of the logistic regression. The analysis of binary data can be greatly enhanced by Yule's transformation of odds ratios into estimated correlation matrix that can be further analysed by SEM. The interpretation of results is aided by expressing them as odds ratios which are the most frequently used measure of effect in medical statistics.
Application of linear regression analysis in accuracy assessment of rolling force calculations
NASA Astrophysics Data System (ADS)
Poliak, E. I.; Shim, M. K.; Kim, G. S.; Choo, W. Y.
1998-10-01
Efficient operation of the computational models employed in process control systems require periodical assessment of the accuracy of their predictions. Linear regression is proposed as a tool which allows separate systematic and random prediction errors from those related to measurements. A quantitative characteristic of the model predictive ability is introduced in addition to standard statistical tests for model adequacy. Rolling force calculations are considered as an example for the application. However, the outlined approach can be used to assess the performance of any computational model.
Davey, Calum; Aiken, Alexander M; Hayes, Richard J; Hargreaves, James R
2015-01-01
Introduction: Helminth (worm) infections cause morbidity among poor communities worldwide. An influential study conducted in Kenya in 1998–99 reported that a school-based drug-and-educational intervention had benefits for worm infections and school attendance. Methods: In this statistical replication, we re-analysed data from this cluster quasi-randomized stepped-wedge trial, specifying two co-primary outcomes: school attendance and examination performance. We estimated intention-to-treat effects using year-stratified cluster-summary analysis and observation-level random-effects regression, and combined both years with a random-effects model accounting for year. The participants were not blinded to allocation status, and other interventions were concurrently conducted in a sub-set of schools. A protocol guiding outcome data collection was not available. Results: Quasi-randomization resulted in three similar groups of 25 schools. There was a substantial amount of missing data. In year-stratified cluster-summary analysis, there was no clear evidence for improvement in either school attendance or examination performance. In year-stratified regression models, there was some evidence of improvement in school attendance [adjusted odds ratios (aOR): year 1: 1.48, 95% confidence interval (CI) 0.88–2.52, P = 0.147; year 2: 1.23, 95% CI 1.01–1.51, P = 0.044], but not examination performance (adjusted differences: year 1: −0.135, 95% CI −0.323–0.054, P = 0.161; year 2: −0.017, 95% CI −0.201–0.166, P = 0.854). When both years were combined, there was strong evidence of an effect on attendance (aOR 1.82, 95% CI 1.74–1.91, P < 0.001), but not examination performance (adjusted difference −0.121, 95% CI −0.293–0.052, P = 0.169). Conclusions: The evidence supporting an improvement in school attendance differed by analysis method. This, and various other important limitations of the data, caution against over-interpretation of the results. We find that the study provides some evidence, but with high risk of bias, that a school-based drug-treatment and health-education intervention improved school attendance and no evidence of effect on examination performance. PMID:26203171
Canaza-Cayo, Ali William; Lopes, Paulo Sávio; da Silva, Marcos Vinicius Gualberto Barbosa; de Almeida Torres, Robledo; Martins, Marta Fonseca; Arbex, Wagner Antonio; Cobuci, Jaime Araujo
2015-01-01
A total of 32,817 test-day milk yield (TDMY) records of the first lactation of 4,056 Girolando cows daughters of 276 sires, collected from 118 herds between 2000 and 2011 were utilized to estimate the genetic parameters for TDMY via random regression models (RRM) using Legendre’s polynomial functions whose orders varied from 3 to 5. In addition, nine measures of persistency in milk yield (PSi) and the genetic trend of 305-day milk yield (305MY) were evaluated. The fit quality criteria used indicated RRM employing the Legendre’s polynomial of orders 3 and 5 for fitting the genetic additive and permanent environment effects, respectively, as the best model. The heritability and genetic correlation for TDMY throughout the lactation, obtained with the best model, varied from 0.18 to 0.23 and from −0.03 to 1.00, respectively. The heritability and genetic correlation for persistency and 305MY varied from 0.10 to 0.33 and from −0.98 to 1.00, respectively. The use of PS7 would be the most suitable option for the evaluation of Girolando cattle. The estimated breeding values for 305MY of sires and cows showed significant and positive genetic trends. Thus, the use of selection indices would be indicated in the genetic evaluation of Girolando cattle for both traits. PMID:26323397
Aspilcueta-Borquis, Rúsbel R; Araujo Neto, Francisco R; Baldi, Fernando; Santos, Daniel J A; Albuquerque, Lucia G; Tonhati, Humberto
2012-08-01
The test-day yields of milk, fat and protein were analysed from 1433 first lactations of buffaloes of the Murrah breed, daughters of 113 sires from 12 herds in the state of São Paulo, Brazil, born between 1985 and 2007. For the test-day yields, 10 monthly classes of lactation days were considered. The contemporary groups were defined as the herd-year-month of the test day. Random additive genetic, permanent environmental and residual effects were included in the model. The fixed effects considered were the contemporary group, number of milkings (1 or 2 milkings), linear and quadratic effects of the covariable cow age at calving and the mean lactation curve of the population (modelled by third-order Legendre orthogonal polynomials). The random additive genetic and permanent environmental effects were estimated by means of regression on third- to sixth-order Legendre orthogonal polynomials. The residual variances were modelled with a homogenous structure and various heterogeneous classes. According to the likelihood-ratio test, the best model for milk and fat production was that with four residual variance classes, while a third-order Legendre polynomial was best for the additive genetic effect for milk and fat yield, a fourth-order polynomial was best for the permanent environmental effect for milk production and a fifth-order polynomial was best for fat production. For protein yield, the best model was that with three residual variance classes and third- and fourth-order Legendre polynomials were best for the additive genetic and permanent environmental effects, respectively. The heritability estimates for the characteristics analysed were moderate, varying from 0·16±0·05 to 0·29±0·05 for milk yield, 0·20±0·05 to 0·30±0·08 for fat yield and 0·18±0·06 to 0·27±0·08 for protein yield. The estimates of the genetic correlations between the tests varied from 0·18±0·120 to 0·99±0·002; from 0·44±0·080 to 0·99±0·004; and from 0·41±0·080 to 0·99±0·004, for milk, fat and protein production, respectively, indicating that whatever the selection criterion used, indirect genetic gains can be expected throughout the lactation curve.
Estimating Unbiased Treatment Effects in Education Using a Regression Discontinuity Design
ERIC Educational Resources Information Center
Smith, William C.
2014-01-01
The ability of regression discontinuity (RD) designs to provide an unbiased treatment effect while overcoming the ethical concerns plagued by Random Control Trials (RCTs) make it a valuable and useful approach in education evaluation. RD is the only explicitly recognized quasi-experimental approach identified by the Institute of Education…
Sá, Michel Pompeu Barros de Oliveira; Ferraz, Paulo Ernando; Escobar, Rodrigo Renda; Martins, Wendell Nunes; Lustosa, Pablo César; Nunes, Eliobas de Oliveira; Vasconcelos, Frederico Pires; Lima, Ricardo Carvalho
2012-12-01
Most recent published meta-analysis of randomized controlled trials (RCTs) showed that off-pump coronary artery bypass graft surgery (CABG) reduces incidence of stroke by 30% compared with on-pump CABG, but showed no difference in other outcomes. New RCTs were published, indicating need of new meta-analysis to investigate pooled results adding these further studies. MEDLINE, EMBASE, CENTRAL/CCTR, SciELO, LILACS, Google Scholar and reference lists of relevant articles were searched for RCTs that compared outcomes (30-day mortality for all-cause, myocardial infarction or stroke) between off-pump versus on-pump CABG until May 2012. The principal summary measures were relative risk (RR) with 95% Confidence Interval (CI) and P values (considered statistically significant when <0.05). The RR's were combined across studies using DerSimonian-Laird random effects weighted model. Meta-analysis and meta-regression were completed using the software Comprehensive Meta-Analysis version 2 (Biostat Inc., Englewood, New Jersey, USA). Forty-seven RCTs were identified and included 13,524 patients (6,758 for off-pump and 6,766 for on-pump CABG). There was no significant difference between off-pump and on-pump CABG groups in RR for 30-day mortality or myocardial infarction, but there was difference about stroke in favor to off-pump CABG (RR 0.793, 95% CI 0.660-0.920, P=0.049). It was observed no important heterogeneity of effects about any outcome, but it was observed publication bias about outcome "stroke". Meta-regression did not demonstrate influence of female gender, number of grafts or age in outcomes. Off-pump CABG reduces the incidence of post-operative stroke by 20.7% and has no substantial effect on mortality or myocardial infarction in comparison to on-pump CABG. Patient gender, number of grafts performed and age do not seem to explain the effect of off-pump CABG on mortality, myocardial infarction or stroke, respectively.
Saboori, S; Shab-Bidar, S; Speakman, J R; Yousefi Rad, E; Djafarian, K
2015-08-01
C-reactive protein (CRP), a marker of chronic inflammation, has a major role in the etiology of chronic disease. Vitamin E may have anti-inflammatory effects. However, there is no consensus on the effects of vitamin E supplementation on CRP levels in clinical trials. The aim of this study was to systematically review randomized controlled trials (RCTs) that report on the effects of vitamin E supplementation (α- and γ-tocopherols) on CRP levels. A systematic search of RCTs was conducted on Medline and EMBASE through PubMed, Scopus, Ovid and Science Direct, and completed by a manual review of the literature up to May 2014. Pooled effects were estimated by using random-effects models and heterogeneity was assessed by Cochran's Q and I(2) tests. Subgroup analyses and meta-regression analyses were also performed according to intervention duration, dose of supplementation and baseline level of CRP. Of 4734 potentially relevant studies, only 12 trials met the inclusion criteria with 246 participants in the intervention arms and 249 participants in control arms. Pooled analysis showed a significant reduction in CRP levels of 0.62 mg/l (95% confidence interval = -0.92, -0.31; P < 0.001) in vitamin E-treated individuals, with the evidence of heterogeneity across studies. This significant effect was maintained in all subgroups, although the univariate meta-regression analysis showed that the vitamin E supplementation dose, baseline level of CRP and duration of intervention were not the sources of the observed heterogeneity. The results of this meta-analysis suggest that supplementation with vitamin E in the form of either α-tocopherol or γ-tocopherol would reduce serum CRP levels.
Serban, Corina; Sahebkar, Amirhossein; Ursoniu, Sorin; Andrica, Florina; Banach, Maciej
2015-06-01
Hibiscus sabdariffa L. is a tropical wild plant rich in organic acids, polyphenols, anthocyanins, polysaccharides, and volatile constituents that are beneficial for the cardiovascular system. Hibiscus sabdariffa beverages are commonly consumed to treat arterial hypertension, yet the evidence from randomized controlled trials (RCTs) has not been fully conclusive. Therefore, we aimed to assess the potential antihypertensive effects of H. sabdariffa through systematic review of literature and meta-analysis of available RCTs. The search included PUBMED, Cochrane Library, Scopus, and EMBASE (up to July 2014) to identify RCTs investigating the efficacy of H. sabdariffa supplementation on SBP and DBP values. Two independent reviewers extracted data on the study characteristics, methods, and outcomes. Quantitative data synthesis and meta-regression were performed using a fixed-effect model, and sensitivity analysis using leave-one-out method. Five RCTs (comprising seven treatment arms) were selected for the meta-analysis. In total, 390 participants were randomized, of whom 225 were allocated to the H. sabdariffa supplementation group and 165 to the control group in the selected studies. Fixed-effect meta-regression indicated a significant effect of H. sabdariffa supplementation in lowering both SBP (weighed mean difference -7.58 mmHg, 95% confidence interval -9.69 to -5.46, P < 0.00001) and DBP (weighed mean difference -3.53 mmHg, 95% confidence interval -5.16 to -1.89, P < 0.0001). These effects were inversely associated with baseline BP values, and were robust in sensitivity analyses. This meta-analysis of RCTs showed a significant effect of H. sabdariffa in lowering both SBP and DBP. Further well designed trials are necessary to validate these results.
NASA Astrophysics Data System (ADS)
Erener, Arzu; Sivas, A. Abdullah; Selcuk-Kestel, A. Sevtap; Düzgün, H. Sebnem
2017-07-01
All of the quantitative landslide susceptibility mapping (QLSM) methods requires two basic data types, namely, landslide inventory and factors that influence landslide occurrence (landslide influencing factors, LIF). Depending on type of landslides, nature of triggers and LIF, accuracy of the QLSM methods differs. Moreover, how to balance the number of 0 (nonoccurrence) and 1 (occurrence) in the training set obtained from the landslide inventory and how to select which one of the 1's and 0's to be included in QLSM models play critical role in the accuracy of the QLSM. Although performance of various QLSM methods is largely investigated in the literature, the challenge of training set construction is not adequately investigated for the QLSM methods. In order to tackle this challenge, in this study three different training set selection strategies along with the original data set is used for testing the performance of three different regression methods namely Logistic Regression (LR), Bayesian Logistic Regression (BLR) and Fuzzy Logistic Regression (FLR). The first sampling strategy is proportional random sampling (PRS), which takes into account a weighted selection of landslide occurrences in the sample set. The second method, namely non-selective nearby sampling (NNS), includes randomly selected sites and their surrounding neighboring points at certain preselected distances to include the impact of clustering. Selective nearby sampling (SNS) is the third method, which concentrates on the group of 1's and their surrounding neighborhood. A randomly selected group of landslide sites and their neighborhood are considered in the analyses similar to NNS parameters. It is found that LR-PRS, FLR-PRS and BLR-Whole Data set-ups, with order, yield the best fits among the other alternatives. The results indicate that in QLSM based on regression models, avoidance of spatial correlation in the data set is critical for the model's performance.
Teaching quality: High school students' autonomy and competence.
León, Jaime; Medina-Garrido, Elena; Ortega, Miriam
2018-05-01
How teachers manage class learning and interact with students affects students’ motivation and engagement. However, it could be that the effect of students’ representation of teaching quality on the students’ motivation varies between classes. Students from 90 classes participated in the study. We used multilevel random structural equation modeling to analyze whether the relationship of the students’ perception of teaching quality (as an indicator of the students’ mental representation) and students’ motivation varies between classes, and if this variability depends on the class assessment of teaching quality (as an indicator of teaching quality). The effect of teachers’ structure on the regression slope of student perception of student competence was .127. The effect of teachers’ autonomy support on the regression slope of student perception of student autonomy was .066. With this study we contribute a more detailed description of the relationship between teaching quality, competence and autonomy.
Zhou, Ling-Mei; Xu, Jia-Ying; Rao, Chun-Ping; Han, Shufen; Wan, Zhongxiao; Qin, Li-Qiang
2015-01-01
Whey supplementation is beneficial for human health, possibly by reducing the circulating C-reactive protein (CRP) level, a sensitive marker of inflammation. Thus, a meta-analysis of randomized controlled trials was conducted to evaluate their relationship. A systematic literature search was conducted in July, 2014, to identify eligible studies. Either a fixed-effects model or a random-effects model was used to calculate pooled effects. The meta-analysis results of nine trials showed a slight, but no significant, reduction of 0.42 mg/L (95% CI −0.96, 0.13) in CRP level with the supplementation of whey protein and its derivates. Relatively high heterogeneity across studies was observed. Subgroup analyses showed that whey significantly lowered CRP by 0.72 mg/L (95% CI −0.97, −0.47) among trials with a daily whey dose ≥20 g/day and by 0.67 mg/L (95% CI −1.21, −0.14) among trials with baseline CRP ≥3 mg/L. Meta-regression analysis revealed that the baseline CRP level was a potential effect modifier of whey supplementation in reducing CRP. In conclusion, our meta-analysis did not find sufficient evidence that whey and its derivates elicited a beneficial effect in reducing circulating CRP. However, they may significantly reduce CRP among participants with highly supplemental doses or increased baseline CRP levels. PMID:25671415
Huang, Lei
2015-01-01
To solve the problem in which the conventional ARMA modeling methods for gyro random noise require a large number of samples and converge slowly, an ARMA modeling method using a robust Kalman filtering is developed. The ARMA model parameters are employed as state arguments. Unknown time-varying estimators of observation noise are used to achieve the estimated mean and variance of the observation noise. Using the robust Kalman filtering, the ARMA model parameters are estimated accurately. The developed ARMA modeling method has the advantages of a rapid convergence and high accuracy. Thus, the required sample size is reduced. It can be applied to modeling applications for gyro random noise in which a fast and accurate ARMA modeling method is required. PMID:26437409
Siegel, Michael; DeJong, William; Albers, Alison B; Naimi, Timothy S; Jernigan, David H
2013-02-01
This study aims to compare the average price of liquor in the United States between retail alcohol outlets in states that have a monopoly ('control' states) with those that do not ('licence' states). A cross-sectional study of brand-specific alcohol prices in the United States. We determined the average prices in February 2012 of 74 brands of liquor among the 13 control states that maintain a monopoly on liquor sales at the retail level and among a sample of 50 license-state liquor stores, using their online-available prices. We calculated average prices for 74 brands of liquor by control versus license state. We used a random-effects regression model to estimate differences between control and license state prices-overall and by alcoholic beverage type. We also compared prices between the 13 control states. The overall mean price for the 74 brands was $27.79 in the license states [95% confidence interval (CI): $25.26-30.32] and $29.82 in the control states (95% CI: $26.98-32.66). Based on the random-effects linear regression model, the average liquor price was approximately $2 lower (6.9% lower) in license states. In the United States monopoly of alcohol retail outlets appears to be associated with slightly higher liquor prices. © 2012 The Authors, Addiction © 2012 Society for the Study of Addiction.
Possibility of modifying the growth trajectory in Raeini Cashmere goat.
Ghiasi, Heydar; Mokhtari, M S
2018-03-27
The objective of this study was to investigate the possibility of modifying the growth trajectory in Raeini Cashmere goat breed. In total, 13,193 records on live body weight collected from 4788 Raeini Cashmere goats were used. According to Akanke's information criterion (AIC), the sing-trait random regression model included fourth-order Legendre polynomial for direct and maternal genetic effect; maternal and individual permanent environmental effect was the best model for estimating (co)variance components. The matrices of eigenvectors for (co)variances between random regression coefficients of direct additive genetic were used to calculate eigenfunctions, and different eigenvector indices were also constructed. The obtained results showed that the first eigenvalue explained 79.90% of total genetic variance. Therefore, changing the body weights applying the first eigenfunction will be obtained rapidly. Selection based on the first eigenvector will cause favorable positive genetic gains for all body weight considered from birth to 12 months of age. For modifying the growth trajectory in Raeini Cashmere goat, the selection should be based on the second eigenfunction. The second eigenvalue accounted for 14.41% of total genetic variance for body weights that is low in comparison with genetic variance explained by the first eigenvalue. The complex patterns of genetic change in growth trajectory observed under the third and fourth eigenfunction and low amount of genetic variance explained by the third and fourth eigenvalues.
Siegel, Michael; DeJong, William; Albers, Alison B.; Naimi, Timothy S.; Jernigan, David H.
2012-01-01
Aims This study aims to compare the average price of liquor in the United States between retail alcohol outlets in states that have a monopoly ('control' states) with those that do not ('licence' states). Design A cross-sectional study of brand-specific alcohol prices in the United States. Setting We determined the average prices in February 2012 of 74 brands of liquor among the 13 control states that maintain a monopoly on liquor sales at the retail level and among a sample of 50 license-state liquor stores, using their online-available prices. Measurements We calculated average prices for 74 brands of liquor by control vs. license state. We used a random effects regression model to estimate differences between control and license state prices – overall and by alcoholic beverage type. We also compared prices between the 13 control states. Findings The overall mean price for the 74 brands was $27.79 in the license states (95% confidence interval [CI], $25.26–$30.32) and $29.82 in the control states (95% CI, $26.98–$32.66). Based on the random effects linear regression model, the average liquor price was approximately two dollars lower (6.9% lower) in license states. Conclusions In the United States monopoly of alcohol retail outlets appears to be associated with slightly higher liquor prices. PMID:22934914
Cao, Qingqing; Wu, Zhenqiang; Sun, Ying; Wang, Tiezhu; Han, Tengwei; Gu, Chaomei; Sun, Yehuan
2011-11-01
To Eexplore the application of negative binomial regression and modified Poisson regression analysis in analyzing the influential factors for injury frequency and the risk factors leading to the increase of injury frequency. 2917 primary and secondary school students were selected from Hefei by cluster random sampling method and surveyed by questionnaire. The data on the count event-based injuries used to fitted modified Poisson regression and negative binomial regression model. The risk factors incurring the increase of unintentional injury frequency for juvenile students was explored, so as to probe the efficiency of these two models in studying the influential factors for injury frequency. The Poisson model existed over-dispersion (P < 0.0001) based on testing by the Lagrangemultiplier. Therefore, the over-dispersion dispersed data using a modified Poisson regression and negative binomial regression model, was fitted better. respectively. Both showed that male gender, younger age, father working outside of the hometown, the level of the guardian being above junior high school and smoking might be the results of higher injury frequencies. On a tendency of clustered frequency data on injury event, both the modified Poisson regression analysis and negative binomial regression analysis can be used. However, based on our data, the modified Poisson regression fitted better and this model could give a more accurate interpretation of relevant factors affecting the frequency of injury.
Soil salinity study in Northern Great Plains sodium affected soil
NASA Astrophysics Data System (ADS)
Kharel, Tulsi P.
Climate and land-use changes when combined with the marine sediments that underlay portions of the Northern Great Plains have increased the salinization and sodification risks. The objectives of this dissertation were to compare three chemical amendments (calcium chloride, sulfuric acid and gypsum) remediation strategies on water permeability and sodium (Na) transport in undisturbed soil columns and to develop a remote sensing technique to characterize salinization in South Dakota soils. Forty-eight undisturbed soil columns (30 cm x 15 cm) collected from White Lake, Redfield, and Pierpont were used to assess the chemical remediation strategies. In this study the experimental design was a completely randomized design and each treatment was replicated four times. Following the application of chemical remediation strategies, 45.2 cm of water was leached through these columns. The leachate was separated into 120- ml increments and analyzed for Na and electrical conductivity (EC). Sulfuric acid increased Na leaching, whereas gypsum and CaCl2 increased water permeability. Our results further indicate that to maintain effective water permeability, ratio between soil EC and sodium absorption ratio (SAR) should be considered. In the second study, soil samples from 0-15 cm depth in 62 x 62 m grid spacing were taken from the South Dakota Pierpont (65 ha) and Redfield (17 ha) sites. Saturated paste EC was measured on each soil sample. At each sampling points reflectance and derived indices (Landsat 5, 7, 8 images), elevation, slope and aspect (LiDAR) were extracted. Regression models based on multiple linear regression, classification and regression tree, cubist, and random forest techniques were developed and their ability to predict soil EC were compared. Results showed that: 1) Random forest method was found to be the most effective method because of its ability to capture spatially correlated variation, 2) the short wave infrared (1.5 -2.29 mum) and near infrared (0.75-0.90 mum) were very sensitive to soil salinity; 3) EC prediction model using all 3 season (spring, summer and fall) images was better on state wide validation dataset compared to individual season model. Finally, in eastern South Dakota, the model predicted that from 2008 to 2012, EC increased in 569,165 ha or 13.4% of the land seeded to corn (Zea mays L.) or soybeans (Glycine max L).
Guerdoux, Estelle; Trouillet, Raphaël; Brouillet, Denis
2014-07-01
This study aimed to examine the age-related differences in the olfactory-visual cross-correspondences and the extent to which they are moderated by the odors pleasantness. Sixty participants aged from 20- to 75- years (young, middle-aged and older adults) performed a priming task to explore the influence of six olfactory primes (lemon, orange, rose, thyme, mint and fish) on the categorization (cool vs. warm) of six subsequent color targets (yellow, orange, pink, malachite green, grass-green, and blue-gray). We tested mixed effects models. Response times were regressed on covariates models using both fixed effects (Groups of age, olfactory Pleasantness and multimodal Condition) and cross-random effects (Subject, Color and Odor). The random effects coding for Odor (p < .001) and Color (p = .001) were significant. There was a significant interaction effect ( p= .004) between Condition × Pleasantness, but not with Groups of age. The compatibility effect (i.e., when odors and colors were congruent, the targets processing were facilitated) was as much enhanced as the olfactory primes were pleasant. Cross-correspondences between olfaction and vision may be robust in aging. They should be considered alongside spatiotemporal but also emotional congruency.
Miranda, J A; Pires, A V; Abreu, L R A; Mota, L F M; Silva, M A; Bonafé, C M; Lima, H J D; Martins, P G M A
2016-12-01
Our objective was to evaluate changes in breeding values for carcass traits of two meat-type quail (Coturnix coturnix) strains (LF1 and LF2) to changes in the dietary (methionine + cystine):lysine ([Met + Cys]:Lys) ratio due to genotype by environment (G × E) interaction via reaction norm. A total of 7000 records of carcass weight and yield were used for analyses. During the initial phase (from hatching to day 21), five diets with increasing (Met + Cys):Lys ratios (0.61, 0.66, 0.71, 0.76 and 0.81), containing 26.1% crude protein and 2900 kcal ME/kg, were evaluated. Analyses were performed using random regression models that included linear functions of sex (fixed effect) and breeding value (random effect) for carcass weight and yield, without and with heterogeneous residual variance adjustment. Both fixed and random effects were modelled using Legendre polynomials of second order. Genetic variance and heritability estimates were affected by both (Met + Cys):Lys ratio and strain. We observed that a G × E interaction was present, with changes in the breeding value ranking. Therefore, genetic evaluation for carcass traits should be performed under the same (Met + Cys):Lys ratio in which quails are raised. © 2016 Blackwell Verlag GmbH.
Lin, Chao-Cheng; Bai, Ya-Mei; Chen, Jen-Yeu; Hwang, Tzung-Jeng; Chen, Tzu-Ting; Chiu, Hung-Wen; Li, Yu-Chuan
2010-03-01
Metabolic syndrome (MetS) is an important side effect of second-generation antipsychotics (SGAs). However, many SGA-treated patients with MetS remain undetected. In this study, we trained and validated artificial neural network (ANN) and multiple logistic regression models without biochemical parameters to rapidly identify MetS in patients with SGA treatment. A total of 383 patients with a diagnosis of schizophrenia or schizoaffective disorder (DSM-IV criteria) with SGA treatment for more than 6 months were investigated to determine whether they met the MetS criteria according to the International Diabetes Federation. The data for these patients were collected between March 2005 and September 2005. The input variables of ANN and logistic regression were limited to demographic and anthropometric data only. All models were trained by randomly selecting two-thirds of the patient data and were internally validated with the remaining one-third of the data. The models were then externally validated with data from 69 patients from another hospital, collected between March 2008 and June 2008. The area under the receiver operating characteristic curve (AUC) was used to measure the performance of all models. Both the final ANN and logistic regression models had high accuracy (88.3% vs 83.6%), sensitivity (93.1% vs 86.2%), and specificity (86.9% vs 83.8%) to identify MetS in the internal validation set. The mean +/- SD AUC was high for both the ANN and logistic regression models (0.934 +/- 0.033 vs 0.922 +/- 0.035, P = .63). During external validation, high AUC was still obtained for both models. Waist circumference and diastolic blood pressure were the common variables that were left in the final ANN and logistic regression models. Our study developed accurate ANN and logistic regression models to detect MetS in patients with SGA treatment. The models are likely to provide a noninvasive tool for large-scale screening of MetS in this group of patients. (c) 2010 Physicians Postgraduate Press, Inc.
Kim, Dongchul; Kang, Mingon; Biswas, Ashis; Liu, Chunyu; Gao, Jean
2016-08-10
Inferring gene regulatory networks is one of the most interesting research areas in the systems biology. Many inference methods have been developed by using a variety of computational models and approaches. However, there are two issues to solve. First, depending on the structural or computational model of inference method, the results tend to be inconsistent due to innately different advantages and limitations of the methods. Therefore the combination of dissimilar approaches is demanded as an alternative way in order to overcome the limitations of standalone methods through complementary integration. Second, sparse linear regression that is penalized by the regularization parameter (lasso) and bootstrapping-based sparse linear regression methods were suggested in state of the art methods for network inference but they are not effective for a small sample size data and also a true regulator could be missed if the target gene is strongly affected by an indirect regulator with high correlation or another true regulator. We present two novel network inference methods based on the integration of three different criteria, (i) z-score to measure the variation of gene expression from knockout data, (ii) mutual information for the dependency between two genes, and (iii) linear regression-based feature selection. Based on these criterion, we propose a lasso-based random feature selection algorithm (LARF) to achieve better performance overcoming the limitations of bootstrapping as mentioned above. In this work, there are three main contributions. First, our z score-based method to measure gene expression variations from knockout data is more effective than similar criteria of related works. Second, we confirmed that the true regulator selection can be effectively improved by LARF. Lastly, we verified that an integrative approach can clearly outperform a single method when two different methods are effectively jointed. In the experiments, our methods were validated by outperforming the state of the art methods on DREAM challenge data, and then LARF was applied to inferences of gene regulatory network associated with psychiatric disorders.
Influence of Resistance Exercise on Lean Body Mass in Aging Adults: A Meta-Analysis
Peterson, Mark D.; Sen, Ananda; Gordon, Paul M.
2010-01-01
Purpose Sarcopenia plays a principal role in the pathogenesis of frailty and functional impairment that occurs with aging. There are few published accounts which examine the overall benefit of resistance exercise (RE) for lean body mass (LBM), while considering a continuum of dosage schemes and/or age ranges. Therefore the purpose of this meta-analysis was to determine the effects of RE on LBM in older men and women, while taking these factors into consideration. Methods This study followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses recommendations. Randomized controlled trials and randomized or non-randomized studies among adults ≥ 50 years, were included. Heterogeneity between studies was assessed using the Cochran Q and I2 statistics, and publication bias was evaluated through physical inspection of funnel plots as well as formal rank-correlation statistics. Mixed-effects meta-regression was incorporated to assess the relationship between RE dosage and changes in LBM. Results Data from forty-nine studies, representing a total of 1328 participants were pooled using random-effect models. Results demonstrated a positive effect for lean body mass and there was no evidence of publication bias. The Cochran Q statistic for heterogeneity was 497.8, which was significant (p < 0.01). Likewise, I2 was equal to 84%, representing rejection of the null hypothesis of homogeneity. The weighted pooled estimate of mean lean body mass change was 1.1 kg (95% CI, 0.9 kg to 1.2 kg). Meta-regression revealed that higher volume interventions were associated (β = 0.05, p < 0.01) with significantly greater increases in lean body mass, whereas older individuals experienced less increase (β = -0.03, p = 0.01). Conclusions RE is effective for eliciting gains in lean body mass among aging adults, particularly with higher volume programs. Findings suggest that RE participation earlier in life may provide superior effectiveness. PMID:20543750
Dietary interventions to prevent and manage diabetes in worksite settings: a meta-analysis.
Shrestha, Archana; Karmacharya, Biraj Man; Khudyakov, Polyna; Weber, Mary Beth; Spiegelman, Donna
2018-01-25
The translation of lifestyle intervention to improve glucose tolerance into the workplace has been rare. The objective of this meta-analysis is to summarize the evidence for the effectiveness of dietary interventions in worksite settings on lowering blood sugar levels. We searched for studies in PubMed, Embase, Econlit, Ovid, Cochrane, Web of Science, and Cumulative Index to Nursing and Allied Health Literature. Search terms were as follows: (1) Exposure-based: nutrition/diet/dietary intervention/health promotion/primary prevention/health behavior/health education/food /program evaluation; (2) Outcome-based: diabetes/hyperglycemia/glucose/HbA1c/glycated hemoglobin; and (3) Setting-based: workplace/worksite/occupational/industry/job/employee. We manually searched review articles and reference lists of articles identified from 1969 to December 2016. We tested for between-studies heterogeneity and calculated the pooled effect sizes for changes in HbA1c (%) and fasting glucose (mg/dl) using random effect models for meta-analysis in 2016. A total of 17 articles out of 1663 initially selected articles were included in the meta-analysis. With a random-effects model, worksite dietary interventions led to a pooled -0.18% (95% CI, -0.29 to -0.06; P<0.001) difference in HbA1c. With the random-effects model, the interventions resulted in 2.60 mg/dl lower fasting glucose with borderline significance (95% CI: -5.27 to 0.08, P=0.06). In the multivariate meta-regression model, the interventions with high percent of female participants and that used the intervention directly delivered to individuals, rather the environment changes, were associated with more effective interventions. Workplace dietary interventions can improve HbA1c. The effects were larger for the interventions with greater number of female participants and with individual-level interventions.
Effect of vitamins C and E on insulin resistance in diabetes: a meta-analysis study.
Khodaeian, Mehrnoosh; Tabatabaei-Malazy, Ozra; Qorbani, Mostafa; Farzadfar, Farshad; Amini, Peyvand; Larijani, Bagher
2015-11-01
Data regarding the effect of vitamin C (VC) and vitamin E (VE) supplementation on insulin resistance in type 2 diabetes mellitus (T2DM) are controversial. We aimed to systematically review the current data on this topic. All randomized controlled trials (RCTs) conducted to assess the effect of VC and/or VE on insulin resistance in diabetes published in Google Scholar and PubMed web databases until January 2014 were included. Exclusion criteria were studies conducted in animal, type 1 DM, children or pregnant women. Main outcome measure was insulin resistance by homoeostasis model assessment (HOMA) index. According to degree of heterogeneity, fixed- or random-effect model was employed by stata software (11.0). We selected 14 RCTs involving 735 patients with T2DM. VE or mixture-mode supplementation did not have any significant effect on HOMA with a standardized mean difference (SMD): 0·017, 95% CI: -0·376 to 0·411 (P = 0·932); and SMD: -0·035, 95% CI: -0·634 to 0·025 (P = 0·070), respectively, by random-effect model. VC supplement alone did not improve insulin resistance with a SMD: -0·150, 95% CI: -0·494 to 0·194 (P = 0·391), by fixed-effect model. Meta-regression test demonstrated that HOMA index may have not been influenced by the year of publication, dosage or duration of treatment. The sole intake of VC, VE or their combination with other antioxidants could not improve insulin resistance in diabetes. © 2015 Stichting European Society for Clinical Investigation Journal Foundation.
Generated effect modifiers (GEM’s) in randomized clinical trials
Petkova, Eva; Tarpey, Thaddeus; Su, Zhe; Ogden, R. Todd
2017-01-01
In a randomized clinical trial (RCT), it is often of interest not only to estimate the effect of various treatments on the outcome, but also to determine whether any patient characteristic has a different relationship with the outcome, depending on treatment. In regression models for the outcome, if there is a non-zero interaction between treatment and a predictor, that predictor is called an “effect modifier”. Identification of such effect modifiers is crucial as we move towards precision medicine, that is, optimizing individual treatment assignment based on patient measurements assessed when presenting for treatment. In most settings, there will be several baseline predictor variables that could potentially modify the treatment effects. This article proposes optimal methods of constructing a composite variable (defined as a linear combination of pre-treatment patient characteristics) in order to generate an effect modifier in an RCT setting. Several criteria are considered for generating effect modifiers and their performance is studied via simulations. An example from a RCT is provided for illustration. PMID:27465235
Development and validation of a mortality risk model for pediatric sepsis.
Chen, Mengshi; Lu, Xiulan; Hu, Li; Liu, Pingping; Zhao, Wenjiao; Yan, Haipeng; Tang, Liang; Zhu, Yimin; Xiao, Zhenghui; Chen, Lizhang; Tan, Hongzhuan
2017-05-01
Pediatric sepsis is a burdensome public health problem. Assessing the mortality risk of pediatric sepsis patients, offering effective treatment guidance, and improving prognosis to reduce mortality rates, are crucial.We extracted data derived from electronic medical records of pediatric sepsis patients that were collected during the first 24 hours after admission to the pediatric intensive care unit (PICU) of the Hunan Children's hospital from January 2012 to June 2014. A total of 788 children were randomly divided into a training (592, 75%) and validation group (196, 25%). The risk factors for mortality among these patients were identified by conducting multivariate logistic regression in the training group. Based on the established logistic regression equation, the logit probabilities for all patients (in both groups) were calculated to verify the model's internal and external validities.According to the training group, 6 variables (brain natriuretic peptide, albumin, total bilirubin, D-dimer, lactate levels, and mechanical ventilation in 24 hours) were included in the final logistic regression model. The areas under the curves of the model were 0.854 (0.826, 0.881) and 0.844 (0.816, 0.873) in the training and validation groups, respectively.The Mortality Risk Model for Pediatric Sepsis we established in this study showed acceptable accuracy to predict the mortality risk in pediatric sepsis patients.
Development and validation of a mortality risk model for pediatric sepsis
Chen, Mengshi; Lu, Xiulan; Hu, Li; Liu, Pingping; Zhao, Wenjiao; Yan, Haipeng; Tang, Liang; Zhu, Yimin; Xiao, Zhenghui; Chen, Lizhang; Tan, Hongzhuan
2017-01-01
Abstract Pediatric sepsis is a burdensome public health problem. Assessing the mortality risk of pediatric sepsis patients, offering effective treatment guidance, and improving prognosis to reduce mortality rates, are crucial. We extracted data derived from electronic medical records of pediatric sepsis patients that were collected during the first 24 hours after admission to the pediatric intensive care unit (PICU) of the Hunan Children's hospital from January 2012 to June 2014. A total of 788 children were randomly divided into a training (592, 75%) and validation group (196, 25%). The risk factors for mortality among these patients were identified by conducting multivariate logistic regression in the training group. Based on the established logistic regression equation, the logit probabilities for all patients (in both groups) were calculated to verify the model's internal and external validities. According to the training group, 6 variables (brain natriuretic peptide, albumin, total bilirubin, D-dimer, lactate levels, and mechanical ventilation in 24 hours) were included in the final logistic regression model. The areas under the curves of the model were 0.854 (0.826, 0.881) and 0.844 (0.816, 0.873) in the training and validation groups, respectively. The Mortality Risk Model for Pediatric Sepsis we established in this study showed acceptable accuracy to predict the mortality risk in pediatric sepsis patients. PMID:28514310
Ham, Joo-ho; Park, Hun-Young; Kim, Youn-ho; Bae, Sang-kon; Ko, Byung-hoon
2017-01-01
[Purpose] The purpose of this study was to develop a regression model to estimate the heart rate at the lactate threshold (HRLT) and the heart rate at the ventilatory threshold (HRVT) using the heart rate threshold (HRT), and to test the validity of the regression model. [Methods] We performed a graded exercise test with a treadmill in 220 normal individuals (men: 112, women: 108) aged 20–59 years. HRT, HRLT, and HRVT were measured in all subjects. A regression model was developed to estimate HRLT and HRVT using HRT with 70% of the data (men: 79, women: 76) through randomization (7:3), with the Bernoulli trial. The validity of the regression model developed with the remaining 30% of the data (men: 33, women: 32) was also examined. [Results] Based on the regression coefficient, we found that the independent variable HRT was a significant variable in all regression models. The adjusted R2 of the developed regression models averaged about 70%, and the standard error of estimation of the validity test results was 11 bpm, which is similar to that of the developed model. [Conclusion] These results suggest that HRT is a useful parameter for predicting HRLT and HRVT. PMID:29036765
Ham, Joo-Ho; Park, Hun-Young; Kim, Youn-Ho; Bae, Sang-Kon; Ko, Byung-Hoon; Nam, Sang-Seok
2017-09-30
The purpose of this study was to develop a regression model to estimate the heart rate at the lactate threshold (HRLT) and the heart rate at the ventilatory threshold (HRVT) using the heart rate threshold (HRT), and to test the validity of the regression model. We performed a graded exercise test with a treadmill in 220 normal individuals (men: 112, women: 108) aged 20-59 years. HRT, HRLT, and HRVT were measured in all subjects. A regression model was developed to estimate HRLT and HRVT using HRT with 70% of the data (men: 79, women: 76) through randomization (7:3), with the Bernoulli trial. The validity of the regression model developed with the remaining 30% of the data (men: 33, women: 32) was also examined. Based on the regression coefficient, we found that the independent variable HRT was a significant variable in all regression models. The adjusted R2 of the developed regression models averaged about 70%, and the standard error of estimation of the validity test results was 11 bpm, which is similar to that of the developed model. These results suggest that HRT is a useful parameter for predicting HRLT and HRVT. ©2017 The Korean Society for Exercise Nutrition
2015-01-07
vector that helps to manage , predict, and mitigate the risk in the original variable. Residual risk can be exemplified as a quantification of the improved... the random variable of interest is viewed in concert with a related random vector that helps to manage , predict, and mitigate the risk in the original...measures of risk. They view a random variable of interest in concert with an auxiliary random vector that helps to manage , predict and mitigate the risk
Threshold regression to accommodate a censored covariate.
Qian, Jing; Chiou, Sy Han; Maye, Jacqueline E; Atem, Folefac; Johnson, Keith A; Betensky, Rebecca A
2018-06-22
In several common study designs, regression modeling is complicated by the presence of censored covariates. Examples of such covariates include maternal age of onset of dementia that may be right censored in an Alzheimer's amyloid imaging study of healthy subjects, metabolite measurements that are subject to limit of detection censoring in a case-control study of cardiovascular disease, and progressive biomarkers whose baseline values are of interest, but are measured post-baseline in longitudinal neuropsychological studies of Alzheimer's disease. We propose threshold regression approaches for linear regression models with a covariate that is subject to random censoring. Threshold regression methods allow for immediate testing of the significance of the effect of a censored covariate. In addition, they provide for unbiased estimation of the regression coefficient of the censored covariate. We derive the asymptotic properties of the resulting estimators under mild regularity conditions. Simulations demonstrate that the proposed estimators have good finite-sample performance, and often offer improved efficiency over existing methods. We also derive a principled method for selection of the threshold. We illustrate the approach in application to an Alzheimer's disease study that investigated brain amyloid levels in older individuals, as measured through positron emission tomography scans, as a function of maternal age of dementia onset, with adjustment for other covariates. We have developed an R package, censCov, for implementation of our method, available at CRAN. © 2018, The International Biometric Society.
Li, Shao-Hua; Liu, Xu-Xia; Bai, Yong-Yi; Wang, Xiao-Jian; Sun, Kai; Chen, Jing-Zhou; Hui, Ru-Tai
2010-02-01
The effect of isoflavone on endothelial function in postmenopausal women is controversial. The objective of this study was to evaluate the effect of oral isoflavone supplementation on endothelial function, as measured by flow-mediated dilation (FMD), in postmenopausal women. A meta-analysis of randomized placebo-controlled trials was conducted to evaluate the effect of oral isoflavone supplementation on endothelial function in postmenopausal women. Trials were searched in PubMed, Embase, the Cochrane Library database, and reviews and reference lists of relevant articles. Summary estimates of weighted mean differences (WMDs) and 95% CIs were obtained by using random-effects models. Meta-regression and subgroup analyses were performed to identify the source of heterogeneity. A total of 9 trials were reviewed in the present meta-analysis. Overall, the results of the 9 trials showed that isoflavone significantly increased FMD (WMD: 1.75%; 95% CI: 0.83%, 2.67%; P = 0.0002). Meta-regression analysis indicated that the age-adjusted baseline FMD was inversely related to effect size. Subgroup analysis showed that oral supplementation of isoflavone had no influence on FMD if the age-adjusted baseline FMD was > or = 5.2% (4 trials; WMD: 0.24%; 95% CI: -0.94%, 1.42%; P = 0.69). This improvement seemed to be significant when the age-adjusted baseline FMD levels were <5.2% (5 trials; WMD: 2.22%; 95% CI: 1.15%, 3.30%; P < 0.0001), although significant heterogeneity was still detected in this low-baseline-FMD subgroup. Oral isoflavone supplementation does not improve endothelial function in postmenopausal women with high baseline FMD levels but leads to significant improvement in women with low baseline FMD levels.
Statistical power analyses using G*Power 3.1: tests for correlation and regression analyses.
Faul, Franz; Erdfelder, Edgar; Buchner, Axel; Lang, Albert-Georg
2009-11-01
G*Power is a free power analysis program for a variety of statistical tests. We present extensions and improvements of the version introduced by Faul, Erdfelder, Lang, and Buchner (2007) in the domain of correlation and regression analyses. In the new version, we have added procedures to analyze the power of tests based on (1) single-sample tetrachoric correlations, (2) comparisons of dependent correlations, (3) bivariate linear regression, (4) multiple linear regression based on the random predictor model, (5) logistic regression, and (6) Poisson regression. We describe these new features and provide a brief introduction to their scope and handling.
Use and interpretation of logistic regression in habitat-selection studies
Keating, Kim A.; Cherry, Steve
2004-01-01
Logistic regression is an important tool for wildlife habitat-selection studies, but the method frequently has been misapplied due to an inadequate understanding of the logistic model, its interpretation, and the influence of sampling design. To promote better use of this method, we review its application and interpretation under 3 sampling designs: random, case-control, and use-availability. Logistic regression is appropriate for habitat use-nonuse studies employing random sampling and can be used to directly model the conditional probability of use in such cases. Logistic regression also is appropriate for studies employing case-control sampling designs, but careful attention is required to interpret results correctly. Unless bias can be estimated or probability of use is small for all habitats, results of case-control studies should be interpreted as odds ratios, rather than probability of use or relative probability of use. When data are gathered under a use-availability design, logistic regression can be used to estimate approximate odds ratios if probability of use is small, at least on average. More generally, however, logistic regression is inappropriate for modeling habitat selection in use-availability studies. In particular, using logistic regression to fit the exponential model of Manly et al. (2002:100) does not guarantee maximum-likelihood estimates, valid probabilities, or valid likelihoods. We show that the resource selection function (RSF) commonly used for the exponential model is proportional to a logistic discriminant function. Thus, it may be used to rank habitats with respect to probability of use and to identify important habitat characteristics or their surrogates, but it is not guaranteed to be proportional to probability of use. Other problems associated with the exponential model also are discussed. We describe an alternative model based on Lancaster and Imbens (1996) that offers a method for estimating conditional probability of use in use-availability studies. Although promising, this model fails to converge to a unique solution in some important situations. Further work is needed to obtain a robust method that is broadly applicable to use-availability studies.
Hewitt, Angela L; Popa, Laurentiu S; Pasalar, Siavash; Hendrix, Claudia M; Ebner, Timothy J
2011-11-01
Encoding of movement kinematics in Purkinje cell simple spike discharge has important implications for hypotheses of cerebellar cortical function. Several outstanding questions remain regarding representation of these kinematic signals. It is uncertain whether kinematic encoding occurs in unpredictable, feedback-dependent tasks or kinematic signals are conserved across tasks. Additionally, there is a need to understand the signals encoded in the instantaneous discharge of single cells without averaging across trials or time. To address these questions, this study recorded Purkinje cell firing in monkeys trained to perform a manual random tracking task in addition to circular tracking and center-out reach. Random tracking provides for extensive coverage of kinematic workspaces. Direction and speed errors are significantly greater during random than circular tracking. Cross-correlation analyses comparing hand and target velocity profiles show that hand velocity lags target velocity during random tracking. Correlations between simple spike firing from 120 Purkinje cells and hand position, velocity, and speed were evaluated with linear regression models including a time constant, τ, as a measure of the firing lead/lag relative to the kinematic parameters. Across the population, velocity accounts for the majority of simple spike firing variability (63 ± 30% of R(adj)(2)), followed by position (28 ± 24% of R(adj)(2)) and speed (11 ± 19% of R(adj)(2)). Simple spike firing often leads hand kinematics. Comparison of regression models based on averaged vs. nonaveraged firing and kinematics reveals lower R(adj)(2) values for nonaveraged data; however, regression coefficients and τ values are highly similar. Finally, for most cells, model coefficients generated from random tracking accurately estimate simple spike firing in either circular tracking or center-out reach. These findings imply that the cerebellum controls movement kinematics, consistent with a forward internal model that predicts upcoming limb kinematics.
Nonconvex Sparse Logistic Regression With Weakly Convex Regularization
NASA Astrophysics Data System (ADS)
Shen, Xinyue; Gu, Yuantao
2018-06-01
In this work we propose to fit a sparse logistic regression model by a weakly convex regularized nonconvex optimization problem. The idea is based on the finding that a weakly convex function as an approximation of the $\\ell_0$ pseudo norm is able to better induce sparsity than the commonly used $\\ell_1$ norm. For a class of weakly convex sparsity inducing functions, we prove the nonconvexity of the corresponding sparse logistic regression problem, and study its local optimality conditions and the choice of the regularization parameter to exclude trivial solutions. Despite the nonconvexity, a method based on proximal gradient descent is used to solve the general weakly convex sparse logistic regression, and its convergence behavior is studied theoretically. Then the general framework is applied to a specific weakly convex function, and a necessary and sufficient local optimality condition is provided. The solution method is instantiated in this case as an iterative firm-shrinkage algorithm, and its effectiveness is demonstrated in numerical experiments by both randomly generated and real datasets.
Large unbalanced credit scoring using Lasso-logistic regression ensemble.
Wang, Hong; Xu, Qingsong; Zhou, Lifeng
2015-01-01
Recently, various ensemble learning methods with different base classifiers have been proposed for credit scoring problems. However, for various reasons, there has been little research using logistic regression as the base classifier. In this paper, given large unbalanced data, we consider the plausibility of ensemble learning using regularized logistic regression as the base classifier to deal with credit scoring problems. In this research, the data is first balanced and diversified by clustering and bagging algorithms. Then we apply a Lasso-logistic regression learning ensemble to evaluate the credit risks. We show that the proposed algorithm outperforms popular credit scoring models such as decision tree, Lasso-logistic regression and random forests in terms of AUC and F-measure. We also provide two importance measures for the proposed model to identify important variables in the data.
Churilov, Leonid
2018-01-01
The hemodynamic effects of intravenous (IV) paracetamol in patients undergoing cardiac surgery are unknown. We performed a prospective single center placebo controlled randomized study with parallel group design in adult patients undergoing elective cardiac surgery. Participants received paracetamol (1 gram) IV or placebo (an equal volume of 0.9% saline) preoperatively followed by two postoperative doses 6 hours apart. The primary endpoint was the absolute change in systolic (SBP) 30 minutes after the preoperative infusion, analysed using an ANCOVA model. Secondary endpoints included absolute changes in mean arterial pressure (MAP) and diastolic blood pressure (DPB), and other key hemodynamic variables after each infusion. All other endpoints were analysed using random-effect generalized least squares regression modelling with individual patients treated as random effects. Fifty participants were randomly assigned to receive paracetamol (n = 25) or placebo (n = 25). Post preoperative infusion, paracetamol decreased SBP by a mean (SD) of 13 (18) mmHg, p = 0.02, compared to a mean (SD) of 1 (11) mmHg with saline. Paracetamol decreased MAP and DBP by a mean (SD) of 9 (12) mmHg and 8 (9) mmHg (p = 0.01 and 0.02), respectively, compared to a mean (SD) of 1 (8) mmHg and 0 (6) mmHg with placebo. Postoperatively, there were no significant differences in pressure or flow based hemodynamic parameters in both groups. This study provides high quality evidence that the administration of IV paracetamol in patients undergoing cardiac surgery causes a transient decrease in preoperative blood pressure when administered before surgery but no adverse hemodynamic effects when administered in the postoperative setting. PMID:29659631
Chiam, Elizabeth; Bellomo, Rinaldo; Churilov, Leonid; Weinberg, Laurence
2018-01-01
The hemodynamic effects of intravenous (IV) paracetamol in patients undergoing cardiac surgery are unknown. We performed a prospective single center placebo controlled randomized study with parallel group design in adult patients undergoing elective cardiac surgery. Participants received paracetamol (1 gram) IV or placebo (an equal volume of 0.9% saline) preoperatively followed by two postoperative doses 6 hours apart. The primary endpoint was the absolute change in systolic (SBP) 30 minutes after the preoperative infusion, analysed using an ANCOVA model. Secondary endpoints included absolute changes in mean arterial pressure (MAP) and diastolic blood pressure (DPB), and other key hemodynamic variables after each infusion. All other endpoints were analysed using random-effect generalized least squares regression modelling with individual patients treated as random effects. Fifty participants were randomly assigned to receive paracetamol (n = 25) or placebo (n = 25). Post preoperative infusion, paracetamol decreased SBP by a mean (SD) of 13 (18) mmHg, p = 0.02, compared to a mean (SD) of 1 (11) mmHg with saline. Paracetamol decreased MAP and DBP by a mean (SD) of 9 (12) mmHg and 8 (9) mmHg (p = 0.01 and 0.02), respectively, compared to a mean (SD) of 1 (8) mmHg and 0 (6) mmHg with placebo. Postoperatively, there were no significant differences in pressure or flow based hemodynamic parameters in both groups. This study provides high quality evidence that the administration of IV paracetamol in patients undergoing cardiac surgery causes a transient decrease in preoperative blood pressure when administered before surgery but no adverse hemodynamic effects when administered in the postoperative setting.
Does higher education protect against obesity? Evidence using Mendelian randomization.
Böckerman, Petri; Viinikainen, Jutta; Pulkki-Råback, Laura; Hakulinen, Christian; Pitkänen, Niina; Lehtimäki, Terho; Pehkonen, Jaakko; Raitakari, Olli T
2017-08-01
The aim of this explorative study was to examine the effect of education on obesity using Mendelian randomization. Participants (N=2011) were from the on-going nationally representative Young Finns Study (YFS) that began in 1980 when six cohorts (aged 30, 33, 36, 39, 42 and 45 in 2007) were recruited. The average value of BMI (kg/m 2 ) measurements in 2007 and 2011 and genetic information were linked to comprehensive register-based information on the years of education in 2007. We first used a linear regression (Ordinary Least Squares, OLS) to estimate the relationship between education and BMI. To identify a causal relationship, we exploited Mendelian randomization and used a genetic score as an instrument for education. The genetic score was based on 74 genetic variants that genome-wide association studies (GWASs) have found to be associated with the years of education. Because the genotypes are randomly assigned at conception, the instrument causes exogenous variation in the years of education and thus enables identification of causal effects. The years of education in 2007 were associated with lower BMI in 2007/2011 (regression coefficient (b)=-0.22; 95% Confidence Intervals [CI]=-0.29, -0.14) according to the linear regression results. The results based on Mendelian randomization suggests that there may be a negative causal effect of education on BMI (b=-0.84; 95% CI=-1.77, 0.09). The findings indicate that education could be a protective factor against obesity in advanced countries. Copyright © 2017 Elsevier Inc. All rights reserved.
Effects of greening and community reuse of vacant lots on crime
Kondo, Michelle; Hohl, Bernadette; Han, SeungHoon; Branas, Charles
2016-01-01
The Youngstown Neighborhood Development Corporation initiated a ‘Lots of Green’ programme to reuse vacant land in 2010. We performed a difference-in-differences analysis of the effects of this programme on crime in and around newly treated lots, in comparison to crimes in and around randomly selected and matched, untreated vacant lot controls. The effects of two types of vacant lot treatments on crime were tested: a cleaning and greening ‘stabilisation’ treatment and a ‘community reuse’ treatment mostly involving community gardens. The combined effects of both types of vacant lot treatments were also tested. After adjustment for various sociodemographic factors, linear and Poisson regression models demonstrated statistically significant reductions in all crime classes for at least one lot treatment type. Regression models adjusted for spatial autocorrelation found the most consistent significant reductions in burglaries around stabilisation lots, and in assaults around community reuse lots. Spill-over crime reduction effects were found in contiguous areas around newly treated lots. Significant increases in motor vehicle thefts around both types of lots were also found after they had been greened. Community-initiated vacant lot greening may have a greater impact on reducing more serious, violent crimes. PMID:28529389
Efficient strategies for leave-one-out cross validation for genomic best linear unbiased prediction.
Cheng, Hao; Garrick, Dorian J; Fernando, Rohan L
2017-01-01
A random multiple-regression model that simultaneously fit all allele substitution effects for additive markers or haplotypes as uncorrelated random effects was proposed for Best Linear Unbiased Prediction, using whole-genome data. Leave-one-out cross validation can be used to quantify the predictive ability of a statistical model. Naive application of Leave-one-out cross validation is computationally intensive because the training and validation analyses need to be repeated n times, once for each observation. Efficient Leave-one-out cross validation strategies are presented here, requiring little more effort than a single analysis. Efficient Leave-one-out cross validation strategies is 786 times faster than the naive application for a simulated dataset with 1,000 observations and 10,000 markers and 99 times faster with 1,000 observations and 100 markers. These efficiencies relative to the naive approach using the same model will increase with increases in the number of observations. Efficient Leave-one-out cross validation strategies are presented here, requiring little more effort than a single analysis.
Rivera, Margarita; Locke, Adam E.; Corre, Tanguy; Czamara, Darina; Wolf, Christiane; Ching-Lopez, Ana; Milaneschi, Yuri; Kloiber, Stefan; Cohen-Woods, Sara; Rucker, James; Aitchison, Katherine J.; Bergmann, Sven; Boomsma, Dorret I.; Craddock, Nick; Gill, Michael; Holsboer, Florian; Hottenga, Jouke-Jan; Korszun, Ania; Kutalik, Zoltan; Lucae, Susanne; Maier, Wolfgang; Mors, Ole; Müller-Myhsok, Bertram; Owen, Michael J.; Penninx, Brenda W. J. H.; Preisig, Martin; Rice, John; Rietschel, Marcella; Tozzi, Federica; Uher, Rudolf; Vollenweider, Peter; Waeber, Gerard; Willemsen, Gonneke; Craig, Ian W.; Farmer, Anne E.; Lewis, Cathryn M.; Breen, Gerome; McGuffin, Peter
2017-01-01
Background Depression and obesity are highly prevalent, and major impacts on public health frequently co-occur. Recently, we reported that having depression moderates the effect of the FTO gene, suggesting its implication in the association between depression and obesity. Aims To confirm these findings by investigating the FTO polymorphism rs9939609 in new cohorts, and subsequently in a meta-analysis. Method The sample consists of 6902 individuals with depression and 6799 controls from three replication cohorts and two original discovery cohorts. Linear regression models were performed to test for association between rs9939609 and body mass index (BMI), and for the interaction between rs9939609 and depression status for an effect on BMI. Fixed and random effects meta-analyses were performed using METASOFT. Results In the replication cohorts, we observed a significant interaction between FTO, BMI and depression with fixed effects meta-analysis (β = 0.12, P = 2.7 × 10−4) and with the Han/Eskin random effects method (P = 1.4 × 10−7) but not with traditional random effects (β = 0.1, P = 0.35). When combined with the discovery cohorts, random effects meta-analysis also supports the interaction (β = 0.12, P = 0.027) being highly significant based on the Han/Eskin model (P = 6.9 × 10−8). On average, carriers of the risk allele who have depression have a 2.2% higher BMI for each risk allele, over and above the main effect of FTO. Conclusions This meta-analysis provides additional support for a significant interaction between FTO, depression and BMI, indicating that depression increases the effect of FTO on BMI. The findings provide a useful starting point in understanding the biological mechanism involved in the association between obesity and depression. PMID:28642257
Gonzalez-Mulé, Erik; DeGeest, David S; McCormick, Brian W; Seong, Jee Young; Brown, Kenneth G
2014-09-01
Drawing on the group-norms theory of organizational citizenship behaviors and person-environment fit theory, we introduce and test a multilevel model of the effects of additive and dispersion composition models of team members' personality characteristics on group norms and individual helping behaviors. Our model was tested using regression and random coefficients modeling on 102 research and development teams. Results indicated that high mean levels of extraversion are positively related to individual helping behaviors through the mediating effect of cooperative group norms. Further, low variance on agreeableness (supplementary fit) and high variance on extraversion (complementary fit) promote the enactment of individual helping behaviors, but only the effects of extraversion were mediated by cooperative group norms. Implications of these findings for theories of helping behaviors in teams are discussed. PsycINFO Database Record (c) 2014 APA, all rights reserved.
Broderick, Joseph P; Berkhemer, Olvert A; Palesch, Yuko Y; Dippel, Diederik W J; Foster, Lydia D; Roos, Yvo B W E M; van der Lugt, Aad; Tomsick, Thomas A; Majoie, Charles B L M; van Zwam, Wim H; Demchuk, Andrew M; van Oostenbrugge, Robert J; Khatri, Pooja; Lingsma, Hester F; Hill, Michael D; Roozenbeek, Bob; Jauch, Edward C; Jovin, Tudor G; Yan, Bernard; von Kummer, Rüdiger; Molina, Carlos A; Goyal, Mayank; Schonewille, Wouter J; Mazighi, Mikael; Engelter, Stefan T; Anderson, Craig S; Spilker, Judith; Carrozzella, Janice; Ryckborst, Karla J; Janis, L Scott; Simpson, Kit N
2015-12-01
We assessed the effect of endovascular treatment in acute ischemic stroke patients with severe neurological deficit (National Institutes of Health Stroke Scale score, ≥20) after a prespecified analysis plan. The pooled analysis of the Interventional Management of Stroke III (IMS III) and Multicenter Randomized Clinical Trial of Endovascular Therapy for Acute Ischemic Stroke in the Netherlands (MR CLEAN) trials included participants with an National Institutes of Health Stroke Scale score of ≥20 before intravenous tissue-type plasminogen activator (tPA) treatment (IMS III) or randomization (MR CLEAN) who were treated with intravenous tPA ≤3 hours of stroke onset. Our hypothesis was that participants with severe stroke randomized to endovascular therapy after intravenous tPA would have improved 90-day outcome (distribution of modified Rankin Scale scores), when compared with those who received intravenous tPA alone. Among 342 participants in the pooled analysis (194 from IMS III and 148 from MR CLEAN), an ordinal logistic regression model showed that the endovascular group had superior 90-day outcome compared with the intravenous tPA group (adjusted odds ratio, 1.78; 95% confidence interval, 1.20-2.66). In the logistic regression model of the dichotomous outcome (modified Rankin Scale score, 0-2, or functional independence), the endovascular group had superior outcomes (adjusted odds ratio, 1.97; 95% confidence interval, 1.09-3.56). Functional independence (modified Rankin Scale score, ≤2) at 90 days was 25% in the endovascular group when compared with 14% in the intravenous tPA group. Endovascular therapy after intravenous tPA within 3 hours of symptom onset improves functional outcome at 90 days after severe ischemic stroke. URL: http://www.clinicaltrials.gov. Unique identifier: NCT00359424 (IMS III) and ISRCTN10888758 (MR CLEAN). © 2015 American Heart Association, Inc.
Bignardi, A B; El Faro, L; Rosa, G J M; Cardoso, V L; Machado, P F; Albuquerque, L G
2012-04-01
A total of 46,089 individual monthly test-day (TD) milk yields (10 test-days), from 7,331 complete first lactations of Holstein cattle were analyzed. A standard multivariate analysis (MV), reduced rank analyses fitting the first 2, 3, and 4 genetic principal components (PC2, PC3, PC4), and analyses that fitted a factor analytic structure considering 2, 3, and 4 factors (FAS2, FAS3, FAS4), were carried out. The models included the random animal genetic effect and fixed effects of the contemporary groups (herd-year-month of test-day), age of cow (linear and quadratic effects), and days in milk (linear effect). The residual covariance matrix was assumed to have full rank. Moreover, 2 random regression models were applied. Variance components were estimated by restricted maximum likelihood method. The heritability estimates ranged from 0.11 to 0.24. The genetic correlation estimates between TD obtained with the PC2 model were higher than those obtained with the MV model, especially on adjacent test-days at the end of lactation close to unity. The results indicate that for the data considered in this study, only 2 principal components are required to summarize the bulk of genetic variation among the 10 traits. Copyright © 2012 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Ferrer, Assumpta; Formiga, Francesc; Sanz, Héctor; de Vries, Oscar J; Badia, Teresa; Pujol, Ramón
2014-01-01
Background The purpose of this study was to assess the effectiveness of a multifactorial intervention to reduce falls among the oldest-old people, including individuals with cognitive impairment or comorbidities. Methods A randomized, single-blind, parallel-group clinical trial was conducted from January 2009 to December 2010 in seven primary health care centers in Baix Llobregat (Barcelona). Of 696 referred people who were born in 1924, 328 were randomized to an intervention group or a control group. The intervention model used an algorithm and was multifaceted for both patients and their primary care providers. Primary outcomes were risk of falling and time until falls. Data analyses were by intention-to-treat. Results Sixty-five (39.6%) subjects in the intervention group and 48 (29.3%) in the control group fell during follow-up. The difference in the risk of falls was not significant (relative risk 1.28, 95% confidence interval [CI] 0.94–1.75). Cox regression models with time from randomization to the first fall were not significant. Cox models for recurrent falls showed that intervention had a negative effect (hazard ratio [HR] 1.46, 95% CI 1.03–2.09) and that functional impairment (HR 1.42, 95% CI 0.97–2.12), previous falls (HR 1.09, 95% CI 0.74–1.60), and cognitive impairment (HR 1.08, 95% CI 0.72–1.60) had no effect on the assessment. Conclusion This multifactorial intervention among octogenarians, including individuals with cognitive impairment or comorbidities, did not result in a reduction in falls. A history of previous falls, disability, and cognitive impairment had no effect on the program among the community-dwelling subjects in this study. PMID:24596458
NeCamp, Timothy; Kilbourne, Amy; Almirall, Daniel
2017-08-01
Cluster-level dynamic treatment regimens can be used to guide sequential treatment decision-making at the cluster level in order to improve outcomes at the individual or patient-level. In a cluster-level dynamic treatment regimen, the treatment is potentially adapted and re-adapted over time based on changes in the cluster that could be impacted by prior intervention, including aggregate measures of the individuals or patients that compose it. Cluster-randomized sequential multiple assignment randomized trials can be used to answer multiple open questions preventing scientists from developing high-quality cluster-level dynamic treatment regimens. In a cluster-randomized sequential multiple assignment randomized trial, sequential randomizations occur at the cluster level and outcomes are observed at the individual level. This manuscript makes two contributions to the design and analysis of cluster-randomized sequential multiple assignment randomized trials. First, a weighted least squares regression approach is proposed for comparing the mean of a patient-level outcome between the cluster-level dynamic treatment regimens embedded in a sequential multiple assignment randomized trial. The regression approach facilitates the use of baseline covariates which is often critical in the analysis of cluster-level trials. Second, sample size calculators are derived for two common cluster-randomized sequential multiple assignment randomized trial designs for use when the primary aim is a between-dynamic treatment regimen comparison of the mean of a continuous patient-level outcome. The methods are motivated by the Adaptive Implementation of Effective Programs Trial which is, to our knowledge, the first-ever cluster-randomized sequential multiple assignment randomized trial in psychiatry.
Random regression analyses using B-splines to model growth of Australian Angus cattle
Meyer, Karin
2005-01-01
Regression on the basis function of B-splines has been advocated as an alternative to orthogonal polynomials in random regression analyses. Basic theory of splines in mixed model analyses is reviewed, and estimates from analyses of weights of Australian Angus cattle from birth to 820 days of age are presented. Data comprised 84 533 records on 20 731 animals in 43 herds, with a high proportion of animals with 4 or more weights recorded. Changes in weights with age were modelled through B-splines of age at recording. A total of thirteen analyses, considering different combinations of linear, quadratic and cubic B-splines and up to six knots, were carried out. Results showed good agreement for all ages with many records, but fluctuated where data were sparse. On the whole, analyses using B-splines appeared more robust against "end-of-range" problems and yielded more consistent and accurate estimates of the first eigenfunctions than previous, polynomial analyses. A model fitting quadratic B-splines, with knots at 0, 200, 400, 600 and 821 days and a total of 91 covariance components, appeared to be a good compromise between detailedness of the model, number of parameters to be estimated, plausibility of results, and fit, measured as residual mean square error. PMID:16093011
Evaluating differential effects using regression interactions and regression mixture models
Van Horn, M. Lee; Jaki, Thomas; Masyn, Katherine; Howe, George; Feaster, Daniel J.; Lamont, Andrea E.; George, Melissa R. W.; Kim, Minjung
2015-01-01
Research increasingly emphasizes understanding differential effects. This paper focuses on understanding regression mixture models, a relatively new statistical methods for assessing differential effects by comparing results to using an interactive term in linear regression. The research questions which each model answers, their formulation, and their assumptions are compared using Monte Carlo simulations and real data analysis. The capabilities of regression mixture models are described and specific issues to be addressed when conducting regression mixtures are proposed. The paper aims to clarify the role that regression mixtures can take in the estimation of differential effects and increase awareness of the benefits and potential pitfalls of this approach. Regression mixture models are shown to be a potentially effective exploratory method for finding differential effects when these effects can be defined by a small number of classes of respondents who share a typical relationship between a predictor and an outcome. It is also shown that the comparison between regression mixture models and interactions becomes substantially more complex as the number of classes increases. It is argued that regression interactions are well suited for direct tests of specific hypotheses about differential effects and regression mixtures provide a useful approach for exploring effect heterogeneity given adequate samples and study design. PMID:26556903
Santana, Mário L; Bignardi, Annaiza Braga; Pereira, Rodrigo Junqueira; Menéndez-Buxadera, Alberto; El Faro, Lenira
2016-02-01
The present study had the following objectives: to compare random regression models (RRM) considering the time-dependent (days in milk, DIM) and/or temperature × humidity-dependent (THI) covariate for genetic evaluation; to identify the effect of genotype by environment interaction (G×E) due to heat stress on milk yield; and to quantify the loss of milk yield due to heat stress across lactation of cows under tropical conditions. A total of 937,771 test-day records from 3603 first lactations of Brazilian Holstein cows obtained between 2007 and 2013 were analyzed. An important reduction in milk yield due to heat stress was observed for THI values above 66 (-0.23 kg/day/THI). Three phases of milk yield loss were identified during lactation, the most damaging one at the end of lactation (-0.27 kg/day/THI). Using the most complex RRM, the additive genetic variance could be altered simultaneously as a function of both DIM and THI values. This model could be recommended for the genetic evaluation taking into account the effect of G×E. The response to selection in the comfort zone (THI ≤ 66) is expected to be higher than that obtained in the heat stress zone (THI > 66) of the animals. The genetic correlations between milk yield in the comfort and heat stress zones were less than unity at opposite extremes of the environmental gradient. Thus, the best animals for milk yield in the comfort zone are not necessarily the best in the zone of heat stress and, therefore, G×E due to heat stress should not be neglected in the genetic evaluation.
Rotolo, Federico; Paoletti, Xavier; Michiels, Stefan
2018-03-01
Surrogate endpoints are attractive for use in clinical trials instead of well-established endpoints because of practical convenience. To validate a surrogate endpoint, two important measures can be estimated in a meta-analytic context when individual patient data are available: the R indiv 2 or the Kendall's τ at the individual level, and the R trial 2 at the trial level. We aimed at providing an R implementation of classical and well-established as well as more recent statistical methods for surrogacy assessment with failure time endpoints. We also intended incorporating utilities for model checking and visualization and data generating methods described in the literature to date. In the case of failure time endpoints, the classical approach is based on two steps. First, a Kendall's τ is estimated as measure of individual level surrogacy using a copula model. Then, the R trial 2 is computed via a linear regression of the estimated treatment effects; at this second step, the estimation uncertainty can be accounted for via measurement-error model or via weights. In addition to the classical approach, we recently developed an approach based on bivariate auxiliary Poisson models with individual random effects to measure the Kendall's τ and treatment-by-trial interactions to measure the R trial 2 . The most common data simulation models described in the literature are based on: copula models, mixed proportional hazard models, and mixture of half-normal and exponential random variables. The R package surrosurv implements the classical two-step method with Clayton, Plackett, and Hougaard copulas. It also allows to optionally adjusting the second-step linear regression for measurement-error. The mixed Poisson approach is implemented with different reduced models in addition to the full model. We present the package functions for estimating the surrogacy models, for checking their convergence, for performing leave-one-trial-out cross-validation, and for plotting the results. We illustrate their use in practice on individual patient data from a meta-analysis of 4069 patients with advanced gastric cancer from 20 trials of chemotherapy. The surrosurv package provides an R implementation of classical and recent statistical methods for surrogacy assessment of failure time endpoints. Flexible simulation functions are available to generate data according to the methods described in the literature. Copyright © 2017 Elsevier B.V. All rights reserved.
Computer Simulation of Human Service Program Evaluations.
ERIC Educational Resources Information Center
Trochim, William M. K.; Davis, James E.
1985-01-01
Describes uses of computer simulations for the context of human service program evaluation. Presents simple mathematical models for most commonly used human service outcome evaluation designs (pretest-posttest randomized experiment, pretest-posttest nonequivalent groups design, and regression-discontinuity design). Translates models into single…
Taylor, Jeremy M G; Cheng, Wenting; Foster, Jared C
2015-03-01
A recent article (Zhang et al., 2012, Biometrics 168, 1010-1018) compares regression based and inverse probability based methods of estimating an optimal treatment regime and shows for a small number of covariates that inverse probability weighted methods are more robust to model misspecification than regression methods. We demonstrate that using models that fit the data better reduces the concern about non-robustness for the regression methods. We extend the simulation study of Zhang et al. (2012, Biometrics 168, 1010-1018), also considering the situation of a larger number of covariates, and show that incorporating random forests into both regression and inverse probability weighted based methods improves their properties. © 2014, The International Biometric Society.
Simental-Mendia, Luis E; Pirro, Matteo; Atkin, Stephen L; Banach, Maciej; Mikhailidis, Dimitri P; Sahebkar, Amirhossein
2018-01-01
Fibrinogen is a key mediator of thrombosis and it has been implicated in the pathogenesis of atherosclerosis. Because metformin has shown a potential protective effect on different atherothrombotic risk factors, we assessed in this meta-analysis its effect on plasma fibrinogen concentrations. A systematic review and meta-analysis was carried out to identify randomized placebo-controlled trials evaluating the effect of metformin administration on fibrinogen levels. The search included PubMed-Medline, Scopus, ISI Web of Knowledge and Google Scholar databases (by June 2, 2017) and quality of studies was performed according to Cochrane criteria. Quantitative data synthesis was conducted using a random-effects model and sensitivity analysis by the leave-one-out method. Meta-regression analysis was performed to assess the modifiers of treatment response. Meta-analysis of data from 9 randomized placebo-controlled clinical trials with 2302 patients comprising 10 treatment arms did not suggest a significant change in plasma fibrinogen concentrations following metformin therapy (WMD: -0.25 g/L, 95% CI: -0.53, 0.04, p = 0.092). The effect size was robust in the leave-one-out sensitivity analysis and remained non-significant after omission of each single study from the meta-analysis. No significant effect of metformin on plasma fibrinogen concentrations was demonstrated in the current meta-analysis. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
The prediction of food additives in the fruit juice based on electronic nose with chemometrics.
Qiu, Shanshan; Wang, Jun
2017-09-01
Food additives are added to products to enhance their taste, and preserve flavor or appearance. While their use should be restricted to achieve a technological benefit, the contents of food additives should be also strictly controlled. In this study, E-nose was applied as an alternative to traditional monitoring technologies for determining two food additives, namely benzoic acid and chitosan. For quantitative monitoring, support vector machine (SVM), random forest (RF), extreme learning machine (ELM) and partial least squares regression (PLSR) were applied to establish regression models between E-nose signals and the amount of food additives in fruit juices. The monitoring models based on ELM and RF reached higher correlation coefficients (R 2 s) and lower root mean square errors (RMSEs) than models based on PLSR and SVM. This work indicates that E-nose combined with RF or ELM can be a cost-effective, easy-to-build and rapid detection system for food additive monitoring. Copyright © 2017 Elsevier Ltd. All rights reserved.
Wynn, Jonathan K.; Green, Michael F.; Sprock, Joyce; Light, Gregory A.; Widmark, Clifford; Reist, Christopher; Erhart, Stephen; Marder, Stephen R.; Mintz, Jim; Braff, David L.
2009-01-01
Prepulse inhibition (PPI), whereby the startle eyeblink response is inhibited by a relatively weak non-startling stimulus preceding the powerful startle eliciting stimulus, is a measure of sensorimotor gating and has been shown to be deficient in schizophrenia patients. There is considerable interest in whether conventional and/or atypical antipsychotic medications can “normalize” PPI deficits in schizophrenia patients. 51 schizophrenia patients participated in a randomized, double-blind controlled trial on the effects of three commonly-prescribed antipsychotic medications (risperidone, olanzapine, or haloperidol) on PPI, startle habituation, and startle reactivity. Patients were tested at baseline, Week 4 and Week 8. Mixed model regression analyses revealed that olanzapine significantly improved PPI from Week 4 to Week 8, and that at Week 8 patients receiving olanzapine produced significantly greater PPI than those receiving risperidone, but not haloperidol. There were no effects of medication on startle habituation or startle reactivity. These results support the conclusion that olanzapine effectively increased PPI in schizophrenia patients, but that risperidone and haloperidol had no such effects. The results are discussed in terms of animal models, neural substrates, and treatment implications. PMID:17662577
Effect of latitude on the rate of change in incidence of Lyme disease in the United States
Tuite, Ashleigh R.; Greer, Amy L.
2013-01-01
Background Tick-borne illnesses represent an important class of emerging zoonoses, with climate change projected to increase the geographic range within which tick-borne zoonoses might become endemic. We evaluated the impact of latitude on the rate of change in the incidence of Lyme disease in the United States, using publicly available data. Methods We estimated state-level year-on-year incidence rate ratios (IRRs) for Lyme disease for the period 1993 to 2007 using Poisson regression methods. We evaluated between-state heterogeneity in IRRs using a random-effects meta-analytic approach. We identified state-level characteristics associated with increasing incidence using random-effects meta-regression. Results The incidence of Lyme disease in the US increased by about 80% between 1993 and 2007 (IRR per year 1.049, 95% CI [confidence interval] 1.048 to 1.050). There was marked between-state heterogeneity in the average incidence of Lyme disease, ranging from 0.008 per 100 000 person-years in Colorado to 75 per 100 000 in Connecticut, and significant between-state heterogeneity in temporal trends (p < 0.001). In multivariable meta-regression models, increasing incidence showed a linear association with state latitude and population density. These 2 factors explained 27% of the between-state variation in IRRs. No independent association was identified for other state-level characteristics. Interpretation Lyme disease incidence increased in the US as a whole during the study period, but the changes were not uniform. Marked increases were identified in northern-most states, whereas southern states experienced stable or declining rates of Lyme disease. PMID:25077101
Decision tree modeling using R.
Zhang, Zhongheng
2016-08-01
In machine learning field, decision tree learner is powerful and easy to interpret. It employs recursive binary partitioning algorithm that splits the sample in partitioning variable with the strongest association with the response variable. The process continues until some stopping criteria are met. In the example I focus on conditional inference tree, which incorporates tree-structured regression models into conditional inference procedures. While growing a single tree is subject to small changes in the training data, random forests procedure is introduced to address this problem. The sources of diversity for random forests come from the random sampling and restricted set of input variables to be selected. Finally, I introduce R functions to perform model based recursive partitioning. This method incorporates recursive partitioning into conventional parametric model building.
Box–Cox Transformation and Random Regression Models for Fecal egg Count Data
da Silva, Marcos Vinícius Gualberto Barbosa; Van Tassell, Curtis P.; Sonstegard, Tad S.; Cobuci, Jaime Araujo; Gasbarre, Louis C.
2012-01-01
Accurate genetic evaluation of livestock is based on appropriate modeling of phenotypic measurements. In ruminants, fecal egg count (FEC) is commonly used to measure resistance to nematodes. FEC values are not normally distributed and logarithmic transformations have been used in an effort to achieve normality before analysis. However, the transformed data are often still not normally distributed, especially when data are extremely skewed. A series of repeated FEC measurements may provide information about the population dynamics of a group or individual. A total of 6375 FEC measures were obtained for 410 animals between 1992 and 2003 from the Beltsville Agricultural Research Center Angus herd. Original data were transformed using an extension of the Box–Cox transformation to approach normality and to estimate (co)variance components. We also proposed using random regression models (RRM) for genetic and non-genetic studies of FEC. Phenotypes were analyzed using RRM and restricted maximum likelihood. Within the different orders of Legendre polynomials used, those with more parameters (order 4) adjusted FEC data best. Results indicated that the transformation of FEC data utilizing the Box–Cox transformation family was effective in reducing the skewness and kurtosis, and dramatically increased estimates of heritability, and measurements of FEC obtained in the period between 12 and 26 weeks in a 26-week experimental challenge period are genetically correlated. PMID:22303406
Box-Cox Transformation and Random Regression Models for Fecal egg Count Data.
da Silva, Marcos Vinícius Gualberto Barbosa; Van Tassell, Curtis P; Sonstegard, Tad S; Cobuci, Jaime Araujo; Gasbarre, Louis C
2011-01-01
Accurate genetic evaluation of livestock is based on appropriate modeling of phenotypic measurements. In ruminants, fecal egg count (FEC) is commonly used to measure resistance to nematodes. FEC values are not normally distributed and logarithmic transformations have been used in an effort to achieve normality before analysis. However, the transformed data are often still not normally distributed, especially when data are extremely skewed. A series of repeated FEC measurements may provide information about the population dynamics of a group or individual. A total of 6375 FEC measures were obtained for 410 animals between 1992 and 2003 from the Beltsville Agricultural Research Center Angus herd. Original data were transformed using an extension of the Box-Cox transformation to approach normality and to estimate (co)variance components. We also proposed using random regression models (RRM) for genetic and non-genetic studies of FEC. Phenotypes were analyzed using RRM and restricted maximum likelihood. Within the different orders of Legendre polynomials used, those with more parameters (order 4) adjusted FEC data best. Results indicated that the transformation of FEC data utilizing the Box-Cox transformation family was effective in reducing the skewness and kurtosis, and dramatically increased estimates of heritability, and measurements of FEC obtained in the period between 12 and 26 weeks in a 26-week experimental challenge period are genetically correlated.
Zhao, Zeng-hui; Wang, Wei-ming; Gao, Xin; Yan, Ji-xing
2013-01-01
According to the geological characteristics of Xinjiang Ili mine in western area of China, a physical model of interstratified strata composed of soft rock and hard coal seam was established. Selecting the tunnel position, deformation modulus, and strength parameters of each layer as influencing factors, the sensitivity coefficient of roadway deformation to each parameter was firstly analyzed based on a Mohr-Columb strain softening model and nonlinear elastic-plastic finite element analysis. Then the effect laws of influencing factors which showed high sensitivity were further discussed. Finally, a regression model for the relationship between roadway displacements and multifactors was obtained by equivalent linear regression under multiple factors. The results show that the roadway deformation is highly sensitive to the depth of coal seam under the floor which should be considered in the layout of coal roadway; deformation modulus and strength of coal seam and floor have a great influence on the global stability of tunnel; on the contrary, roadway deformation is not sensitive to the mechanical parameters of soft roof; roadway deformation under random combinations of multi-factors can be deduced by the regression model. These conclusions provide theoretical significance to the arrangement and stability maintenance of coal roadway. PMID:24459447
Estimation of retinal vessel caliber using model fitting and random forests
NASA Astrophysics Data System (ADS)
Araújo, Teresa; Mendonça, Ana Maria; Campilho, Aurélio
2017-03-01
Retinal vessel caliber changes are associated with several major diseases, such as diabetes and hypertension. These caliber changes can be evaluated using eye fundus images. However, the clinical assessment is tiresome and prone to errors, motivating the development of automatic methods. An automatic method based on vessel crosssection intensity profile model fitting for the estimation of vessel caliber in retinal images is herein proposed. First, vessels are segmented from the image, vessel centerlines are detected and individual segments are extracted and smoothed. Intensity profiles are extracted perpendicularly to the vessel, and the profile lengths are determined. Then, model fitting is applied to the smoothed profiles. A novel parametric model (DoG-L7) is used, consisting on a Difference-of-Gaussians multiplied by a line which is able to describe profile asymmetry. Finally, the parameters of the best-fit model are used for determining the vessel width through regression using ensembles of bagged regression trees with random sampling of the predictors (random forests). The method is evaluated on the REVIEW public dataset. A precision close to the observers is achieved, outperforming other state-of-the-art methods. The method is robust and reliable for width estimation in images with pathologies and artifacts, with performance independent of the range of diameters.
Luck, Tobias; Motzek, Tom; Luppa, Melanie; Matschinger, Herbert; Fleischer, Steffen; Sesselmann, Yves; Roling, Gudrun; Beutner, Katrin; König, Hans-Helmut; Behrens, Johann; Riedel-Heller, Steffi G
2013-01-01
Background Falls in older people are a major public health issue, but the underlying causes are complex. We sought to evaluate the effectiveness of preventive home visits as a multifactorial, individualized strategy to reduce falls in community-dwelling older people. Methods Data were derived from a prospective randomized controlled trial with follow-up examination after 18 months. Two hundred and thirty participants (≥80 years of age) with functional impairment were randomized to intervention and control groups. The intervention group received up to three preventive home visits including risk assessment, home counseling intervention, and a booster session. The control group received no preventive home visits. Structured interviews at baseline and follow-up provided information concerning falls in both study groups. Random-effects Poisson regression evaluated the effect of preventive home visits on the number of falls controlling for covariates. Results Random-effects Poisson regression showed a significant increase in the number of falls between baseline and follow-up in the control group (incidence rate ratio 1.96) and a significant decrease in the intervention group (incidence rate ratio 0.63) controlling for age, sex, family status, level of care, and impairment in activities of daily living. Conclusion Our results indicate that a preventive home visiting program can be effective in reducing falls in community-dwelling older people. PMID:23788832
Preisser, John S; Long, D Leann; Stamm, John W
2017-01-01
Marginalized zero-inflated count regression models have recently been introduced for the statistical analysis of dental caries indices and other zero-inflated count data as alternatives to traditional zero-inflated and hurdle models. Unlike the standard approaches, the marginalized models directly estimate overall exposure or treatment effects by relating covariates to the marginal mean count. This article discusses model interpretation and model class choice according to the research question being addressed in caries research. Two data sets, one consisting of fictional dmft counts in 2 groups and the other on DMFS among schoolchildren from a randomized clinical trial comparing 3 toothpaste formulations to prevent incident dental caries, are analyzed with negative binomial hurdle, zero-inflated negative binomial, and marginalized zero-inflated negative binomial models. In the first example, estimates of treatment effects vary according to the type of incidence rate ratio (IRR) estimated by the model. Estimates of IRRs in the analysis of the randomized clinical trial were similar despite their distinctive interpretations. The choice of statistical model class should match the study's purpose, while accounting for the broad decline in children's caries experience, such that dmft and DMFS indices more frequently generate zero counts. Marginalized (marginal mean) models for zero-inflated count data should be considered for direct assessment of exposure effects on the marginal mean dental caries count in the presence of high frequencies of zero counts. © 2017 S. Karger AG, Basel.
A test of geographic assignment using isotope tracers in feathers of known origin
Wunder, Michael B.; Kester, C.L.; Knopf, F.L.; Rye, R.O.
2005-01-01
We used feathers of known origin collected from across the breeding range of a migratory shorebird to test the use of isotope tracers for assigning breeding origins. We analyzed δD, δ13C, and δ15N in feathers from 75 mountain plover (Charadrius montanus) chicks sampled in 2001 and from 119 chicks sampled in 2002. We estimated parameters for continuous-response inverse regression models and for discrete-response Bayesian probability models from data for each year independently. We evaluated model predictions with both the training data and by using the alternate year as an independent test dataset. Our results provide weak support for modeling latitude and isotope values as monotonic functions of one another, especially when data are pooled over known sources of variation such as sample year or location. We were unable to make even qualitative statements, such as north versus south, about the likely origin of birds using both δD and δ13C in inverse regression models; results were no better than random assignment. Probability models provided better results and a more natural framework for the problem. Correct assignment rates were highest when considering all three isotopes in the probability framework, but the use of even a single isotope was better than random assignment. The method appears relatively robust to temporal effects and is most sensitive to the isotope discrimination gradients over which samples are taken. We offer that the problem of using isotope tracers to infer geographic origin is best framed as one of assignment, rather than prediction.
Large Unbalanced Credit Scoring Using Lasso-Logistic Regression Ensemble
Wang, Hong; Xu, Qingsong; Zhou, Lifeng
2015-01-01
Recently, various ensemble learning methods with different base classifiers have been proposed for credit scoring problems. However, for various reasons, there has been little research using logistic regression as the base classifier. In this paper, given large unbalanced data, we consider the plausibility of ensemble learning using regularized logistic regression as the base classifier to deal with credit scoring problems. In this research, the data is first balanced and diversified by clustering and bagging algorithms. Then we apply a Lasso-logistic regression learning ensemble to evaluate the credit risks. We show that the proposed algorithm outperforms popular credit scoring models such as decision tree, Lasso-logistic regression and random forests in terms of AUC and F-measure. We also provide two importance measures for the proposed model to identify important variables in the data. PMID:25706988
A Highly Efficient Design Strategy for Regression with Outcome Pooling
Mitchell, Emily M.; Lyles, Robert H.; Manatunga, Amita K.; Perkins, Neil J.; Schisterman, Enrique F.
2014-01-01
The potential for research involving biospecimens can be hindered by the prohibitive cost of performing laboratory assays on individual samples. To mitigate this cost, strategies such as randomly selecting a portion of specimens for analysis or randomly pooling specimens prior to performing laboratory assays may be employed. These techniques, while effective in reducing cost, are often accompanied by a considerable loss of statistical efficiency. We propose a novel pooling strategy based on the k-means clustering algorithm to reduce laboratory costs while maintaining a high level of statistical efficiency when predictor variables are measured on all subjects, but the outcome of interest is assessed in pools. We perform simulations motivated by the BioCycle study to compare this k-means pooling strategy with current pooling and selection techniques under simple and multiple linear regression models. While all of the methods considered produce unbiased estimates and confidence intervals with appropriate coverage, pooling under k-means clustering provides the most precise estimates, closely approximating results from the full data and losing minimal precision as the total number of pools decreases. The benefits of k-means clustering evident in the simulation study are then applied to an analysis of the BioCycle dataset. In conclusion, when the number of lab tests is limited by budget, pooling specimens based on k-means clustering prior to performing lab assays can be an effective way to save money with minimal information loss in a regression setting. PMID:25220822
A highly efficient design strategy for regression with outcome pooling.
Mitchell, Emily M; Lyles, Robert H; Manatunga, Amita K; Perkins, Neil J; Schisterman, Enrique F
2014-12-10
The potential for research involving biospecimens can be hindered by the prohibitive cost of performing laboratory assays on individual samples. To mitigate this cost, strategies such as randomly selecting a portion of specimens for analysis or randomly pooling specimens prior to performing laboratory assays may be employed. These techniques, while effective in reducing cost, are often accompanied by a considerable loss of statistical efficiency. We propose a novel pooling strategy based on the k-means clustering algorithm to reduce laboratory costs while maintaining a high level of statistical efficiency when predictor variables are measured on all subjects, but the outcome of interest is assessed in pools. We perform simulations motivated by the BioCycle study to compare this k-means pooling strategy with current pooling and selection techniques under simple and multiple linear regression models. While all of the methods considered produce unbiased estimates and confidence intervals with appropriate coverage, pooling under k-means clustering provides the most precise estimates, closely approximating results from the full data and losing minimal precision as the total number of pools decreases. The benefits of k-means clustering evident in the simulation study are then applied to an analysis of the BioCycle dataset. In conclusion, when the number of lab tests is limited by budget, pooling specimens based on k-means clustering prior to performing lab assays can be an effective way to save money with minimal information loss in a regression setting. Copyright © 2014 John Wiley & Sons, Ltd.
Nunes, Matheus Henrique
2016-01-01
Tree stem form in native tropical forests is very irregular, posing a challenge to establishing taper equations that can accurately predict the diameter at any height along the stem and subsequently merchantable volume. Artificial intelligence approaches can be useful techniques in minimizing estimation errors within complex variations of vegetation. We evaluated the performance of Random Forest® regression tree and Artificial Neural Network procedures in modelling stem taper. Diameters and volume outside bark were compared to a traditional taper-based equation across a tropical Brazilian savanna, a seasonal semi-deciduous forest and a rainforest. Neural network models were found to be more accurate than the traditional taper equation. Random forest showed trends in the residuals from the diameter prediction and provided the least precise and accurate estimations for all forest types. This study provides insights into the superiority of a neural network, which provided advantages regarding the handling of local effects. PMID:27187074
Nunes, Matheus Henrique; Görgens, Eric Bastos
2016-01-01
Tree stem form in native tropical forests is very irregular, posing a challenge to establishing taper equations that can accurately predict the diameter at any height along the stem and subsequently merchantable volume. Artificial intelligence approaches can be useful techniques in minimizing estimation errors within complex variations of vegetation. We evaluated the performance of Random Forest® regression tree and Artificial Neural Network procedures in modelling stem taper. Diameters and volume outside bark were compared to a traditional taper-based equation across a tropical Brazilian savanna, a seasonal semi-deciduous forest and a rainforest. Neural network models were found to be more accurate than the traditional taper equation. Random forest showed trends in the residuals from the diameter prediction and provided the least precise and accurate estimations for all forest types. This study provides insights into the superiority of a neural network, which provided advantages regarding the handling of local effects.
Introduction to the use of regression models in epidemiology.
Bender, Ralf
2009-01-01
Regression modeling is one of the most important statistical techniques used in analytical epidemiology. By means of regression models the effect of one or several explanatory variables (e.g., exposures, subject characteristics, risk factors) on a response variable such as mortality or cancer can be investigated. From multiple regression models, adjusted effect estimates can be obtained that take the effect of potential confounders into account. Regression methods can be applied in all epidemiologic study designs so that they represent a universal tool for data analysis in epidemiology. Different kinds of regression models have been developed in dependence on the measurement scale of the response variable and the study design. The most important methods are linear regression for continuous outcomes, logistic regression for binary outcomes, Cox regression for time-to-event data, and Poisson regression for frequencies and rates. This chapter provides a nontechnical introduction to these regression models with illustrating examples from cancer research.
Evaluating Differential Effects Using Regression Interactions and Regression Mixture Models
ERIC Educational Resources Information Center
Van Horn, M. Lee; Jaki, Thomas; Masyn, Katherine; Howe, George; Feaster, Daniel J.; Lamont, Andrea E.; George, Melissa R. W.; Kim, Minjung
2015-01-01
Research increasingly emphasizes understanding differential effects. This article focuses on understanding regression mixture models, which are relatively new statistical methods for assessing differential effects by comparing results to using an interactive term in linear regression. The research questions which each model answers, their…
Martina, R; Kay, R; van Maanen, R; Ridder, A
2015-01-01
Clinical studies in overactive bladder have traditionally used analysis of covariance or nonparametric methods to analyse the number of incontinence episodes and other count data. It is known that if the underlying distributional assumptions of a particular parametric method do not hold, an alternative parametric method may be more efficient than a nonparametric one, which makes no assumptions regarding the underlying distribution of the data. Therefore, there are advantages in using methods based on the Poisson distribution or extensions of that method, which incorporate specific features that provide a modelling framework for count data. One challenge with count data is overdispersion, but methods are available that can account for this through the introduction of random effect terms in the modelling, and it is this modelling framework that leads to the negative binomial distribution. These models can also provide clinicians with a clearer and more appropriate interpretation of treatment effects in terms of rate ratios. In this paper, the previously used parametric and non-parametric approaches are contrasted with those based on Poisson regression and various extensions in trials evaluating solifenacin and mirabegron in patients with overactive bladder. In these applications, negative binomial models are seen to fit the data well. Copyright © 2014 John Wiley & Sons, Ltd.
Molavi Vardanjani, Mehdi; Masoudi Alavi, Negin; Razavi, Narges Sadat; Aghajani, Mohammad; Azizi-Fini, Esmail; Vaghefi, Seied Morteza
2013-09-01
The anxiety reduction before coronary angiography has clinical advantages and is one of the objectives of nursing. Reflexology is a non-invasive method that has been used in several clinical situations. Applying reflexology might have effect on the reduction of anxiety before coronary angiography. The aim of this randomized clinical trial was to investigate the effect of reflexology on anxiety among patients undergoing coronary angiography. This trial was conducted in Shahid Beheshti Hospital, in Kashan, Iran. One hundred male patients who were undergoing coronary angiography were randomly enrolled into intervention and placebo groups. The intervention protocol was included 30 minutes of general foot massage and the stimulation of three reflex points including solar plexus, pituitary gland, and heart. The placebo group only received the general foot massage. Spielbergers state trait anxiety inventory was used to assess the anxiety experienced by patients. Data was analyzed using Man-Witney, Wilcoxon and Chi-square tests. The stepwise multiple regressions used to analyze the variables that are involved in anxiety reduction. The mean range of anxiety decreased from 53.24 to 45.24 in reflexology group which represented 8 score reduction (P = 0.0001). The reduction in anxiety was 5.9 score in placebo group which was also significant (P = 0.0001). The anxiety reduction was significantly higher in reflexology group (P = 0.014). The stepwise multiple regression analysis showed that doing reflexology can explain the 7.5% of anxiety reduction which made a significant model. Reflexology can decrease the anxiety level before coronary angiography. Therefore, reflexology before coronary angiography is recommended.
Quantifying the impact of between-study heterogeneity in multivariate meta-analyses
Jackson, Dan; White, Ian R; Riley, Richard D
2012-01-01
Measures that quantify the impact of heterogeneity in univariate meta-analysis, including the very popular I2 statistic, are now well established. Multivariate meta-analysis, where studies provide multiple outcomes that are pooled in a single analysis, is also becoming more commonly used. The question of how to quantify heterogeneity in the multivariate setting is therefore raised. It is the univariate R2 statistic, the ratio of the variance of the estimated treatment effect under the random and fixed effects models, that generalises most naturally, so this statistic provides our basis. This statistic is then used to derive a multivariate analogue of I2, which we call . We also provide a multivariate H2 statistic, the ratio of a generalisation of Cochran's heterogeneity statistic and its associated degrees of freedom, with an accompanying generalisation of the usual I2 statistic, . Our proposed heterogeneity statistics can be used alongside all the usual estimates and inferential procedures used in multivariate meta-analysis. We apply our methods to some real datasets and show how our statistics are equally appropriate in the context of multivariate meta-regression, where study level covariate effects are included in the model. Our heterogeneity statistics may be used when applying any procedure for fitting the multivariate random effects model. Copyright © 2012 John Wiley & Sons, Ltd. PMID:22763950
Ensemble habitat mapping of invasive plant species
Stohlgren, T.J.; Ma, P.; Kumar, S.; Rocca, M.; Morisette, J.T.; Jarnevich, C.S.; Benson, N.
2010-01-01
Ensemble species distribution models combine the strengths of several species environmental matching models, while minimizing the weakness of any one model. Ensemble models may be particularly useful in risk analysis of recently arrived, harmful invasive species because species may not yet have spread to all suitable habitats, leaving species-environment relationships difficult to determine. We tested five individual models (logistic regression, boosted regression trees, random forest, multivariate adaptive regression splines (MARS), and maximum entropy model or Maxent) and ensemble modeling for selected nonnative plant species in Yellowstone and Grand Teton National Parks, Wyoming; Sequoia and Kings Canyon National Parks, California, and areas of interior Alaska. The models are based on field data provided by the park staffs, combined with topographic, climatic, and vegetation predictors derived from satellite data. For the four invasive plant species tested, ensemble models were the only models that ranked in the top three models for both field validation and test data. Ensemble models may be more robust than individual species-environment matching models for risk analysis. ?? 2010 Society for Risk Analysis.
A Portuguese value set for the SF-6D.
Ferreira, Lara N; Ferreira, Pedro L; Pereira, Luis N; Brazier, John; Rowen, Donna
2010-08-01
The SF-6D is a preference-based measure of health derived from the SF-36 that can be used for cost-effectiveness analysis using cost-per-quality adjusted life-year analysis. This study seeks to estimate a system weight for the SF-6D for Portugal and to compare the results with the UK system weights. A sample of 55 health states defined by the SF-6D has been valued by a representative random sample of the Portuguese population, stratified by sex and age (n = 140), using the Standard Gamble (SG). Several models are estimated at both the individual and aggregate levels for predicting health-state valuations. Models with main effects, with interaction effects and with the constant forced to unity are presented. Random effects (RE) models are estimated using generalized least squares (GLS) regressions. Generalized estimation equations (GEE) are used to estimate RE models with the constant forced to unity. Estimations at the individual level were performed using 630 health-state valuations. Alternative functional forms are considered to account for the skewed distribution of health-state valuations. The models are analyzed in terms of their coefficients, overall fit, and the ability for predicting the SG-values. The RE models estimated using GLS and through GEE produce significant coefficients, which are robust across model specification. However, there are concerns regarding some inconsistent estimates, and so parsimonious consistent models were estimated. There is evidence of under prediction in some states assigned to poor health. The results are consistent with the UK results. The models estimated provide preference-based quality of life weights for the Portuguese population when health status data have been collected using the SF-36. Although the sample was randomly drowned findings should be treated with caution, given the small sample size, even knowing that they have been estimated at the individual level.
Ristić-Medić, Danijela; Dullemeijer, Carla; Tepsić, Jasna; Petrović-Oggiano, Gordana; Popović, Tamara; Arsić, Aleksandra; Glibetić, Marija; Souverein, Olga W; Collings, Rachel; Cavelaars, Adriënne; de Groot, Lisette; van't Veer, Pieter; Gurinović, Mirjana
2014-03-01
The objective of this systematic review was to identify studies investigating iodine intake and biomarkers of iodine status, to assess the data of the selected studies, and to estimate dose-response relationships using meta-analysis. All randomized controlled trials, prospective cohort studies, nested case-control studies, and cross-sectional studies that supplied or measured dietary iodine and measured iodine biomarkers were included. The overall pooled regression coefficient (β) and the standard error of β were calculated by random-effects meta-analysis on a double-log scale, using the calculated intake-status regression coefficient (β) for each individual study. The results of pooled randomized controlled trials indicated that the doubling of dietary iodine intake increased urinary iodine concentrations by 14% in children and adolescents, by 57% in adults and the elderly, and by 81% in pregnant women. The dose-response relationship between iodine intake and biomarkers of iodine status indicated a 12% decrease in thyroid-stimulating hormone and a 31% decrease in thyroglobulin in pregnant women. The model of dose-response quantification used to describe the relationship between iodine intake and biomarkers of iodine status may be useful for providing complementary evidence to support recommendations for iodine intake in different population groups.
Homogenization Issues in the Combustion of Heterogeneous Solid Propellants
NASA Technical Reports Server (NTRS)
Chen, M.; Buckmaster, J.; Jackson, T. L.; Massa, L.
2002-01-01
We examine random packs of discs or spheres, models for ammonium-perchlorate-in-binder propellants, and discuss their average properties. An analytical strategy is described for calculating the mean or effective heat conduction coefficient in terms of the heat conduction coefficients of the individual components, and the results are verified by comparison with those of direct numerical simulations (dns) for both 2-D (disc) and 3-D (sphere) packs across which a temperature difference is applied. Similarly, when the surface regression speed of each component is related to the surface temperature via a simple Arrhenius law, an analytical strategy is developed for calculating an effective Arrhenius law for the combination, and these results are verified using dns in which a uniform heat flux is applied to the pack surface, causing it to regress. These results are needed for homogenization strategies necessary for fully integrated 2-D or 3-D simulations of heterogeneous propellant combustion.
Ortega Hinojosa, Alberto M; Davies, Molly M; Jarjour, Sarah; Burnett, Richard T; Mann, Jennifer K; Hughes, Edward; Balmes, John R; Turner, Michelle C; Jerrett, Michael
2014-10-01
Globally and in the United States, smoking and obesity are leading causes of death and disability. Reliable estimates of prevalence for these risk factors are often missing variables in public health surveillance programs. This may limit the capacity of public health surveillance to target interventions or to assess associations between other environmental risk factors (e.g., air pollution) and health because smoking and obesity are often important confounders. To generate prevalence estimates of smoking and obesity rates over small areas for the United States (i.e., at the ZIP code and census tract levels). We predicted smoking and obesity prevalence using a combined approach first using a lasso-based variable selection procedure followed by a two-level random effects regression with a Poisson link clustered on state and county. We used data from the Behavioral Risk Factor Surveillance System (BRFSS) from 1991 to 2010 to estimate the model. We used 10-fold cross-validated mean squared errors and the variance of the residuals to test our model. To downscale the estimates we combined the prediction equations with 1990 and 2000 U.S. Census data for each of the four five-year time periods in this time range at the ZIP code and census tract levels. Several sensitivity analyses were conducted using models that included only basic terms, that accounted for spatial autocorrelation, and used Generalized Linear Models that did not include random effects. The two-level random effects model produced improved estimates compared to the fixed effects-only models. Estimates were particularly improved for the two-thirds of the conterminous U.S. where BRFSS data were available to estimate the county level random effects. We downscaled the smoking and obesity rate predictions to derive ZIP code and census tract estimates. To our knowledge these smoking and obesity predictions are the first to be developed for the entire conterminous U.S. for census tracts and ZIP codes. Our estimates could have significant utility for public health surveillance. Copyright © 2014. Published by Elsevier Inc.
Anderson, Weston; Guikema, Seth; Zaitchik, Ben; Pan, William
2014-01-01
Obtaining accurate small area estimates of population is essential for policy and health planning but is often difficult in countries with limited data. In lieu of available population data, small area estimate models draw information from previous time periods or from similar areas. This study focuses on model-based methods for estimating population when no direct samples are available in the area of interest. To explore the efficacy of tree-based models for estimating population density, we compare six different model structures including Random Forest and Bayesian Additive Regression Trees. Results demonstrate that without information from prior time periods, non-parametric tree-based models produced more accurate predictions than did conventional regression methods. Improving estimates of population density in non-sampled areas is important for regions with incomplete census data and has implications for economic, health and development policies.
Anderson, Weston; Guikema, Seth; Zaitchik, Ben; Pan, William
2014-01-01
Obtaining accurate small area estimates of population is essential for policy and health planning but is often difficult in countries with limited data. In lieu of available population data, small area estimate models draw information from previous time periods or from similar areas. This study focuses on model-based methods for estimating population when no direct samples are available in the area of interest. To explore the efficacy of tree-based models for estimating population density, we compare six different model structures including Random Forest and Bayesian Additive Regression Trees. Results demonstrate that without information from prior time periods, non-parametric tree-based models produced more accurate predictions than did conventional regression methods. Improving estimates of population density in non-sampled areas is important for regions with incomplete census data and has implications for economic, health and development policies. PMID:24992657
Fischer, A; Friggens, N C; Berry, D P; Faverdin, P
2018-07-01
The ability to properly assess and accurately phenotype true differences in feed efficiency among dairy cows is key to the development of breeding programs for improving feed efficiency. The variability among individuals in feed efficiency is commonly characterised by the residual intake approach. Residual feed intake is represented by the residuals of a linear regression of intake on the corresponding quantities of the biological functions that consume (or release) energy. However, the residuals include both, model fitting and measurement errors as well as any variability in cow efficiency. The objective of this study was to isolate the individual animal variability in feed efficiency from the residual component. Two separate models were fitted, in one the standard residual energy intake (REI) was calculated as the residual of a multiple linear regression of lactation average net energy intake (NEI) on lactation average milk energy output, average metabolic BW, as well as lactation loss and gain of body condition score. In the other, a linear mixed model was used to simultaneously fit fixed linear regressions and random cow levels on the biological traits and intercept using fortnight repeated measures for the variables. This method split the predicted NEI in two parts: one quantifying the population mean intercept and coefficients, and one quantifying cow-specific deviations in the intercept and coefficients. The cow-specific part of predicted NEI was assumed to isolate true differences in feed efficiency among cows. NEI and associated energy expenditure phenotypes were available for the first 17 fortnights of lactation from 119 Holstein cows; all fed a constant energy-rich diet. Mixed models fitting cow-specific intercept and coefficients to different combinations of the aforementioned energy expenditure traits, calculated on a fortnightly basis, were compared. The variance of REI estimated with the lactation average model represented only 8% of the variance of measured NEI. Among all compared mixed models, the variance of the cow-specific part of predicted NEI represented between 53% and 59% of the variance of REI estimated from the lactation average model or between 4% and 5% of the variance of measured NEI. The remaining 41% to 47% of the variance of REI estimated with the lactation average model may therefore reflect model fitting errors or measurement errors. In conclusion, the use of a mixed model framework with cow-specific random regressions seems to be a promising method to isolate the cow-specific component of REI in dairy cows.
Ahmed, Ali; Husain, Ahsan; Love, Thomas E.; Gambassi, Giovanni; Dell'Italia, Louis J.; Francis, Gary S.; Gheorghiade, Mihai; Allman, Richard M.; Meleth, Sreelatha; Bourge, Robert C.
2008-01-01
Aims Non-potassium-sparing diuretics are commonly used in heart failure (HF). They activate the neurohormonal system, and are potentially harmful. Yet, the long-term effects of chronic diuretic use in HF are largely unknown. We retrospectively analysed the Digitalis Investigation Group (DIG) data to determine the effects of diuretics on HF outcomes. Methods and results Propensity scores for diuretic use were calculated for each of the 7788 DIG participants using a non-parsimonious multivariable logistic regression model, and were used to match 1391 (81%) no-diuretic patients with 1391 diuretic patients. Effects of diuretics on mortality and hospitalization at 40 months of median follow-up were assessed using matched Cox regression models. All-cause mortality was 21% for no-diuretic patients and 29% for diuretic patients [hazard ratio (HR) 1.31; 95% confidence interval (CI) 1.11−1.55; P = 0.002]. HF hospitalizations occurred in 18% of no-diuretic patients and 23% of diuretic patients (HR 1.37; 95% CI 1.13−1.65; P = 0.001). Conclusion Chronic diuretic use was associated with increased long-term mortality and hospitalizations in a wide spectrum of ambulatory chronic systolic and diastolic HF patients. The findings of the current study challenge the wisdom of routine chronic use of diuretics in HF patients who are asymptomatic or minimally symptomatic without fluid retention, and are on complete neurohormonal blockade. These findings, based on a non-randomized design, need to be further studied in randomized trials. PMID:16709595
NASA Astrophysics Data System (ADS)
Saputro, D. R. S.; Amalia, F.; Widyaningsih, P.; Affan, R. C.
2018-05-01
Bayesian method is a method that can be used to estimate the parameters of multivariate multiple regression model. Bayesian method has two distributions, there are prior and posterior distributions. Posterior distribution is influenced by the selection of prior distribution. Jeffreys’ prior distribution is a kind of Non-informative prior distribution. This prior is used when the information about parameter not available. Non-informative Jeffreys’ prior distribution is combined with the sample information resulting the posterior distribution. Posterior distribution is used to estimate the parameter. The purposes of this research is to estimate the parameters of multivariate regression model using Bayesian method with Non-informative Jeffreys’ prior distribution. Based on the results and discussion, parameter estimation of β and Σ which were obtained from expected value of random variable of marginal posterior distribution function. The marginal posterior distributions for β and Σ are multivariate normal and inverse Wishart. However, in calculation of the expected value involving integral of a function which difficult to determine the value. Therefore, approach is needed by generating of random samples according to the posterior distribution characteristics of each parameter using Markov chain Monte Carlo (MCMC) Gibbs sampling algorithm.
Regression-based adaptive sparse polynomial dimensional decomposition for sensitivity analysis
NASA Astrophysics Data System (ADS)
Tang, Kunkun; Congedo, Pietro; Abgrall, Remi
2014-11-01
Polynomial dimensional decomposition (PDD) is employed in this work for global sensitivity analysis and uncertainty quantification of stochastic systems subject to a large number of random input variables. Due to the intimate structure between PDD and Analysis-of-Variance, PDD is able to provide simpler and more direct evaluation of the Sobol' sensitivity indices, when compared to polynomial chaos (PC). Unfortunately, the number of PDD terms grows exponentially with respect to the size of the input random vector, which makes the computational cost of the standard method unaffordable for real engineering applications. In order to address this problem of curse of dimensionality, this work proposes a variance-based adaptive strategy aiming to build a cheap meta-model by sparse-PDD with PDD coefficients computed by regression. During this adaptive procedure, the model representation by PDD only contains few terms, so that the cost to resolve repeatedly the linear system of the least-square regression problem is negligible. The size of the final sparse-PDD representation is much smaller than the full PDD, since only significant terms are eventually retained. Consequently, a much less number of calls to the deterministic model is required to compute the final PDD coefficients.
Geo-additive modelling of malaria in Burundi
2011-01-01
Background Malaria is a major public health issue in Burundi in terms of both morbidity and mortality, with around 2.5 million clinical cases and more than 15,000 deaths each year. It is still the single main cause of mortality in pregnant women and children below five years of age. Because of the severe health and economic burden of malaria, there is still a growing need for methods that will help to understand the influencing factors. Several studies/researches have been done on the subject yielding different results as which factors are most responsible for the increase in malaria transmission. This paper considers the modelling of the dependence of malaria cases on spatial determinants and climatic covariates including rainfall, temperature and humidity in Burundi. Methods The analysis carried out in this work exploits real monthly data collected in the area of Burundi over 12 years (1996-2007). Semi-parametric regression models are used. The spatial analysis is based on a geo-additive model using provinces as the geographic units of study. The spatial effect is split into structured (correlated) and unstructured (uncorrelated) components. Inference is fully Bayesian and uses Markov chain Monte Carlo techniques. The effects of the continuous covariates are modelled by cubic p-splines with 20 equidistant knots and second order random walk penalty. For the spatially correlated effect, Markov random field prior is chosen. The spatially uncorrelated effects are assumed to be i.i.d. Gaussian. The effects of climatic covariates and the effects of other spatial determinants are estimated simultaneously in a unified regression framework. Results The results obtained from the proposed model suggest that although malaria incidence in a given month is strongly positively associated with the minimum temperature of the previous months, regional patterns of malaria that are related to factors other than climatic variables have been identified, without being able to explain them. Conclusions In this paper, semiparametric models are used to model the effects of both climatic covariates and spatial effects on malaria distribution in Burundi. The results obtained from the proposed models suggest a strong positive association between malaria incidence in a given month and the minimum temperature of the previous month. From the spatial effects, important spatial patterns of malaria that are related to factors other than climatic variables are identified. Potential explanations (factors) could be related to socio-economic conditions, food shortage, limited access to health care service, precarious housing, promiscuity, poor hygienic conditions, limited access to drinking water, land use (rice paddies for example), displacement of the population (due to armed conflicts). PMID:21835010
Meseret, S.; Tamir, B.; Gebreyohannes, G.; Lidauer, M.; Negussie, E.
2015-01-01
The development of effective genetic evaluations and selection of sires requires accurate estimates of genetic parameters for all economically important traits in the breeding goal. The main objective of this study was to assess the relative performance of the traditional lactation average model (LAM) against the random regression test-day model (RRM) in the estimation of genetic parameters and prediction of breeding values for Holstein Friesian herds in Ethiopia. The data used consisted of 6,500 test-day (TD) records from 800 first-lactation Holstein Friesian cows that calved between 1997 and 2013. Co-variance components were estimated using the average information restricted maximum likelihood method under single trait animal model. The estimate of heritability for first-lactation milk yield was 0.30 from LAM whilst estimates from the RRM model ranged from 0.17 to 0.29 for the different stages of lactation. Genetic correlations between different TDs in first-lactation Holstein Friesian ranged from 0.37 to 0.99. The observed genetic correlation was less than unity between milk yields at different TDs, which indicated that the assumption of LAM may not be optimal for accurate evaluation of the genetic merit of animals. A close look at estimated breeding values from both models showed that RRM had higher standard deviation compared to LAM indicating that the TD model makes efficient utilization of TD information. Correlations of breeding values between models ranged from 0.90 to 0.96 for different group of sires and cows and marked re-rankings were observed in top sires and cows in moving from the traditional LAM to RRM evaluations. PMID:26194217
Publication bias in obesity treatment trials?
Allison, D B; Faith, M S; Gorman, B S
1996-10-01
The present investigation examined the extent of publication bias (namely the tendency to publish significant findings and file away non-significant findings) within the obesity treatment literature. Quantitative literature synthesis of four published meta-analyses from the obesity treatment literature. Interventions in these studies included pharmacological, educational, child, and couples treatments. To assess publication bias, several regression procedures (for example weighted least-squares, random-effects multi-level modeling, and robust regression methods) were used to regress effect sizes onto their standard errors, or proxies thereof, within each of the four meta-analysis. A significant positive beta weight in these analyses signified publication bias. There was evidence for publication bias within two of the four published meta-analyses, such that reviews of published studies were likely to overestimate clinical efficacy. The lack of evidence for publication bias within the two other meta-analyses might have been due to insufficient statistical power rather than the absence of selection bias. As in other disciplines, publication bias appears to exist in the obesity treatment literature. Suggestions are offered for managing publication bias once identified or reducing its likelihood in the first place.
Mason, W Alex; Fleming, Charles B; Gross, Thomas J; Thompson, Ronald W; Parra, Gilbert R; Haggerty, Kevin P; Snyder, James J
2016-12-01
This randomized controlled trial tested a widely used general parent training program, Common Sense Parenting (CSP), with low-income 8th graders and their families to support a positive transition to high school. The program was tested in its original 6-session format and in a modified format (CSP-Plus), which added 2 sessions that included adolescents. Over 2 annual cohorts, 321 families were enrolled and randomly assigned to either the CSP, CSP-Plus, or minimal-contact control condition. Pretest, posttest, 1-year follow-up, and 2-year follow-up survey data on parenting as well as youth school bonding, social skills, and problem behaviors were collected from parents and youth (94% retention). Extending prior examinations of posttest outcomes, intent-to-treat regression analyses tested for intervention effects at the 2 follow-up assessments, and growth curve analyses examined experimental condition differences in yearly change across time. Separate exploratory tests of moderation by youth gender, youth conduct problems, and family economic hardship also were conducted. Out of 52 regression models predicting 1- and 2-year follow-up outcomes, only 2 out of 104 possible intervention effects were statistically significant. No statistically significant intervention effects were found in the growth curve analyses. Tests of moderation also showed few statistically significant effects. Because CSP already is in widespread use, findings have direct implications for practice. Specifically, findings suggest that the program may not be efficacious with parents of adolescents in a selective prevention context and may reveal the limits of brief, general parent training for achieving outcomes with parents of adolescents. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Binia, Aristea; Jaeger, Jonathan; Hu, Youyou; Singh, Anurag; Zimmermann, Diane
2015-08-01
To evaluate the efficacy of daily potassium intake on decreasing blood pressure in non-medicated normotensive or hypertensive patients, and to determine the relationship between potassium intake, sodium-to-potassium ratio and reduction in blood pressure. Mixed-effect meta-analyses and meta-regression models. Medline and the references of previous meta-analyses. Randomized controlled trials with potassium supplementation, with blood pressure as the primary outcome, in non-medicated patients. Fifteen randomized controlled trials of potassium supplementation in patients without antihypertensive medication were selected for the meta-analyses (917 patients). Potassium supplementation resulted in reduction of SBP by 4.7 mmHg [95% confidence interval (CI) 2.4-7.0] and DBP by 3.5 mmHg (95% CI 1.3-5.7) in all patients. The effect was found to be greater in hypertensive patients, with a reduction of SBP by 6.8 mmHg (95% CI 4.3-9.3) and DBP by 4.6 mmHg (95% CI 1.8-7.5). Meta-regression analysis showed that both increased daily potassium excretion and decreased sodium-to-potassium ratio were associated with blood pressure reduction (P < 0.05). Increased total daily potassium urinary excretion from 60 to 100 mmol/day and decrease of sodium-to-potassium ratio were shown to be necessary to explain the estimated effect. Potassium supplementation is associated with reduction of blood pressure in patients who are not on antihypertensive medication, and the effect is significant in hypertensive patients. The reduction in blood pressure significantly correlates with decreased daily urinary sodium-to-potassium ratio and increased urinary potassium. Patients with elevated blood pressure may benefit from increased potassium intake along with controlled or decreased sodium intake.
Hu, Chen; Steingrimsson, Jon Arni
2018-01-01
A crucial component of making individualized treatment decisions is to accurately predict each patient's disease risk. In clinical oncology, disease risks are often measured through time-to-event data, such as overall survival and progression/recurrence-free survival, and are often subject to censoring. Risk prediction models based on recursive partitioning methods are becoming increasingly popular largely due to their ability to handle nonlinear relationships, higher-order interactions, and/or high-dimensional covariates. The most popular recursive partitioning methods are versions of the Classification and Regression Tree (CART) algorithm, which builds a simple interpretable tree structured model. With the aim of increasing prediction accuracy, the random forest algorithm averages multiple CART trees, creating a flexible risk prediction model. Risk prediction models used in clinical oncology commonly use both traditional demographic and tumor pathological factors as well as high-dimensional genetic markers and treatment parameters from multimodality treatments. In this article, we describe the most commonly used extensions of the CART and random forest algorithms to right-censored outcomes. We focus on how they differ from the methods for noncensored outcomes, and how the different splitting rules and methods for cost-complexity pruning impact these algorithms. We demonstrate these algorithms by analyzing a randomized Phase III clinical trial of breast cancer. We also conduct Monte Carlo simulations to compare the prediction accuracy of survival forests with more commonly used regression models under various scenarios. These simulation studies aim to evaluate how sensitive the prediction accuracy is to the underlying model specifications, the choice of tuning parameters, and the degrees of missing covariates.
Kershaw, Kiarri N; Hankinson, Arlene L; Liu, Kiang; Reis, Jared P; Lewis, Cora E; Loria, Catherine M; Carnethon, Mercedes R
2014-03-01
Few studies have examined longitudinal associations between close social relationships and weight change. Using data from 3,074 participants in the Coronary Artery Risk Development in Young Adults Study who were examined in 2000, 2005, and 2010 (at ages 33-45 years in 2000), we estimated separate logistic regression random-effects models to assess whether patterns of exposure to supportive and negative relationships were associated with 10% or greater increases in body mass index (BMI) (weight (kg)/height (m)(2)) and waist circumference. Linear regression random-effects modeling was used to examine associations of social relationships with mean changes in BMI and waist circumference. Participants with persistently high supportive relationships were significantly less likely to increase their BMI values and waist circumference by 10% or greater compared with those with persistently low supportive relationships after adjustment for sociodemographic characteristics, baseline BMI/waist circumference, depressive symptoms, and health behaviors. Persistently high negative relationships were associated with higher likelihood of 10% or greater increases in waist circumference (odds ratio = 1.62, 95% confidence interval: 1.15, 2.29) and marginally higher BMI increases (odds ratio = 1.50, 95% confidence interval: 1.00, 2.24) compared with participants with persistently low negative relationships. Increasingly negative relationships were associated with increases in waist circumference only. These findings suggest that supportive relationships may minimize weight gain, and that adverse relationships may contribute to weight gain, particularly via central fat accumulation.
Kershaw, Kiarri N.; Hankinson, Arlene L.; Liu, Kiang; Reis, Jared P.; Lewis, Cora E.; Loria, Catherine M.; Carnethon, Mercedes R.
2014-01-01
Few studies have examined longitudinal associations between close social relationships and weight change. Using data from 3,074 participants in the Coronary Artery Risk Development in Young Adults Study who were examined in 2000, 2005, and 2010 (at ages 33–45 years in 2000), we estimated separate logistic regression random-effects models to assess whether patterns of exposure to supportive and negative relationships were associated with 10% or greater increases in body mass index (BMI) (weight (kg)/height (m)2) and waist circumference. Linear regression random-effects modeling was used to examine associations of social relationships with mean changes in BMI and waist circumference. Participants with persistently high supportive relationships were significantly less likely to increase their BMI values and waist circumference by 10% or greater compared with those with persistently low supportive relationships after adjustment for sociodemographic characteristics, baseline BMI/waist circumference, depressive symptoms, and health behaviors. Persistently high negative relationships were associated with higher likelihood of 10% or greater increases in waist circumference (odds ratio = 1.62, 95% confidence interval: 1.15, 2.29) and marginally higher BMI increases (odds ratio = 1.50, 95% confidence interval: 1.00, 2.24) compared with participants with persistently low negative relationships. Increasingly negative relationships were associated with increases in waist circumference only. These findings suggest that supportive relationships may minimize weight gain, and that adverse relationships may contribute to weight gain, particularly via central fat accumulation. PMID:24389018
De Haas, Y; Janss, L L G; Kadarmideen, H N
2007-10-01
Genetic correlations between body condition score (BCS) and fertility traits in dairy cattle were estimated using bivariate random regression models. BCS was recorded by the Swiss Holstein Association on 22,075 lactating heifers (primiparous cows) from 856 sires. Fertility data during first lactation were extracted for 40,736 cows. The fertility traits were days to first service (DFS), days between first and last insemination (DFLI), calving interval (CI), number of services per conception (NSPC) and conception rate to first insemination (CRFI). A bivariate model was used to estimate genetic correlations between BCS as a longitudinal trait by random regression components, and daughter's fertility at the sire level as a single lactation measurement. Heritability of BCS was 0.17, and heritabilities for fertility traits were low (0.01-0.08). Genetic correlations between BCS and fertility over the lactation varied from: -0.45 to -0.14 for DFS; -0.75 to 0.03 for DFLI; from -0.59 to -0.02 for CI; from -0.47 to 0.33 for NSPC and from 0.08 to 0.82 for CRFI. These results show (genetic) interactions between fat reserves and reproduction along the lactation trajectory of modern dairy cows, which can be useful in genetic selection as well as in management. Maximum genetic gain in fertility from indirect selection on BCS should be based on measurements taken in mid lactation when the genetic variance for BCS is largest, and the genetic correlations between BCS and fertility is strongest.
Neighborhood Structural Inequality, Collective Efficacy, and Sexual Risk Behavior among Urban Youth
BROWNING, CHRISTOPHER R.; BURRINGTON, LORI A.; LEVENTHAL, TAMA; BROOKS-GUNN, JEANNE
2011-01-01
We draw on collective efficacy theory to extend a contextual model of early adolescent sexual behavior. Specifically, we hypothesize that neighborhood structural disadvantage—as measured by levels of concentrated poverty, residential instability, and aspects of immigrant concentration—and diminished collective efficacy have consequences for the prevalence of early adolescent multiple sexual partnering. Findings from random effects multinomial logistic regression models of the number of sexual partners among a sample of youth, age 11 to 16, from the Project on Human Development in Chicago Neighborhoods (N = 768) reveal evidence of neighborhood effects on adolescent higher-risk sexual activity. Collective efficacy is negatively associated with having two or more sexual partners versus one (but not zero versus one) sexual partner. The effect of collective efficacy is dependent upon age: The regulatory effect of collective efficacy increases for older adolescents. PMID:18771063
Griffiths, Alison; Paracha, Noman; Davies, Andrew; Branscombe, Neil; Cowie, Martin R; Sculpher, Mark
2017-03-01
The aim of this article is to discuss methods used to analyze health-related quality of life (HRQoL) data from randomized controlled trials (RCTs) for decision analytic models. The analysis presented in this paper was used to provide HRQoL data for the ivabradine health technology assessment (HTA) submission in chronic heart failure. We have used a large, longitudinal EuroQol five-dimension questionnaire (EQ-5D) dataset from the Systolic Heart Failure Treatment with the I f Inhibitor Ivabradine Trial (SHIFT) (clinicaltrials.gov: NCT02441218) to illustrate issues and methods. HRQoL weights (utility values) were estimated from a mixed regression model developed using SHIFT EQ-5D data (n = 5313 patients). The regression model was used to predict HRQoL outcomes according to treatment, patient characteristics, and key clinical outcomes for patients with a heart rate ≥75 bpm. Ivabradine was associated with an HRQoL weight gain of 0.01. HRQoL weights differed according to New York Heart Association (NYHA) class (NYHA I-IV, no hospitalization: standard care 0.82-0.46; ivabradine 0.84-0.47). A reduction in HRQoL weight was associated with hospitalizations within 30 days of an HRQoL assessment visit, with this reduction varying by NYHA class [-0.07 (NYHA I) to -0.21 (NYHA IV)]. The mixed model explained variation in EQ-5D data according to key clinical outcomes and patient characteristics, providing essential information for long-term predictions of patient HRQoL in the cost-effectiveness model. This model was also used to estimate the loss in HRQoL associated with hospitalizations. In SHIFT many hospitalizations did not occur close to EQ-5D visits; hence, any temporary changes in HRQoL associated with such events would not be captured fully in observed RCT evidence, but could be predicted in our cost-effectiveness analysis using the mixed model. Given the large reduction in hospitalizations associated with ivabradine this was an important feature of the analysis. The Servier Research Group.
Solving large mixed linear models using preconditioned conjugate gradient iteration.
Strandén, I; Lidauer, M
1999-12-01
Continuous evaluation of dairy cattle with a random regression test-day model requires a fast solving method and algorithm. A new computing technique feasible in Jacobi and conjugate gradient based iterative methods using iteration on data is presented. In the new computing technique, the calculations in multiplication of a vector by a matrix were recorded to three steps instead of the commonly used two steps. The three-step method was implemented in a general mixed linear model program that used preconditioned conjugate gradient iteration. Performance of this program in comparison to other general solving programs was assessed via estimation of breeding values using univariate, multivariate, and random regression test-day models. Central processing unit time per iteration with the new three-step technique was, at best, one-third that needed with the old technique. Performance was best with the test-day model, which was the largest and most complex model used. The new program did well in comparison to other general software. Programs keeping the mixed model equations in random access memory required at least 20 and 435% more time to solve the univariate and multivariate animal models, respectively. Computations of the second best iteration on data took approximately three and five times longer for the animal and test-day models, respectively, than did the new program. Good performance was due to fast computing time per iteration and quick convergence to the final solutions. Use of preconditioned conjugate gradient based methods in solving large breeding value problems is supported by our findings.
Henry, Stephen G.; Jerant, Anthony; Iosif, Ana-Maria; Feldman, Mitchell D.; Cipri, Camille; Kravitz, Richard L.
2015-01-01
Objective To identify factors associated with participant consent to record visits; to estimate effects of recording on patient-clinician interactions Methods Secondary analysis of data from a randomized trial studying communication about depression; participants were asked for optional consent to audio record study visits. Multiple logistic regression was used to model likelihood of patient and clinician consent. Multivariable regression and propensity score analyses were used to estimate effects of audio recording on 6 dependent variables: discussion of depressive symptoms, preventive health, and depression diagnosis; depression treatment recommendations; visit length; visit difficulty. Results Of 867 visits involving 135 primary care clinicians, 39% were recorded. For clinicians, only working in academic settings (P=0.003) and having worked longer at their current practice (P=0.02) were associated with increased likelihood of consent. For patients, white race (P=0.002) and diabetes (P=0.03) were associated with increased likelihood of consent. Neither multivariable regression nor propensity score analyses revealed any significant effects of recording on the variables examined. Conclusion Few clinician or patient characteristics were significantly associated with consent. Audio recording had no significant effect on any dependent variables. Practice Implications Benefits of recording clinic visits likely outweigh the risks of bias in this setting. PMID:25837372
Nelson, Jon P
2014-01-01
Precise estimates of price elasticities are important for alcohol tax policy. Using meta-analysis, this paper corrects average beer elasticities for heterogeneity, dependence, and publication selection bias. A sample of 191 estimates is obtained from 114 primary studies. Simple and weighted means are reported. Dependence is addressed by restricting number of estimates per study, author-restricted samples, and author-specific variables. Publication bias is addressed using funnel graph, trim-and-fill, and Egger's intercept model. Heterogeneity and selection bias are examined jointly in meta-regressions containing moderator variables for econometric methodology, primary data, and precision of estimates. Results for fixed- and random-effects regressions are reported. Country-specific effects and sample time periods are unimportant, but several methodology variables help explain the dispersion of estimates. In models that correct for selection bias and heterogeneity, the average beer price elasticity is about -0.20, which is less elastic by 50% compared to values commonly used in alcohol tax policy simulations. Copyright © 2013 Elsevier B.V. All rights reserved.
Alghamdi, Manal; Al-Mallah, Mouaz; Keteyian, Steven; Brawner, Clinton; Ehrman, Jonathan; Sakr, Sherif
2017-01-01
Machine learning is becoming a popular and important approach in the field of medical research. In this study, we investigate the relative performance of various machine learning methods such as Decision Tree, Naïve Bayes, Logistic Regression, Logistic Model Tree and Random Forests for predicting incident diabetes using medical records of cardiorespiratory fitness. In addition, we apply different techniques to uncover potential predictors of diabetes. This FIT project study used data of 32,555 patients who are free of any known coronary artery disease or heart failure who underwent clinician-referred exercise treadmill stress testing at Henry Ford Health Systems between 1991 and 2009 and had a complete 5-year follow-up. At the completion of the fifth year, 5,099 of those patients have developed diabetes. The dataset contained 62 attributes classified into four categories: demographic characteristics, disease history, medication use history, and stress test vital signs. We developed an Ensembling-based predictive model using 13 attributes that were selected based on their clinical importance, Multiple Linear Regression, and Information Gain Ranking methods. The negative effect of the imbalance class of the constructed model was handled by Synthetic Minority Oversampling Technique (SMOTE). The overall performance of the predictive model classifier was improved by the Ensemble machine learning approach using the Vote method with three Decision Trees (Naïve Bayes Tree, Random Forest, and Logistic Model Tree) and achieved high accuracy of prediction (AUC = 0.92). The study shows the potential of ensembling and SMOTE approaches for predicting incident diabetes using cardiorespiratory fitness data.
Neither fixed nor random: weighted least squares meta-analysis.
Stanley, T D; Doucouliagos, Hristos
2015-06-15
This study challenges two core conventional meta-analysis methods: fixed effect and random effects. We show how and explain why an unrestricted weighted least squares estimator is superior to conventional random-effects meta-analysis when there is publication (or small-sample) bias and better than a fixed-effect weighted average if there is heterogeneity. Statistical theory and simulations of effect sizes, log odds ratios and regression coefficients demonstrate that this unrestricted weighted least squares estimator provides satisfactory estimates and confidence intervals that are comparable to random effects when there is no publication (or small-sample) bias and identical to fixed-effect meta-analysis when there is no heterogeneity. When there is publication selection bias, the unrestricted weighted least squares approach dominates random effects; when there is excess heterogeneity, it is clearly superior to fixed-effect meta-analysis. In practical applications, an unrestricted weighted least squares weighted average will often provide superior estimates to both conventional fixed and random effects. Copyright © 2015 John Wiley & Sons, Ltd.
Vanderick, S; Harris, B L; Pryce, J E; Gengler, N
2009-03-01
In New Zealand, a large proportion of cows are currently crossbreds, mostly Holstein-Friesians (HF) x Jersey (JE). The genetic evaluation system for milk yields is considering the same additive genetic effects for all breeds. The objective was to model different additive effects according to parental breeds to obtain first estimates of correlations among breed-specific effects and to study the usefulness of this type of random regression test-day model. Estimates of (co)variance components for purebred HF and JE cattle in purebred herds were computed by using a single-breed model. This analysis showed differences between the 2 breeds, with a greater variability in the HF breed. (Co)variance components for purebred HF and JE and crossbred HF x JE cattle were then estimated by using a complete multibreed model in which computations of complete across-breed (co)variances were simplified by correlating only eigenvectors for HF and JE random regressions of the same order as obtained from the single-breed analysis. Parameter estimates differed more strongly than expected between the single-breed and multibreed analyses, especially for JE. This could be due to differences between animals and management in purebred and non-purebred herds. In addition, the model used only partially accounted for heterosis. The multibreed analysis showed additive genetic differences between the HF and JE breeds, expressed as genetic correlations of additive effects in both breeds, especially in linear and quadratic Legendre polynomials (respectively, 0.807 and 0.604). The differences were small for overall milk production (0.926). Results showed that permanent environmental lactation curves were highly correlated across breeds; however, intraherd lactation curves were also affected by the breed-environment interaction. This result may indicate the existence of breed-specific competition effects that vary through the different lactation stages. In conclusion, a multibreed model similar to the one presented could optimally use the environmental and genetic parameters and provide breed-dependent additive breeding values. This model could also be a useful tool to evaluate crossbred dairy cattle populations like those in New Zealand. However, a routine evaluation would still require the development of an improved methodology. It would also be computationally very challenging because of the simultaneous presence of a large number of breeds.
NASA Astrophysics Data System (ADS)
Hasan, Haliza; Ahmad, Sanizah; Osman, Balkish Mohd; Sapri, Shamsiah; Othman, Nadirah
2017-08-01
In regression analysis, missing covariate data has been a common problem. Many researchers use ad hoc methods to overcome this problem due to the ease of implementation. However, these methods require assumptions about the data that rarely hold in practice. Model-based methods such as Maximum Likelihood (ML) using the expectation maximization (EM) algorithm and Multiple Imputation (MI) are more promising when dealing with difficulties caused by missing data. Then again, inappropriate methods of missing value imputation can lead to serious bias that severely affects the parameter estimates. The main objective of this study is to provide a better understanding regarding missing data concept that can assist the researcher to select the appropriate missing data imputation methods. A simulation study was performed to assess the effects of different missing data techniques on the performance of a regression model. The covariate data were generated using an underlying multivariate normal distribution and the dependent variable was generated as a combination of explanatory variables. Missing values in covariate were simulated using a mechanism called missing at random (MAR). Four levels of missingness (10%, 20%, 30% and 40%) were imposed. ML and MI techniques available within SAS software were investigated. A linear regression analysis was fitted and the model performance measures; MSE, and R-Squared were obtained. Results of the analysis showed that MI is superior in handling missing data with highest R-Squared and lowest MSE when percent of missingness is less than 30%. Both methods are unable to handle larger than 30% level of missingness.
Benchmarking dairy herd health status using routinely recorded herd summary data.
Parker Gaddis, K L; Cole, J B; Clay, J S; Maltecca, C
2016-02-01
Genetic improvement of dairy cattle health through the use of producer-recorded data has been determined to be feasible. Low estimated heritabilities indicate that genetic progress will be slow. Variation observed in lowly heritable traits can largely be attributed to nongenetic factors, such as the environment. More rapid improvement of dairy cattle health may be attainable if herd health programs incorporate environmental and managerial aspects. More than 1,100 herd characteristics are regularly recorded on farm test-days. We combined these data with producer-recorded health event data, and parametric and nonparametric models were used to benchmark herd and cow health status. Health events were grouped into 3 categories for analyses: mastitis, reproductive, and metabolic. Both herd incidence and individual incidence were used as dependent variables. Models implemented included stepwise logistic regression, support vector machines, and random forests. At both the herd and individual levels, random forest models attained the highest accuracy for predicting health status in all health event categories when evaluated with 10-fold cross-validation. Accuracy (SD) ranged from 0.61 (0.04) to 0.63 (0.04) when using random forest models at the herd level. Accuracy of prediction (SD) at the individual cow level ranged from 0.87 (0.06) to 0.93 (0.001) with random forest models. Highly significant variables and key words from logistic regression and random forest models were also investigated. All models identified several of the same key factors for each health event category, including movement out of the herd, size of the herd, and weather-related variables. We concluded that benchmarking health status using routinely collected herd data is feasible. Nonparametric models were better suited to handle this complex data with numerous variables. These data mining techniques were able to perform prediction of health status and could add evidence to personal experience in herd management. Copyright © 2016 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
ERIC Educational Resources Information Center
Posavac, Steven S.; Posavac, Emil J.
2017-01-01
The authors describe the Pennies for Milk exercise, a participative classroom experience in which students generate a regression to the mean effect within the context of simulated household milk purchases. Regression to the mean is a ubiquitous threat for marketing researchers and managers but is often hard for students to understand. The Pennies…
Predicting recreational water quality advisories: A comparison of statistical methods
Brooks, Wesley R.; Corsi, Steven R.; Fienen, Michael N.; Carvin, Rebecca B.
2016-01-01
Epidemiological studies indicate that fecal indicator bacteria (FIB) in beach water are associated with illnesses among people having contact with the water. In order to mitigate public health impacts, many beaches are posted with an advisory when the concentration of FIB exceeds a beach action value. The most commonly used method of measuring FIB concentration takes 18–24 h before returning a result. In order to avoid the 24 h lag, it has become common to ”nowcast” the FIB concentration using statistical regressions on environmental surrogate variables. Most commonly, nowcast models are estimated using ordinary least squares regression, but other regression methods from the statistical and machine learning literature are sometimes used. This study compares 14 regression methods across 7 Wisconsin beaches to identify which consistently produces the most accurate predictions. A random forest model is identified as the most accurate, followed by multiple regression fit using the adaptive LASSO.
A refined method for multivariate meta-analysis and meta-regression
Jackson, Daniel; Riley, Richard D
2014-01-01
Making inferences about the average treatment effect using the random effects model for meta-analysis is problematic in the common situation where there is a small number of studies. This is because estimates of the between-study variance are not precise enough to accurately apply the conventional methods for testing and deriving a confidence interval for the average effect. We have found that a refined method for univariate meta-analysis, which applies a scaling factor to the estimated effects’ standard error, provides more accurate inference. We explain how to extend this method to the multivariate scenario and show that our proposal for refined multivariate meta-analysis and meta-regression can provide more accurate inferences than the more conventional approach. We explain how our proposed approach can be implemented using standard output from multivariate meta-analysis software packages and apply our methodology to two real examples. © 2013 The Authors. Statistics in Medicine published by John Wiley & Sons, Ltd. PMID:23996351
Estimation of genetic parameters for milk yield in Murrah buffaloes by Bayesian inference.
Breda, F C; Albuquerque, L G; Euclydes, R F; Bignardi, A B; Baldi, F; Torres, R A; Barbosa, L; Tonhati, H
2010-02-01
Random regression models were used to estimate genetic parameters for test-day milk yield in Murrah buffaloes using Bayesian inference. Data comprised 17,935 test-day milk records from 1,433 buffaloes. Twelve models were tested using different combinations of third-, fourth-, fifth-, sixth-, and seventh-order orthogonal polynomials of weeks of lactation for additive genetic and permanent environmental effects. All models included the fixed effects of contemporary group, number of daily milkings and age of cow at calving as covariate (linear and quadratic effect). In addition, residual variances were considered to be heterogeneous with 6 classes of variance. Models were selected based on the residual mean square error, weighted average of residual variance estimates, and estimates of variance components, heritabilities, correlations, eigenvalues, and eigenfunctions. Results indicated that changes in the order of fit for additive genetic and permanent environmental random effects influenced the estimation of genetic parameters. Heritability estimates ranged from 0.19 to 0.31. Genetic correlation estimates were close to unity between adjacent test-day records, but decreased gradually as the interval between test-days increased. Results from mean squared error and weighted averages of residual variance estimates suggested that a model considering sixth- and seventh-order Legendre polynomials for additive and permanent environmental effects, respectively, and 6 classes for residual variances, provided the best fit. Nevertheless, this model presented the largest degree of complexity. A more parsimonious model, with fourth- and sixth-order polynomials, respectively, for these same effects, yielded very similar genetic parameter estimates. Therefore, this last model is recommended for routine applications. Copyright 2010 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Bayesian logistic regression approaches to predict incorrect DRG assignment.
Suleiman, Mani; Demirhan, Haydar; Boyd, Leanne; Girosi, Federico; Aksakalli, Vural
2018-05-07
Episodes of care involving similar diagnoses and treatments and requiring similar levels of resource utilisation are grouped to the same Diagnosis-Related Group (DRG). In jurisdictions which implement DRG based payment systems, DRGs are a major determinant of funding for inpatient care. Hence, service providers often dedicate auditing staff to the task of checking that episodes have been coded to the correct DRG. The use of statistical models to estimate an episode's probability of DRG error can significantly improve the efficiency of clinical coding audits. This study implements Bayesian logistic regression models with weakly informative prior distributions to estimate the likelihood that episodes require a DRG revision, comparing these models with each other and to classical maximum likelihood estimates. All Bayesian approaches had more stable model parameters than maximum likelihood. The best performing Bayesian model improved overall classification per- formance by 6% compared to maximum likelihood, with a 34% gain compared to random classification, respectively. We found that the original DRG, coder and the day of coding all have a significant effect on the likelihood of DRG error. Use of Bayesian approaches has improved model parameter stability and classification accuracy. This method has already lead to improved audit efficiency in an operational capacity.
The impact of green stormwater infrastructure installation on surrounding health and safety.
Kondo, Michelle C; Low, Sarah C; Henning, Jason; Branas, Charles C
2015-03-01
We investigated the health and safety effects of urban green stormwater infrastructure (GSI) installments. We conducted a difference-in-differences analysis of the effects of GSI installments on health (e.g., blood pressure, cholesterol and stress levels) and safety (e.g., felonies, nuisance and property crimes, narcotics crimes) outcomes from 2000 to 2012 in Philadelphia, Pennsylvania. We used mixed-effects regression models to compare differences in pre- and posttreatment measures of outcomes for treatment sites (n=52) and randomly chosen, matched control sites (n=186) within multiple geographic extents surrounding GSI sites. Regression-adjusted models showed consistent and statistically significant reductions in narcotics possession (18%-27% less) within 16th-mile, quarter-mile, half-mile (P<.001), and eighth-mile (P<.01) distances from treatment sites and at the census tract level (P<.01). Narcotics manufacture and burglaries were also significantly reduced at multiple scales. Nonsignificant reductions in homicides, assaults, thefts, public drunkenness, and narcotics sales were associated with GSI installation in at least 1 geographic extent. Health and safety considerations should be included in future assessments of GSI programs. Subsequent studies should assess mechanisms of this association.
The Impact of Green Stormwater Infrastructure Installation on Surrounding Health and Safety
Low, Sarah C.; Henning, Jason; Branas, Charles C.
2015-01-01
Objectives. We investigated the health and safety effects of urban green stormwater infrastructure (GSI) installments. Methods. We conducted a difference-in-differences analysis of the effects of GSI installments on health (e.g., blood pressure, cholesterol and stress levels) and safety (e.g., felonies, nuisance and property crimes, narcotics crimes) outcomes from 2000 to 2012 in Philadelphia, Pennsylvania. We used mixed-effects regression models to compare differences in pre- and posttreatment measures of outcomes for treatment sites (n = 52) and randomly chosen, matched control sites (n = 186) within multiple geographic extents surrounding GSI sites. Results. Regression-adjusted models showed consistent and statistically significant reductions in narcotics possession (18%–27% less) within 16th-mile, quarter-mile, half-mile (P < .001), and eighth-mile (P < .01) distances from treatment sites and at the census tract level (P < .01). Narcotics manufacture and burglaries were also significantly reduced at multiple scales. Nonsignificant reductions in homicides, assaults, thefts, public drunkenness, and narcotics sales were associated with GSI installation in at least 1 geographic extent. Conclusions. Health and safety considerations should be included in future assessments of GSI programs. Subsequent studies should assess mechanisms of this association. PMID:25602887
ERIC Educational Resources Information Center
Shadish, William R.
2011-01-01
This article reviews several decades of the author's meta-analytic and experimental research on the conditions under which nonrandomized experiments can approximate the results from randomized experiments (REs). Several studies make clear that we can expect accurate effect estimates from the regression discontinuity design, though its statistical…
NASA Astrophysics Data System (ADS)
Thanos, Konstantinos-Georgios; Thomopoulos, Stelios C. A.
2016-05-01
wayGoo is a fully functional application whose main functionalities include content geolocation, event scheduling, and indoor navigation. However, significant information about events do not reach users' attention, either because of the size of this information or because some information comes from real - time data sources. The purpose of this work is to facilitate event management operations by prioritizing the presented events, based on users' interests using both, static and real - time data. Through the wayGoo interface, users select conceptual topics that are interesting for them. These topics constitute a browsing behavior vector which is used for learning users' interests implicitly, without being intrusive. Then, the system estimates user preferences and return an events list sorted from the most preferred one to the least. User preferences are modeled via a Naïve Bayesian Network which consists of: a) the `decision' random variable corresponding to users' decision on attending an event, b) the `distance' random variable, modeled by a linear regression that estimates the probability that the distance between a user and each event destination is not discouraging, ` the seat availability' random variable, modeled by a linear regression, which estimates the probability that the seat availability is encouraging d) and the `relevance' random variable, modeled by a clustering - based collaborative filtering, which determines the relevance of each event users' interests. Finally, experimental results show that the proposed system contribute essentially to assisting users in browsing and selecting events to attend.
Austin, Peter C; Lee, Douglas S; Steyerberg, Ewout W; Tu, Jack V
2012-01-01
In biomedical research, the logistic regression model is the most commonly used method for predicting the probability of a binary outcome. While many clinical researchers have expressed an enthusiasm for regression trees, this method may have limited accuracy for predicting health outcomes. We aimed to evaluate the improvement that is achieved by using ensemble-based methods, including bootstrap aggregation (bagging) of regression trees, random forests, and boosted regression trees. We analyzed 30-day mortality in two large cohorts of patients hospitalized with either acute myocardial infarction (N = 16,230) or congestive heart failure (N = 15,848) in two distinct eras (1999–2001 and 2004–2005). We found that both the in-sample and out-of-sample prediction of ensemble methods offered substantial improvement in predicting cardiovascular mortality compared to conventional regression trees. However, conventional logistic regression models that incorporated restricted cubic smoothing splines had even better performance. We conclude that ensemble methods from the data mining and machine learning literature increase the predictive performance of regression trees, but may not lead to clear advantages over conventional logistic regression models for predicting short-term mortality in population-based samples of subjects with cardiovascular disease. PMID:22777999
Taku, Kyoko; Melby, Melissa K; Kurzer, Mindy S; Mizuno, Shoichi; Watanabe, Shaw; Ishimi, Yoshiko
2010-08-01
Effects of soy isoflavone supplements on bone turnover markers remain unclear. This up-to-date systematic review and meta-analysis of randomized controlled trials (RCTs) was performed primarily to more completely and precisely clarify the effects on urinary deoxypyridinoline (DPD) and serum bone alkaline phosphatase (BAP) and secondarily to evaluate the effects on other bone turnover markers, compared with placebo in menopausal women. PubMed, CENTRAL, ICHUSHI, and CNKI were searched in June 2009 for relevant studies of RCTs. Data on study design, participants, interventions, and outcomes were extracted and methodological quality of each included trial was assessed. From 3740 identified relevant articles, 10 (887 participants), 10 (1210 participants), and 8 (380 participants) RCTs were selected for meta-analysis of effects on DPD, BAP, and serum osteocalcin (OC), respectively, using Review Manager 5.0.22. Daily ingestion of an average 56 mg soy isoflavones (aglycone equivalents) for 10 weeks to 12 months significantly decreased DPD by 14.1% (95% CI: -26.8% to -1.5%; P=0.03) compared to baseline (heterogeneity: P<0.00001; I(2)=93%; random effects model). The overall effect of soy isoflavones on DPD compared with placebo was a significant decrease of -18.0% (95% CI: -28.4% to -7.7%, P=0.0007; heterogeneity: P=0.0001; I(2)=73%; random effects model). Subgroup analyses and meta-regressions revealed that isoflavone dose and intervention duration did not significantly relate to the variable effects on DPD. Daily supplementation of about 84 mg and 73 mg of soy isoflavones for up to 12 months insignificantly increased BAP by 8.0% (95% CI: -4.2% to 20.2%, P=0.20; heterogeneity: P<0.00001; I(2)=98%) and OC by 10.3% (95% CI: -3.1% to 23.7%, P=0.13; heterogeneity: P=0.002; I(2)=69%) compared with placebo (random effects model), respectively. Soy isoflavone supplements moderately decreased the bone resorption marker DPD, but did not affect bone formation markers BAP and OC in menopausal women. The effects varied between studies, and further studies are needed to address factors relating to the observed effects of soy isoflavones on DPD and to verify effects on other bone turnover markers. Copyright 2010 Elsevier Inc. All rights reserved.
Generated effect modifiers (GEM's) in randomized clinical trials.
Petkova, Eva; Tarpey, Thaddeus; Su, Zhe; Ogden, R Todd
2017-01-01
In a randomized clinical trial (RCT), it is often of interest not only to estimate the effect of various treatments on the outcome, but also to determine whether any patient characteristic has a different relationship with the outcome, depending on treatment. In regression models for the outcome, if there is a non-zero interaction between treatment and a predictor, that predictor is called an "effect modifier". Identification of such effect modifiers is crucial as we move towards precision medicine, that is, optimizing individual treatment assignment based on patient measurements assessed when presenting for treatment. In most settings, there will be several baseline predictor variables that could potentially modify the treatment effects. This article proposes optimal methods of constructing a composite variable (defined as a linear combination of pre-treatment patient characteristics) in order to generate an effect modifier in an RCT setting. Several criteria are considered for generating effect modifiers and their performance is studied via simulations. An example from a RCT is provided for illustration. © The Author 2016. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
NASA Technical Reports Server (NTRS)
Boyce, Lola; Bast, Callie C.; Trimble, Greg A.
1992-01-01
This report presents the results of a fourth year effort of a research program, conducted for NASA-LeRC by the University of Texas at San Antonio (UTSA). The research included on-going development of methodology that provides probabilistic lifetime strength of aerospace materials via computational simulation. A probabilistic material strength degradation model, in the form of a randomized multifactor interaction equation, is postulated for strength degradation of structural components of aerospace propulsion systems subject to a number of effects or primitive variables. These primitive variables may include high temperature, fatigue or creep. In most cases, strength is reduced as a result of the action of a variable. This multifactor interaction strength degradation equation has been randomized and is included in the computer program, PROMISS. Also included in the research is the development of methodology to calibrate the above-described constitutive equation using actual experimental materials data together with regression analysis of that data, thereby predicting values for the empirical material constants for each effect or primitive variable. This regression methodology is included in the computer program, PROMISC. Actual experimental materials data were obtained from industry and the open literature for materials typically for applications in aerospace propulsion system components. Material data for Inconel 718 has been analyzed using the developed methodology.
NASA Technical Reports Server (NTRS)
Boyce, Lola; Bast, Callie C.; Trimble, Greg A.
1992-01-01
The results of a fourth year effort of a research program conducted for NASA-LeRC by The University of Texas at San Antonio (UTSA) are presented. The research included on-going development of methodology that provides probabilistic lifetime strength of aerospace materials via computational simulation. A probabilistic material strength degradation model, in the form of a randomized multifactor interaction equation, is postulated for strength degradation of structural components of aerospace propulsion systems subjected to a number of effects or primitive variables. These primitive variables may include high temperature, fatigue, or creep. In most cases, strength is reduced as a result of the action of a variable. This multifactor interaction strength degradation equation was randomized and is included in the computer program, PROMISC. Also included in the research is the development of methodology to calibrate the above-described constitutive equation using actual experimental materials data together with regression analysis of that data, thereby predicting values for the empirical material constants for each effect or primitive variable. This regression methodology is included in the computer program, PROMISC. Actual experimental materials data were obtained from industry and the open literature for materials typically for applications in aerospace propulsion system components. Material data for Inconel 718 was analyzed using the developed methodology.
Simulation of land use change in the three gorges reservoir area based on CART-CA
NASA Astrophysics Data System (ADS)
Yuan, Min
2018-05-01
This study proposes a new method to simulate spatiotemporal complex multiple land uses by using classification and regression tree algorithm (CART) based CA model. In this model, we use classification and regression tree algorithm to calculate land class conversion probability, and combine neighborhood factor, random factor to extract cellular transformation rules. The overall Kappa coefficient is 0.8014 and the overall accuracy is 0.8821 in the land dynamic simulation results of the three gorges reservoir area from 2000 to 2010, and the simulation results are satisfactory.
Singh, Gyanendra; Sachdeva, S N; Pal, Mahesh
2016-11-01
This work examines the application of M5 model tree and conventionally used fixed/random effect negative binomial (FENB/RENB) regression models for accident prediction on non-urban sections of highway in Haryana (India). Road accident data for a period of 2-6 years on different sections of 8 National and State Highways in Haryana was collected from police records. Data related to road geometry, traffic and road environment related variables was collected through field studies. Total two hundred and twenty two data points were gathered by dividing highways into sections with certain uniform geometric characteristics. For prediction of accident frequencies using fifteen input parameters, two modeling approaches: FENB/RENB regression and M5 model tree were used. Results suggest that both models perform comparably well in terms of correlation coefficient and root mean square error values. M5 model tree provides simple linear equations that are easy to interpret and provide better insight, indicating that this approach can effectively be used as an alternative to RENB approach if the sole purpose is to predict motor vehicle crashes. Sensitivity analysis using M5 model tree also suggests that its results reflect the physical conditions. Both models clearly indicate that to improve safety on Indian highways minor accesses to the highways need to be properly designed and controlled, the service roads to be made functional and dispersion of speeds is to be brought down. Copyright © 2016 Elsevier Ltd. All rights reserved.
A bioavailable strontium isoscape for Western Europe: A machine learning approach
von Holstein, Isabella C. C.; Laffoon, Jason E.; Willmes, Malte; Liu, Xiao-Ming; Davies, Gareth R.
2018-01-01
Strontium isotope ratios (87Sr/86Sr) are gaining considerable interest as a geolocation tool and are now widely applied in archaeology, ecology, and forensic research. However, their application for provenance requires the development of baseline models predicting surficial 87Sr/86Sr variations (“isoscapes”). A variety of empirically-based and process-based models have been proposed to build terrestrial 87Sr/86Sr isoscapes but, in their current forms, those models are not mature enough to be integrated with continuous-probability surface models used in geographic assignment. In this study, we aim to overcome those limitations and to predict 87Sr/86Sr variations across Western Europe by combining process-based models and a series of remote-sensing geospatial products into a regression framework. We find that random forest regression significantly outperforms other commonly used regression and interpolation methods, and efficiently predicts the multi-scale patterning of 87Sr/86Sr variations by accounting for geological, geomorphological and atmospheric controls. Random forest regression also provides an easily interpretable and flexible framework to integrate different types of environmental auxiliary variables required to model the multi-scale patterning of 87Sr/86Sr variability. The method is transferable to different scales and resolutions and can be applied to the large collection of geospatial data available at local and global levels. The isoscape generated in this study provides the most accurate 87Sr/86Sr predictions in bioavailable strontium for Western Europe (R2 = 0.58 and RMSE = 0.0023) to date, as well as a conservative estimate of spatial uncertainty by applying quantile regression forest. We anticipate that the method presented in this study combined with the growing numbers of bioavailable 87Sr/86Sr data and satellite geospatial products will extend the applicability of the 87Sr/86Sr geo-profiling tool in provenance applications. PMID:29847595
Jakubovski, Ewgeni; Varigonda, Anjali L; Freemantle, Nicholas; Taylor, Matthew J; Bloch, Michael H
2016-02-01
Previous studies suggested that the treatment response to selective serotonin reuptake inhibitors (SSRIs) in major depressive disorder follows a flat response curve within the therapeutic dose range. The present study was designed to clarify the relationship between dosage and treatment response in major depressive disorder. The authors searched PubMed for randomized placebo-controlled trials examining the efficacy of SSRIs for treating adults with major depressive disorder. Trials were also required to assess improvement in depression severity at multiple time points. Additional data were collected on treatment response and all-cause and side effect-related discontinuation. All medication doses were transformed into imipramine-equivalent doses. The longitudinal data were analyzed with a mixed-regression model. Endpoint and tolerability analyses were analyzed using meta-regression and stratified subgroup analysis by predefined SSRI dose categories in order to assess the effect of SSRI dosing on the efficacy and tolerability of SSRIs for major depressive disorder. Forty studies involving 10,039 participants were included. Longitudinal modeling (dose-by-time interaction=0.0007, 95% CI=0.0001-0.0013) and endpoint analysis (meta-regression: β=0.00053, 95% CI=0.00018-0.00088, z=2.98) demonstrated a small but statistically significant positive association between SSRI dose and efficacy. Higher doses of SSRIs were associated with an increased likelihood of dropouts due to side effects (meta-regression: β=0.00207, 95% CI=0.00071-0.00342, z=2.98) and decreased likelihood of all-cause dropout (meta-regression: β=-0.00093, 95% CI=-0.00165 to -0.00021, z=-2.54). Higher doses of SSRIs appear slightly more effective in major depressive disorder. This benefit appears to plateau at around 250 mg of imipramine equivalents (50 mg of fluoxetine). The slightly increased benefits of SSRIs at higher doses are somewhat offset by decreased tolerability at high doses.
Regression Discontinuity Design in Gifted and Talented Education Research
ERIC Educational Resources Information Center
Matthews, Michael S.; Peters, Scott J.; Housand, Angela M.
2012-01-01
This Methodological Brief introduces the reader to the regression discontinuity design (RDD), which is a method that when used correctly can yield estimates of research treatment effects that are equivalent to those obtained through randomized control trials and can therefore be used to infer causality. However, RDD does not require the random…
2012-01-01
Background A discrete choice experiment (DCE) is a preference survey which asks participants to make a choice among product portfolios comparing the key product characteristics by performing several choice tasks. Analyzing DCE data needs to account for within-participant correlation because choices from the same participant are likely to be similar. In this study, we empirically compared some commonly-used statistical methods for analyzing DCE data while accounting for within-participant correlation based on a survey of patient preference for colorectal cancer (CRC) screening tests conducted in Hamilton, Ontario, Canada in 2002. Methods A two-stage DCE design was used to investigate the impact of six attributes on participants' preferences for CRC screening test and willingness to undertake the test. We compared six models for clustered binary outcomes (logistic and probit regressions using cluster-robust standard error (SE), random-effects and generalized estimating equation approaches) and three models for clustered nominal outcomes (multinomial logistic and probit regressions with cluster-robust SE and random-effects multinomial logistic model). We also fitted a bivariate probit model with cluster-robust SE treating the choices from two stages as two correlated binary outcomes. The rank of relative importance between attributes and the estimates of β coefficient within attributes were used to assess the model robustness. Results In total 468 participants with each completing 10 choices were analyzed. Similar results were reported for the rank of relative importance and β coefficients across models for stage-one data on evaluating participants' preferences for the test. The six attributes ranked from high to low as follows: cost, specificity, process, sensitivity, preparation and pain. However, the results differed across models for stage-two data on evaluating participants' willingness to undertake the tests. Little within-patient correlation (ICC ≈ 0) was found in stage-one data, but substantial within-patient correlation existed (ICC = 0.659) in stage-two data. Conclusions When small clustering effect presented in DCE data, results remained robust across statistical models. However, results varied when larger clustering effect presented. Therefore, it is important to assess the robustness of the estimates via sensitivity analysis using different models for analyzing clustered data from DCE studies. PMID:22348526
Zhang, Zhongheng; Ni, Hongying; Xu, Xiao
2014-08-01
Propensity score (PS) analysis has been increasingly used in critical care medicine; however, its validation has not been systematically investigated. The present study aimed to compare effect sizes in PS-based observational studies vs. randomized controlled trials (RCTs) (or meta-analysis of RCTs). Critical care observational studies using PS were systematically searched in PubMed from inception to April 2013. Identified PS-based studies were matched to one or more RCTs in terms of population, intervention, comparison, and outcome. The effect sizes of experimental treatments were compared for PS-based studies vs. RCTs (or meta-analysis of RCTs) with sign test. Furthermore, ratio of odds ratio (ROR) was calculated from the interaction term of treatment × study type in a logistic regression model. A ROR < 1 indicates greater benefit for experimental treatment in RCTs compared with PS-based studies. RORs of each comparison were pooled by using meta-analytic approach with random-effects model. A total of 20 PS-based studies were identified and matched to RCTs. Twelve of the 20 comparisons showed greater beneficial effect for experimental treatment in RCTs than that in PS-based studies (sign test P = 0.503). The difference was statistically significant in four comparisons. ROR can be calculated from 13 comparisons, of which four showed significantly greater beneficial effect for experimental treatment in RCTs. The pooled ROR was 0.71 (95% CI: 0.63, 0.79; P = 0.002), suggesting that RCTs (or meta-analysis of RCTs) were more likely to report beneficial effect for the experimental treatment than PS-based studies. The result remained unchanged in sensitivity analysis and meta-regression. In critical care literature, PS-based observational study is likely to report less beneficial effect of experimental treatment compared with RCTs (or meta-analysis of RCTs). Copyright © 2014 Elsevier Inc. All rights reserved.
Hedeker, D; Flay, B R; Petraitis, J
1996-02-01
Methods are proposed and described for estimating the degree to which relations among variables vary at the individual level. As an example of the methods, M. Fishbein and I. Ajzen's (1975; I. Ajzen & M. Fishbein, 1980) theory of reasoned action is examined, which posits first that an individual's behavioral intentions are a function of 2 components: the individual's attitudes toward the behavior and the subjective norms as perceived by the individual. A second component of their theory is that individuals may weight these 2 components differently in assessing their behavioral intentions. This article illustrates the use of empirical Bayes methods based on a random-effects regression model to estimate these individual influences, estimating an individual's weighting of both of these components (attitudes toward the behavior and subjective norms) in relation to their behavioral intentions. This method can be used when an individual's behavioral intentions, subjective norms, and attitudes toward the behavior are all repeatedly measured. In this case, the empirical Bayes estimates are derived as a function of the data from the individual, strengthened by the overall sample data.
Analysis of Learning Curve Fitting Techniques.
1987-09-01
1986. 15. Neter, John and others. Applied Linear Regression Models. Homewood IL: Irwin, 19-33. 16. SAS User’s Guide: Basics, Version 5 Edition. SAS... Linear Regression Techniques (15:23-52). Random errors are assumed to be normally distributed when using -# ordinary least-squares, according to Johnston...lot estimated by the improvement curve formula. For a more detailed explanation of the ordinary least-squares technique, see Neter, et. al., Applied
Brian S. Cade; Barry R. Noon; Rick D. Scherer; John J. Keane
2017-01-01
Counts of avian fledglings, nestlings, or clutch size that are bounded below by zero and above by some small integer form a discrete random variable distribution that is not approximated well by conventional parametric count distributions such as the Poisson or negative binomial. We developed a logistic quantile regression model to provide estimates of the empirical...
Lee, Chang-Hyun; Chung, Chun Kee; Kim, Chi Heon
2017-11-01
Radiofrequency denervation is commonly used for the treatment of chronic facet joint pain that has been refractory to more conservative treatments, although the evidence supporting this treatment has been controversial. We aimed to elucidate the precise effects of radiofrequency denervation in patients with low back pain originating from the facet joints relative to those obtained using control treatments, with particular attention to consistency in the denervation protocol. A meta-analysis of randomized controlled trials was carried out. Adult patients undergoing radiofrequency denervation or control treatments (sham or epidural block) for facet joint disease of the lumbar spine comprised the patient sample. Visual analog scale (VAS) pain scores were measured and stratified by response of diagnostic block procedures. We searched PubMed, Embase, Web of Science, and the Cochrane Database for randomized controlled trials regarding radiofrequency denervation and control treatments for back pain. Changes in VAS pain scores of the radiofrequency group were compared with those of the control group as well as the minimal clinically important difference (MCID) for back pain VAS. Meta-regression model was developed to evaluate the effect of radiofrequency treatment according to responses of diagnostic block while controlling for other variables. We then calculated mean differences and 95% confidence intervals (CIs) using random-effects models. We included data from seven trials involving 454 patients who had undergone radiofrequency denervation (231 patients) and control treatments such as sham or epidural block procedures (223 patients). The radiofrequency group exhibited significantly greater improvements in back pain score when compared with the control group for 1-year follow-up. Although the average improvement in VAS scores exceeded the MCID, the lower limit of the 95% CI encompassed the MCID. A subgroup of patients who responded very well to diagnostic block procedures demonstrated significant improvements in back pain relative to the control group at all times. When placed into our meta-regression model, the response to diagnostic block procedure was responsible for a statistically significant portion of treatment effect. Studies published over the last two decades revealed that radiofrequency denervation reduced back pain significantly in patients with facet joint disease compared with the MCID and control treatments. Conventional radiofrequency denervation resulted in significant reductions in low back pain originating from the facet joints in patients showing the best response to diagnostic block over the first 12 months when compared with sham procedures or epidural nerve blocks. Copyright © 2017 Elsevier Inc. All rights reserved.
Ciardulli, Andrea; D'Antonio, Francesco; Magro-Malosso, Elena R; Manzoli, Lamberto; Anisman, Paul; Saccone, Gabriele; Berghella, Vincenzo
2018-03-07
To explore the effect of maternal fluorinated steroid therapy on fetuses affected by second-degree immune-mediated congenital atrioventricular block. Studies reporting the outcome of fetuses with second-degree immune-mediated congenital atrioventricular block diagnosed on prenatal ultrasound and treated with fluorinated steroids compared with those not treated were included. The primary outcome was the overall progression of congenital atrioventricular block to either continuous or intermittent third-degree congenital atrioventricular block at birth. Meta-analyses of proportions using random effect model and meta-analyses using individual data random-effect logistic regression were used. Five studies (71 fetuses) were included. The progression rate to congenital atrioventricular block at birth in fetuses treated with steroids was 52% (95% confidence interval 23-79) and in fetuses not receiving steroid therapy 73% (95% confidence interval 39-94). The overall rate of regression to either first-degree, intermittent first-/second-degree or sinus rhythm in fetuses treated with steroids was 25% (95% confidence interval 12-41) compared with 23% (95% confidence interval 8-44) in those not treated. Stable (constant) second-degree congenital atrioventricular block at birth was present in 11% (95% confidence interval 2-27) of cases in the treated group and in none of the newborns in the untreated group, whereas complete regression to sinus rhythm occurred in 21% (95% confidence interval 6-42) of fetuses receiving steroids vs. 9% (95% confidence interval 0-41) of those untreated. There is still limited evidence as to the benefit of administered fluorinated steroids in terms of affecting outcome of fetuses with second-degree immune-mediated congenital atrioventricular block. © 2018 Nordic Federation of Societies of Obstetrics and Gynecology.
Zhang, Xueying; Chu, Yiyi; Wang, Yuxuan; Zhang, Kai
2018-08-01
The regulatory monitoring data of particulate matter with an aerodynamic diameter <2.5μm (PM 2.5 ) in Texas have limited spatial and temporal coverage. The purpose of this study is to estimate the ground-level PM 2.5 concentrations on a daily basis using satellite-retrieved Aerosol Optical Depth (AOD) in the state of Texas. We obtained the AOD values at 1-km resolution generated through the Multi-Angle Implementation of Atmospheric Correction (MAIAC) algorithm based on the images retrieved from the Moderate Resolution Imaging Spectroradiometer (MODIS) satellites. We then developed mixed-effects models based on AODs, land use features, geographic characteristics, and weather conditions, and the day-specific as well as site-specific random effects to estimate the PM 2.5 concentrations (μg/m 3 ) in the state of Texas during the period 2008-2013. The mixed-effects models' performance was evaluated using the coefficient of determination (R 2 ) and square root of the mean squared prediction error (RMSPE) from ten-fold cross-validation, which randomly selected 90% of the observations for training purpose and 10% of the observations for assessing the models' true prediction ability. Mixed-effects regression models showed good prediction performance (R 2 values from 10-fold cross validation: 0.63-0.69). The model performance varied by regions and study years, and the East region of Texas, and year of 2009 presented relatively higher prediction precision (R 2 : 0.62 for the East region; R 2 : 0.69 for the year of 2009). The PM 2.5 concentrations generated through our developed models at 1-km grid cells in the state of Texas showed a decreasing trend from 2008 to 2013 and a higher reduction of predicted PM 2.5 in more polluted areas. Our findings suggest that mixed-effects regression models developed based on MAIAC AOD are a feasible approach to predict ground-level PM 2.5 in Texas. Predicted PM 2.5 concentrations at the 1-km resolution on a daily basis can be used for epidemiological studies to investigate short- and long-term health impact of PM 2.5 in Texas. Copyright © 2017 Elsevier B.V. All rights reserved.
Validation of test-day models for genetic evaluation of dairy goats in Norway.
Andonov, S; Ødegård, J; Boman, I A; Svendsen, M; Holme, I J; Adnøy, T; Vukovic, V; Klemetsdal, G
2007-10-01
Test-day data for daily milk yield and fat, protein, and lactose content were sampled from the years 1988 to 2003 in 17 flocks belonging to 2 genetically well-tied buck circles. In total, records from 2,111 to 2,215 goats for content traits and 2,371 goats for daily milk yield were included in the analysis, averaging 2.6 and 4.8 observations per goat for the 2 groups of traits, respectively. The data were analyzed by using 4 test-day models with different modeling of fixed effects. Model [0] (the reference model) contained a fixed effect of year-season of kidding with regression on Ali-Schaeffer polynomials nested within the year-season classes, and a random effect of flock test-day. In model [1], the lactation curve effect from model [0] was replaced by a fixed effect of days in milk (in 3-d periods), the same for all year-seasons of kidding. Models [2] and [3] were obtained from model [1] by removing the fixed year-season of kidding effect and considering the flock test-day effect as either fixed or random, respectively. The models were compared by using 2 criteria: mean-squared error of prediction and a test of bias affecting the genetic trend. The first criterion indicated a preference for model [3], whereas the second criterion preferred model [1]. Mean-squared error of prediction is based on model fit, whereas the second criterion tests the ability of the model to produce unbiased genetic evaluation (i.e., its capability of separating environmental and genetic time trends). Thus, a fixed structure with year (year, year-season, or possibly flock-year) was indicated to appropriately separate time trends. Heritability estimates for daily milk yield and milk content were 0.26 and 0.24 to 0.27, respectively.
Crane, David; Brown, Jamie; Kaner, Eileen; Beyer, Fiona; Muirhead, Colin; Hickman, Matthew; Redmore, James; de Vocht, Frank; Beard, Emma; Michie, Susan
2018-01-01
Background Applying theory to the design and evaluation of interventions is likely to increase effectiveness and improve the evidence base from which future interventions are developed, though few interventions report this. Objective The aim of this paper was to assess how digital interventions to reduce hazardous and harmful alcohol consumption report the use of theory in their development and evaluation, and whether reporting of theory use is associated with intervention effectiveness. Methods Randomized controlled trials were extracted from a Cochrane review on digital interventions for reducing hazardous and harmful alcohol consumption. Reporting of theory use within these digital interventions was investigated using the theory coding scheme (TCS). Reported theory use was analyzed by frequency counts and descriptive statistics. Associations were analyzed with meta-regression models. Results Of 41 trials involving 42 comparisons, half did not mention theory (50% [21/42]), and only 38% (16/42) used theory to select or develop the intervention techniques. Significant heterogeneity existed between studies in the effect of interventions on alcohol reduction (I2=77.6%, P<.001). No significant associations were detected between reporting of theory use and intervention effectiveness in unadjusted models, though the meta-regression was underpowered to detect modest associations. Conclusions Digital interventions offer a unique opportunity to refine and develop new dynamic, temporally sensitive theories, yet none of the studies reported refining or developing theory. Clearer selection, application, and reporting of theory use is needed to accurately assess how useful theory is in this field and to advance the field of behavior change theories. PMID:29490895
Egg production forecasting: Determining efficient modeling approaches.
Ahmad, H A
2011-12-01
Several mathematical or statistical and artificial intelligence models were developed to compare egg production forecasts in commercial layers. Initial data for these models were collected from a comparative layer trial on commercial strains conducted at the Poultry Research Farms, Auburn University. Simulated data were produced to represent new scenarios by using means and SD of egg production of the 22 commercial strains. From the simulated data, random examples were generated for neural network training and testing for the weekly egg production prediction from wk 22 to 36. Three neural network architectures-back-propagation-3, Ward-5, and the general regression neural network-were compared for their efficiency to forecast egg production, along with other traditional models. The general regression neural network gave the best-fitting line, which almost overlapped with the commercial egg production data, with an R(2) of 0.71. The general regression neural network-predicted curve was compared with original egg production data, the average curves of white-shelled and brown-shelled strains, linear regression predictions, and the Gompertz nonlinear model. The general regression neural network was superior in all these comparisons and may be the model of choice if the initial overprediction is managed efficiently. In general, neural network models are efficient, are easy to use, require fewer data, and are practical under farm management conditions to forecast egg production.
Mocking, R J T; Harmsen, I; Assies, J; Koeter, M W J; Ruhé, H G; Schene, A H
2016-03-15
Omega-3 polyunsaturated fatty acid (PUFA) supplementation has been proposed as (adjuvant) treatment for major depressive disorder (MDD). In the present meta-analysis, we pooled randomized placebo-controlled trials assessing the effects of omega-3 PUFA supplementation on depressive symptoms in MDD. Moreover, we performed meta-regression to test whether supplementation effects depended on eicosapentaenoic acid (EPA) or docosahexaenoic acid dose, their ratio, study duration, participants' age, percentage antidepressant users, baseline MDD symptom severity, publication year and study quality. To limit heterogeneity, we only included studies in adult patients with MDD assessed using standardized clinical interviews, and excluded studies that specifically studied perinatal/perimenopausal or comorbid MDD. Our PubMED/EMBASE search resulted in 1955 articles, from which we included 13 studies providing 1233 participants. After taking potential publication bias into account, meta-analysis showed an overall beneficial effect of omega-3 PUFAs on depressive symptoms in MDD (standardized mean difference=0.398 (0.114-0.682), P=0.006, random-effects model). As an explanation for significant heterogeneity (I(2)=73.36, P<0.001), meta-regression showed that higher EPA dose (β=0.00037 (0.00009-0.00065), P=0.009), higher percentage antidepressant users (β=0.0058 (0.00017-0.01144), P=0.044) and earlier publication year (β=-0.0735 (-0.143 to 0.004), P=0.04) were significantly associated with better outcome for PUFA supplementation. Additional sensitivity analyses were performed. In conclusion, present meta-analysis suggested a beneficial overall effect of omega-3 PUFA supplementation in MDD patients, especially for higher doses of EPA and in participants taking antidepressants. Future precision medicine trials should establish whether possible interactions between EPA and antidepressants could provide targets to improve antidepressant response and its prediction. Furthermore, potential long-term biochemical side effects of high-dosed add-on EPA supplementation should be carefully monitored.
Padilha, Alessandro Haiduck; Cobuci, Jaime Araujo; Costa, Cláudio Napolis; Neto, José Braccini
2016-01-01
The aim of this study was to compare two random regression models (RRM) fitted by fourth (RRM4) and fifth-order Legendre polynomials (RRM5) with a lactation model (LM) for evaluating Holstein cattle in Brazil. Two datasets with the same animals were prepared for this study. To apply test-day RRM and LMs, 262,426 test day records and 30,228 lactation records covering 305 days were prepared, respectively. The lowest values of Akaike’s information criterion, Bayesian information criterion, and estimates of the maximum of the likelihood function (−2LogL) were for RRM4. Heritability for 305-day milk yield (305MY) was 0.23 (RRM4), 0.24 (RRM5), and 0.21 (LM). Heritability, additive genetic and permanent environmental variances of test days on days in milk was from 0.16 to 0.27, from 3.76 to 6.88 and from 11.12 to 20.21, respectively. Additive genetic correlations between test days ranged from 0.20 to 0.99. Permanent environmental correlations between test days were between 0.07 and 0.99. Standard deviations of average estimated breeding values (EBVs) for 305MY from RRM4 and RRM5 were from 11% to 30% higher for bulls and around 28% higher for cows than that in LM. Rank correlations between RRM EBVs and LM EBVs were between 0.86 to 0.96 for bulls and 0.80 to 0.87 for cows. Average percentage of gain in reliability of EBVs for 305-day yield increased from 4% to 17% for bulls and from 23% to 24% for cows when reliability of EBVs from RRM models was compared to those from LM model. Random regression model fitted by fourth order Legendre polynomials is recommended for genetic evaluations of Brazilian Holstein cattle because of the higher reliability in the estimation of breeding values. PMID:26954176
Padilha, Alessandro Haiduck; Cobuci, Jaime Araujo; Costa, Cláudio Napolis; Neto, José Braccini
2016-06-01
The aim of this study was to compare two random regression models (RRM) fitted by fourth (RRM4) and fifth-order Legendre polynomials (RRM5) with a lactation model (LM) for evaluating Holstein cattle in Brazil. Two datasets with the same animals were prepared for this study. To apply test-day RRM and LMs, 262,426 test day records and 30,228 lactation records covering 305 days were prepared, respectively. The lowest values of Akaike's information criterion, Bayesian information criterion, and estimates of the maximum of the likelihood function (-2LogL) were for RRM4. Heritability for 305-day milk yield (305MY) was 0.23 (RRM4), 0.24 (RRM5), and 0.21 (LM). Heritability, additive genetic and permanent environmental variances of test days on days in milk was from 0.16 to 0.27, from 3.76 to 6.88 and from 11.12 to 20.21, respectively. Additive genetic correlations between test days ranged from 0.20 to 0.99. Permanent environmental correlations between test days were between 0.07 and 0.99. Standard deviations of average estimated breeding values (EBVs) for 305MY from RRM4 and RRM5 were from 11% to 30% higher for bulls and around 28% higher for cows than that in LM. Rank correlations between RRM EBVs and LM EBVs were between 0.86 to 0.96 for bulls and 0.80 to 0.87 for cows. Average percentage of gain in reliability of EBVs for 305-day yield increased from 4% to 17% for bulls and from 23% to 24% for cows when reliability of EBVs from RRM models was compared to those from LM model. Random regression model fitted by fourth order Legendre polynomials is recommended for genetic evaluations of Brazilian Holstein cattle because of the higher reliability in the estimation of breeding values.
Oil Extraction and Indigenous Livelihoods in the Northern Ecuadorian Amazon
Bozigar, Matthew; Gray, Clark L.; Bilsborrow, Richard E.
2015-01-01
Globally, the extraction of minerals and fossil fuels is increasingly penetrating into isolated regions inhabited by indigenous peoples, potentially undermining their livelihoods and well-being. To provide new insight to this issue, we draw on a unique longitudinal dataset collected in the Ecuadorian Amazon over an 11-year period from 484 indigenous households with varying degrees of exposure to oil extraction. Fixed and random effects regression models of the consequences of oil activities for livelihood outcomes reveal mixed and multidimensional effects. These results challenge common assumptions about these processes and are only partly consistent with hypotheses drawn from the Dutch disease literature. PMID:26543302
Motivational and mindfulness intervention for young adult female marijuana users
de Dios, Marcel A.; Herman, Debra S.; Britton, Willoughby B.; Hagerty, Claire E.; Anderson, Bradley J.; Stein, Michael D.
2011-01-01
This pilot study tested the efficacy of a brief intervention using motivational interviewing (MI) plus mindfulness meditation (MM) to reduce marijuana use among young adult female. Thirty-four female marijuana users between the ages of 18–29 were randomized to either the intervention group (n = 22), consisting of 2 sessions of MI-MM or an assessment-only control group (n = 12). Participants’ marijuana use was assessed at baseline, 1, 2, and 3 months post-treatment. Fixed-effects regression modeling was used to analyze treatment effects. Participants randomized to the intervention group were found to use marijuana on 6.15 (z = −2.42, p=.015), 7.81 (z = −2.78, p=.005), and 6.83 (z = −2.23, p=.026) fewer days at months 1, 2, and 3, respectively, than controls. Findings from this pilot study provide preliminary evidence for the feasibility and effectiveness of a brief MI-MM for young adult female marijuana users. PMID:21940136
Mahoney, Gerald; Solomon, Richard
2016-05-01
This investigation is a secondary analysis of data from a randomized control trial of the PLAY Home Consultation Intervention Program which was conducted with 112 preschool children with Autism Spectrum Disorders and their parents (Solomon et al. in J Dev Behav Pediatr 35:475-485, 2014). Subjects were randomly assigned to either a community standard (CS) treatment group or to the PLAY Project plus CS Treatment (PLAY). PLAY subjects received monthly parent-child intervention sessions for 1 year during which parents learned how to use the rationale and interactive strategies of the Developmental, Individual-differences, Relationship-based (DIR) intervention model (Greenspan and Weider in The child with special needs: encouraging intellectual and emotional growth. DeCapo Press, Cambridge, MA, 1998) to engage in more responsive, affective and less directive interactions with their children. This investigation examined whether PLAY intervention effects on parents' style of interacting with their children as well as on children's social engagement mediated the effects of PLAY on children's autism severity as measured by ADOS calibrated severity scores. Regression procedures were used to test for mediation. There were two main findings. First the effects of PLAY on children's social engagement were mediated by the increases in parental responsiveness and affect that were promoted by PLAY. Second, the effects of PLAY on the severity children's Social Affect disorders were mediated by changes in parental responsiveness and affect; however, the effects of Responsive/Affect were mediated by the impact these variables had on children's social engagement. Results are discussed in terms of contemporary models of developmental change including the developmental change model that is the foundation for DIR.
Regression-assisted deconvolution.
McIntyre, Julie; Stefanski, Leonard A
2011-06-30
We present a semi-parametric deconvolution estimator for the density function of a random variable biX that is measured with error, a common challenge in many epidemiological studies. Traditional deconvolution estimators rely only on assumptions about the distribution of X and the error in its measurement, and ignore information available in auxiliary variables. Our method assumes the availability of a covariate vector statistically related to X by a mean-variance function regression model, where regression errors are normally distributed and independent of the measurement errors. Simulations suggest that the estimator achieves a much lower integrated squared error than the observed-data kernel density estimator when models are correctly specified and the assumption of normal regression errors is met. We illustrate the method using anthropometric measurements of newborns to estimate the density function of newborn length. Copyright © 2011 John Wiley & Sons, Ltd.
Dietary interventions to prevent and manage diabetes in worksite settings: a meta-analysis
Shrestha, Archana; Karmacharya, Biraj Man; Khudyakov, Polyna; Weber, Mary Beth; Spiegelman, Donna
2017-01-01
Objectives: The translation of lifestyle intervention to improve glucose tolerance into the workplace has been rare. The objective of this meta-analysis is to summarize the evidence for the effectiveness of dietary interventions in worksite settings on lowering blood sugar levels. Methods: We searched for studies in PubMed, Embase, Econlit, Ovid, Cochrane, Web of Science, and Cumulative Index to Nursing and Allied Health Literature. Search terms were as follows: (1) Exposure-based: nutrition/diet/dietary intervention/health promotion/primary prevention/health behavior/health education/food /program evaluation; (2) Outcome-based: diabetes/hyperglycemia/glucose/HbA1c/glycated hemoglobin; and (3) Setting-based: workplace/worksite/occupational/industry/job/employee. We manually searched review articles and reference lists of articles identified from 1969 to December 2016. We tested for between-studies heterogeneity and calculated the pooled effect sizes for changes in HbA1c (%) and fasting glucose (mg/dl) using random effect models for meta-analysis in 2016. Results: A total of 17 articles out of 1663 initially selected articles were included in the meta-analysis. With a random-effects model, worksite dietary interventions led to a pooled -0.18% (95% CI, -0.29 to -0.06; P<0.001) difference in HbA1c. With the random-effects model, the interventions resulted in 2.60 mg/dl lower fasting glucose with borderline significance (95% CI: -5.27 to 0.08, P=0.06). In the multivariate meta-regression model, the interventions with high percent of female participants and that used the intervention directly delivered to individuals, rather the environment changes, were associated with more effective interventions. Conclusion: Workplace dietary interventions can improve HbA1c. The effects were larger for the interventions with greater number of female participants and with individual-level interventions. PMID:29187673
NASA Astrophysics Data System (ADS)
Chakraborty, Joheen; Banerji, Sugata
2018-03-01
Driven by a desire to control climate change and reduce the dependence on fossil fuels, governments around the world are increasing the adoption of renewable energy sources. However, among the US states, we observe a wide disparity in renewable penetration. In this study, we have identified and cleaned over a dozen datasets representing solar energy penetration in each US state, and the potentially relevant socioeconomic and other factors that may be driving the growth in solar. We have applied a number of predictive modeling approaches - including machine learning and regression - on these datasets over a 17-year period and evaluated the relative performance of the models. Our goals were: (1) identify the most important factors that are driving the growth in solar, (2) choose the most effective predictive modeling technique for solar growth, and (3) develop a model for predicting next year’s solar growth using this year’s data. We obtained very promising results with random forests (about 90% efficacy) and varying degrees of success with support vector machines and regression techniques (linear, polynomial, ridge). We also identified states with solar growth slower than expected and representing a potential for stronger growth in future.
Siegrist, Karin; Millier, Aurelie; Amri, Ikbal; Aballéa, Samuel; Toumi, Mondher
2015-12-30
The lack of social contacts may be an important element in the presumed vicious circle aggravating, or at least stabilising negative symptoms in patients with schizophrenia. A European 2-year cohort study collected negative symptom scores, psychosocial functioning scores, objective social contact frequency scores and quality of life scores every 6 months. Bivariate analyses, correlation analyses, multivariate regressions and random effects regressions were conducted to describe relations between social contact and outcomes of interest and to gain a better understanding of this relation over time. Using data from 1208 patients with schizophrenia, a link between social contact frequency and negative symptom scores, functioning and quality of life at baseline was established. Regression models confirmed the significant association between social contact and negative symptoms as well as psychosocial functioning. This study aimed at demonstrating the importance of social contact for deficient behavioural aspects of schizophrenia. Copyright © 2015 The Authors. Published by Elsevier Ireland Ltd.. All rights reserved.
Leung, Michael; Bassani, Diego G; Racine-Poon, Amy; Goldenberg, Anna; Ali, Syed Asad; Kang, Gagandeep; Premkumar, Prasanna S; Roth, Daniel E
2017-09-10
Conditioning child growth measures on baseline accounts for regression to the mean (RTM). Here, we present the "conditional random slope" (CRS) model, based on a linear-mixed effects model that incorporates a baseline-time interaction term that can accommodate multiple data points for a child while also directly accounting for RTM. In two birth cohorts, we applied five approaches to estimate child growth velocities from 0 to 12 months to assess the effect of increasing data density (number of measures per child) on the magnitude of RTM of unconditional estimates, and the correlation and concordance between the CRS and four alternative metrics. Further, we demonstrated the differential effect of the choice of velocity metric on the magnitude of the association between infant growth and stunting at 2 years. RTM was minimally attenuated by increasing data density for unconditional growth modeling approaches. CRS and classical conditional models gave nearly identical estimates with two measures per child. Compared to the CRS estimates, unconditional metrics had moderate correlation (r = 0.65-0.91), but poor agreement in the classification of infants with relatively slow growth (kappa = 0.38-0.78). Estimates of the velocity-stunting association were the same for CRS and classical conditional models but differed substantially between conditional versus unconditional metrics. The CRS can leverage the flexibility of linear mixed models while addressing RTM in longitudinal analyses. © 2017 The Authors American Journal of Human Biology Published by Wiley Periodicals, Inc.
Niël-Weise, Barbara S; Stijnen, Theo; van den Broek, Peterhans J
2010-06-01
In this systematic review, we assessed the effect of in-line filters on infusion-related phlebitis associated with peripheral IV catheters. The study was designed as a systematic review and meta-analysis of randomized controlled trials. We used MEDLINE and the Cochrane Controlled Trial Register up to August 10, 2009. Two reviewers independently assessed trial quality and extracted data. Data on phlebitis were combined when appropriate, using a random-effects model. The impact of the risk of phlebitis in the control group (baseline risk) on the effect of in-line filters was studied by using meta-regression based on the bivariate meta-analysis model. The quality of the evidence was determined by using the GRADE (Grading of Recommendations Assessment, Development, and Evaluation) method. Eleven trials (1633 peripheral catheters) were included in this review to compare the effect of in-line filters on the incidence of phlebitis in hospitalized patients. Baseline risks across trials ranged from 23% to 96%. Meta-analysis of all trials showed that in-line filters reduced the risk of infusion-related phlebitis (relative risk, 0.66; 95% confidence interval, 0.43-1.00). This benefit, however, is very uncertain, because the trials had serious methodological shortcomings and meta-analysis revealed marked unexplained statistical heterogeneity (P < 0.0000, I(2) = 90.4%). The estimated benefit did not depend on baseline risk. In-line filters in peripheral IV catheters cannot be recommended routinely, because evidence of their benefit is uncertain.
Divergence instability of pipes conveying fluid with uncertain flow velocity
NASA Astrophysics Data System (ADS)
Rahmati, Mehdi; Mirdamadi, Hamid Reza; Goli, Sareh
2018-02-01
This article deals with investigation of probabilistic stability of pipes conveying fluid with stochastic flow velocity in time domain. As a matter of fact, this study has focused on the randomness effects of flow velocity on stability of pipes conveying fluid while most of research efforts have only focused on the influences of deterministic parameters on the system stability. The Euler-Bernoulli beam and plug flow theory are employed to model pipe structure and internal flow, respectively. In addition, flow velocity is considered as a stationary random process with Gaussian distribution. Afterwards, the stochastic averaging method and Routh's stability criterion are used so as to investigate the stability conditions of system. Consequently, the effects of boundary conditions, viscoelastic damping, mass ratio, and elastic foundation on the stability regions are discussed. Results delineate that the critical mean flow velocity decreases by increasing power spectral density (PSD) of the random velocity. Moreover, by increasing PSD from zero, the type effects of boundary condition and presence of elastic foundation are diminished, while the influences of viscoelastic damping and mass ratio could increase. Finally, to have a more applicable study, regression analysis is utilized to develop design equations and facilitate further analyses for design purposes.
Analysis of a Split-Plot Experimental Design Applied to a Low-Speed Wind Tunnel Investigation
NASA Technical Reports Server (NTRS)
Erickson, Gary E.
2013-01-01
A procedure to analyze a split-plot experimental design featuring two input factors, two levels of randomization, and two error structures in a low-speed wind tunnel investigation of a small-scale model of a fighter airplane configuration is described in this report. Standard commercially-available statistical software was used to analyze the test results obtained in a randomization-restricted environment often encountered in wind tunnel testing. The input factors were differential horizontal stabilizer incidence and the angle of attack. The response variables were the aerodynamic coefficients of lift, drag, and pitching moment. Using split-plot terminology, the whole plot, or difficult-to-change, factor was the differential horizontal stabilizer incidence, and the subplot, or easy-to-change, factor was the angle of attack. The whole plot and subplot factors were both tested at three levels. Degrees of freedom for the whole plot error were provided by replication in the form of three blocks, or replicates, which were intended to simulate three consecutive days of wind tunnel facility operation. The analysis was conducted in three stages, which yielded the estimated mean squares, multiple regression function coefficients, and corresponding tests of significance for all individual terms at the whole plot and subplot levels for the three aerodynamic response variables. The estimated regression functions included main effects and two-factor interaction for the lift coefficient, main effects, two-factor interaction, and quadratic effects for the drag coefficient, and only main effects for the pitching moment coefficient.
Comparing colon cancer outcomes: The impact of low hospital case volume and case-mix adjustment.
Fischer, C; Lingsma, H F; van Leersum, N; Tollenaar, R A E M; Wouters, M W; Steyerberg, E W
2015-08-01
When comparing performance across hospitals it is essential to consider the noise caused by low hospital case volume and to perform adequate case-mix adjustment. We aimed to quantify the role of noise and case-mix adjustment on standardized postoperative mortality and anastomotic leakage (AL) rates. We studied 13,120 patients who underwent colon cancer resection in 85 Dutch hospitals. We addressed differences between hospitals in postoperative mortality and AL, using fixed (ignoring noise) and random effects (incorporating noise) logistic regression models with general and additional, disease specific, case-mix adjustment. Adding disease specific variables improved the performance of the case-mix adjustment models for postoperative mortality (c-statistic increased from 0.77 to 0.81). The overall variation in standardized mortality ratios was similar, but some individual hospitals changed considerably. For the standardized AL rates the performance of the adjustment models was poor (c-statistic 0.59 and 0.60) and overall variation was small. Most of the observed variation between hospitals was actually noise. Noise had a larger effect on hospital performance than extended case-mix adjustment, although some individual hospital outcome rates were affected by more detailed case-mix adjustment. To compare outcomes between hospitals it is crucial to consider noise due to low hospital case volume with a random effects model. Copyright © 2015 Elsevier Ltd. All rights reserved.
Molavi Vardanjani, Mehdi; Masoudi Alavi, Negin; Razavi, Narges Sadat; Aghajani, Mohammad; Azizi-Fini, Esmail; Vaghefi, Seied Morteza
2013-01-01
Background: The anxiety reduction before coronary angiography has clinical advantages and is one of the objectives of nursing. Reflexology is a non-invasive method that has been used in several clinical situations. Applying reflexology might have effect on the reduction of anxiety before coronary angiography. Objectives: The aim of this randomized clinical trial was to investigate the effect of reflexology on anxiety among patients undergoing coronary angiography. Patients and Methods: This trial was conducted in Shahid Beheshti Hospital, in Kashan, Iran. One hundred male patients who were undergoing coronary angiography were randomly enrolled into intervention and placebo groups. The intervention protocol was included 30 minutes of general foot massage and the stimulation of three reflex points including solar plexus, pituitary gland, and heart. The placebo group only received the general foot massage. Spielbergers state trait anxiety inventory was used to assess the anxiety experienced by patients. Data was analyzed using Man-Witney, Wilcoxon and Chi-square tests. The stepwise multiple regressions used to analyze the variables that are involved in anxiety reduction. Results: The mean range of anxiety decreased from 53.24 to 45.24 in reflexology group which represented 8 score reduction (P = 0.0001). The reduction in anxiety was 5.9 score in placebo group which was also significant (P = 0.0001). The anxiety reduction was significantly higher in reflexology group (P = 0.014). The stepwise multiple regression analysis showed that doing reflexology can explain the 7.5% of anxiety reduction which made a significant model. Conclusions: Reflexology can decrease the anxiety level before coronary angiography. Therefore, reflexology before coronary angiography is recommended. PMID:25414869
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tang, Kunkun, E-mail: ktg@illinois.edu; Inria Bordeaux – Sud-Ouest, Team Cardamom, 200 avenue de la Vieille Tour, 33405 Talence; Congedo, Pietro M.
The Polynomial Dimensional Decomposition (PDD) is employed in this work for the global sensitivity analysis and uncertainty quantification (UQ) of stochastic systems subject to a moderate to large number of input random variables. Due to the intimate connection between the PDD and the Analysis of Variance (ANOVA) approaches, PDD is able to provide a simpler and more direct evaluation of the Sobol' sensitivity indices, when compared to the Polynomial Chaos expansion (PC). Unfortunately, the number of PDD terms grows exponentially with respect to the size of the input random vector, which makes the computational cost of standard methods unaffordable formore » real engineering applications. In order to address the problem of the curse of dimensionality, this work proposes essentially variance-based adaptive strategies aiming to build a cheap meta-model (i.e. surrogate model) by employing the sparse PDD approach with its coefficients computed by regression. Three levels of adaptivity are carried out in this paper: 1) the truncated dimensionality for ANOVA component functions, 2) the active dimension technique especially for second- and higher-order parameter interactions, and 3) the stepwise regression approach designed to retain only the most influential polynomials in the PDD expansion. During this adaptive procedure featuring stepwise regressions, the surrogate model representation keeps containing few terms, so that the cost to resolve repeatedly the linear systems of the least-squares regression problem is negligible. The size of the finally obtained sparse PDD representation is much smaller than the one of the full expansion, since only significant terms are eventually retained. Consequently, a much smaller number of calls to the deterministic model is required to compute the final PDD coefficients.« less
Impact of Health Research Systems on Under-5 Mortality Rate: A Trend Analysis.
Yazdizadeh, Bahareh; Parsaeian, Mahboubeh; Majdzadeh, Reza; Nikooee, Sima
2016-11-26
Between 1990 and 2015, under-5 mortality rate (U5MR) declined by 53%, from an estimated rate of 91 deaths per 1000 live births to 43, globally. The aim of this study was to determine the share of health research systems in this decrease alongside other influential factors. We used random effect regression models including the 'random intercept' and 'random intercept and random slope' models to analyze the panel data from 1990 to 2010. We selected the countries with U5MRs falling between the first and third quartiles in 1990. We used both the total articles (TA) and the number of child-specific articles (CSA) as a proxy of the health research system. In order to account for the impact of other factors, measles vaccination coverage (MVC) (as a proxy of health system performance), gross domestic product (GDP), human development index (HDI), and corruption perception index (CPI) (as proxies of development), were embedded in the model. Among all the models, 'the random intercept and random slope models' had lower residuals. The same variables of CSA, HDI, and time were significant and the coefficient of CSA was estimated at -0.17; meaning, with the addition of every 100 CSA, the rate of U5MR decreased by 17 per 1000 live births. Although the number of CSA has contributed to the reduction of U5MR, the amount of its contribution is negligible compared to the countries' development. We recommend entering different types of researches into the model separately in future research and including the variable of 'exchange between knowledge generator and user.' © 2017 The Author(s); Published by Kerman University of Medical Sciences. This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Dazard, Jean-Eudes; Ishwaran, Hemant; Mehlotra, Rajeev; Weinberg, Aaron; Zimmerman, Peter
2018-01-01
Unraveling interactions among variables such as genetic, clinical, demographic and environmental factors is essential to understand the development of common and complex diseases. To increase the power to detect such variables interactions associated with clinical time-to-events outcomes, we borrowed established concepts from random survival forest (RSF) models. We introduce a novel RSF-based pairwise interaction estimator and derive a randomization method with bootstrap confidence intervals for inferring interaction significance. Using various linear and nonlinear time-to-events survival models in simulation studies, we first show the efficiency of our approach: true pairwise interaction-effects between variables are uncovered, while they may not be accompanied with their corresponding main-effects, and may not be detected by standard semi-parametric regression modeling and test statistics used in survival analysis. Moreover, using a RSF-based cross-validation scheme for generating prediction estimators, we show that informative predictors may be inferred. We applied our approach to an HIV cohort study recording key host gene polymorphisms and their association with HIV change of tropism or AIDS progression. Altogether, this shows how linear or nonlinear pairwise statistical interactions of variables may be efficiently detected with a predictive value in observational studies with time-to-event outcomes. PMID:29453930
Dazard, Jean-Eudes; Ishwaran, Hemant; Mehlotra, Rajeev; Weinberg, Aaron; Zimmerman, Peter
2018-02-17
Unraveling interactions among variables such as genetic, clinical, demographic and environmental factors is essential to understand the development of common and complex diseases. To increase the power to detect such variables interactions associated with clinical time-to-events outcomes, we borrowed established concepts from random survival forest (RSF) models. We introduce a novel RSF-based pairwise interaction estimator and derive a randomization method with bootstrap confidence intervals for inferring interaction significance. Using various linear and nonlinear time-to-events survival models in simulation studies, we first show the efficiency of our approach: true pairwise interaction-effects between variables are uncovered, while they may not be accompanied with their corresponding main-effects, and may not be detected by standard semi-parametric regression modeling and test statistics used in survival analysis. Moreover, using a RSF-based cross-validation scheme for generating prediction estimators, we show that informative predictors may be inferred. We applied our approach to an HIV cohort study recording key host gene polymorphisms and their association with HIV change of tropism or AIDS progression. Altogether, this shows how linear or nonlinear pairwise statistical interactions of variables may be efficiently detected with a predictive value in observational studies with time-to-event outcomes.
Two-Part and Related Regression Models for Longitudinal Data
Farewell, V.T.; Long, D.L.; Tom, B.D.M.; Yiu, S.; Su, L.
2017-01-01
Statistical models that involve a two-part mixture distribution are applicable in a variety of situations. Frequently, the two parts are a model for the binary response variable and a model for the outcome variable that is conditioned on the binary response. Two common examples are zero-inflated or hurdle models for count data and two-part models for semicontinuous data. Recently, there has been particular interest in the use of these models for the analysis of repeated measures of an outcome variable over time. The aim of this review is to consider motivations for the use of such models in this context and to highlight the central issues that arise with their use. We examine two-part models for semicontinuous and zero-heavy count data, and we also consider models for count data with a two-part random effects distribution. PMID:28890906
Functional Additive Mixed Models
Scheipl, Fabian; Staicu, Ana-Maria; Greven, Sonja
2014-01-01
We propose an extensive framework for additive regression models for correlated functional responses, allowing for multiple partially nested or crossed functional random effects with flexible correlation structures for, e.g., spatial, temporal, or longitudinal functional data. Additionally, our framework includes linear and nonlinear effects of functional and scalar covariates that may vary smoothly over the index of the functional response. It accommodates densely or sparsely observed functional responses and predictors which may be observed with additional error and includes both spline-based and functional principal component-based terms. Estimation and inference in this framework is based on standard additive mixed models, allowing us to take advantage of established methods and robust, flexible algorithms. We provide easy-to-use open source software in the pffr() function for the R-package refund. Simulations show that the proposed method recovers relevant effects reliably, handles small sample sizes well and also scales to larger data sets. Applications with spatially and longitudinally observed functional data demonstrate the flexibility in modeling and interpretability of results of our approach. PMID:26347592
Functional Additive Mixed Models.
Scheipl, Fabian; Staicu, Ana-Maria; Greven, Sonja
2015-04-01
We propose an extensive framework for additive regression models for correlated functional responses, allowing for multiple partially nested or crossed functional random effects with flexible correlation structures for, e.g., spatial, temporal, or longitudinal functional data. Additionally, our framework includes linear and nonlinear effects of functional and scalar covariates that may vary smoothly over the index of the functional response. It accommodates densely or sparsely observed functional responses and predictors which may be observed with additional error and includes both spline-based and functional principal component-based terms. Estimation and inference in this framework is based on standard additive mixed models, allowing us to take advantage of established methods and robust, flexible algorithms. We provide easy-to-use open source software in the pffr() function for the R-package refund. Simulations show that the proposed method recovers relevant effects reliably, handles small sample sizes well and also scales to larger data sets. Applications with spatially and longitudinally observed functional data demonstrate the flexibility in modeling and interpretability of results of our approach.
Financial Time Series Prediction Using Elman Recurrent Random Neural Networks
Wang, Jie; Wang, Jun; Fang, Wen; Niu, Hongli
2016-01-01
In recent years, financial market dynamics forecasting has been a focus of economic research. To predict the price indices of stock markets, we developed an architecture which combined Elman recurrent neural networks with stochastic time effective function. By analyzing the proposed model with the linear regression, complexity invariant distance (CID), and multiscale CID (MCID) analysis methods and taking the model compared with different models such as the backpropagation neural network (BPNN), the stochastic time effective neural network (STNN), and the Elman recurrent neural network (ERNN), the empirical results show that the proposed neural network displays the best performance among these neural networks in financial time series forecasting. Further, the empirical research is performed in testing the predictive effects of SSE, TWSE, KOSPI, and Nikkei225 with the established model, and the corresponding statistical comparisons of the above market indices are also exhibited. The experimental results show that this approach gives good performance in predicting the values from the stock market indices. PMID:27293423