Linear regression metamodeling as a tool to summarize and present simulation model results.
Jalal, Hawre; Dowd, Bryan; Sainfort, François; Kuntz, Karen M
2013-10-01
Modelers lack a tool to systematically and clearly present complex model results, including those from sensitivity analyses. The objective was to propose linear regression metamodeling as a tool to increase transparency of decision analytic models and better communicate their results. We used a simplified cancer cure model to demonstrate our approach. The model computed the lifetime cost and benefit of 3 treatment options for cancer patients. We simulated 10,000 cohorts in a probabilistic sensitivity analysis (PSA) and regressed the model outcomes on the standardized input parameter values in a set of regression analyses. We used the regression coefficients to describe measures of sensitivity analyses, including threshold and parameter sensitivity analyses. We also compared the results of the PSA to deterministic full-factorial and one-factor-at-a-time designs. The regression intercept represented the estimated base-case outcome, and the other coefficients described the relative parameter uncertainty in the model. We defined simple relationships that compute the average and incremental net benefit of each intervention. Metamodeling produced outputs similar to traditional deterministic 1-way or 2-way sensitivity analyses but was more reliable since it used all parameter values. Linear regression metamodeling is a simple, yet powerful, tool that can assist modelers in communicating model characteristics and sensitivity analyses.
A general framework for the use of logistic regression models in meta-analysis.
Simmonds, Mark C; Higgins, Julian Pt
2016-12-01
Where individual participant data are available for every randomised trial in a meta-analysis of dichotomous event outcomes, "one-stage" random-effects logistic regression models have been proposed as a way to analyse these data. Such models can also be used even when individual participant data are not available and we have only summary contingency table data. One benefit of this one-stage regression model over conventional meta-analysis methods is that it maximises the correct binomial likelihood for the data and so does not require the common assumption that effect estimates are normally distributed. A second benefit of using this model is that it may be applied, with only minor modification, in a range of meta-analytic scenarios, including meta-regression, network meta-analyses and meta-analyses of diagnostic test accuracy. This single model can potentially replace the variety of often complex methods used in these areas. This paper considers, with a range of meta-analysis examples, how random-effects logistic regression models may be used in a number of different types of meta-analyses. This one-stage approach is compared with widely used meta-analysis methods including Bayesian network meta-analysis and the bivariate and hierarchical summary receiver operating characteristic (ROC) models for meta-analyses of diagnostic test accuracy. © The Author(s) 2014.
Bennett, Bradley C; Husby, Chad E
2008-03-28
Botanical pharmacopoeias are non-random subsets of floras, with some taxonomic groups over- or under-represented. Moerman [Moerman, D.E., 1979. Symbols and selectivity: a statistical analysis of Native American medical ethnobotany, Journal of Ethnopharmacology 1, 111-119] introduced linear regression/residual analysis to examine these patterns. However, regression, the commonly-employed analysis, suffers from several statistical flaws. We use contingency table and binomial analyses to examine patterns of Shuar medicinal plant use (from Amazonian Ecuador). We first analyzed the Shuar data using Moerman's approach, modified to better meet requirements of linear regression analysis. Second, we assessed the exact randomization contingency table test for goodness of fit. Third, we developed a binomial model to test for non-random selection of plants in individual families. Modified regression models (which accommodated assumptions of linear regression) reduced R(2) to from 0.59 to 0.38, but did not eliminate all problems associated with regression analyses. Contingency table analyses revealed that the entire flora departs from the null model of equal proportions of medicinal plants in all families. In the binomial analysis, only 10 angiosperm families (of 115) differed significantly from the null model. These 10 families are largely responsible for patterns seen at higher taxonomic levels. Contingency table and binomial analyses offer an easy and statistically valid alternative to the regression approach.
Bayesian Unimodal Density Regression for Causal Inference
ERIC Educational Resources Information Center
Karabatsos, George; Walker, Stephen G.
2011-01-01
Karabatsos and Walker (2011) introduced a new Bayesian nonparametric (BNP) regression model. Through analyses of real and simulated data, they showed that the BNP regression model outperforms other parametric and nonparametric regression models of common use, in terms of predictive accuracy of the outcome (dependent) variable. The other,…
Modelling of capital asset pricing by considering the lagged effects
NASA Astrophysics Data System (ADS)
Sukono; Hidayat, Y.; Bon, A. Talib bin; Supian, S.
2017-01-01
In this paper the problem of modelling the Capital Asset Pricing Model (CAPM) with the effect of the lagged is discussed. It is assumed that asset returns are analysed influenced by the market return and the return of risk-free assets. To analyse the relationship between asset returns, the market return, and the return of risk-free assets, it is conducted by using a regression equation of CAPM, and regression equation of lagged distributed CAPM. Associated with the regression equation lagged CAPM distributed, this paper also developed a regression equation of Koyck transformation CAPM. Results of development show that the regression equation of Koyck transformation CAPM has advantages, namely simple as it only requires three parameters, compared with regression equation of lagged distributed CAPM.
NASA Astrophysics Data System (ADS)
Sulistianingsih, E.; Kiftiah, M.; Rosadi, D.; Wahyuni, H.
2017-04-01
Gross Domestic Product (GDP) is an indicator of economic growth in a region. GDP is a panel data, which consists of cross-section and time series data. Meanwhile, panel regression is a tool which can be utilised to analyse panel data. There are three models in panel regression, namely Common Effect Model (CEM), Fixed Effect Model (FEM) and Random Effect Model (REM). The models will be chosen based on results of Chow Test, Hausman Test and Lagrange Multiplier Test. This research analyses palm oil about production, export, and government consumption to five district GDP are in West Kalimantan, namely Sanggau, Sintang, Sambas, Ketapang and Bengkayang by panel regression. Based on the results of analyses, it concluded that REM, which adjusted-determination-coefficient is 0,823, is the best model in this case. Also, according to the result, only Export and Government Consumption that influence GDP of the districts.
Variable selection and model choice in geoadditive regression models.
Kneib, Thomas; Hothorn, Torsten; Tutz, Gerhard
2009-06-01
Model choice and variable selection are issues of major concern in practical regression analyses, arising in many biometric applications such as habitat suitability analyses, where the aim is to identify the influence of potentially many environmental conditions on certain species. We describe regression models for breeding bird communities that facilitate both model choice and variable selection, by a boosting algorithm that works within a class of geoadditive regression models comprising spatial effects, nonparametric effects of continuous covariates, interaction surfaces, and varying coefficients. The major modeling components are penalized splines and their bivariate tensor product extensions. All smooth model terms are represented as the sum of a parametric component and a smooth component with one degree of freedom to obtain a fair comparison between the model terms. A generic representation of the geoadditive model allows us to devise a general boosting algorithm that automatically performs model choice and variable selection.
Regression and multivariate models for predicting particulate matter concentration level.
Nazif, Amina; Mohammed, Nurul Izma; Malakahmad, Amirhossein; Abualqumboz, Motasem S
2018-01-01
The devastating health effects of particulate matter (PM 10 ) exposure by susceptible populace has made it necessary to evaluate PM 10 pollution. Meteorological parameters and seasonal variation increases PM 10 concentration levels, especially in areas that have multiple anthropogenic activities. Hence, stepwise regression (SR), multiple linear regression (MLR) and principal component regression (PCR) analyses were used to analyse daily average PM 10 concentration levels. The analyses were carried out using daily average PM 10 concentration, temperature, humidity, wind speed and wind direction data from 2006 to 2010. The data was from an industrial air quality monitoring station in Malaysia. The SR analysis established that meteorological parameters had less influence on PM 10 concentration levels having coefficient of determination (R 2 ) result from 23 to 29% based on seasoned and unseasoned analysis. While, the result of the prediction analysis showed that PCR models had a better R 2 result than MLR methods. The results for the analyses based on both seasoned and unseasoned data established that MLR models had R 2 result from 0.50 to 0.60. While, PCR models had R 2 result from 0.66 to 0.89. In addition, the validation analysis using 2016 data also recognised that the PCR model outperformed the MLR model, with the PCR model for the seasoned analysis having the best result. These analyses will aid in achieving sustainable air quality management strategies.
Statistical power analyses using G*Power 3.1: tests for correlation and regression analyses.
Faul, Franz; Erdfelder, Edgar; Buchner, Axel; Lang, Albert-Georg
2009-11-01
G*Power is a free power analysis program for a variety of statistical tests. We present extensions and improvements of the version introduced by Faul, Erdfelder, Lang, and Buchner (2007) in the domain of correlation and regression analyses. In the new version, we have added procedures to analyze the power of tests based on (1) single-sample tetrachoric correlations, (2) comparisons of dependent correlations, (3) bivariate linear regression, (4) multiple linear regression based on the random predictor model, (5) logistic regression, and (6) Poisson regression. We describe these new features and provide a brief introduction to their scope and handling.
A method for fitting regression splines with varying polynomial order in the linear mixed model.
Edwards, Lloyd J; Stewart, Paul W; MacDougall, James E; Helms, Ronald W
2006-02-15
The linear mixed model has become a widely used tool for longitudinal analysis of continuous variables. The use of regression splines in these models offers the analyst additional flexibility in the formulation of descriptive analyses, exploratory analyses and hypothesis-driven confirmatory analyses. We propose a method for fitting piecewise polynomial regression splines with varying polynomial order in the fixed effects and/or random effects of the linear mixed model. The polynomial segments are explicitly constrained by side conditions for continuity and some smoothness at the points where they join. By using a reparameterization of this explicitly constrained linear mixed model, an implicitly constrained linear mixed model is constructed that simplifies implementation of fixed-knot regression splines. The proposed approach is relatively simple, handles splines in one variable or multiple variables, and can be easily programmed using existing commercial software such as SAS or S-plus. The method is illustrated using two examples: an analysis of longitudinal viral load data from a study of subjects with acute HIV-1 infection and an analysis of 24-hour ambulatory blood pressure profiles.
Random regression analyses using B-splines to model growth of Australian Angus cattle
Meyer, Karin
2005-01-01
Regression on the basis function of B-splines has been advocated as an alternative to orthogonal polynomials in random regression analyses. Basic theory of splines in mixed model analyses is reviewed, and estimates from analyses of weights of Australian Angus cattle from birth to 820 days of age are presented. Data comprised 84 533 records on 20 731 animals in 43 herds, with a high proportion of animals with 4 or more weights recorded. Changes in weights with age were modelled through B-splines of age at recording. A total of thirteen analyses, considering different combinations of linear, quadratic and cubic B-splines and up to six knots, were carried out. Results showed good agreement for all ages with many records, but fluctuated where data were sparse. On the whole, analyses using B-splines appeared more robust against "end-of-range" problems and yielded more consistent and accurate estimates of the first eigenfunctions than previous, polynomial analyses. A model fitting quadratic B-splines, with knots at 0, 200, 400, 600 and 821 days and a total of 91 covariance components, appeared to be a good compromise between detailedness of the model, number of parameters to be estimated, plausibility of results, and fit, measured as residual mean square error. PMID:16093011
Baldi, F; Alencar, M M; Albuquerque, L G
2010-12-01
The objective of this work was to estimate covariance functions using random regression models on B-splines functions of animal age, for weights from birth to adult age in Canchim cattle. Data comprised 49,011 records on 2435 females. The model of analysis included fixed effects of contemporary groups, age of dam as quadratic covariable and the population mean trend taken into account by a cubic regression on orthogonal polynomials of animal age. Residual variances were modelled through a step function with four classes. The direct and maternal additive genetic effects, and animal and maternal permanent environmental effects were included as random effects in the model. A total of seventeen analyses, considering linear, quadratic and cubic B-splines functions and up to seven knots, were carried out. B-spline functions of the same order were considered for all random effects. Random regression models on B-splines functions were compared to a random regression model on Legendre polynomials and with a multitrait model. Results from different models of analyses were compared using the REML form of the Akaike Information criterion and Schwarz' Bayesian Information criterion. In addition, the variance components and genetic parameters estimated for each random regression model were also used as criteria to choose the most adequate model to describe the covariance structure of the data. A model fitting quadratic B-splines, with four knots or three segments for direct additive genetic effect and animal permanent environmental effect and two knots for maternal additive genetic effect and maternal permanent environmental effect, was the most adequate to describe the covariance structure of the data. Random regression models using B-spline functions as base functions fitted the data better than Legendre polynomials, especially at mature ages, but higher number of parameters need to be estimated with B-splines functions. © 2010 Blackwell Verlag GmbH.
Impact of Contextual Factors on Prostate Cancer Risk and Outcomes
2013-07-01
framework with random effects (“frailty models”) while the case-control analyses (Aim 4) will use multilevel unconditional logistic regression models...contextual-level SES on prostate cancer risk within racial/ethnic groups. The survival analyses (Aims 1-3) will utilize a proportional hazards regression
NASA Astrophysics Data System (ADS)
Öktem, H.
2012-01-01
Plastic injection molding plays a key role in the production of high-quality plastic parts. Shrinkage is one of the most significant problems of a plastic part in terms of quality in the plastic injection molding. This article focuses on the study of the modeling and analysis of the effects of process parameters on the shrinkage by evaluating the quality of the plastic part of a DVD-ROM cover made with Acrylonitrile Butadiene Styrene (ABS) polymer material. An effective regression model was developed to determine the mathematical relationship between the process parameters (mold temperature, melt temperature, injection pressure, injection time, and cooling time) and the volumetric shrinkage by utilizing the analysis data. Finite element (FE) analyses designed by Taguchi (L27) orthogonal arrays were run in the Moldflow simulation program. Analysis of variance (ANOVA) was then performed to check the adequacy of the regression model and to determine the effect of the process parameters on the shrinkage. Experiments were conducted to control the accuracy of the regression model with the FE analyses obtained from Moldflow. The results show that the regression model agrees very well with the FE analyses and the experiments. From this, it can be concluded that this study succeeded in modeling the shrinkage problem in our application.
An empirical study using permutation-based resampling in meta-regression
2012-01-01
Background In meta-regression, as the number of trials in the analyses decreases, the risk of false positives or false negatives increases. This is partly due to the assumption of normality that may not hold in small samples. Creation of a distribution from the observed trials using permutation methods to calculate P values may allow for less spurious findings. Permutation has not been empirically tested in meta-regression. The objective of this study was to perform an empirical investigation to explore the differences in results for meta-analyses on a small number of trials using standard large sample approaches verses permutation-based methods for meta-regression. Methods We isolated a sample of randomized controlled clinical trials (RCTs) for interventions that have a small number of trials (herbal medicine trials). Trials were then grouped by herbal species and condition and assessed for methodological quality using the Jadad scale, and data were extracted for each outcome. Finally, we performed meta-analyses on the primary outcome of each group of trials and meta-regression for methodological quality subgroups within each meta-analysis. We used large sample methods and permutation methods in our meta-regression modeling. We then compared final models and final P values between methods. Results We collected 110 trials across 5 intervention/outcome pairings and 5 to 10 trials per covariate. When applying large sample methods and permutation-based methods in our backwards stepwise regression the covariates in the final models were identical in all cases. The P values for the covariates in the final model were larger in 78% (7/9) of the cases for permutation and identical for 22% (2/9) of the cases. Conclusions We present empirical evidence that permutation-based resampling may not change final models when using backwards stepwise regression, but may increase P values in meta-regression of multiple covariates for relatively small amount of trials. PMID:22587815
Boligon, A A; Baldi, F; Mercadante, M E Z; Lobo, R B; Pereira, R J; Albuquerque, L G
2011-06-28
We quantified the potential increase in accuracy of expected breeding value for weights of Nelore cattle, from birth to mature age, using multi-trait and random regression models on Legendre polynomials and B-spline functions. A total of 87,712 weight records from 8144 females were used, recorded every three months from birth to mature age from the Nelore Brazil Program. For random regression analyses, all female weight records from birth to eight years of age (data set I) were considered. From this general data set, a subset was created (data set II), which included only nine weight records: at birth, weaning, 365 and 550 days of age, and 2, 3, 4, 5, and 6 years of age. Data set II was analyzed using random regression and multi-trait models. The model of analysis included the contemporary group as fixed effects and age of dam as a linear and quadratic covariable. In the random regression analyses, average growth trends were modeled using a cubic regression on orthogonal polynomials of age. Residual variances were modeled by a step function with five classes. Legendre polynomials of fourth and sixth order were utilized to model the direct genetic and animal permanent environmental effects, respectively, while third-order Legendre polynomials were considered for maternal genetic and maternal permanent environmental effects. Quadratic polynomials were applied to model all random effects in random regression models on B-spline functions. Direct genetic and animal permanent environmental effects were modeled using three segments or five coefficients, and genetic maternal and maternal permanent environmental effects were modeled with one segment or three coefficients in the random regression models on B-spline functions. For both data sets (I and II), animals ranked differently according to expected breeding value obtained by random regression or multi-trait models. With random regression models, the highest gains in accuracy were obtained at ages with a low number of weight records. The results indicate that random regression models provide more accurate expected breeding values than the traditionally finite multi-trait models. Thus, higher genetic responses are expected for beef cattle growth traits by replacing a multi-trait model with random regression models for genetic evaluation. B-spline functions could be applied as an alternative to Legendre polynomials to model covariance functions for weights from birth to mature age.
Multiple linear regression models are often used to predict levels of fecal indicator bacteria (FIB) in recreational swimming waters based on independent variables (IVs) such as meteorologic, hydrodynamic, and water-quality measures. The IVs used for these analyses are traditiona...
NASA Astrophysics Data System (ADS)
Bae, Gihyun; Huh, Hoon; Park, Sungho
This paper deals with a regression model for light weight and crashworthiness enhancement design of automotive parts in frontal car crash. The ULSAB-AVC model is employed for the crash analysis and effective parts are selected based on the amount of energy absorption during the crash behavior. Finite element analyses are carried out for designated design cases in order to investigate the crashworthiness and weight according to the material and thickness of main energy absorption parts. Based on simulations results, a regression analysis is performed to construct a regression model utilized for light weight and crashworthiness enhancement design of automotive parts. An example for weight reduction of main energy absorption parts demonstrates the validity of a regression model constructed.
Survival Data and Regression Models
NASA Astrophysics Data System (ADS)
Grégoire, G.
2014-12-01
We start this chapter by introducing some basic elements for the analysis of censored survival data. Then we focus on right censored data and develop two types of regression models. The first one concerns the so-called accelerated failure time models (AFT), which are parametric models where a function of a parameter depends linearly on the covariables. The second one is a semiparametric model, where the covariables enter in a multiplicative form in the expression of the hazard rate function. The main statistical tool for analysing these regression models is the maximum likelihood methodology and, in spite we recall some essential results about the ML theory, we refer to the chapter "Logistic Regression" for a more detailed presentation.
Rovadoscki, Gregori A; Petrini, Juliana; Ramirez-Diaz, Johanna; Pertile, Simone F N; Pertille, Fábio; Salvian, Mayara; Iung, Laiza H S; Rodriguez, Mary Ana P; Zampar, Aline; Gaya, Leila G; Carvalho, Rachel S B; Coelho, Antonio A D; Savino, Vicente J M; Coutinho, Luiz L; Mourão, Gerson B
2016-09-01
Repeated measures from the same individual have been analyzed by using repeatability and finite dimension models under univariate or multivariate analyses. However, in the last decade, the use of random regression models for genetic studies with longitudinal data have become more common. Thus, the aim of this research was to estimate genetic parameters for body weight of four experimental chicken lines by using univariate random regression models. Body weight data from hatching to 84 days of age (n = 34,730) from four experimental free-range chicken lines (7P, Caipirão da ESALQ, Caipirinha da ESALQ and Carijó Barbado) were used. The analysis model included the fixed effects of contemporary group (gender and rearing system), fixed regression coefficients for age at measurement, and random regression coefficients for permanent environmental effects and additive genetic effects. Heterogeneous variances for residual effects were considered, and one residual variance was assigned for each of six subclasses of age at measurement. Random regression curves were modeled by using Legendre polynomials of the second and third orders, with the best model chosen based on the Akaike Information Criterion, Bayesian Information Criterion, and restricted maximum likelihood. Multivariate analyses under the same animal mixed model were also performed for the validation of the random regression models. The Legendre polynomials of second order were better for describing the growth curves of the lines studied. Moderate to high heritabilities (h(2) = 0.15 to 0.98) were estimated for body weight between one and 84 days of age, suggesting that selection for body weight at all ages can be used as a selection criteria. Genetic correlations among body weight records obtained through multivariate analyses ranged from 0.18 to 0.96, 0.12 to 0.89, 0.06 to 0.96, and 0.28 to 0.96 in 7P, Caipirão da ESALQ, Caipirinha da ESALQ, and Carijó Barbado chicken lines, respectively. Results indicate that genetic gain for body weight can be achieved by selection. Also, selection for body weight at 42 days of age can be maintained as a selection criterion. © 2016 Poultry Science Association Inc.
Henrard, S; Speybroeck, N; Hermans, C
2015-11-01
Haemophilia is a rare genetic haemorrhagic disease characterized by partial or complete deficiency of coagulation factor VIII, for haemophilia A, or IX, for haemophilia B. As in any other medical research domain, the field of haemophilia research is increasingly concerned with finding factors associated with binary or continuous outcomes through multivariable models. Traditional models include multiple logistic regressions, for binary outcomes, and multiple linear regressions for continuous outcomes. Yet these regression models are at times difficult to implement, especially for non-statisticians, and can be difficult to interpret. The present paper sought to didactically explain how, why, and when to use classification and regression tree (CART) analysis for haemophilia research. The CART method is non-parametric and non-linear, based on the repeated partitioning of a sample into subgroups based on a certain criterion. Breiman developed this method in 1984. Classification trees (CTs) are used to analyse categorical outcomes and regression trees (RTs) to analyse continuous ones. The CART methodology has become increasingly popular in the medical field, yet only a few examples of studies using this methodology specifically in haemophilia have to date been published. Two examples using CART analysis and previously published in this field are didactically explained in details. There is increasing interest in using CART analysis in the health domain, primarily due to its ease of implementation, use, and interpretation, thus facilitating medical decision-making. This method should be promoted for analysing continuous or categorical outcomes in haemophilia, when applicable. © 2015 John Wiley & Sons Ltd.
Applications of MIDAS regression in analysing trends in water quality
NASA Astrophysics Data System (ADS)
Penev, Spiridon; Leonte, Daniela; Lazarov, Zdravetz; Mann, Rob A.
2014-04-01
We discuss novel statistical methods in analysing trends in water quality. Such analysis uses complex data sets of different classes of variables, including water quality, hydrological and meteorological. We analyse the effect of rainfall and flow on trends in water quality utilising a flexible model called Mixed Data Sampling (MIDAS). This model arises because of the mixed frequency in the data collection. Typically, water quality variables are sampled fortnightly, whereas the rain data is sampled daily. The advantage of using MIDAS regression is in the flexible and parsimonious modelling of the influence of the rain and flow on trends in water quality variables. We discuss the model and its implementation on a data set from the Shoalhaven Supply System and Catchments in the state of New South Wales, Australia. Information criteria indicate that MIDAS modelling improves upon simplistic approaches that do not utilise the mixed data sampling nature of the data.
Space shuttle propulsion parameter estimation using optional estimation techniques
NASA Technical Reports Server (NTRS)
1983-01-01
A regression analyses on tabular aerodynamic data provided. A representative aerodynamic model for coefficient estimation. It also reduced the storage requirements for the "normal' model used to check out the estimation algorithms. The results of the regression analyses are presented. The computer routines for the filter portion of the estimation algorithm and the :"bringing-up' of the SRB predictive program on the computer was developed. For the filter program, approximately 54 routines were developed. The routines were highly subsegmented to facilitate overlaying program segments within the partitioned storage space on the computer.
Procedures for adjusting regional regression models of urban-runoff quality using local data
Hoos, A.B.; Sisolak, J.K.
1993-01-01
Statistical operations termed model-adjustment procedures (MAP?s) can be used to incorporate local data into existing regression models to improve the prediction of urban-runoff quality. Each MAP is a form of regression analysis in which the local data base is used as a calibration data set. Regression coefficients are determined from the local data base, and the resulting `adjusted? regression models can then be used to predict storm-runoff quality at unmonitored sites. The response variable in the regression analyses is the observed load or mean concentration of a constituent in storm runoff for a single storm. The set of explanatory variables used in the regression analyses is different for each MAP, but always includes the predicted value of load or mean concentration from a regional regression model. The four MAP?s examined in this study were: single-factor regression against the regional model prediction, P, (termed MAP-lF-P), regression against P,, (termed MAP-R-P), regression against P, and additional local variables (termed MAP-R-P+nV), and a weighted combination of P, and a local-regression prediction (termed MAP-W). The procedures were tested by means of split-sample analysis, using data from three cities included in the Nationwide Urban Runoff Program: Denver, Colorado; Bellevue, Washington; and Knoxville, Tennessee. The MAP that provided the greatest predictive accuracy for the verification data set differed among the three test data bases and among model types (MAP-W for Denver and Knoxville, MAP-lF-P and MAP-R-P for Bellevue load models, and MAP-R-P+nV for Bellevue concentration models) and, in many cases, was not clearly indicated by the values of standard error of estimate for the calibration data set. A scheme to guide MAP selection, based on exploratory data analysis of the calibration data set, is presented and tested. The MAP?s were tested for sensitivity to the size of a calibration data set. As expected, predictive accuracy of all MAP?s for the verification data set decreased as the calibration data-set size decreased, but predictive accuracy was not as sensitive for the MAP?s as it was for the local regression models.
ERIC Educational Resources Information Center
Li, Spencer D.
2011-01-01
Mediation analysis in child and adolescent development research is possible using large secondary data sets. This article provides an overview of two statistical methods commonly used to test mediated effects in secondary analysis: multiple regression and structural equation modeling (SEM). Two empirical studies are presented to illustrate the…
Grieve, Richard; Nixon, Richard; Thompson, Simon G
2010-01-01
Cost-effectiveness analyses (CEA) may be undertaken alongside cluster randomized trials (CRTs) where randomization is at the level of the cluster (for example, the hospital or primary care provider) rather than the individual. Costs (and outcomes) within clusters may be correlated so that the assumption made by standard bivariate regression models, that observations are independent, is incorrect. This study develops a flexible modeling framework to acknowledge the clustering in CEA that use CRTs. The authors extend previous Bayesian bivariate models for CEA of multicenter trials to recognize the specific form of clustering in CRTs. They develop new Bayesian hierarchical models (BHMs) that allow mean costs and outcomes, and also variances, to differ across clusters. They illustrate how each model can be applied using data from a large (1732 cases, 70 primary care providers) CRT evaluating alternative interventions for reducing postnatal depression. The analyses compare cost-effectiveness estimates from BHMs with standard bivariate regression models that ignore the data hierarchy. The BHMs show high levels of cost heterogeneity across clusters (intracluster correlation coefficient, 0.17). Compared with standard regression models, the BHMs yield substantially increased uncertainty surrounding the cost-effectiveness estimates, and altered point estimates. The authors conclude that ignoring clustering can lead to incorrect inferences. The BHMs that they present offer a flexible modeling framework that can be applied more generally to CEA that use CRTs.
Schraer, S.M.; Shaw, D.R.; Boyette, M.; Coupe, R.H.; Thurman, E.M.
2000-01-01
Enzyme-linked immunosorbent assay (ELISA) data from surface water reconnaissance were compared to data from samples analyzed by gas chromatography for the pesticide residues cyanazine (2-[[4-chloro-6-(ethylamino)-l,3,5-triazin-2-yl]amino]-2-methylpropanenitrile ) and metolachlor (2-chloro-N-(2-ethyl-6-methylphenyl)-N-(2-methoxy-1-methylethyl)acetamide). When ELISA analyses were duplicated, cyanazine and metolachlor detection was found to have highly reproducible results; adjusted R2s were 0.97 and 0.94, respectively. When ELISA results for cyanazine were regressed against gas chromatography results, the models effectively predicted cyanazine concentrations from ELISA analyses (adjusted R2s ranging from 0.76 to 0.81). The intercepts and slopes for these models were not different from 0 and 1, respectively. This indicates that cyanazine analysis by ELISA is expected to give the same results as analysis by gas chromatography. However, regressing ELISA analyses for metolachlor against gas chromatography data provided more variable results (adjusted R2s ranged from 0.67 to 0.94). Regression models for metolachlor analyses had two of three intercepts that were not different from 0. Slopes for all metolachlor regression models were significantly different from 1. This indicates that as metolachlor concentrations increase, ELISA will over- or under-estimate metolachlor concentration, depending on the method of comparison. ELISA can be effectively used to detect cyanazine and metolachlor in surface water samples. However, when detections of metolachlor have significant consequences or implications it may be necessary to use other analytical methods.
Intermediate and advanced topics in multilevel logistic regression analysis
Merlo, Juan
2017-01-01
Multilevel data occur frequently in health services, population and public health, and epidemiologic research. In such research, binary outcomes are common. Multilevel logistic regression models allow one to account for the clustering of subjects within clusters of higher‐level units when estimating the effect of subject and cluster characteristics on subject outcomes. A search of the PubMed database demonstrated that the use of multilevel or hierarchical regression models is increasing rapidly. However, our impression is that many analysts simply use multilevel regression models to account for the nuisance of within‐cluster homogeneity that is induced by clustering. In this article, we describe a suite of analyses that can complement the fitting of multilevel logistic regression models. These ancillary analyses permit analysts to estimate the marginal or population‐average effect of covariates measured at the subject and cluster level, in contrast to the within‐cluster or cluster‐specific effects arising from the original multilevel logistic regression model. We describe the interval odds ratio and the proportion of opposed odds ratios, which are summary measures of effect for cluster‐level covariates. We describe the variance partition coefficient and the median odds ratio which are measures of components of variance and heterogeneity in outcomes. These measures allow one to quantify the magnitude of the general contextual effect. We describe an R 2 measure that allows analysts to quantify the proportion of variation explained by different multilevel logistic regression models. We illustrate the application and interpretation of these measures by analyzing mortality in patients hospitalized with a diagnosis of acute myocardial infarction. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. PMID:28543517
Intermediate and advanced topics in multilevel logistic regression analysis.
Austin, Peter C; Merlo, Juan
2017-09-10
Multilevel data occur frequently in health services, population and public health, and epidemiologic research. In such research, binary outcomes are common. Multilevel logistic regression models allow one to account for the clustering of subjects within clusters of higher-level units when estimating the effect of subject and cluster characteristics on subject outcomes. A search of the PubMed database demonstrated that the use of multilevel or hierarchical regression models is increasing rapidly. However, our impression is that many analysts simply use multilevel regression models to account for the nuisance of within-cluster homogeneity that is induced by clustering. In this article, we describe a suite of analyses that can complement the fitting of multilevel logistic regression models. These ancillary analyses permit analysts to estimate the marginal or population-average effect of covariates measured at the subject and cluster level, in contrast to the within-cluster or cluster-specific effects arising from the original multilevel logistic regression model. We describe the interval odds ratio and the proportion of opposed odds ratios, which are summary measures of effect for cluster-level covariates. We describe the variance partition coefficient and the median odds ratio which are measures of components of variance and heterogeneity in outcomes. These measures allow one to quantify the magnitude of the general contextual effect. We describe an R 2 measure that allows analysts to quantify the proportion of variation explained by different multilevel logistic regression models. We illustrate the application and interpretation of these measures by analyzing mortality in patients hospitalized with a diagnosis of acute myocardial infarction. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.
An approach to checking case-crossover analyses based on equivalence with time-series methods.
Lu, Yun; Symons, James Morel; Geyh, Alison S; Zeger, Scott L
2008-03-01
The case-crossover design has been increasingly applied to epidemiologic investigations of acute adverse health effects associated with ambient air pollution. The correspondence of the design to that of matched case-control studies makes it inferentially appealing for epidemiologic studies. Case-crossover analyses generally use conditional logistic regression modeling. This technique is equivalent to time-series log-linear regression models when there is a common exposure across individuals, as in air pollution studies. Previous methods for obtaining unbiased estimates for case-crossover analyses have assumed that time-varying risk factors are constant within reference windows. In this paper, we rely on the connection between case-crossover and time-series methods to illustrate model-checking procedures from log-linear model diagnostics for time-stratified case-crossover analyses. Additionally, we compare the relative performance of the time-stratified case-crossover approach to time-series methods under 3 simulated scenarios representing different temporal patterns of daily mortality associated with air pollution in Chicago, Illinois, during 1995 and 1996. Whenever a model-be it time-series or case-crossover-fails to account appropriately for fluctuations in time that confound the exposure, the effect estimate will be biased. It is therefore important to perform model-checking in time-stratified case-crossover analyses rather than assume the estimator is unbiased.
Marston, Louise; Peacock, Janet L; Yu, Keming; Brocklehurst, Peter; Calvert, Sandra A; Greenough, Anne; Marlow, Neil
2009-07-01
Studies of prematurely born infants contain a relatively large percentage of multiple births, so the resulting data have a hierarchical structure with small clusters of size 1, 2 or 3. Ignoring the clustering may lead to incorrect inferences. The aim of this study was to compare statistical methods which can be used to analyse such data: generalised estimating equations, multilevel models, multiple linear regression and logistic regression. Four datasets which differed in total size and in percentage of multiple births (n = 254, multiple 18%; n = 176, multiple 9%; n = 10 098, multiple 3%; n = 1585, multiple 8%) were analysed. With the continuous outcome, two-level models produced similar results in the larger dataset, while generalised least squares multilevel modelling (ML GLS 'xtreg' in Stata) and maximum likelihood multilevel modelling (ML MLE 'xtmixed' in Stata) produced divergent estimates using the smaller dataset. For the dichotomous outcome, most methods, except generalised least squares multilevel modelling (ML GH 'xtlogit' in Stata) gave similar odds ratios and 95% confidence intervals within datasets. For the continuous outcome, our results suggest using multilevel modelling. We conclude that generalised least squares multilevel modelling (ML GLS 'xtreg' in Stata) and maximum likelihood multilevel modelling (ML MLE 'xtmixed' in Stata) should be used with caution when the dataset is small. Where the outcome is dichotomous and there is a relatively large percentage of non-independent data, it is recommended that these are accounted for in analyses using logistic regression with adjusted standard errors or multilevel modelling. If, however, the dataset has a small percentage of clusters greater than size 1 (e.g. a population dataset of children where there are few multiples) there appears to be less need to adjust for clustering.
ERIC Educational Resources Information Center
Leow, Christine; Wen, Xiaoli; Korfmacher, Jon
2015-01-01
This article compares regression modeling and propensity score analysis as different types of statistical techniques used in addressing selection bias when estimating the impact of two-year versus one-year Head Start on children's school readiness. The analyses were based on the national Head Start secondary dataset. After controlling for…
Saunders, Christina T; Blume, Jeffrey D
2017-10-26
Mediation analysis explores the degree to which an exposure's effect on an outcome is diverted through a mediating variable. We describe a classical regression framework for conducting mediation analyses in which estimates of causal mediation effects and their variance are obtained from the fit of a single regression model. The vector of changes in exposure pathway coefficients, which we named the essential mediation components (EMCs), is used to estimate standard causal mediation effects. Because these effects are often simple functions of the EMCs, an analytical expression for their model-based variance follows directly. Given this formula, it is instructive to revisit the performance of routinely used variance approximations (e.g., delta method and resampling methods). Requiring the fit of only one model reduces the computation time required for complex mediation analyses and permits the use of a rich suite of regression tools that are not easily implemented on a system of three equations, as would be required in the Baron-Kenny framework. Using data from the BRAIN-ICU study, we provide examples to illustrate the advantages of this framework and compare it with the existing approaches. © The Author 2017. Published by Oxford University Press.
Conditional Poisson models: a flexible alternative to conditional logistic case cross-over analysis.
Armstrong, Ben G; Gasparrini, Antonio; Tobias, Aurelio
2014-11-24
The time stratified case cross-over approach is a popular alternative to conventional time series regression for analysing associations between time series of environmental exposures (air pollution, weather) and counts of health outcomes. These are almost always analyzed using conditional logistic regression on data expanded to case-control (case crossover) format, but this has some limitations. In particular adjusting for overdispersion and auto-correlation in the counts is not possible. It has been established that a Poisson model for counts with stratum indicators gives identical estimates to those from conditional logistic regression and does not have these limitations, but it is little used, probably because of the overheads in estimating many stratum parameters. The conditional Poisson model avoids estimating stratum parameters by conditioning on the total event count in each stratum, thus simplifying the computing and increasing the number of strata for which fitting is feasible compared with the standard unconditional Poisson model. Unlike the conditional logistic model, the conditional Poisson model does not require expanding the data, and can adjust for overdispersion and auto-correlation. It is available in Stata, R, and other packages. By applying to some real data and using simulations, we demonstrate that conditional Poisson models were simpler to code and shorter to run than are conditional logistic analyses and can be fitted to larger data sets than possible with standard Poisson models. Allowing for overdispersion or autocorrelation was possible with the conditional Poisson model but when not required this model gave identical estimates to those from conditional logistic regression. Conditional Poisson regression models provide an alternative to case crossover analysis of stratified time series data with some advantages. The conditional Poisson model can also be used in other contexts in which primary control for confounding is by fine stratification.
Taljaard, Monica; McKenzie, Joanne E; Ramsay, Craig R; Grimshaw, Jeremy M
2014-06-19
An interrupted time series design is a powerful quasi-experimental approach for evaluating effects of interventions introduced at a specific point in time. To utilize the strength of this design, a modification to standard regression analysis, such as segmented regression, is required. In segmented regression analysis, the change in intercept and/or slope from pre- to post-intervention is estimated and used to test causal hypotheses about the intervention. We illustrate segmented regression using data from a previously published study that evaluated the effectiveness of a collaborative intervention to improve quality in pre-hospital ambulance care for acute myocardial infarction (AMI) and stroke. In the original analysis, a standard regression model was used with time as a continuous variable. We contrast the results from this standard regression analysis with those from segmented regression analysis. We discuss the limitations of the former and advantages of the latter, as well as the challenges of using segmented regression in analysing complex quality improvement interventions. Based on the estimated change in intercept and slope from pre- to post-intervention using segmented regression, we found insufficient evidence of a statistically significant effect on quality of care for stroke, although potential clinically important effects for AMI cannot be ruled out. Segmented regression analysis is the recommended approach for analysing data from an interrupted time series study. Several modifications to the basic segmented regression analysis approach are available to deal with challenges arising in the evaluation of complex quality improvement interventions.
NASA Astrophysics Data System (ADS)
Pradhan, Biswajeet
2010-05-01
This paper presents the results of the cross-validation of a multivariate logistic regression model using remote sensing data and GIS for landslide hazard analysis on the Penang, Cameron, and Selangor areas in Malaysia. Landslide locations in the study areas were identified by interpreting aerial photographs and satellite images, supported by field surveys. SPOT 5 and Landsat TM satellite imagery were used to map landcover and vegetation index, respectively. Maps of topography, soil type, lineaments and land cover were constructed from the spatial datasets. Ten factors which influence landslide occurrence, i.e., slope, aspect, curvature, distance from drainage, lithology, distance from lineaments, soil type, landcover, rainfall precipitation, and normalized difference vegetation index (ndvi), were extracted from the spatial database and the logistic regression coefficient of each factor was computed. Then the landslide hazard was analysed using the multivariate logistic regression coefficients derived not only from the data for the respective area but also using the logistic regression coefficients calculated from each of the other two areas (nine hazard maps in all) as a cross-validation of the model. For verification of the model, the results of the analyses were then compared with the field-verified landslide locations. Among the three cases of the application of logistic regression coefficient in the same study area, the case of Selangor based on the Selangor logistic regression coefficients showed the highest accuracy (94%), where as Penang based on the Penang coefficients showed the lowest accuracy (86%). Similarly, among the six cases from the cross application of logistic regression coefficient in other two areas, the case of Selangor based on logistic coefficient of Cameron showed highest (90%) prediction accuracy where as the case of Penang based on the Selangor logistic regression coefficients showed the lowest accuracy (79%). Qualitatively, the cross application model yields reasonable results which can be used for preliminary landslide hazard mapping.
Prunier, J G; Colyn, M; Legendre, X; Nimon, K F; Flamand, M C
2015-01-01
Direct gradient analyses in spatial genetics provide unique opportunities to describe the inherent complexity of genetic variation in wildlife species and are the object of many methodological developments. However, multicollinearity among explanatory variables is a systemic issue in multivariate regression analyses and is likely to cause serious difficulties in properly interpreting results of direct gradient analyses, with the risk of erroneous conclusions, misdirected research and inefficient or counterproductive conservation measures. Using simulated data sets along with linear and logistic regressions on distance matrices, we illustrate how commonality analysis (CA), a detailed variance-partitioning procedure that was recently introduced in the field of ecology, can be used to deal with nonindependence among spatial predictors. By decomposing model fit indices into unique and common (or shared) variance components, CA allows identifying the location and magnitude of multicollinearity, revealing spurious correlations and thus thoroughly improving the interpretation of multivariate regressions. Despite a few inherent limitations, especially in the case of resistance model optimization, this review highlights the great potential of CA to account for complex multicollinearity patterns in spatial genetics and identifies future applications and lines of research. We strongly urge spatial geneticists to systematically investigate commonalities when performing direct gradient analyses. © 2014 John Wiley & Sons Ltd.
On the equivalence of case-crossover and time series methods in environmental epidemiology.
Lu, Yun; Zeger, Scott L
2007-04-01
The case-crossover design was introduced in epidemiology 15 years ago as a method for studying the effects of a risk factor on a health event using only cases. The idea is to compare a case's exposure immediately prior to or during the case-defining event with that same person's exposure at otherwise similar "reference" times. An alternative approach to the analysis of daily exposure and case-only data is time series analysis. Here, log-linear regression models express the expected total number of events on each day as a function of the exposure level and potential confounding variables. In time series analyses of air pollution, smooth functions of time and weather are the main confounders. Time series and case-crossover methods are often viewed as competing methods. In this paper, we show that case-crossover using conditional logistic regression is a special case of time series analysis when there is a common exposure such as in air pollution studies. This equivalence provides computational convenience for case-crossover analyses and a better understanding of time series models. Time series log-linear regression accounts for overdispersion of the Poisson variance, while case-crossover analyses typically do not. This equivalence also permits model checking for case-crossover data using standard log-linear model diagnostics.
Prediction by regression and intrarange data scatter in surface-process studies
Toy, T.J.; Osterkamp, W.R.; Renard, K.G.
1993-01-01
Modeling is a major component of contemporary earth science, and regression analysis occupies a central position in the parameterization, calibration, and validation of geomorphic and hydrologic models. Although this methodology can be used in many ways, we are primarily concerned with the prediction of values for one variable from another variable. Examination of the literature reveals considerable inconsistency in the presentation of the results of regression analysis and the occurrence of patterns in the scatter of data points about the regression line. Both circumstances confound utilization and evaluation of the models. Statisticians are well aware of various problems associated with the use of regression analysis and offer improved practices; often, however, their guidelines are not followed. After a review of the aforementioned circumstances and until standard criteria for model evaluation become established, we recommend, as a minimum, inclusion of scatter diagrams, the standard error of the estimate, and sample size in reporting the results of regression analyses for most surface-process studies. ?? 1993 Springer-Verlag.
Habitat features and predictive habitat modeling for the Colorado chipmunk in southern New Mexico
Rivieccio, M.; Thompson, B.C.; Gould, W.R.; Boykin, K.G.
2003-01-01
Two subspecies of Colorado chipmunk (state threatened and federal species of concern) occur in southern New Mexico: Tamias quadrivittatus australis in the Organ Mountains and T. q. oscuraensis in the Oscura Mountains. We developed a GIS model of potentially suitable habitat based on vegetation and elevation features, evaluated site classifications of the GIS model, and determined vegetation and terrain features associated with chipmunk occurrence. We compared GIS model classifications with actual vegetation and elevation features measured at 37 sites. At 60 sites we measured 18 habitat variables regarding slope, aspect, tree species, shrub species, and ground cover. We used logistic regression to analyze habitat variables associated with chipmunk presence/absence. All (100%) 37 sample sites (28 predicted suitable, 9 predicted unsuitable) were classified correctly by the GIS model regarding elevation and vegetation. For 28 sites predicted suitable by the GIS model, 18 sites (64%) appeared visually suitable based on habitat variables selected from logistic regression analyses, of which 10 sites (36%) were specifically predicted as suitable habitat via logistic regression. We detected chipmunks at 70% of sites deemed suitable via the logistic regression models. Shrub cover, tree density, plant proximity, presence of logs, and presence of rock outcrop were retained in the logistic model for the Oscura Mountains; litter, shrub cover, and grass cover were retained in the logistic model for the Organ Mountains. Evaluation of predictive models illustrates the need for multi-stage analyses to best judge performance. Microhabitat analyses indicate prospective needs for different management strategies between the subspecies. Sensitivities of each population of the Colorado chipmunk to natural and prescribed fire suggest that partial burnings of areas inhabited by Colorado chipmunks in southern New Mexico may be beneficial. These partial burnings may later help avoid a fire that could substantially reduce habitat of chipmunks over a mountain range.
Sacks, Jason D; Ito, Kazuhiko; Wilson, William E; Neas, Lucas M
2012-10-01
With the advent of multicity studies, uniform statistical approaches have been developed to examine air pollution-mortality associations across cities. To assess the sensitivity of the air pollution-mortality association to different model specifications in a single and multipollutant context, the authors applied various regression models developed in previous multicity time-series studies of air pollution and mortality to data from Philadelphia, Pennsylvania (May 1992-September 1995). Single-pollutant analyses used daily cardiovascular mortality, fine particulate matter (particles with an aerodynamic diameter ≤2.5 µm; PM(2.5)), speciated PM(2.5), and gaseous pollutant data, while multipollutant analyses used source factors identified through principal component analysis. In single-pollutant analyses, risk estimates were relatively consistent across models for most PM(2.5) components and gaseous pollutants. However, risk estimates were inconsistent for ozone in all-year and warm-season analyses. Principal component analysis yielded factors with species associated with traffic, crustal material, residual oil, and coal. Risk estimates for these factors exhibited less sensitivity to alternative regression models compared with single-pollutant models. Factors associated with traffic and crustal material showed consistently positive associations in the warm season, while the coal combustion factor showed consistently positive associations in the cold season. Overall, mortality risk estimates examined using a source-oriented approach yielded more stable and precise risk estimates, compared with single-pollutant analyses.
Application of logistic regression to case-control association studies involving two causative loci.
North, Bernard V; Curtis, David; Sham, Pak C
2005-01-01
Models in which two susceptibility loci jointly influence the risk of developing disease can be explored using logistic regression analysis. Comparison of likelihoods of models incorporating different sets of disease model parameters allows inferences to be drawn regarding the nature of the joint effect of the loci. We have simulated case-control samples generated assuming different two-locus models and then analysed them using logistic regression. We show that this method is practicable and that, for the models we have used, it can be expected to allow useful inferences to be drawn from sample sizes consisting of hundreds of subjects. Interactions between loci can be explored, but interactive effects do not exactly correspond with classical definitions of epistasis. We have particularly examined the issue of the extent to which it is helpful to utilise information from a previously identified locus when investigating a second, unknown locus. We show that for some models conditional analysis can have substantially greater power while for others unconditional analysis can be more powerful. Hence we conclude that in general both conditional and unconditional analyses should be performed when searching for additional loci.
Ngwa, Julius S; Cabral, Howard J; Cheng, Debbie M; Pencina, Michael J; Gagnon, David R; LaValley, Michael P; Cupples, L Adrienne
2016-11-03
Typical survival studies follow individuals to an event and measure explanatory variables for that event, sometimes repeatedly over the course of follow up. The Cox regression model has been used widely in the analyses of time to diagnosis or death from disease. The associations between the survival outcome and time dependent measures may be biased unless they are modeled appropriately. In this paper we explore the Time Dependent Cox Regression Model (TDCM), which quantifies the effect of repeated measures of covariates in the analysis of time to event data. This model is commonly used in biomedical research but sometimes does not explicitly adjust for the times at which time dependent explanatory variables are measured. This approach can yield different estimates of association compared to a model that adjusts for these times. In order to address the question of how different these estimates are from a statistical perspective, we compare the TDCM to Pooled Logistic Regression (PLR) and Cross Sectional Pooling (CSP), considering models that adjust and do not adjust for time in PLR and CSP. In a series of simulations we found that time adjusted CSP provided identical results to the TDCM while the PLR showed larger parameter estimates compared to the time adjusted CSP and the TDCM in scenarios with high event rates. We also observed upwardly biased estimates in the unadjusted CSP and unadjusted PLR methods. The time adjusted PLR had a positive bias in the time dependent Age effect with reduced bias when the event rate is low. The PLR methods showed a negative bias in the Sex effect, a subject level covariate, when compared to the other methods. The Cox models yielded reliable estimates for the Sex effect in all scenarios considered. We conclude that survival analyses that explicitly account in the statistical model for the times at which time dependent covariates are measured provide more reliable estimates compared to unadjusted analyses. We present results from the Framingham Heart Study in which lipid measurements and myocardial infarction data events were collected over a period of 26 years.
Campos-Filho, N; Franco, E L
1989-02-01
A frequent procedure in matched case-control studies is to report results from the multivariate unmatched analyses if they do not differ substantially from the ones obtained after conditioning on the matching variables. Although conceptually simple, this rule requires that an extensive series of logistic regression models be evaluated by both the conditional and unconditional maximum likelihood methods. Most computer programs for logistic regression employ only one maximum likelihood method, which requires that the analyses be performed in separate steps. This paper describes a Pascal microcomputer (IBM PC) program that performs multiple logistic regression by both maximum likelihood estimation methods, which obviates the need for switching between programs to obtain relative risk estimates from both matched and unmatched analyses. The program calculates most standard statistics and allows factoring of categorical or continuous variables by two distinct methods of contrast. A built-in, descriptive statistics option allows the user to inspect the distribution of cases and controls across categories of any given variable.
Artificial Neural Network for the Prediction of Chromosomal Abnormalities in Azoospermic Males.
Akinsal, Emre Can; Haznedar, Bulent; Baydilli, Numan; Kalinli, Adem; Ozturk, Ahmet; Ekmekçioğlu, Oğuz
2018-02-04
To evaluate whether an artifical neural network helps to diagnose any chromosomal abnormalities in azoospermic males. The data of azoospermic males attending to a tertiary academic referral center were evaluated retrospectively. Height, total testicular volume, follicle stimulating hormone, luteinising hormone, total testosterone and ejaculate volume of the patients were used for the analyses. In artificial neural network, the data of 310 azoospermics were used as the education and 115 as the test set. Logistic regression analyses and discriminant analyses were performed for statistical analyses. The tests were re-analysed with a neural network. Both logistic regression analyses and artificial neural network predicted the presence or absence of chromosomal abnormalities with more than 95% accuracy. The use of artificial neural network model has yielded satisfactory results in terms of distinguishing patients whether they have any chromosomal abnormality or not.
Genetic modelling of test day records in dairy sheep using orthogonal Legendre polynomials.
Kominakis, A; Volanis, M; Rogdakis, E
2001-03-01
Test day milk yields of three lactations in Sfakia sheep were analyzed fitting a random regression (RR) model, regressing on orthogonal polynomials of the stage of the lactation period, i.e. days in milk. Univariate (UV) and multivariate (MV) analyses were also performed for four stages of the lactation period, represented by average days in milk, i.e. 15, 45, 70 and 105 days, to compare estimates obtained from RR models with estimates from UV and MV analyses. The total number of test day records were 790, 1314 and 1041 obtained from 214, 342 and 303 ewes in the first, second and third lactation, respectively. Error variances and covariances between regression coefficients were estimated by restricted maximum likelihood. Models were compared using likelihood ratio tests (LRTs). Log likelihoods were not significantly reduced when the rank of the orthogonal Legendre polynomials (LPs) of lactation stage was reduced from 4 to 2 and homogenous variances for lactation stages within lactations were considered. Mean weighted heritability estimates with RR models were 0.19, 0.09 and 0.08 for first, second and third lactation, respectively. The respective estimates obtained from UV analyses were 0.14, 0.12 and 0.08, respectively. Mean permanent environmental variance, as a proportion of the total, was high at all stages and lactations ranging from 0.54 to 0.71. Within lactations, genetic and permanent environmental correlations between lactation stages were in the range from 0.36 to 0.99 and 0.76 to 0.99, respectively. Genetic parameters for additive genetic and permanent environmental effects obtained from RR models were different from those obtained from UV and MV analyses.
Cruz, Antonio M; Barr, Cameron; Puñales-Pozo, Elsa
2008-01-01
This research's main goals were to build a predictor for a turnaround time (TAT) indicator for estimating its values and use a numerical clustering technique for finding possible causes of undesirable TAT values. The following stages were used: domain understanding, data characterisation and sample reduction and insight characterisation. Building the TAT indicator multiple linear regression predictor and clustering techniques were used for improving corrective maintenance task efficiency in a clinical engineering department (CED). The indicator being studied was turnaround time (TAT). Multiple linear regression was used for building a predictive TAT value model. The variables contributing to such model were clinical engineering department response time (CE(rt), 0.415 positive coefficient), stock service response time (Stock(rt), 0.734 positive coefficient), priority level (0.21 positive coefficient) and service time (0.06 positive coefficient). The regression process showed heavy reliance on Stock(rt), CE(rt) and priority, in that order. Clustering techniques revealed the main causes of high TAT values. This examination has provided a means for analysing current technical service quality and effectiveness. In doing so, it has demonstrated a process for identifying areas and methods of improvement and a model against which to analyse these methods' effectiveness.
Hüls, Anke; Frömke, Cornelia; Ickstadt, Katja; Hille, Katja; Hering, Johanna; von Münchhausen, Christiane; Hartmann, Maria; Kreienbrock, Lothar
2017-01-01
Antimicrobial resistance in livestock is a matter of general concern. To develop hygiene measures and methods for resistance prevention and control, epidemiological studies on a population level are needed to detect factors associated with antimicrobial resistance in livestock holdings. In general, regression models are used to describe these relationships between environmental factors and resistance outcome. Besides the study design, the correlation structures of the different outcomes of antibiotic resistance and structural zero measurements on the resistance outcome as well as on the exposure side are challenges for the epidemiological model building process. The use of appropriate regression models that acknowledge these complexities is essential to assure valid epidemiological interpretations. The aims of this paper are (i) to explain the model building process comparing several competing models for count data (negative binomial model, quasi-Poisson model, zero-inflated model, and hurdle model) and (ii) to compare these models using data from a cross-sectional study on antibiotic resistance in animal husbandry. These goals are essential to evaluate which model is most suitable to identify potential prevention measures. The dataset used as an example in our analyses was generated initially to study the prevalence and associated factors for the appearance of cefotaxime-resistant Escherichia coli in 48 German fattening pig farms. For each farm, the outcome was the count of samples with resistant bacteria. There was almost no overdispersion and only moderate evidence of excess zeros in the data. Our analyses show that it is essential to evaluate regression models in studies analyzing the relationship between environmental factors and antibiotic resistances in livestock. After model comparison based on evaluation of model predictions, Akaike information criterion, and Pearson residuals, here the hurdle model was judged to be the most appropriate model. PMID:28620609
Brenn, T; Arnesen, E
1985-01-01
For comparative evaluation, discriminant analysis, logistic regression and Cox's model were used to select risk factors for total and coronary deaths among 6595 men aged 20-49 followed for 9 years. Groups with mortality between 5 and 93 per 1000 were considered. Discriminant analysis selected variable sets only marginally different from the logistic and Cox methods which always selected the same sets. A time-saving option, offered for both the logistic and Cox selection, showed no advantage compared with discriminant analysis. Analysing more than 3800 subjects, the logistic and Cox methods consumed, respectively, 80 and 10 times more computer time than discriminant analysis. When including the same set of variables in non-stepwise analyses, all methods estimated coefficients that in most cases were almost identical. In conclusion, discriminant analysis is advocated for preliminary or stepwise analysis, otherwise Cox's method should be used.
Gene set analysis using variance component tests.
Huang, Yen-Tsung; Lin, Xihong
2013-06-28
Gene set analyses have become increasingly important in genomic research, as many complex diseases are contributed jointly by alterations of numerous genes. Genes often coordinate together as a functional repertoire, e.g., a biological pathway/network and are highly correlated. However, most of the existing gene set analysis methods do not fully account for the correlation among the genes. Here we propose to tackle this important feature of a gene set to improve statistical power in gene set analyses. We propose to model the effects of an independent variable, e.g., exposure/biological status (yes/no), on multiple gene expression values in a gene set using a multivariate linear regression model, where the correlation among the genes is explicitly modeled using a working covariance matrix. We develop TEGS (Test for the Effect of a Gene Set), a variance component test for the gene set effects by assuming a common distribution for regression coefficients in multivariate linear regression models, and calculate the p-values using permutation and a scaled chi-square approximation. We show using simulations that type I error is protected under different choices of working covariance matrices and power is improved as the working covariance approaches the true covariance. The global test is a special case of TEGS when correlation among genes in a gene set is ignored. Using both simulation data and a published diabetes dataset, we show that our test outperforms the commonly used approaches, the global test and gene set enrichment analysis (GSEA). We develop a gene set analyses method (TEGS) under the multivariate regression framework, which directly models the interdependence of the expression values in a gene set using a working covariance. TEGS outperforms two widely used methods, GSEA and global test in both simulation and a diabetes microarray data.
Shrinkage regression-based methods for microarray missing value imputation.
Wang, Hsiuying; Chiu, Chia-Chun; Wu, Yi-Ching; Wu, Wei-Sheng
2013-01-01
Missing values commonly occur in the microarray data, which usually contain more than 5% missing values with up to 90% of genes affected. Inaccurate missing value estimation results in reducing the power of downstream microarray data analyses. Many types of methods have been developed to estimate missing values. Among them, the regression-based methods are very popular and have been shown to perform better than the other types of methods in many testing microarray datasets. To further improve the performances of the regression-based methods, we propose shrinkage regression-based methods. Our methods take the advantage of the correlation structure in the microarray data and select similar genes for the target gene by Pearson correlation coefficients. Besides, our methods incorporate the least squares principle, utilize a shrinkage estimation approach to adjust the coefficients of the regression model, and then use the new coefficients to estimate missing values. Simulation results show that the proposed methods provide more accurate missing value estimation in six testing microarray datasets than the existing regression-based methods do. Imputation of missing values is a very important aspect of microarray data analyses because most of the downstream analyses require a complete dataset. Therefore, exploring accurate and efficient methods for estimating missing values has become an essential issue. Since our proposed shrinkage regression-based methods can provide accurate missing value estimation, they are competitive alternatives to the existing regression-based methods.
Xu, Yun; Muhamadali, Howbeer; Sayqal, Ali; Dixon, Neil; Goodacre, Royston
2016-10-28
Partial least squares (PLS) is one of the most commonly used supervised modelling approaches for analysing multivariate metabolomics data. PLS is typically employed as either a regression model (PLS-R) or a classification model (PLS-DA). However, in metabolomics studies it is common to investigate multiple, potentially interacting, factors simultaneously following a specific experimental design. Such data often cannot be considered as a "pure" regression or a classification problem. Nevertheless, these data have often still been treated as a regression or classification problem and this could lead to ambiguous results. In this study, we investigated the feasibility of designing a hybrid target matrix Y that better reflects the experimental design than simple regression or binary class membership coding commonly used in PLS modelling. The new design of Y coding was based on the same principle used by structural modelling in machine learning techniques. Two real metabolomics datasets were used as examples to illustrate how the new Y coding can improve the interpretability of the PLS model compared to classic regression/classification coding.
NASA Astrophysics Data System (ADS)
Yilmaz, Isik; Keskin, Inan; Marschalko, Marian; Bednarik, Martin
2010-05-01
This study compares the GIS based collapse susceptibility mapping methods such as; conditional probability (CP), logistic regression (LR) and artificial neural networks (ANN) applied in gypsum rock masses in Sivas basin (Turkey). Digital Elevation Model (DEM) was first constructed using GIS software. Collapse-related factors, directly or indirectly related to the causes of collapse occurrence, such as distance from faults, slope angle and aspect, topographical elevation, distance from drainage, topographic wetness index- TWI, stream power index- SPI, Normalized Difference Vegetation Index (NDVI) by means of vegetation cover, distance from roads and settlements were used in the collapse susceptibility analyses. In the last stage of the analyses, collapse susceptibility maps were produced from CP, LR and ANN models, and they were then compared by means of their validations. Area Under Curve (AUC) values obtained from all three methodologies showed that the map obtained from ANN model looks like more accurate than the other models, and the results also showed that the artificial neural networks is a usefull tool in preparation of collapse susceptibility map and highly compatible with GIS operating features. Key words: Collapse; doline; susceptibility map; gypsum; GIS; conditional probability; logistic regression; artificial neural networks.
Kupek, Emil
2006-03-15
Structural equation modelling (SEM) has been increasingly used in medical statistics for solving a system of related regression equations. However, a great obstacle for its wider use has been its difficulty in handling categorical variables within the framework of generalised linear models. A large data set with a known structure among two related outcomes and three independent variables was generated to investigate the use of Yule's transformation of odds ratio (OR) into Q-metric by (OR-1)/(OR+1) to approximate Pearson's correlation coefficients between binary variables whose covariance structure can be further analysed by SEM. Percent of correctly classified events and non-events was compared with the classification obtained by logistic regression. The performance of SEM based on Q-metric was also checked on a small (N = 100) random sample of the data generated and on a real data set. SEM successfully recovered the generated model structure. SEM of real data suggested a significant influence of a latent confounding variable which would have not been detectable by standard logistic regression. SEM classification performance was broadly similar to that of the logistic regression. The analysis of binary data can be greatly enhanced by Yule's transformation of odds ratios into estimated correlation matrix that can be further analysed by SEM. The interpretation of results is aided by expressing them as odds ratios which are the most frequently used measure of effect in medical statistics.
Robust inference under the beta regression model with application to health care studies.
Ghosh, Abhik
2017-01-01
Data on rates, percentages, or proportions arise frequently in many different applied disciplines like medical biology, health care, psychology, and several others. In this paper, we develop a robust inference procedure for the beta regression model, which is used to describe such response variables taking values in (0, 1) through some related explanatory variables. In relation to the beta regression model, the issue of robustness has been largely ignored in the literature so far. The existing maximum likelihood-based inference has serious lack of robustness against outliers in data and generate drastically different (erroneous) inference in the presence of data contamination. Here, we develop the robust minimum density power divergence estimator and a class of robust Wald-type tests for the beta regression model along with several applications. We derive their asymptotic properties and describe their robustness theoretically through the influence function analyses. Finite sample performances of the proposed estimators and tests are examined through suitable simulation studies and real data applications in the context of health care and psychology. Although we primarily focus on the beta regression models with a fixed dispersion parameter, some indications are also provided for extension to the variable dispersion beta regression models with an application.
Terza, Joseph V; Bradford, W David; Dismuke, Clara E
2008-01-01
Objective To investigate potential bias in the use of the conventional linear instrumental variables (IV) method for the estimation of causal effects in inherently nonlinear regression settings. Data Sources Smoking Supplement to the 1979 National Health Interview Survey, National Longitudinal Alcohol Epidemiologic Survey, and simulated data. Study Design Potential bias from the use of the linear IV method in nonlinear models is assessed via simulation studies and real world data analyses in two commonly encountered regression setting: (1) models with a nonnegative outcome (e.g., a count) and a continuous endogenous regressor; and (2) models with a binary outcome and a binary endogenous regressor. Principle Findings The simulation analyses show that substantial bias in the estimation of causal effects can result from applying the conventional IV method in inherently nonlinear regression settings. Moreover, the bias is not attenuated as the sample size increases. This point is further illustrated in the survey data analyses in which IV-based estimates of the relevant causal effects diverge substantially from those obtained with appropriate nonlinear estimation methods. Conclusions We offer this research as a cautionary note to those who would opt for the use of linear specifications in inherently nonlinear settings involving endogeneity. PMID:18546544
Levine, Matthew E; Albers, David J; Hripcsak, George
2016-01-01
Time series analysis methods have been shown to reveal clinical and biological associations in data collected in the electronic health record. We wish to develop reliable high-throughput methods for identifying adverse drug effects that are easy to implement and produce readily interpretable results. To move toward this goal, we used univariate and multivariate lagged regression models to investigate associations between twenty pairs of drug orders and laboratory measurements. Multivariate lagged regression models exhibited higher sensitivity and specificity than univariate lagged regression in the 20 examples, and incorporating autoregressive terms for labs and drugs produced more robust signals in cases of known associations among the 20 example pairings. Moreover, including inpatient admission terms in the model attenuated the signals for some cases of unlikely associations, demonstrating how multivariate lagged regression models' explicit handling of context-based variables can provide a simple way to probe for health-care processes that confound analyses of EHR data.
Holsclaw, Tracy; Hallgren, Kevin A; Steyvers, Mark; Smyth, Padhraic; Atkins, David C
2015-12-01
Behavioral coding is increasingly used for studying mechanisms of change in psychosocial treatments for substance use disorders (SUDs). However, behavioral coding data typically include features that can be problematic in regression analyses, including measurement error in independent variables, non normal distributions of count outcome variables, and conflation of predictor and outcome variables with third variables, such as session length. Methodological research in econometrics has shown that these issues can lead to biased parameter estimates, inaccurate standard errors, and increased Type I and Type II error rates, yet these statistical issues are not widely known within SUD treatment research, or more generally, within psychotherapy coding research. Using minimally technical language intended for a broad audience of SUD treatment researchers, the present paper illustrates the nature in which these data issues are problematic. We draw on real-world data and simulation-based examples to illustrate how these data features can bias estimation of parameters and interpretation of models. A weighted negative binomial regression is introduced as an alternative to ordinary linear regression that appropriately addresses the data characteristics common to SUD treatment behavioral coding data. We conclude by demonstrating how to use and interpret these models with data from a study of motivational interviewing. SPSS and R syntax for weighted negative binomial regression models is included in online supplemental materials. (c) 2016 APA, all rights reserved).
Holsclaw, Tracy; Hallgren, Kevin A.; Steyvers, Mark; Smyth, Padhraic; Atkins, David C.
2015-01-01
Behavioral coding is increasingly used for studying mechanisms of change in psychosocial treatments for substance use disorders (SUDs). However, behavioral coding data typically include features that can be problematic in regression analyses, including measurement error in independent variables, non-normal distributions of count outcome variables, and conflation of predictor and outcome variables with third variables, such as session length. Methodological research in econometrics has shown that these issues can lead to biased parameter estimates, inaccurate standard errors, and increased type-I and type-II error rates, yet these statistical issues are not widely known within SUD treatment research, or more generally, within psychotherapy coding research. Using minimally-technical language intended for a broad audience of SUD treatment researchers, the present paper illustrates the nature in which these data issues are problematic. We draw on real-world data and simulation-based examples to illustrate how these data features can bias estimation of parameters and interpretation of models. A weighted negative binomial regression is introduced as an alternative to ordinary linear regression that appropriately addresses the data characteristics common to SUD treatment behavioral coding data. We conclude by demonstrating how to use and interpret these models with data from a study of motivational interviewing. SPSS and R syntax for weighted negative binomial regression models is included in supplementary materials. PMID:26098126
Estimating effects of limiting factors with regression quantiles
Cade, B.S.; Terrell, J.W.; Schroeder, R.L.
1999-01-01
In a recent Concepts paper in Ecology, Thomson et al. emphasized that assumptions of conventional correlation and regression analyses fundamentally conflict with the ecological concept of limiting factors, and they called for new statistical procedures to address this problem. The analytical issue is that unmeasured factors may be the active limiting constraint and may induce a pattern of unequal variation in the biological response variable through an interaction with the measured factors. Consequently, changes near the maxima, rather than at the center of response distributions, are better estimates of the effects expected when the observed factor is the active limiting constraint. Regression quantiles provide estimates for linear models fit to any part of a response distribution, including near the upper bounds, and require minimal assumptions about the form of the error distribution. Regression quantiles extend the concept of one-sample quantiles to the linear model by solving an optimization problem of minimizing an asymmetric function of absolute errors. Rank-score tests for regression quantiles provide tests of hypotheses and confidence intervals for parameters in linear models with heteroscedastic errors, conditions likely to occur in models of limiting ecological relations. We used selected regression quantiles (e.g., 5th, 10th, ..., 95th) and confidence intervals to test hypotheses that parameters equal zero for estimated changes in average annual acorn biomass due to forest canopy cover of oak (Quercus spp.) and oak species diversity. Regression quantiles also were used to estimate changes in glacier lily (Erythronium grandiflorum) seedling numbers as a function of lily flower numbers, rockiness, and pocket gopher (Thomomys talpoides fossor) activity, data that motivated the query by Thomson et al. for new statistical procedures. Both example applications showed that effects of limiting factors estimated by changes in some upper regression quantile (e.g., 90-95th) were greater than if effects were estimated by changes in the means from standard linear model procedures. Estimating a range of regression quantiles (e.g., 5-95th) provides a comprehensive description of biological response patterns for exploratory and inferential analyses in observational studies of limiting factors, especially when sampling large spatial and temporal scales.
Zhu, Xiang; Stephens, Matthew
2017-01-01
Bayesian methods for large-scale multiple regression provide attractive approaches to the analysis of genome-wide association studies (GWAS). For example, they can estimate heritability of complex traits, allowing for both polygenic and sparse models; and by incorporating external genomic data into the priors, they can increase power and yield new biological insights. However, these methods require access to individual genotypes and phenotypes, which are often not easily available. Here we provide a framework for performing these analyses without individual-level data. Specifically, we introduce a “Regression with Summary Statistics” (RSS) likelihood, which relates the multiple regression coefficients to univariate regression results that are often easily available. The RSS likelihood requires estimates of correlations among covariates (SNPs), which also can be obtained from public databases. We perform Bayesian multiple regression analysis by combining the RSS likelihood with previously proposed prior distributions, sampling posteriors by Markov chain Monte Carlo. In a wide range of simulations RSS performs similarly to analyses using the individual data, both for estimating heritability and detecting associations. We apply RSS to a GWAS of human height that contains 253,288 individuals typed at 1.06 million SNPs, for which analyses of individual-level data are practically impossible. Estimates of heritability (52%) are consistent with, but more precise, than previous results using subsets of these data. We also identify many previously unreported loci that show evidence for association with height in our analyses. Software is available at https://github.com/stephenslab/rss. PMID:29399241
Mortality rates in OECD countries converged during the period 1990-2010.
Bremberg, Sven G
2017-06-01
Since the scientific revolution of the 18th century, human health has gradually improved, but there is no unifying theory that explains this improvement in health. Studies of macrodeterminants have produced conflicting results. Most studies have analysed health at a given point in time as the outcome; however, the rate of improvement in health might be a more appropriate outcome. Twenty-eight OECD member countries were selected for analysis in the period 1990-2010. The main outcomes studied, in six age groups, were the national rates of decrease in mortality in the period 1990-2010. The effects of seven potential determinants on the rates of decrease in mortality were analysed in linear multiple regression models using least squares, controlling for country-specific history constants, which represent the mortality rate in 1990. The multiple regression analyses started with models that only included mortality rates in 1990 as determinants. These models explained 87% of the intercountry variation in the children aged 1-4 years and 51% in adults aged 55-74 years. When added to the regression equations, the seven determinants did not seem to significantly increase the explanatory power of the equations. The analyses indicated a decrease in mortality in all nations and in all age groups. The development of mortality rates in the different nations demonstrated significant catch-up effects. Therefore an important objective of the national public health sector seems to be to reduce the delay between international research findings and the universal implementation of relevant innovations.
2013-01-01
Background Malnutrition is one of the principal causes of child mortality in developing countries including Bangladesh. According to our knowledge, most of the available studies, that addressed the issue of malnutrition among under-five children, considered the categorical (dichotomous/polychotomous) outcome variables and applied logistic regression (binary/multinomial) to find their predictors. In this study malnutrition variable (i.e. outcome) is defined as the number of under-five malnourished children in a family, which is a non-negative count variable. The purposes of the study are (i) to demonstrate the applicability of the generalized Poisson regression (GPR) model as an alternative of other statistical methods and (ii) to find some predictors of this outcome variable. Methods The data is extracted from the Bangladesh Demographic and Health Survey (BDHS) 2007. Briefly, this survey employs a nationally representative sample which is based on a two-stage stratified sample of households. A total of 4,460 under-five children is analysed using various statistical techniques namely Chi-square test and GPR model. Results The GPR model (as compared to the standard Poisson regression and negative Binomial regression) is found to be justified to study the above-mentioned outcome variable because of its under-dispersion (variance < mean) property. Our study also identify several significant predictors of the outcome variable namely mother’s education, father’s education, wealth index, sanitation status, source of drinking water, and total number of children ever born to a woman. Conclusions Consistencies of our findings in light of many other studies suggest that the GPR model is an ideal alternative of other statistical models to analyse the number of under-five malnourished children in a family. Strategies based on significant predictors may improve the nutritional status of children in Bangladesh. PMID:23297699
Ecologists are often faced with problem of small sample size, correlated and large number of predictors, and high noise-to-signal relationships. This necessitates excluding important variables from the model when applying standard multiple or multivariate regression analyses. In ...
Mauer, Michael; Caramori, Maria Luiza; Fioretto, Paola; Najafian, Behzad
2015-06-01
Studies of structural-functional relationships have improved understanding of the natural history of diabetic nephropathy (DN). However, in order to consider structural end points for clinical trials, the robustness of the resultant models needs to be verified. This study examined whether structural-functional relationship models derived from a large cohort of type 1 diabetic (T1D) patients with a wide range of renal function are robust. The predictability of models derived from multiple regression analysis and piecewise linear regression analysis was also compared. T1D patients (n = 161) with research renal biopsies were divided into two equal groups matched for albumin excretion rate (AER). Models to explain AER and glomerular filtration rate (GFR) by classical DN lesions in one group (T1D-model, or T1D-M) were applied to the other group (T1D-test, or T1D-T) and regression analyses were performed. T1D-M-derived models explained 70 and 63% of AER variance and 32 and 21% of GFR variance in T1D-M and T1D-T, respectively, supporting the substantial robustness of the models. Piecewise linear regression analyses substantially improved predictability of the models with 83% of AER variance and 66% of GFR variance explained by classical DN glomerular lesions alone. These studies demonstrate that DN structural-functional relationship models are robust, and if appropriate models are used, glomerular lesions alone explain a major proportion of AER and GFR variance in T1D patients. © The Author 2014. Published by Oxford University Press on behalf of ERA-EDTA. All rights reserved.
Bootstrap investigation of the stability of a Cox regression model.
Altman, D G; Andersen, P K
1989-07-01
We describe a bootstrap investigation of the stability of a Cox proportional hazards regression model resulting from the analysis of a clinical trial of azathioprine versus placebo in patients with primary biliary cirrhosis. We have considered stability to refer both to the choice of variables included in the model and, more importantly, to the predictive ability of the model. In stepwise Cox regression analyses of 100 bootstrap samples using 17 candidate variables, the most frequently selected variables were those selected in the original analysis, and no other important variable was identified. Thus there was no reason to doubt the model obtained in the original analysis. For each patient in the trial, bootstrap confidence intervals were constructed for the estimated probability of surviving two years. It is shown graphically that these intervals are markedly wider than those obtained from the original model.
[How to fit and interpret multilevel models using SPSS].
Pardo, Antonio; Ruiz, Miguel A; San Martín, Rafael
2007-05-01
Hierarchic or multilevel models are used to analyse data when cases belong to known groups and sample units are selected both from the individual level and from the group level. In this work, the multilevel models most commonly discussed in the statistic literature are described, explaining how to fit these models using the SPSS program (any version as of the 11 th ) and how to interpret the outcomes of the analysis. Five particular models are described, fitted, and interpreted: (1) one-way analysis of variance with random effects, (2) regression analysis with means-as-outcomes, (3) one-way analysis of covariance with random effects, (4) regression analysis with random coefficients, and (5) regression analysis with means- and slopes-as-outcomes. All models are explained, trying to make them understandable to researchers in health and behaviour sciences.
An overview of longitudinal data analysis methods for neurological research.
Locascio, Joseph J; Atri, Alireza
2011-01-01
The purpose of this article is to provide a concise, broad and readily accessible overview of longitudinal data analysis methods, aimed to be a practical guide for clinical investigators in neurology. In general, we advise that older, traditional methods, including (1) simple regression of the dependent variable on a time measure, (2) analyzing a single summary subject level number that indexes changes for each subject and (3) a general linear model approach with a fixed-subject effect, should be reserved for quick, simple or preliminary analyses. We advocate the general use of mixed-random and fixed-effect regression models for analyses of most longitudinal clinical studies. Under restrictive situations or to provide validation, we recommend: (1) repeated-measure analysis of covariance (ANCOVA), (2) ANCOVA for two time points, (3) generalized estimating equations and (4) latent growth curve/structural equation models.
Competing risks models and time-dependent covariates
Barnett, Adrian; Graves, Nick
2008-01-01
New statistical models for analysing survival data in an intensive care unit context have recently been developed. Two models that offer significant advantages over standard survival analyses are competing risks models and multistate models. Wolkewitz and colleagues used a competing risks model to examine survival times for nosocomial pneumonia and mortality. Their model was able to incorporate time-dependent covariates and so examine how risk factors that changed with time affected the chances of infection or death. We briefly explain how an alternative modelling technique (using logistic regression) can more fully exploit time-dependent covariates for this type of data. PMID:18423067
Sabes-Figuera, Ramon; McCrone, Paul; Kendricks, Antony
2013-04-01
Economic evaluation analyses can be enhanced by employing regression methods, allowing for the identification of important sub-groups and to adjust for imperfect randomisation in clinical trials or to analyse non-randomised data. To explore the benefits of combining regression techniques and the standard Bayesian approach to refine cost-effectiveness analyses using data from randomised clinical trials. Data from a randomised trial of anti-depressant treatment were analysed and a regression model was used to explore the factors that have an impact on the net benefit (NB) statistic with the aim of using these findings to adjust the cost-effectiveness acceptability curves. Exploratory sub-samples' analyses were carried out to explore possible differences in cost-effectiveness. Results The analysis found that having suffered a previous similar depression is strongly correlated with a lower NB, independent of the outcome measure or follow-up point. In patients with previous similar depression, adding an selective serotonin reuptake inhibitors (SSRI) to supportive care for mild-to-moderate depression is probably cost-effective at the level used by the English National Institute for Health and Clinical Excellence to make recommendations. This analysis highlights the need for incorporation of econometric methods into cost-effectiveness analyses using the NB approach.
The relationship between biomechanical variables and driving performance during the golf swing.
Chu, Yungchien; Sell, Timothy C; Lephart, Scott M
2010-09-01
Swing kinematic and ground reaction force data from 308 golfers were analysed to identify the variables important to driving ball velocity. Regression models were applied at four selected events in the swing. The models accounted for 44-74% of variance in ball velocity. Based on the regression analyses, upper torso-pelvis separation (the X-Factor), delayed release (i.e. the initiation of movement) of the arms and wrists, trunk forward and lateral tilting, and weight-shifting during the swing were significantly related to ball velocity. Our results also verify several general coaching ideas that were considered important to increased ball velocity. The results of this study may serve as both skill and strength training guidelines for golfers.
Christensen, A L; Lundbye-Christensen, S; Dethlefsen, C
2011-12-01
Several statistical methods of assessing seasonal variation are available. Brookhart and Rothman [3] proposed a second-order moment-based estimator based on the geometrical model derived by Edwards [1], and reported that this estimator is superior in estimating the peak-to-trough ratio of seasonal variation compared with Edwards' estimator with respect to bias and mean squared error. Alternatively, seasonal variation may be modelled using a Poisson regression model, which provides flexibility in modelling the pattern of seasonal variation and adjustments for covariates. Based on a Monte Carlo simulation study three estimators, one based on the geometrical model, and two based on log-linear Poisson regression models, were evaluated in regards to bias and standard deviation (SD). We evaluated the estimators on data simulated according to schemes varying in seasonal variation and presence of a secular trend. All methods and analyses in this paper are available in the R package Peak2Trough[13]. Applying a Poisson regression model resulted in lower absolute bias and SD for data simulated according to the corresponding model assumptions. Poisson regression models had lower bias and SD for data simulated to deviate from the corresponding model assumptions than the geometrical model. This simulation study encourages the use of Poisson regression models in estimating the peak-to-trough ratio of seasonal variation as opposed to the geometrical model. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.
ERIC Educational Resources Information Center
Stapleton, Laura M.
2008-01-01
This article discusses replication sampling variance estimation techniques that are often applied in analyses using data from complex sampling designs: jackknife repeated replication, balanced repeated replication, and bootstrapping. These techniques are used with traditional analyses such as regression, but are currently not used with structural…
Development and validation of prognostic models in metastatic breast cancer: a GOCS study.
Rabinovich, M; Vallejo, C; Bianco, A; Perez, J; Machiavelli, M; Leone, B; Romero, A; Rodriguez, R; Cuevas, M; Dansky, C
1992-01-01
The significance of several prognostic factors and the magnitude of their influence on response rate and survival were assessed by means of uni- and multivariate analyses in 362 patients with stage IV (UICC) breast carcinoma receiving combination chemotherapy as first systemic treatment over an 8-year period. Univariate analyses identified performance status and prior adjuvant radiotherapy as predictors of objective regression (OR), whereas the performance status, prior chemotherapy and radiotherapy (adjuvants), white blood cells count, SGOT and SGPT levels, and metastatic pattern were significantly correlated to survival. In multivariate analyses favorable characteristics associated to OR were prior adjuvant radiotherapy, no prior chemotherapy and postmenopausal status. Regarding survival, the performance status and visceral involvement were selected by the Cox model. The predictive accuracy of the logistic and the proportional hazards models was retrospectively tested in the training sample, and prospectively in a new population of 126 patients also receiving combined chemotherapy as first treatment for metastatic breast cancer. A certain overfitting to data in the training sample was observed with the regression model for response. However, the discriminative ability of the Cox model for survival was clearly confirmed.
Development of Super-Ensemble techniques for ocean analyses: the Mediterranean Sea case
NASA Astrophysics Data System (ADS)
Pistoia, Jenny; Pinardi, Nadia; Oddo, Paolo; Collins, Matthew; Korres, Gerasimos; Drillet, Yann
2017-04-01
Short-term ocean analyses for Sea Surface Temperature SST in the Mediterranean Sea can be improved by a statistical post-processing technique, called super-ensemble. This technique consists in a multi-linear regression algorithm applied to a Multi-Physics Multi-Model Super-Ensemble (MMSE) dataset, a collection of different operational forecasting analyses together with ad-hoc simulations produced by modifying selected numerical model parameterizations. A new linear regression algorithm based on Empirical Orthogonal Function filtering techniques is capable to prevent overfitting problems, even if best performances are achieved when we add correlation to the super-ensemble structure using a simple spatial filter applied after the linear regression. Our outcomes show that super-ensemble performances depend on the selection of an unbiased operator and the length of the learning period, but the quality of the generating MMSE dataset has the largest impact on the MMSE analysis Root Mean Square Error (RMSE) evaluated with respect to observed satellite SST. Lower RMSE analysis estimates result from the following choices: 15 days training period, an overconfident MMSE dataset (a subset with the higher quality ensemble members), and the least square algorithm being filtered a posteriori.
Giménez-Espert, María Del Carmen; Prado-Gascó, Vicente Javier
2018-03-01
To analyse link between empathy and emotional intelligence as a predictor of nurses' attitudes towards communication while comparing the contribution of emotional aspects and attitudinal elements on potential behaviour. Nurses' attitudes towards communication, empathy and emotional intelligence are key skills for nurses involved in patient care. There are currently no studies analysing this link, and its investigation is needed because attitudes may influence communication behaviours. Correlational study. To attain this goal, self-reported instruments (attitudes towards communication of nurses, trait emotional intelligence (Trait Emotional Meta-Mood Scale) and Jefferson Scale of Nursing Empathy (Jefferson Scale Nursing Empathy) were collected from 460 nurses between September 2015-February 2016. Two different analytical methodologies were used: traditional regression models and fuzzy-set qualitative comparative analysis models. The results of the regression model suggest that cognitive dimensions of attitude are a significant and positive predictor of the behavioural dimension. The perspective-taking dimension of empathy and the emotional-clarity dimension of emotional intelligence were significant positive predictors of the dimensions of attitudes towards communication, except for the affective dimension (for which the association was negative). The results of the fuzzy-set qualitative comparative analysis models confirm that the combination of high levels of cognitive dimension of attitudes, perspective-taking and emotional clarity explained high levels of the behavioural dimension of attitude. Empathy and emotional intelligence are predictors of nurses' attitudes towards communication, and the cognitive dimension of attitude is a good predictor of the behavioural dimension of attitudes towards communication of nurses in both regression models and fuzzy-set qualitative comparative analysis. In general, the fuzzy-set qualitative comparative analysis models appear to be better predictors than the regression models are. To evaluate current practices, establish intervention strategies and evaluate their effectiveness. The evaluation of these variables and their relationships are important in creating a satisfied and sustainable workforce and improving quality of care and patient health. © 2018 John Wiley & Sons Ltd.
An introduction to using Bayesian linear regression with clinical data.
Baldwin, Scott A; Larson, Michael J
2017-11-01
Statistical training psychology focuses on frequentist methods. Bayesian methods are an alternative to standard frequentist methods. This article provides researchers with an introduction to fundamental ideas in Bayesian modeling. We use data from an electroencephalogram (EEG) and anxiety study to illustrate Bayesian models. Specifically, the models examine the relationship between error-related negativity (ERN), a particular event-related potential, and trait anxiety. Methodological topics covered include: how to set up a regression model in a Bayesian framework, specifying priors, examining convergence of the model, visualizing and interpreting posterior distributions, interval estimates, expected and predicted values, and model comparison tools. We also discuss situations where Bayesian methods can outperform frequentist methods as well has how to specify more complicated regression models. Finally, we conclude with recommendations about reporting guidelines for those using Bayesian methods in their own research. We provide data and R code for replicating our analyses. Copyright © 2017 Elsevier Ltd. All rights reserved.
Logistic regression for circular data
NASA Astrophysics Data System (ADS)
Al-Daffaie, Kadhem; Khan, Shahjahan
2017-05-01
This paper considers the relationship between a binary response and a circular predictor. It develops the logistic regression model by employing the linear-circular regression approach. The maximum likelihood method is used to estimate the parameters. The Newton-Raphson numerical method is used to find the estimated values of the parameters. A data set from weather records of Toowoomba city is analysed by the proposed methods. Moreover, a simulation study is considered. The R software is used for all computations and simulations.
Mechanisms behind the estimation of photosynthesis traits from leaf reflectance observations
NASA Astrophysics Data System (ADS)
Dechant, Benjamin; Cuntz, Matthias; Doktor, Daniel; Vohland, Michael
2016-04-01
Many studies have investigated the reflectance-based estimation of leaf chlorophyll, water and dry matter contents of plants. Only few studies focused on photosynthesis traits, however. The maximum potential uptake of carbon dioxide under given environmental conditions is determined mainly by RuBisCO activity, limiting carboxylation, or the speed of photosynthetic electron transport. These two main limitations are represented by the maximum carboxylation capacity, V cmax,25, and the maximum electron transport rate, Jmax,25. These traits were estimated from leaf reflectance before but the mechanisms underlying the estimation remain rather speculative. The aim of this study was therefore to reveal the mechanisms behind reflectance-based estimation of V cmax,25 and Jmax,25. Leaf reflectance, photosynthetic response curves as well as nitrogen content per area, Narea, and leaf mass per area, LMA, were measured on 37 deciduous tree species. V cmax,25 and Jmax,25 were determined from the response curves. Partial Least Squares (PLS) regression models for the two photosynthesis traits V cmax,25 and Jmax,25 as well as Narea and LMA were studied using a cross-validation approach. Analyses of linear regression models based on Narea and other leaf traits estimated via PROSPECT inversion, PLS regression coefficients and model residuals were conducted in order to reveal the mechanisms behind the reflectance-based estimation. We found that V cmax,25 and Jmax,25 can be estimated from leaf reflectance with good to moderate accuracy for a large number of species and different light conditions. The dominant mechanism behind the estimations was the strong relationship between photosynthesis traits and leaf nitrogen content. This was concluded from very strong relationships between PLS regression coefficients, the model residuals as well as the prediction performance of Narea- based linear regression models compared to PLS regression models. While the PLS regression model for V cmax,25 was fully based on the correlation to Narea, the PLS regression model for Jmax,25 was not entirely based on it. Analyses of the contributions of different parts of the reflectance spectrum revealed that the information contributing to the Jmax,25 PLS regression model in addition to the main source of information, Narea, was mainly located in the visible part of the spectrum (500-900 nm). Estimated chlorophyll content could be excluded as potential source of this extra information. The PLS regression coefficients of the Jmax,25 model indicated possible contributions from chlorophyll fluorescence and cytochrome f content. In summary, we found that the main mechanism behind the estimation of V cmax,25 and Jmax,25 from leaf reflectance observations is the correlation to Narea but that there is additional information related to Jmax,25 mainly in the visible part of the spectrum.
Spatial quantile regression using INLA with applications to childhood overweight in Malawi.
Mtambo, Owen P L; Masangwi, Salule J; Kazembe, Lawrence N M
2015-04-01
Analyses of childhood overweight have mainly used mean regression. However, using quantile regression is more appropriate as it provides flexibility to analyse the determinants of overweight corresponding to quantiles of interest. The main objective of this study was to fit a Bayesian additive quantile regression model with structured spatial effects for childhood overweight in Malawi using the 2010 Malawi DHS data. Inference was fully Bayesian using R-INLA package. The significant determinants of childhood overweight ranged from socio-demographic factors such as type of residence to child and maternal factors such as child age and maternal BMI. We observed significant positive structured spatial effects on childhood overweight in some districts of Malawi. We recommended that the childhood malnutrition policy makers should consider timely interventions based on risk factors as identified in this paper including spatial targets of interventions. Copyright © 2015 Elsevier Ltd. All rights reserved.
A regularization corrected score method for nonlinear regression models with covariate error.
Zucker, David M; Gorfine, Malka; Li, Yi; Tadesse, Mahlet G; Spiegelman, Donna
2013-03-01
Many regression analyses involve explanatory variables that are measured with error, and failing to account for this error is well known to lead to biased point and interval estimates of the regression coefficients. We present here a new general method for adjusting for covariate error. Our method consists of an approximate version of the Stefanski-Nakamura corrected score approach, using the method of regularization to obtain an approximate solution of the relevant integral equation. We develop the theory in the setting of classical likelihood models; this setting covers, for example, linear regression, nonlinear regression, logistic regression, and Poisson regression. The method is extremely general in terms of the types of measurement error models covered, and is a functional method in the sense of not involving assumptions on the distribution of the true covariate. We discuss the theoretical properties of the method and present simulation results in the logistic regression setting (univariate and multivariate). For illustration, we apply the method to data from the Harvard Nurses' Health Study concerning the relationship between physical activity and breast cancer mortality in the period following a diagnosis of breast cancer. Copyright © 2013, The International Biometric Society.
Prediction models for Arabica coffee beverage quality based on aroma analyses and chemometrics.
Ribeiro, J S; Augusto, F; Salva, T J G; Ferreira, M M C
2012-11-15
In this work, soft modeling based on chemometric analyses of coffee beverage sensory data and the chromatographic profiles of volatile roasted coffee compounds is proposed to predict the scores of acidity, bitterness, flavor, cleanliness, body, and overall quality of the coffee beverage. A partial least squares (PLS) regression method was used to construct the models. The ordered predictor selection (OPS) algorithm was applied to select the compounds for the regression model of each sensory attribute in order to take only significant chromatographic peaks into account. The prediction errors of these models, using 4 or 5 latent variables, were equal to 0.28, 0.33, 0.35, 0.33, 0.34 and 0.41, for each of the attributes and compatible with the errors of the mean scores of the experts. Thus, the results proved the feasibility of using a similar methodology in on-line or routine applications to predict the sensory quality of Brazilian Arabica coffee. Copyright © 2012 Elsevier B.V. All rights reserved.
Linear models for calculating digestibile energy for sheep diets.
Fonnesbeck, P V; Christiansen, M L; Harris, L E
1981-05-01
Equations for estimating the digestible energy (DE) content of sheep diets were generated from the chemical contents and a factorial description of diets fed to lambs in digestion trials. The diet factors were two forages (alfalfa and grass hay), harvested at three stages of maturity (late vegetative, early bloom and full bloom), fed in two ingredient combinations (all hay or a 50:50 hay and corn grain mixture) and prepared by two forage texture processes (coarsely chopped or finely chopped and pelleted). The 2 x 3 x 2 x 2 factorial arrangement produced 24 diet treatments. These were replicated twice, for a total of 48 lamb digestion trials. In model 1 regression equations, DE was calculated directly from chemical composition of the diet. In model 2, regression equations predicted the percentage of digested nutrient from the chemical contents of the diet and then DE of the diet was calculated as the sum of the gross energy of the digested organic components. Expanded forms of model 1 and model 2 were also developed that included diet factors as qualitative indicator variables to adjust the regression constant and regression coefficients for the diet description. The expanded forms of the equations accounted for significantly more variation in DE than did the simple models and more accurately estimated DE of the diet. Information provided by the diet description proved as useful as chemical analyses for the prediction of digestibility of nutrients. The statistics indicate that, with model 1, neutral detergent fiber and plant cell wall analyses provided as much information for the estimation of DE as did model 2 with the combined information from crude protein, available carbohydrate, total lipid, cellulose and hemicellulose. Regression equations are presented for estimating DE with the most currently analyzed organic components, including linear and curvilinear variables and diet factors that significantly reduce the standard error of the estimate. To estimate De of a diet, the user utilizes the equation that uses the chemical analysis information and diet description most effectively.
Regression Models and Fuzzy Logic Prediction of TBM Penetration Rate
NASA Astrophysics Data System (ADS)
Minh, Vu Trieu; Katushin, Dmitri; Antonov, Maksim; Veinthal, Renno
2017-03-01
This paper presents statistical analyses of rock engineering properties and the measured penetration rate of tunnel boring machine (TBM) based on the data of an actual project. The aim of this study is to analyze the influence of rock engineering properties including uniaxial compressive strength (UCS), Brazilian tensile strength (BTS), rock brittleness index (BI), the distance between planes of weakness (DPW), and the alpha angle (Alpha) between the tunnel axis and the planes of weakness on the TBM rate of penetration (ROP). Four
Staley, James R; Jones, Edmund; Kaptoge, Stephen; Butterworth, Adam S; Sweeting, Michael J; Wood, Angela M; Howson, Joanna M M
2017-06-01
Logistic regression is often used instead of Cox regression to analyse genome-wide association studies (GWAS) of single-nucleotide polymorphisms (SNPs) and disease outcomes with cohort and case-cohort designs, as it is less computationally expensive. Although Cox and logistic regression models have been compared previously in cohort studies, this work does not completely cover the GWAS setting nor extend to the case-cohort study design. Here, we evaluated Cox and logistic regression applied to cohort and case-cohort genetic association studies using simulated data and genetic data from the EPIC-CVD study. In the cohort setting, there was a modest improvement in power to detect SNP-disease associations using Cox regression compared with logistic regression, which increased as the disease incidence increased. In contrast, logistic regression had more power than (Prentice weighted) Cox regression in the case-cohort setting. Logistic regression yielded inflated effect estimates (assuming the hazard ratio is the underlying measure of association) for both study designs, especially for SNPs with greater effect on disease. Given logistic regression is substantially more computationally efficient than Cox regression in both settings, we propose a two-step approach to GWAS in cohort and case-cohort studies. First to analyse all SNPs with logistic regression to identify associated variants below a pre-defined P-value threshold, and second to fit Cox regression (appropriately weighted in case-cohort studies) to those identified SNPs to ensure accurate estimation of association with disease.
Bayesian models for comparative analysis integrating phylogenetic uncertainty.
de Villemereuil, Pierre; Wells, Jessie A; Edwards, Robert D; Blomberg, Simon P
2012-06-28
Uncertainty in comparative analyses can come from at least two sources: a) phylogenetic uncertainty in the tree topology or branch lengths, and b) uncertainty due to intraspecific variation in trait values, either due to measurement error or natural individual variation. Most phylogenetic comparative methods do not account for such uncertainties. Not accounting for these sources of uncertainty leads to false perceptions of precision (confidence intervals will be too narrow) and inflated significance in hypothesis testing (e.g. p-values will be too small). Although there is some application-specific software for fitting Bayesian models accounting for phylogenetic error, more general and flexible software is desirable. We developed models to directly incorporate phylogenetic uncertainty into a range of analyses that biologists commonly perform, using a Bayesian framework and Markov Chain Monte Carlo analyses. We demonstrate applications in linear regression, quantification of phylogenetic signal, and measurement error models. Phylogenetic uncertainty was incorporated by applying a prior distribution for the phylogeny, where this distribution consisted of the posterior tree sets from Bayesian phylogenetic tree estimation programs. The models were analysed using simulated data sets, and applied to a real data set on plant traits, from rainforest plant species in Northern Australia. Analyses were performed using the free and open source software OpenBUGS and JAGS. Incorporating phylogenetic uncertainty through an empirical prior distribution of trees leads to more precise estimation of regression model parameters than using a single consensus tree and enables a more realistic estimation of confidence intervals. In addition, models incorporating measurement errors and/or individual variation, in one or both variables, are easily formulated in the Bayesian framework. We show that BUGS is a useful, flexible general purpose tool for phylogenetic comparative analyses, particularly for modelling in the face of phylogenetic uncertainty and accounting for measurement error or individual variation in explanatory variables. Code for all models is provided in the BUGS model description language.
Bayesian models for comparative analysis integrating phylogenetic uncertainty
2012-01-01
Background Uncertainty in comparative analyses can come from at least two sources: a) phylogenetic uncertainty in the tree topology or branch lengths, and b) uncertainty due to intraspecific variation in trait values, either due to measurement error or natural individual variation. Most phylogenetic comparative methods do not account for such uncertainties. Not accounting for these sources of uncertainty leads to false perceptions of precision (confidence intervals will be too narrow) and inflated significance in hypothesis testing (e.g. p-values will be too small). Although there is some application-specific software for fitting Bayesian models accounting for phylogenetic error, more general and flexible software is desirable. Methods We developed models to directly incorporate phylogenetic uncertainty into a range of analyses that biologists commonly perform, using a Bayesian framework and Markov Chain Monte Carlo analyses. Results We demonstrate applications in linear regression, quantification of phylogenetic signal, and measurement error models. Phylogenetic uncertainty was incorporated by applying a prior distribution for the phylogeny, where this distribution consisted of the posterior tree sets from Bayesian phylogenetic tree estimation programs. The models were analysed using simulated data sets, and applied to a real data set on plant traits, from rainforest plant species in Northern Australia. Analyses were performed using the free and open source software OpenBUGS and JAGS. Conclusions Incorporating phylogenetic uncertainty through an empirical prior distribution of trees leads to more precise estimation of regression model parameters than using a single consensus tree and enables a more realistic estimation of confidence intervals. In addition, models incorporating measurement errors and/or individual variation, in one or both variables, are easily formulated in the Bayesian framework. We show that BUGS is a useful, flexible general purpose tool for phylogenetic comparative analyses, particularly for modelling in the face of phylogenetic uncertainty and accounting for measurement error or individual variation in explanatory variables. Code for all models is provided in the BUGS model description language. PMID:22741602
SOCR Analyses - an Instructional Java Web-based Statistical Analysis Toolkit.
Chu, Annie; Cui, Jenny; Dinov, Ivo D
2009-03-01
The Statistical Online Computational Resource (SOCR) designs web-based tools for educational use in a variety of undergraduate courses (Dinov 2006). Several studies have demonstrated that these resources significantly improve students' motivation and learning experiences (Dinov et al. 2008). SOCR Analyses is a new component that concentrates on data modeling and analysis using parametric and non-parametric techniques supported with graphical model diagnostics. Currently implemented analyses include commonly used models in undergraduate statistics courses like linear models (Simple Linear Regression, Multiple Linear Regression, One-Way and Two-Way ANOVA). In addition, we implemented tests for sample comparisons, such as t-test in the parametric category; and Wilcoxon rank sum test, Kruskal-Wallis test, Friedman's test, in the non-parametric category. SOCR Analyses also include several hypothesis test models, such as Contingency tables, Friedman's test and Fisher's exact test.The code itself is open source (http://socr.googlecode.com/), hoping to contribute to the efforts of the statistical computing community. The code includes functionality for each specific analysis model and it has general utilities that can be applied in various statistical computing tasks. For example, concrete methods with API (Application Programming Interface) have been implemented in statistical summary, least square solutions of general linear models, rank calculations, etc. HTML interfaces, tutorials, source code, activities, and data are freely available via the web (www.SOCR.ucla.edu). Code examples for developers and demos for educators are provided on the SOCR Wiki website.In this article, the pedagogical utilization of the SOCR Analyses is discussed, as well as the underlying design framework. As the SOCR project is on-going and more functions and tools are being added to it, these resources are constantly improved. The reader is strongly encouraged to check the SOCR site for most updated information and newly added models.
Publication bias in obesity treatment trials?
Allison, D B; Faith, M S; Gorman, B S
1996-10-01
The present investigation examined the extent of publication bias (namely the tendency to publish significant findings and file away non-significant findings) within the obesity treatment literature. Quantitative literature synthesis of four published meta-analyses from the obesity treatment literature. Interventions in these studies included pharmacological, educational, child, and couples treatments. To assess publication bias, several regression procedures (for example weighted least-squares, random-effects multi-level modeling, and robust regression methods) were used to regress effect sizes onto their standard errors, or proxies thereof, within each of the four meta-analysis. A significant positive beta weight in these analyses signified publication bias. There was evidence for publication bias within two of the four published meta-analyses, such that reviews of published studies were likely to overestimate clinical efficacy. The lack of evidence for publication bias within the two other meta-analyses might have been due to insufficient statistical power rather than the absence of selection bias. As in other disciplines, publication bias appears to exist in the obesity treatment literature. Suggestions are offered for managing publication bias once identified or reducing its likelihood in the first place.
Murphy, Kevin; Birn, Rasmus M.; Handwerker, Daniel A.; Jones, Tyler B.; Bandettini, Peter A.
2009-01-01
Low-frequency fluctuations in fMRI signal have been used to map several consistent resting state networks in the brain. Using the posterior cingulate cortex as a seed region, functional connectivity analyses have found not only positive correlations in the default mode network but negative correlations in another resting state network related to attentional processes. The interpretation is that the human brain is intrinsically organized into dynamic, anti-correlated functional networks. Global variations of the BOLD signal are often considered nuisance effects and are commonly removed using a general linear model (GLM) technique. This global signal regression method has been shown to introduce negative activation measures in standard fMRI analyses. The topic of this paper is whether such a correction technique could be the cause of anti-correlated resting state networks in functional connectivity analyses. Here we show that, after global signal regression, correlation values to a seed voxel must sum to a negative value. Simulations also show that small phase differences between regions can lead to spurious negative correlation values. A combination breath holding and visual task demonstrates that the relative phase of global and local signals can affect connectivity measures and that, experimentally, global signal regression leads to bell-shaped correlation value distributions, centred on zero. Finally, analyses of negatively correlated networks in resting state data show that global signal regression is most likely the cause of anti-correlations. These results call into question the interpretation of negatively correlated regions in the brain when using global signal regression as an initial processing step. PMID:18976716
Murphy, Kevin; Birn, Rasmus M; Handwerker, Daniel A; Jones, Tyler B; Bandettini, Peter A
2009-02-01
Low-frequency fluctuations in fMRI signal have been used to map several consistent resting state networks in the brain. Using the posterior cingulate cortex as a seed region, functional connectivity analyses have found not only positive correlations in the default mode network but negative correlations in another resting state network related to attentional processes. The interpretation is that the human brain is intrinsically organized into dynamic, anti-correlated functional networks. Global variations of the BOLD signal are often considered nuisance effects and are commonly removed using a general linear model (GLM) technique. This global signal regression method has been shown to introduce negative activation measures in standard fMRI analyses. The topic of this paper is whether such a correction technique could be the cause of anti-correlated resting state networks in functional connectivity analyses. Here we show that, after global signal regression, correlation values to a seed voxel must sum to a negative value. Simulations also show that small phase differences between regions can lead to spurious negative correlation values. A combination breath holding and visual task demonstrates that the relative phase of global and local signals can affect connectivity measures and that, experimentally, global signal regression leads to bell-shaped correlation value distributions, centred on zero. Finally, analyses of negatively correlated networks in resting state data show that global signal regression is most likely the cause of anti-correlations. These results call into question the interpretation of negatively correlated regions in the brain when using global signal regression as an initial processing step.
Mainou, Maria; Madenidou, Anastasia-Vasiliki; Liakos, Aris; Paschos, Paschalis; Karagiannis, Thomas; Bekiari, Eleni; Vlachaki, Efthymia; Wang, Zhen; Murad, Mohammad Hassan; Kumar, Shaji; Tsapas, Apostolos
2017-06-01
We performed a systematic review and meta-regression analysis of randomized control trials to investigate the association between response to initial treatment and survival outcomes in patients with newly diagnosed multiple myeloma (MM). Response outcomes included complete response (CR) and the combined outcome of CR or very good partial response (VGPR), while survival outcomes were overall survival (OS) and progression-free survival (PFS). We used random-effect meta-regression models and conducted sensitivity analyses based on definition of CR and study quality. Seventy-two trials were included in the systematic review, 63 of which contributed data in meta-regression analyses. There was no association between OS and CR in patients without autologous stem cell transplant (ASCT) (regression coefficient: .02, 95% confidence interval [CI] -0.06, 0.10), in patients undergoing ASCT (-.11, 95% CI -0.44, 0.22) and in trials comparing ASCT with non-ASCT patients (.04, 95% CI -0.29, 0.38). Similarly, OS did not correlate with the combined metric of CR or VGPR, and no association was evident between response outcomes and PFS. Sensitivity analyses yielded similar results. This meta-regression analysis suggests that there is no association between conventional response outcomes and survival in patients with newly diagnosed MM. © 2017 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Pütter, Carolin; Pechlivanis, Sonali; Nöthen, Markus M; Jöckel, Karl-Heinz; Wichmann, Heinz-Erich; Scherag, André
2011-01-01
Genome-wide association studies have identified robust associations between single nucleotide polymorphisms and complex traits. As the proportion of phenotypic variance explained is still limited for most of the traits, larger and larger meta-analyses are being conducted to detect additional associations. Here we investigate the impact of the study design and the underlying assumption about the true genetic effect in a bimodal mixture situation on the power to detect associations. We performed simulations of quantitative phenotypes analysed by standard linear regression and dichotomized case-control data sets from the extremes of the quantitative trait analysed by standard logistic regression. Using linear regression, markers with an effect in the extremes of the traits were almost undetectable, whereas analysing extremes by case-control design had superior power even for much smaller sample sizes. Two real data examples are provided to support our theoretical findings and to explore our mixture and parameter assumption. Our findings support the idea to re-analyse the available meta-analysis data sets to detect new loci in the extremes. Moreover, our investigation offers an explanation for discrepant findings when analysing quantitative traits in the general population and in the extremes. Copyright © 2011 S. Karger AG, Basel.
ERIC Educational Resources Information Center
Inbar-Furst, Hagit; Gumpel, Thomas P.
2015-01-01
Questionnaires were given to 392 elementary school teachers to examine help-seeking or help-avoidance in dealing with classroom behavioral problems. Scale validity was examined through a series of exploratory and confirmatory factor analyses. Using a series of multivariate regression analyses and structural equation modeling, we identified…
An Overview of Longitudinal Data Analysis Methods for Neurological Research
Locascio, Joseph J.; Atri, Alireza
2011-01-01
The purpose of this article is to provide a concise, broad and readily accessible overview of longitudinal data analysis methods, aimed to be a practical guide for clinical investigators in neurology. In general, we advise that older, traditional methods, including (1) simple regression of the dependent variable on a time measure, (2) analyzing a single summary subject level number that indexes changes for each subject and (3) a general linear model approach with a fixed-subject effect, should be reserved for quick, simple or preliminary analyses. We advocate the general use of mixed-random and fixed-effect regression models for analyses of most longitudinal clinical studies. Under restrictive situations or to provide validation, we recommend: (1) repeated-measure analysis of covariance (ANCOVA), (2) ANCOVA for two time points, (3) generalized estimating equations and (4) latent growth curve/structural equation models. PMID:22203825
ERIC Educational Resources Information Center
Woolley, Kristin K.
Many researchers are unfamiliar with suppressor variables and how they operate in multiple regression analyses. This paper describes the role suppressor variables play in a multiple regression model and provides practical examples that explain how they can change research results. A variable that when added as another predictor increases the total…
Erosion and soil displacement related to timber harvesting in northwestern California, U.S.A.
R.M. Rice; D.J. Furbish
1984-01-01
The relationship between measures of site disturbance and erosion resulting from timber harvest was studied by regression analyses. None of the 12 regression models developed and tested yielded a coefficient of determination (R2) greater than 0.60. The results indicated that the poor fits to the data were due, in part, to unexplained qualitative...
"Erosion and soil displacement related to timber harvesting in northwestern California, U.S.A."
R. M. Rice; D. J. Furbish
1984-01-01
The relationship between measures of site disturbance and erosion resulting from timber harvest was studied by regression analyses. None of the 12 regression models developed and tested yielded a coefficient of determination (R 2) greater than 0.60. The results indicated that the poor fits to the data were due, in part, to unexplained qualitative differences in...
Neuropsychological tests for predicting cognitive decline in older adults
Baerresen, Kimberly M; Miller, Karen J; Hanson, Eric R; Miller, Justin S; Dye, Richelin V; Hartman, Richard E; Vermeersch, David; Small, Gary W
2015-01-01
Summary Aim To determine neuropsychological tests likely to predict cognitive decline. Methods A sample of nonconverters (n = 106) was compared with those who declined in cognitive status (n = 24). Significant univariate logistic regression prediction models were used to create multivariate logistic regression models to predict decline based on initial neuropsychological testing. Results Rey–Osterrieth Complex Figure Test (RCFT) Retention predicted conversion to mild cognitive impairment (MCI) while baseline Buschke Delay predicted conversion to Alzheimer’s disease (AD). Due to group sample size differences, additional analyses were conducted using a subsample of demographically matched nonconverters. Analyses indicated RCFT Retention predicted conversion to MCI and AD, and Buschke Delay predicted conversion to AD. Conclusion Results suggest RCFT Retention and Buschke Delay may be useful in predicting cognitive decline. PMID:26107318
NASA Astrophysics Data System (ADS)
Masselot, Pierre; Chebana, Fateh; Bélanger, Diane; St-Hilaire, André; Abdous, Belkacem; Gosselin, Pierre; Ouarda, Taha B. M. J.
2018-01-01
In a number of environmental studies, relationships between natural processes are often assessed through regression analyses, using time series data. Such data are often multi-scale and non-stationary, leading to a poor accuracy of the resulting regression models and therefore to results with moderate reliability. To deal with this issue, the present paper introduces the EMD-regression methodology consisting in applying the empirical mode decomposition (EMD) algorithm on data series and then using the resulting components in regression models. The proposed methodology presents a number of advantages. First, it accounts of the issues of non-stationarity associated to the data series. Second, this approach acts as a scan for the relationship between a response variable and the predictors at different time scales, providing new insights about this relationship. To illustrate the proposed methodology it is applied to study the relationship between weather and cardiovascular mortality in Montreal, Canada. The results shed new knowledge concerning the studied relationship. For instance, they show that the humidity can cause excess mortality at the monthly time scale, which is a scale not visible in classical models. A comparison is also conducted with state of the art methods which are the generalized additive models and distributed lag models, both widely used in weather-related health studies. The comparison shows that EMD-regression achieves better prediction performances and provides more details than classical models concerning the relationship.
Multilevel covariance regression with correlated random effects in the mean and variance structure.
Quintero, Adrian; Lesaffre, Emmanuel
2017-09-01
Multivariate regression methods generally assume a constant covariance matrix for the observations. In case a heteroscedastic model is needed, the parametric and nonparametric covariance regression approaches can be restrictive in the literature. We propose a multilevel regression model for the mean and covariance structure, including random intercepts in both components and allowing for correlation between them. The implied conditional covariance function can be different across clusters as a result of the random effect in the variance structure. In addition, allowing for correlation between the random intercepts in the mean and covariance makes the model convenient for skewedly distributed responses. Furthermore, it permits us to analyse directly the relation between the mean response level and the variability in each cluster. Parameter estimation is carried out via Gibbs sampling. We compare the performance of our model to other covariance modelling approaches in a simulation study. Finally, the proposed model is applied to the RN4CAST dataset to identify the variables that impact burnout of nurses in Belgium. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Ohno, Yoshiharu; Fujisawa, Yasuko; Takenaka, Daisuke; Kaminaga, Shigeo; Seki, Shinichiro; Sugihara, Naoki; Yoshikawa, Takeshi
2018-02-01
The objective of this study was to compare the capability of xenon-enhanced area-detector CT (ADCT) performed with a subtraction technique and coregistered 81m Kr-ventilation SPECT/CT for the assessment of pulmonary functional loss and disease severity in smokers. Forty-six consecutive smokers (32 men and 14 women; mean age, 67.0 years) underwent prospective unenhanced and xenon-enhanced ADCT, 81m Kr-ventilation SPECT/CT, and pulmonary function tests. Disease severity was evaluated according to the Global Initiative for Chronic Obstructive Lung Disease (GOLD) classification. CT-based functional lung volume (FLV), the percentage of wall area to total airway area (WA%), and ventilated FLV on xenon-enhanced ADCT and SPECT/CT were calculated for each smoker. All indexes were correlated with percentage of forced expiratory volume in 1 second (%FEV 1 ) using step-wise regression analyses, and univariate and multivariate logistic regression analyses were performed. In addition, the diagnostic accuracy of the proposed model was compared with that of each radiologic index by means of McNemar analysis. Multivariate logistic regression showed that %FEV 1 was significantly affected (r = 0.77, r 2 = 0.59) by two factors: the first factor, ventilated FLV on xenon-enhanced ADCT (p < 0.0001); and the second factor, WA% (p = 0.004). Univariate logistic regression analyses indicated that all indexes significantly affected GOLD classification (p < 0.05). Multivariate logistic regression analyses revealed that ventilated FLV on xenon-enhanced ADCT and CT-based FLV significantly influenced GOLD classification (p < 0.0001). The diagnostic accuracy of the proposed model was significantly higher than that of ventilated FLV on SPECT/CT (p = 0.03) and WA% (p = 0.008). Xenon-enhanced ADCT is more effective than 81m Kr-ventilation SPECT/CT for the assessment of pulmonary functional loss and disease severity.
A Regression Framework for Effect Size Assessments in Longitudinal Modeling of Group Differences
Feingold, Alan
2013-01-01
The use of growth modeling analysis (GMA)--particularly multilevel analysis and latent growth modeling--to test the significance of intervention effects has increased exponentially in prevention science, clinical psychology, and psychiatry over the past 15 years. Model-based effect sizes for differences in means between two independent groups in GMA can be expressed in the same metric (Cohen’s d) commonly used in classical analysis and meta-analysis. This article first reviews conceptual issues regarding calculation of d for findings from GMA and then introduces an integrative framework for effect size assessments that subsumes GMA. The new approach uses the structure of the linear regression model, from which effect sizes for findings from diverse cross-sectional and longitudinal analyses can be calculated with familiar statistics, such as the regression coefficient, the standard deviation of the dependent measure, and study duration. PMID:23956615
Friedrich, Nele; Schneider, Harald J; Spielhagen, Christin; Markus, Marcello Ricardo Paulista; Haring, Robin; Grabe, Hans J; Buchfelder, Michael; Wallaschofski, Henri; Nauck, Matthias
2011-10-01
Prolactin (PRL) is involved in immune regulation and may contribute to an atherogenic phenotype. Previous results on the association of PRL with inflammatory biomarkers have been conflicting and limited by small patient studies. Therefore, we used data from a large population-based sample to assess the cross-sectional associations between serum PRL concentration and high-sensitivity C-reactive protein (hsCRP), fibrinogen, interleukin-6 (IL-6), and white blood cell (WBC) count. From the population-based Study of Health in Pomerania (SHIP), a total of 3744 subjects were available for the present analyses. PRL and inflammatory biomarkers were measured. Linear and logistic regression models adjusted for age, sex, body-mass-index, total cholesterol and glucose were analysed. Multivariable linear regression models revealed a positive association of PRL with WBC. Multivariable logistic regression analyses showed a significant association of PRL with increased IL-6 in non-smokers [highest vs lowest quintile: odds ratio 1·69 (95% confidence interval 1·10-2·58), P = 0·02] and smokers [OR 2·06 (95%-CI 1·10-3·89), P = 0·02]. Similar results were found for WBC in non-smokers [highest vs lowest quintile: OR 2·09 (95%-CI 1·21-3·61), P = 0·01)] but not in smokers. Linear and logistic regression analyses revealed no significant associations of PRL with hsCRP or fibrinogen. Serum PRL concentrations are associated with inflammatory biomarkers including IL-6 and WBC, but not hsCRP or fibrinogen. The suggested role of PRL in inflammation needs further investigation in future prospective studies. © 2011 Blackwell Publishing Ltd.
Pega, Frank; Blakely, Tony; Glymour, M Maria; Carter, Kristie N; Kawachi, Ichiro
2016-02-15
In previous studies, researchers estimated short-term relationships between financial credits and health outcomes using conventional regression analyses, but they did not account for time-varying confounders affected by prior treatment (CAPTs) or the credits' cumulative impacts over time. In this study, we examined the association between total number of years of receiving New Zealand's Family Tax Credit (FTC) and self-rated health (SRH) in 6,900 working-age parents using 7 waves of New Zealand longitudinal data (2002-2009). We conducted conventional linear regression analyses, both unadjusted and adjusted for time-invariant and time-varying confounders measured at baseline, and fitted marginal structural models (MSMs) that more fully adjusted for confounders, including CAPTs. Of all participants, 5.1%-6.8% received the FTC for 1-3 years and 1.8%-3.6% for 4-7 years. In unadjusted and adjusted conventional regression analyses, each additional year of receiving the FTC was associated with 0.033 (95% confidence interval (CI): -0.047, -0.019) and 0.026 (95% CI: -0.041, -0.010) units worse SRH (on a 5-unit scale). In the MSMs, the average causal treatment effect also reflected a small decrease in SRH (unstabilized weights: β = -0.039 unit, 95% CI: -0.058, -0.020; stabilized weights: β = -0.031 unit, 95% CI: -0.050, -0.007). Cumulatively receiving the FTC marginally reduced SRH. Conventional regression analyses and MSMs produced similar estimates, suggesting little bias from CAPTs. © The Author 2016. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Bhamidipati, Ravi Kanth; Syed, Muzeeb; Mullangi, Ramesh; Srinivas, Nuggehally
2018-02-01
1. Dalbavancin, a lipoglycopeptide, is approved for treating gram-positive bacterial infections. Area under plasma concentration versus time curve (AUC inf ) of dalbavancin is a key parameter and AUC inf /MIC ratio is a critical pharmacodynamic marker. 2. Using end of intravenous infusion concentration (i.e. C max ) C max versus AUC inf relationship for dalbavancin was established by regression analyses (i.e. linear, log-log, log-linear and power models) using 21 pairs of subject data. 3. The predictions of the AUC inf were performed using published C max data by application of regression equations. The quotient of observed/predicted values rendered fold difference. The mean absolute error (MAE)/root mean square error (RMSE) and correlation coefficient (r) were used in the assessment. 4. MAE and RMSE values for the various models were comparable. The C max versus AUC inf exhibited excellent correlation (r > 0.9488). The internal data evaluation showed narrow confinement (0.84-1.14-fold difference) with a RMSE < 10.3%. The external data evaluation showed that the models predicted AUC inf with a RMSE of 3.02-27.46% with fold difference largely contained within 0.64-1.48. 5. Regardless of the regression models, a single time point strategy of using C max (i.e. end of 30-min infusion) is amenable as a prospective tool for predicting AUC inf of dalbavancin in patients.
Lee, Chia Ee; Vincent-Chong, Vui King; Ramanathan, Anand; Kallarakkal, Thomas George; Karen-Ng, Lee Peng; Ghani, Wan Maria Nabillah; Rahman, Zainal Ariff Abdul; Ismail, Siti Mazlipah; Abraham, Mannil Thomas; Tay, Keng Kiong; Mustafa, Wan Mahadzir Wan; Cheong, Sok Ching; Zain, Rosnah Binti
2015-01-01
BACKGROUND: Collagen Triple Helix Repeat Containing 1 (CTHRC1) is a protein often found to be over-expressed in various types of human cancers. However, correlation between CTHRC1 expression level with clinico-pathological characteristics and prognosis in oral cancer remains unclear. Therefore, this study aimed to determine mRNA and protein expression of CTHRC1 in oral squamous cell carcinoma (OSCC) and to evaluate the clinical and prognostic impact of CTHRC1 in OSCC. METHODS: In this study, mRNA and protein expression of CTHRC1 in OSCCs were determined by quantitative PCR and immunohistochemistry, respectively. The association between CTHRC1 and clinico-pathological parameters were evaluated by univariate and multivariate binary logistic regression analyses. Correlation between CTHRC1 protein expressions with survival were analysed using Kaplan-Meier and Cox regression models. RESULTS: Current study demonstrated CTHRC1 was significantly overexpressed at the mRNA level in OSCC. Univariate analyses indicated a high-expression of CTHRC1 that was significantly associated with advanced stage pTNM staging, tumour size ≥ 4 cm and positive lymph node metastasis (LNM). However, only positive LNM remained significant after adjusting with other confounder factors in multivariate logistic regression analyses. Kaplan-Meier survival analyses and Cox model demonstrated that patients with high-expression of CTHRC1 protein were associated with poor prognosis and is an independent prognostic factor in OSCC. CONCLUSION: This study indicated that over-expression of CTHRC1 potentially as an independent predictor for positive LNM and poor prognosis in OSCC. PMID:26664254
Quantile regression for the statistical analysis of immunological data with many non-detects.
Eilers, Paul H C; Röder, Esther; Savelkoul, Huub F J; van Wijk, Roy Gerth
2012-07-07
Immunological parameters are hard to measure. A well-known problem is the occurrence of values below the detection limit, the non-detects. Non-detects are a nuisance, because classical statistical analyses, like ANOVA and regression, cannot be applied. The more advanced statistical techniques currently available for the analysis of datasets with non-detects can only be used if a small percentage of the data are non-detects. Quantile regression, a generalization of percentiles to regression models, models the median or higher percentiles and tolerates very high numbers of non-detects. We present a non-technical introduction and illustrate it with an implementation to real data from a clinical trial. We show that by using quantile regression, groups can be compared and that meaningful linear trends can be computed, even if more than half of the data consists of non-detects. Quantile regression is a valuable addition to the statistical methods that can be used for the analysis of immunological datasets with non-detects.
Independent contrasts and PGLS regression estimators are equivalent.
Blomberg, Simon P; Lefevre, James G; Wells, Jessie A; Waterhouse, Mary
2012-05-01
We prove that the slope parameter of the ordinary least squares regression of phylogenetically independent contrasts (PICs) conducted through the origin is identical to the slope parameter of the method of generalized least squares (GLSs) regression under a Brownian motion model of evolution. This equivalence has several implications: 1. Understanding the structure of the linear model for GLS regression provides insight into when and why phylogeny is important in comparative studies. 2. The limitations of the PIC regression analysis are the same as the limitations of the GLS model. In particular, phylogenetic covariance applies only to the response variable in the regression and the explanatory variable should be regarded as fixed. Calculation of PICs for explanatory variables should be treated as a mathematical idiosyncrasy of the PIC regression algorithm. 3. Since the GLS estimator is the best linear unbiased estimator (BLUE), the slope parameter estimated using PICs is also BLUE. 4. If the slope is estimated using different branch lengths for the explanatory and response variables in the PIC algorithm, the estimator is no longer the BLUE, so this is not recommended. Finally, we discuss whether or not and how to accommodate phylogenetic covariance in regression analyses, particularly in relation to the problem of phylogenetic uncertainty. This discussion is from both frequentist and Bayesian perspectives.
Retention of community college students in online courses
NASA Astrophysics Data System (ADS)
Krajewski, Sarah
The issue of attrition in online courses at higher learning institutions remains a high priority in the United States. A recent rapid growth of online courses at community colleges has been instigated by student demand, as they meet the time constraints many nontraditional community college students have as a result of the need to work and care for dependents. Failure in an online course can cause students to become frustrated with the college experience, financially burdened, or to even give up and leave college. Attrition could be avoided by proper guidance of who is best suited for online courses. This study examined factors related to retention (i.e., course completion) and success (i.e., receiving a C or better) in an online biology course at a community college in the Midwest by operationalizing student characteristics (age, race, gender), student skills (whether or not the student met the criteria to be placed in an AFP course), and external factors (Pell recipient, full/part time status, first term) from the persistence model developed by Rovai. Internal factors from this model were not included in this study. Both univariate analyses and multivariate logistic regression were used to analyze the variables. Results suggest that race and Pell recipient were both predictive of course completion on univariate analyses. However, multivariate analyses showed that age, race, academic load and first term were predictive of completion and Pell recipient was no longer predictive. The univariate results for the C or better showed that age, race, Pell recipient, academic load, and meeting AFP criteria were predictive of success. Multivariate analyses showed that only age, race, and Pell recipient were significant predictors of success. Both regression models explained very little (<15%) of the variability within the outcome variables of retention and success. Therefore, although significant predictors were identified for course completion and retention, there are still many factors that remain unaccounted for in both regression models. Further research into the operationalization of Rovai's model, including internal factors, to predict completion and success is necessary.
NASA Astrophysics Data System (ADS)
Madhu, B.; Ashok, N. C.; Balasubramanian, S.
2014-11-01
Multinomial logistic regression analysis was used to develop statistical model that can predict the probability of breast cancer in Southern Karnataka using the breast cancer occurrence data during 2007-2011. Independent socio-economic variables describing the breast cancer occurrence like age, education, occupation, parity, type of family, health insurance coverage, residential locality and socioeconomic status of each case was obtained. The models were developed as follows: i) Spatial visualization of the Urban- rural distribution of breast cancer cases that were obtained from the Bharat Hospital and Institute of Oncology. ii) Socio-economic risk factors describing the breast cancer occurrences were complied for each case. These data were then analysed using multinomial logistic regression analysis in a SPSS statistical software and relations between the occurrence of breast cancer across the socio-economic status and the influence of other socio-economic variables were evaluated and multinomial logistic regression models were constructed. iii) the model that best predicted the occurrence of breast cancer were identified. This multivariate logistic regression model has been entered into a geographic information system and maps showing the predicted probability of breast cancer occurrence in Southern Karnataka was created. This study demonstrates that Multinomial logistic regression is a valuable tool for developing models that predict the probability of breast cancer Occurrence in Southern Karnataka.
NASA Astrophysics Data System (ADS)
de Oliveira, Isadora R. N.; Roque, Jussara V.; Maia, Mariza P.; Stringheta, Paulo C.; Teófilo, Reinaldo F.
2018-04-01
A new method was developed to determine the antioxidant properties of red cabbage extract (Brassica oleracea) by mid (MID) and near (NIR) infrared spectroscopies and partial least squares (PLS) regression. A 70% (v/v) ethanolic extract of red cabbage was concentrated to 9° Brix and further diluted (12 to 100%) in water. The dilutions were used as external standards for the building of PLS models. For the first time, this strategy was applied for building multivariate regression models. Reference analyses and spectral data were obtained from diluted extracts. The determinate properties were total and monomeric anthocyanins, total polyphenols and antioxidant capacity by ABTS (2,2-azino-bis(3-ethyl-benzothiazoline-6-sulfonate)) and DPPH (2,2-diphenyl-1-picrylhydrazyl) methods. Ordered predictors selection (OPS) and genetic algorithm (GA) were used for feature selection before PLS regression (PLS-1). In addition, a PLS-2 regression was applied to all properties simultaneously. PLS-1 models provided more predictive models than did PLS-2 regression. PLS-OPS and PLS-GA models presented excellent prediction results with a correlation coefficient higher than 0.98. However, the best models were obtained using PLS and variable selection with the OPS algorithm and the models based on NIR spectra were considered more predictive for all properties. Then, these models provided a simple, rapid and accurate method for determination of red cabbage extract antioxidant properties and its suitability for use in the food industry.
Perry, Charles A.; Wolock, David M.; Artman, Joshua C.
2004-01-01
Streamflow statistics of flow duration and peak-discharge frequency were estimated for 4,771 individual locations on streams listed on the 1999 Kansas Surface Water Register. These statistics included the flow-duration values of 90, 75, 50, 25, and 10 percent, as well as the mean flow value. Peak-discharge frequency values were estimated for the 2-, 5-, 10-, 25-, 50-, and 100-year floods. Least-squares multiple regression techniques were used, along with Tobit analyses, to develop equations for estimating flow-duration values of 90, 75, 50, 25, and 10 percent and the mean flow for uncontrolled flow stream locations. The contributing-drainage areas of 149 U.S. Geological Survey streamflow-gaging stations in Kansas and parts of surrounding States that had flow uncontrolled by Federal reservoirs and used in the regression analyses ranged from 2.06 to 12,004 square miles. Logarithmic transformations of climatic and basin data were performed to yield the best linear relation for developing equations to compute flow durations and mean flow. In the regression analyses, the significant climatic and basin characteristics, in order of importance, were contributing-drainage area, mean annual precipitation, mean basin permeability, and mean basin slope. The analyses yielded a model standard error of prediction range of 0.43 logarithmic units for the 90-percent duration analysis to 0.15 logarithmic units for the 10-percent duration analysis. The model standard error of prediction was 0.14 logarithmic units for the mean flow. Regression equations used to estimate peak-discharge frequency values were obtained from a previous report, and estimates for the 2-, 5-, 10-, 25-, 50-, and 100-year floods were determined for this report. The regression equations and an interpolation procedure were used to compute flow durations, mean flow, and estimates of peak-discharge frequency for locations along uncontrolled flow streams on the 1999 Kansas Surface Water Register. Flow durations, mean flow, and peak-discharge frequency values determined at available gaging stations were used to interpolate the regression-estimated flows for the stream locations where available. Streamflow statistics for locations that had uncontrolled flow were interpolated using data from gaging stations weighted according to the drainage area and the bias between the regression-estimated and gaged flow information. On controlled reaches of Kansas streams, the streamflow statistics were interpolated between gaging stations using only gaged data weighted by drainage area.
2014-01-01
Background Meta-regression is becoming increasingly used to model study level covariate effects. However this type of statistical analysis presents many difficulties and challenges. Here two methods for calculating confidence intervals for the magnitude of the residual between-study variance in random effects meta-regression models are developed. A further suggestion for calculating credible intervals using informative prior distributions for the residual between-study variance is presented. Methods Two recently proposed and, under the assumptions of the random effects model, exact methods for constructing confidence intervals for the between-study variance in random effects meta-analyses are extended to the meta-regression setting. The use of Generalised Cochran heterogeneity statistics is extended to the meta-regression setting and a Newton-Raphson procedure is developed to implement the Q profile method for meta-analysis and meta-regression. WinBUGS is used to implement informative priors for the residual between-study variance in the context of Bayesian meta-regressions. Results Results are obtained for two contrasting examples, where the first example involves a binary covariate and the second involves a continuous covariate. Intervals for the residual between-study variance are wide for both examples. Conclusions Statistical methods, and R computer software, are available to compute exact confidence intervals for the residual between-study variance under the random effects model for meta-regression. These frequentist methods are almost as easily implemented as their established counterparts for meta-analysis. Bayesian meta-regressions are also easily performed by analysts who are comfortable using WinBUGS. Estimates of the residual between-study variance in random effects meta-regressions should be routinely reported and accompanied by some measure of their uncertainty. Confidence and/or credible intervals are well-suited to this purpose. PMID:25196829
Tzavidis, Nikos; Salvati, Nicola; Schmid, Timo; Flouri, Eirini; Midouhas, Emily
2016-02-01
Multilevel modelling is a popular approach for longitudinal data analysis. Statistical models conventionally target a parameter at the centre of a distribution. However, when the distribution of the data is asymmetric, modelling other location parameters, e.g. percentiles, may be more informative. We present a new approach, M -quantile random-effects regression, for modelling multilevel data. The proposed method is used for modelling location parameters of the distribution of the strengths and difficulties questionnaire scores of children in England who participate in the Millennium Cohort Study. Quantile mixed models are also considered. The analyses offer insights to child psychologists about the differential effects of risk factors on children's outcomes.
Scale of association: hierarchical linear models and the measurement of ecological systems
Sean M. McMahon; Jeffrey M. Diez
2007-01-01
A fundamental challenge to understanding patterns in ecological systems lies in employing methods that can analyse, test and draw inference from measured associations between variables across scales. Hierarchical linear models (HLM) use advanced estimation algorithms to measure regression relationships and variance-covariance parameters in hierarchically structured...
Most analyses of daily time series epidemiology data relate mortality or morbidity counts to PM and other air pollutants by means of single-outcome regression models using multiple predictors, without taking into account the complex statistical structure of the predictor variable...
Using Generalized Additive Models to Analyze Single-Case Designs
ERIC Educational Resources Information Center
Shadish, William; Sullivan, Kristynn
2013-01-01
Many analyses for single-case designs (SCDs)--including nearly all the effect size indicators-- currently assume no trend in the data. Regression and multilevel models allow for trend, but usually test only linear trend and have no principled way of knowing if higher order trends should be represented in the model. This paper shows how Generalized…
Karabatsos, George
2017-02-01
Most of applied statistics involves regression analysis of data. In practice, it is important to specify a regression model that has minimal assumptions which are not violated by data, to ensure that statistical inferences from the model are informative and not misleading. This paper presents a stand-alone and menu-driven software package, Bayesian Regression: Nonparametric and Parametric Models, constructed from MATLAB Compiler. Currently, this package gives the user a choice from 83 Bayesian models for data analysis. They include 47 Bayesian nonparametric (BNP) infinite-mixture regression models; 5 BNP infinite-mixture models for density estimation; and 31 normal random effects models (HLMs), including normal linear models. Each of the 78 regression models handles either a continuous, binary, or ordinal dependent variable, and can handle multi-level (grouped) data. All 83 Bayesian models can handle the analysis of weighted observations (e.g., for meta-analysis), and the analysis of left-censored, right-censored, and/or interval-censored data. Each BNP infinite-mixture model has a mixture distribution assigned one of various BNP prior distributions, including priors defined by either the Dirichlet process, Pitman-Yor process (including the normalized stable process), beta (two-parameter) process, normalized inverse-Gaussian process, geometric weights prior, dependent Dirichlet process, or the dependent infinite-probits prior. The software user can mouse-click to select a Bayesian model and perform data analysis via Markov chain Monte Carlo (MCMC) sampling. After the sampling completes, the software automatically opens text output that reports MCMC-based estimates of the model's posterior distribution and model predictive fit to the data. Additional text and/or graphical output can be generated by mouse-clicking other menu options. This includes output of MCMC convergence analyses, and estimates of the model's posterior predictive distribution, for selected functionals and values of covariates. The software is illustrated through the BNP regression analysis of real data.
Analyses of non-fatal accidents in an opencast mine by logistic regression model - a case study.
Onder, Seyhan; Mutlu, Mert
2017-09-01
Accidents cause major damage for both workers and enterprises in the mining industry. To reduce the number of occupational accidents, these incidents should be properly registered and carefully analysed. This study efficiently examines the Aegean Lignite Enterprise (ELI) of Turkish Coal Enterprises (TKI) in Soma between 2006 and 2011, and opencast coal mine occupational accident records were used for statistical analyses. A total of 231 occupational accidents were analysed for this study. The accident records were categorized into seven groups: area, reason, occupation, part of body, age, shift hour and lost days. The SPSS package program was used in this study for logistic regression analyses, which predicted the probability of accidents resulting in greater or less than 3 lost workdays for non-fatal injuries. Social facilities-area of surface installations, workshops and opencast mining areas are the areas with the highest probability for accidents with greater than 3 lost workdays for non-fatal injuries, while the reasons with the highest probability for these types of accidents are transporting and manual handling. Additionally, the model was tested for such reported accidents that occurred in 2012 for the ELI in Soma and estimated the probability of exposure to accidents with lost workdays correctly by 70%.
Empirical analyses of plant-climate relationships for the western United States
Gerald E. Rehfeldt; Nicholas L. Crookston; Marcus V. Warwell; Jeffrey S. Evans
2006-01-01
The Random Forests multiple-regression tree was used to model climate profiles of 25 biotic communities of the western United States and nine of their constituent species. Analyses of the communities were based on a gridded sample of ca. 140,000 points, while those for the species used presence-absence data from ca. 120,000 locations. Independent variables included 35...
ERIC Educational Resources Information Center
Petty, Barrett Wade McCoy
2015-01-01
The study examined factors that predicted the completion of programs of study at Arkansas institutions of higher education for African American males. Astin's (1993a) Input-Environment-Output (I-E-O) Model was used as the theoretical foundation. Descriptive analyses and hierarchical logistic regression analyses were performed on the data. The…
Drivers willingness to pay progressive rate for street parking.
DOT National Transportation Integrated Search
2015-01-01
This study finds willingness to pay and price elasticity for on-street parking demand using stated : preference data obtained from 238 respondents. Descriptive, statistical and economic analyses including : regression, generalized linear model, and f...
Lu, Lee-Jane W.; Nishino, Thomas K.; Khamapirad, Tuenchit; Grady, James J; Leonard, Morton H.; Brunder, Donald G.
2009-01-01
Breast density (the percentage of fibroglandular tissue in the breast) has been suggested to be a useful surrogate marker for breast cancer risk. It is conventionally measured using screen-film mammographic images by a labor intensive histogram segmentation method (HSM). We have adapted and modified the HSM for measuring breast density from raw digital mammograms acquired by full-field digital mammography. Multiple regression model analyses showed that many of the instrument parameters for acquiring the screening mammograms (e.g. breast compression thickness, radiological thickness, radiation dose, compression force, etc) and image pixel intensity statistics of the imaged breasts were strong predictors of the observed threshold values (model R2=0.93) and %density (R2=0.84). The intra-class correlation coefficient of the %-density for duplicate images was estimated to be 0.80, using the regression model-derived threshold values, and 0.94 if estimated directly from the parameter estimates of the %-density prediction regression model. Therefore, with additional research, these mathematical models could be used to compute breast density objectively, automatically bypassing the HSM step, and could greatly facilitate breast cancer research studies. PMID:17671343
Piao, Hui-Hong; He, Jiajia; Zhang, Keqin; Tang, Zihui
2015-01-01
Our research aims to investigate the associations between education level and osteoporosis (OP) in Chinese postmenopausal women. A large-scale, community-based, cross-sectional study was conducted to examine the associations between education level and OP. A self-reported questionnaire was used to access the demographical information and medical history of the participants. A total of 1905 postmenopausal women were available for data analysis in this study. Multiple regression models controlling for confounding factors to include education level were performed to investigate the relationship with OP. The prevalence of OP was 28.29% in our study sample. Multivariate linear regression analyses adjusted for relevant potential confounding factors detected significant associations between education level and T-score (β = 0.025, P-value = 0.095, 95% CI: -0.004-0.055 for model 1; and β = 0.092, P-value = 0.032, 95% CI: 0.008-0.175 for model 2). Multivariate logistic regression analyses detected significant associations between education level and OP in model 1 (P-value = 0.070 for model 1, Table 5), while no significant associations was reported in model 2 (P value = 0.131). In participants with high education levels, the OR for OP was 0.914 (95% CI: 0.830-1.007). The findings indicated that education level was independently and significantly associated with OP. The prevalence of OP was more frequent in Chinese postmenopausal women with low educational status.
Adjusting for Confounding in Early Postlaunch Settings: Going Beyond Logistic Regression Models.
Schmidt, Amand F; Klungel, Olaf H; Groenwold, Rolf H H
2016-01-01
Postlaunch data on medical treatments can be analyzed to explore adverse events or relative effectiveness in real-life settings. These analyses are often complicated by the number of potential confounders and the possibility of model misspecification. We conducted a simulation study to compare the performance of logistic regression, propensity score, disease risk score, and stabilized inverse probability weighting methods to adjust for confounding. Model misspecification was induced in the independent derivation dataset. We evaluated performance using relative bias confidence interval coverage of the true effect, among other metrics. At low events per coefficient (1.0 and 0.5), the logistic regression estimates had a large relative bias (greater than -100%). Bias of the disease risk score estimates was at most 13.48% and 18.83%. For the propensity score model, this was 8.74% and >100%, respectively. At events per coefficient of 1.0 and 0.5, inverse probability weighting frequently failed or reduced to a crude regression, resulting in biases of -8.49% and 24.55%. Coverage of logistic regression estimates became less than the nominal level at events per coefficient ≤5. For the disease risk score, inverse probability weighting, and propensity score, coverage became less than nominal at events per coefficient ≤2.5, ≤1.0, and ≤1.0, respectively. Bias of misspecified disease risk score models was 16.55%. In settings with low events/exposed subjects per coefficient, disease risk score methods can be useful alternatives to logistic regression models, especially when propensity score models cannot be used. Despite better performance of disease risk score methods than logistic regression and propensity score models in small events per coefficient settings, bias, and coverage still deviated from nominal.
Alexeeff, Stacey E.; Schwartz, Joel; Kloog, Itai; Chudnovsky, Alexandra; Koutrakis, Petros; Coull, Brent A.
2016-01-01
Many epidemiological studies use predicted air pollution exposures as surrogates for true air pollution levels. These predicted exposures contain exposure measurement error, yet simulation studies have typically found negligible bias in resulting health effect estimates. However, previous studies typically assumed a statistical spatial model for air pollution exposure, which may be oversimplified. We address this shortcoming by assuming a realistic, complex exposure surface derived from fine-scale (1km x 1km) remote-sensing satellite data. Using simulation, we evaluate the accuracy of epidemiological health effect estimates in linear and logistic regression when using spatial air pollution predictions from kriging and land use regression models. We examined chronic (long-term) and acute (short-term) exposure to air pollution. Results varied substantially across different scenarios. Exposure models with low out-of-sample R2 yielded severe biases in the health effect estimates of some models, ranging from 60% upward bias to 70% downward bias. One land use regression exposure model with greater than 0.9 out-of-sample R2 yielded upward biases up to 13% for acute health effect estimates. Almost all models drastically underestimated the standard errors. Land use regression models performed better in chronic effects simulations. These results can help researchers when interpreting health effect estimates in these types of studies. PMID:24896768
NASA Astrophysics Data System (ADS)
Mfumu Kihumba, Antoine; Ndembo Longo, Jean; Vanclooster, Marnik
2016-03-01
A multivariate statistical modelling approach was applied to explain the anthropogenic pressure of nitrate pollution on the Kinshasa groundwater body (Democratic Republic of Congo). Multiple regression and regression tree models were compared and used to identify major environmental factors that control the groundwater nitrate concentration in this region. The analyses were made in terms of physical attributes related to the topography, land use, geology and hydrogeology in the capture zone of different groundwater sampling stations. For the nitrate data, groundwater datasets from two different surveys were used. The statistical models identified the topography, the residential area, the service land (cemetery), and the surface-water land-use classes as major factors explaining nitrate occurrence in the groundwater. Also, groundwater nitrate pollution depends not on one single factor but on the combined influence of factors representing nitrogen loading sources and aquifer susceptibility characteristics. The groundwater nitrate pressure was better predicted with the regression tree model than with the multiple regression model. Furthermore, the results elucidated the sensitivity of the model performance towards the method of delineation of the capture zones. For pollution modelling at the monitoring points, therefore, it is better to identify capture-zone shapes based on a conceptual hydrogeological model rather than to adopt arbitrary circular capture zones.
Estimating parasitic sea lamprey abundance in Lake Huron from heterogenous data sources
Young, Robert J.; Jones, Michael L.; Bence, James R.; McDonald, Rodney B.; Mullett, Katherine M.; Bergstedt, Roger A.
2003-01-01
The Great Lakes Fishery Commission uses time series of transformer, parasitic, and spawning population estimates to evaluate the effectiveness of its sea lamprey (Petromyzon marinus) control program. This study used an inverse variance weighting method to integrate Lake Huron sea lamprey population estimates derived from two estimation procedures: 1) prediction of the lake-wide spawning population from a regression model based on stream size and, 2) whole-lake mark and recapture estimates. In addition, we used a re-sampling procedure to evaluate the effect of trading off sampling effort between the regression and mark-recapture models. Population estimates derived from the regression model ranged from 132,000 to 377,000 while mark-recapture estimates of marked recently metamorphosed juveniles and parasitic sea lampreys ranged from 536,000 to 634,000 and 484,000 to 1,608,000, respectively. The precision of the estimates varied greatly among estimation procedures and years. The integrated estimate of the mark-recapture and spawner regression procedures ranged from 252,000 to 702,000 transformers. The re-sampling procedure indicated that the regression model is more sensitive to reduction in sampling effort than the mark-recapture model. Reliance on either the regression or mark-recapture model alone could produce misleading estimates of abundance of sea lampreys and the effect of the control program on sea lamprey abundance. These analyses indicate that the precision of the lakewide population estimate can be maximized by re-allocating sampling effort from marking sea lampreys to trapping additional streams.
SOCR Analyses – an Instructional Java Web-based Statistical Analysis Toolkit
Chu, Annie; Cui, Jenny; Dinov, Ivo D.
2011-01-01
The Statistical Online Computational Resource (SOCR) designs web-based tools for educational use in a variety of undergraduate courses (Dinov 2006). Several studies have demonstrated that these resources significantly improve students' motivation and learning experiences (Dinov et al. 2008). SOCR Analyses is a new component that concentrates on data modeling and analysis using parametric and non-parametric techniques supported with graphical model diagnostics. Currently implemented analyses include commonly used models in undergraduate statistics courses like linear models (Simple Linear Regression, Multiple Linear Regression, One-Way and Two-Way ANOVA). In addition, we implemented tests for sample comparisons, such as t-test in the parametric category; and Wilcoxon rank sum test, Kruskal-Wallis test, Friedman's test, in the non-parametric category. SOCR Analyses also include several hypothesis test models, such as Contingency tables, Friedman's test and Fisher's exact test. The code itself is open source (http://socr.googlecode.com/), hoping to contribute to the efforts of the statistical computing community. The code includes functionality for each specific analysis model and it has general utilities that can be applied in various statistical computing tasks. For example, concrete methods with API (Application Programming Interface) have been implemented in statistical summary, least square solutions of general linear models, rank calculations, etc. HTML interfaces, tutorials, source code, activities, and data are freely available via the web (www.SOCR.ucla.edu). Code examples for developers and demos for educators are provided on the SOCR Wiki website. In this article, the pedagogical utilization of the SOCR Analyses is discussed, as well as the underlying design framework. As the SOCR project is on-going and more functions and tools are being added to it, these resources are constantly improved. The reader is strongly encouraged to check the SOCR site for most updated information and newly added models. PMID:21546994
Teng, Ju-Hsi; Lin, Kuan-Chia; Ho, Bin-Shenq
2007-10-01
A community-based aboriginal study was conducted and analysed to explore the application of classification tree and logistic regression. A total of 1066 aboriginal residents in Yilan County were screened during 2003-2004. The independent variables include demographic characteristics, physical examinations, geographic location, health behaviours, dietary habits and family hereditary diseases history. Risk factors of cardiovascular diseases were selected as the dependent variables in further analysis. The completion rate for heath interview is 88.9%. The classification tree results find that if body mass index is higher than 25.72 kg m(-2) and the age is above 51 years, the predicted probability for number of cardiovascular risk factors > or =3 is 73.6% and the population is 322. If body mass index is higher than 26.35 kg m(-2) and geographical latitude of the village is lower than 24 degrees 22.8', the predicted probability for number of cardiovascular risk factors > or =4 is 60.8% and the population is 74. As the logistic regression results indicate that body mass index, drinking habit and menopause are the top three significant independent variables. The classification tree model specifically shows the discrimination paths and interactions between the risk groups. The logistic regression model presents and analyses the statistical independent factors of cardiovascular risks. Applying both models to specific situations will provide a different angle for the design and management of future health intervention plans after community-based study.
Palomo, M J; Quintanilla, R; Izquierdo, M D; Mogas, T; Paramio, M T
2016-12-01
This work analyses the changes that caprine spermatozoa undergo during in vitro fertilization (IVF) of in vitro matured prepubertal goat oocytes and their relationship with IVF outcome, in order to obtain an effective model that allows prediction of in vitro fertility on the basis of semen assessment. The evolution of several sperm parameters (motility, viability and acrosomal integrity) during IVF and their relationship with three IVF outcome criteria (total penetration, normal penetration and cleavage rates) were studied in a total of 56 IVF replicates. Moderate correlation coefficients between some sperm parameters and IVF outcome were observed. In addition, stepwise multiple regression analyses were conducted that considered three grouping of sperm parameters as potential explanatory variables of the three IVF outcome criteria. The proportion of IVF outcome variation that can be explained by the fitted models ranged from 0.62 to 0.86, depending upon the trait analysed and the variables considered. Seven out of 32 sperm parameters were selected as partial covariates in at least one of the nine multiple regression models. Among these, progressive sperm motility assessed immediately after swim-up, the percentage of dead sperm with intact acrosome and the incidence of acrosome reaction both determined just before the gamete co-culture, and finally the proportion of viable spermatozoa at 17 h post-insemination were the most frequently selected sperm parameters. Nevertheless, the predictive ability of these models must be confirmed in a larger sample size experiment.
NASA Astrophysics Data System (ADS)
Reis, D. S.; Stedinger, J. R.; Martins, E. S.
2005-10-01
This paper develops a Bayesian approach to analysis of a generalized least squares (GLS) regression model for regional analyses of hydrologic data. The new approach allows computation of the posterior distributions of the parameters and the model error variance using a quasi-analytic approach. Two regional skew estimation studies illustrate the value of the Bayesian GLS approach for regional statistical analysis of a shape parameter and demonstrate that regional skew models can be relatively precise with effective record lengths in excess of 60 years. With Bayesian GLS the marginal posterior distribution of the model error variance and the corresponding mean and variance of the parameters can be computed directly, thereby providing a simple but important extension of the regional GLS regression procedures popularized by Tasker and Stedinger (1989), which is sensitive to the likely values of the model error variance when it is small relative to the sampling error in the at-site estimator.
ERIC Educational Resources Information Center
Mitchell, James K.; Carter, William E.
2000-01-01
Describes using a computer statistical software package called Minitab to model the sensitivity of several microbes to the disinfectant NaOCl (Clorox') using the Kirby-Bauer technique. Each group of students collects data from one microbe, conducts regression analyses, then chooses the best-fit model based on the highest r-values obtained.…
Regression Is a Univariate General Linear Model Subsuming Other Parametric Methods as Special Cases.
ERIC Educational Resources Information Center
Vidal, Sherry
Although the concept of the general linear model (GLM) has existed since the 1960s, other univariate analyses such as the t-test and the analysis of variance models have remained popular. The GLM produces an equation that minimizes the mean differences of independent variables as they are related to a dependent variable. From a computer printout…
Zwetsloot, P P; Kouwenberg, L H J A; Sena, E S; Eding, J E; den Ruijter, H M; Sluijter, J P G; Pasterkamp, G; Doevendans, P A; Hoefer, I E; Chamuleau, S A J; van Hout, G P J; Jansen Of Lorkeers, S J
2017-10-27
Large animal models are essential for the development of novel therapeutics for myocardial infarction. To optimize translation, we need to assess the effect of experimental design on disease outcome and model experimental design to resemble the clinical course of MI. The aim of this study is therefore to systematically investigate how experimental decisions affect outcome measurements in large animal MI models. We used control animal-data from two independent meta-analyses of large animal MI models. All variables of interest were pre-defined. We performed univariable and multivariable meta-regression to analyze whether these variables influenced infarct size and ejection fraction. Our analyses incorporated 246 relevant studies. Multivariable meta-regression revealed that infarct size and cardiac function were influenced independently by choice of species, sex, co-medication, occlusion type, occluded vessel, quantification method, ischemia duration and follow-up duration. We provide strong systematic evidence that commonly used endpoints significantly depend on study design and biological variation. This makes direct comparison of different study-results difficult and calls for standardized models. Researchers should take this into account when designing large animal studies to most closely mimic the clinical course of MI and enable translational success.
Revisiting the Principle of Relative Constancy: Consumer Mass Media Expenditures in Belgium.
ERIC Educational Resources Information Center
Dupagne, Michel; Green, R. Jeffery
1996-01-01
Proposes two new econometric models for testing the principle of relative constancy (PRC). Reports on regression and cointegration analyses conducted with Belgian mass media expenditure data from 1953-91. Suggests that alternative mass media expenditure models should be developed because PRC lacks of economic foundation and sound empirical…
Developing Models to Forcast Sales of Natural Christmas Trees
Lawrence D. Garrett; Thomas H. Pendleton
1977-01-01
A study of practices for marketing Christmas trees in Winston-Salem, North Carolina, and Denver, Colorado, revealed that such factors as retail lot competition, tree price, consumer traffic, and consumer income were very important in determining a particular retailer's sales. Analyses of 4 years of market data were used in developing regression models for...
NASA Astrophysics Data System (ADS)
Duman, T. Y.; Can, T.; Gokceoglu, C.; Nefeslioglu, H. A.; Sonmez, H.
2006-11-01
As a result of industrialization, throughout the world, cities have been growing rapidly for the last century. One typical example of these growing cities is Istanbul, the population of which is over 10 million. Due to rapid urbanization, new areas suitable for settlement and engineering structures are necessary. The Cekmece area located west of the Istanbul metropolitan area is studied, because the landslide activity is extensive in this area. The purpose of this study is to develop a model that can be used to characterize landslide susceptibility in map form using logistic regression analysis of an extensive landslide database. A database of landslide activity was constructed using both aerial-photography and field studies. About 19.2% of the selected study area is covered by deep-seated landslides. The landslides that occur in the area are primarily located in sandstones with interbedded permeable and impermeable layers such as claystone, siltstone and mudstone. About 31.95% of the total landslide area is located at this unit. To apply logistic regression analyses, a data matrix including 37 variables was constructed. The variables used in the forwards stepwise analyses are different measures of slope, aspect, elevation, stream power index (SPI), plan curvature, profile curvature, geology, geomorphology and relative permeability of lithological units. A total of 25 variables were identified as exerting strong influence on landslide occurrence, and included by the logistic regression equation. Wald statistics values indicate that lithology, SPI and slope are more important than the other parameters in the equation. Beta coefficients of the 25 variables included the logistic regression equation provide a model for landslide susceptibility in the Cekmece area. This model is used to generate a landslide susceptibility map that correctly classified 83.8% of the landslide-prone areas.
Brügemann, K; Gernand, E; von Borstel, U U; König, S
2011-08-01
Data used in the present study included 1,095,980 first-lactation test-day records for protein yield of 154,880 Holstein cows housed on 196 large-scale dairy farms in Germany. Data were recorded between 2002 and 2009 and merged with meteorological data from public weather stations. The maximum distance between each farm and its corresponding weather station was 50 km. Hourly temperature-humidity indexes (THI) were calculated using the mean of hourly measurements of dry bulb temperature and relative humidity. On the phenotypic scale, an increase in THI was generally associated with a decrease in daily protein yield. For genetic analyses, a random regression model was applied using time-dependent (d in milk, DIM) and THI-dependent covariates. Additive genetic and permanent environmental effects were fitted with this random regression model and Legendre polynomials of order 3 for DIM and THI. In addition, the fixed curve was modeled with Legendre polynomials of order 3. Heterogeneous residuals were fitted by dividing DIM into 5 classes, and by dividing THI into 4 classes, resulting in 20 different classes. Additive genetic variances for daily protein yield decreased with increasing degrees of heat stress and were lowest at the beginning of lactation and at extreme THI. Due to higher additive genetic variances, slightly higher permanent environment variances, and similar residual variances, heritabilities were highest for low THI in combination with DIM at the end of lactation. Genetic correlations among individual values for THI were generally >0.90. These trends from the complex random regression model were verified by applying relatively simple bivariate animal models for protein yield measured in 2 THI environments; that is, defining a THI value of 60 as a threshold. These high correlations indicate the absence of any substantial genotype × environment interaction for protein yield. However, heritabilities and additive genetic variances from the random regression model tended to be slightly higher in the THI range corresponding to cows' comfort zone. Selecting such superior environments for progeny testing can contribute to an accurate genetic differentiation among selection candidates. Copyright © 2011 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Estimates of Median Flows for Streams on the 1999 Kansas Surface Water Register
Perry, Charles A.; Wolock, David M.; Artman, Joshua C.
2004-01-01
The Kansas State Legislature, by enacting Kansas Statute KSA 82a?2001 et. seq., mandated the criteria for determining which Kansas stream segments would be subject to classification by the State. One criterion for the selection as a classified stream segment is based on the statistic of median flow being equal to or greater than 1 cubic foot per second. As specified by KSA 82a?2001 et. seq., median flows were determined from U.S. Geological Survey streamflow-gaging-station data by using the most-recent 10 years of gaged data (KSA) for each streamflow-gaging station. Median flows also were determined by using gaged data from the entire period of record (all-available hydrology, AAH). Least-squares multiple regression techniques were used, along with Tobit analyses, to develop equations for estimating median flows for uncontrolled stream segments. The drainage area of the gaging stations on uncontrolled stream segments used in the regression analyses ranged from 2.06 to 12,004 square miles. A logarithmic transformation of the data was needed to develop the best linear relation for computing median flows. In the regression analyses, the significant climatic and basin characteristics, in order of importance, were drainage area, mean annual precipitation, mean basin permeability, and mean basin slope. Tobit analyses of KSA data yielded a model standard error of prediction of 0.285 logarithmic units, and the best equations using Tobit analyses of AAH data had a model standard error of prediction of 0.250 logarithmic units. These regression equations and an interpolation procedure were used to compute median flows for the uncontrolled stream segments on the 1999 Kansas Surface Water Register. Measured median flows from gaging stations were incorporated into the regression-estimated median flows along the stream segments where available. The segments that were uncontrolled were interpolated using gaged data weighted according to the drainage area and the bias between the regression-estimated and gaged flow information. On controlled segments of Kansas streams, the median flow information was interpolated between gaging stations using only gaged data weighted by drainage area. Of the 2,232 total stream segments on the Kansas Surface Water Register, 34.5 percent of the segments had an estimated median streamflow of less than 1 cubic foot per second when the KSA analysis was used. When the AAH analysis was used, 36.2 percent of the segments had an estimated median streamflow of less than 1 cubic foot per second. This report supercedes U.S. Geological Survey Water-Resources Investigations Report 02?4292.
Rugulies, Reiner; Martin, Marie H T; Garde, Anne Helene; Persson, Roger; Albertsen, Karen
2012-03-01
Exposure to deadlines at work is increasing in several countries and may affect health. We aimed to investigate cross-sectional and longitudinal associations between frequency of difficult deadlines at work and sleep quality. Study participants were knowledge workers, drawn from a representative sample of Danish employees who responded to a baseline questionnaire in 2006 (n = 363) and a follow-up questionnaire in 2007 (n = 302). Frequency of difficult deadlines was measured by self-report and categorized into low, intermediate, and high. Sleep quality was measured with a Total Sleep Quality Score and two indexes (Awakening Index and Disturbed Sleep Index) derived from the Karolinska Sleep Questionnaire. Analyses on the association between frequency of deadlines and sleep quality scores were conducted with multiple linear regression models, adjusted for potential confounders. In addition, we used multiple logistic regression models to analyze whether frequency of deadlines at baseline predicted caseness of sleep problems at follow-up among participants free of sleep problems at baseline. Frequent deadlines were cross-sectionally and longitudinally associated with poorer sleep quality on all three sleep quality measures. Associations in the longitudinal analyses were greatly attenuated when we adjusted for baseline sleep quality. The logistic regression analyses showed that frequent deadlines at baseline were associated with elevated odds ratios for caseness of sleep problems at follow-up, however, confidence intervals were wide in these analyses. Frequent deadlines at work were associated with poorer sleep quality among Danish knowledge workers. We recommend investigating the relation between deadlines and health endpoints in large-scale epidemiologic studies. Copyright © 2011 Wiley Periodicals, Inc.
Time series regression model for infectious disease and weather.
Imai, Chisato; Armstrong, Ben; Chalabi, Zaid; Mangtani, Punam; Hashizume, Masahiro
2015-10-01
Time series regression has been developed and long used to evaluate the short-term associations of air pollution and weather with mortality or morbidity of non-infectious diseases. The application of the regression approaches from this tradition to infectious diseases, however, is less well explored and raises some new issues. We discuss and present potential solutions for five issues often arising in such analyses: changes in immune population, strong autocorrelations, a wide range of plausible lag structures and association patterns, seasonality adjustments, and large overdispersion. The potential approaches are illustrated with datasets of cholera cases and rainfall from Bangladesh and influenza and temperature in Tokyo. Though this article focuses on the application of the traditional time series regression to infectious diseases and weather factors, we also briefly introduce alternative approaches, including mathematical modeling, wavelet analysis, and autoregressive integrated moving average (ARIMA) models. Modifications proposed to standard time series regression practice include using sums of past cases as proxies for the immune population, and using the logarithm of lagged disease counts to control autocorrelation due to true contagion, both of which are motivated from "susceptible-infectious-recovered" (SIR) models. The complexity of lag structures and association patterns can often be informed by biological mechanisms and explored by using distributed lag non-linear models. For overdispersed models, alternative distribution models such as quasi-Poisson and negative binomial should be considered. Time series regression can be used to investigate dependence of infectious diseases on weather, but may need modifying to allow for features specific to this context. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.
Scheerman, Janneke F M; van Empelen, Pepijn; van Loveren, Cor; Pakpour, Amir H; van Meijel, Berno; Gholami, Maryam; Mierzaie, Zaher; van den Braak, Matheus C T; Verrips, Gijsbert H W
2017-11-01
The Health Action Process Approach (HAPA) model addresses health behaviours, but it has never been applied to model adolescents' oral hygiene behaviour during fixed orthodontic treatment. This study aimed to apply the HAPA model to explain adolescents' oral hygiene behaviour and dental plaque during orthodontic treatment with fixed appliances. In this cross-sectional study, 116 adolescents with fixed appliances from an orthodontic clinic situated in Almere (the Netherlands) completed a questionnaire assessing oral health behaviours and the psychosocial factors of the HAPA model. Linear regression analyses were performed to examine the factors associated with dental plaque, toothbrushing, and the use of a proxy brush. Stepwise regression analysis showed that lower amounts of plaque were significantly associated with higher frequency of the use of a proxy brush (R 2 = 45%), higher intention of the use of a proxy brush (R 2 = 5%), female gender (R 2 = 2%), and older age (R 2 = 2%). The multiple regression analyses revealed that higher action self-efficacy, intention, maintenance self-efficacy, and a higher education were significantly associated with the use of a proxy brush (R 2 = 45%). Decreased levels of dental plaque are mainly associated with increased use of a proxy brush that is subsequently associated with a higher intention and self-efficacy to use the proxy brush. © 2017 BSPD, IAPD and John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Fridman, M; Hodgkins, P S; Kahle, J S; Erder, M H
2015-06-01
There are few approved therapies for adults with attention-deficit/hyperactivity disorder (ADHD) in Europe. Lisdexamfetamine (LDX) is an effective treatment for ADHD; however, no clinical trials examining the efficacy of LDX specifically in European adults have been conducted. Therefore, to estimate the efficacy of LDX in European adults we performed a meta-regression of existing clinical data. A systematic review identified US- and Europe-based randomized efficacy trials of LDX, atomoxetine (ATX), or osmotic-release oral system methylphenidate (OROS-MPH) in children/adolescents and adults. A meta-regression model was then fitted to the published/calculated effect sizes (Cohen's d) using medication, geographical location, and age group as predictors. The LDX effect size in European adults was extrapolated from the fitted model. Sensitivity analyses performed included using adult-only studies and adding studies with placebo designs other than a standard pill-placebo design. Twenty-two of 2832 identified articles met inclusion criteria. The model-estimated effect size of LDX for European adults was 1.070 (95% confidence interval: 0.738, 1.401), larger than the 0.8 threshold for large effect sizes. The overall model fit was adequate (80%) and stable in the sensitivity analyses. This model predicts that LDX may have a large treatment effect size in European adults with ADHD. Copyright © 2015 Elsevier Masson SAS. All rights reserved.
NASA Astrophysics Data System (ADS)
Wang, Hongjin; Hsieh, Sheng-Jen; Peng, Bo; Zhou, Xunfei
2016-07-01
A method without requirements on knowledge about thermal properties of coatings or those of substrates will be interested in the industrial application. Supervised machine learning regressions may provide possible solution to the problem. This paper compares the performances of two regression models (artificial neural networks (ANN) and support vector machines for regression (SVM)) with respect to coating thickness estimations made based on surface temperature increments collected via time resolved thermography. We describe SVM roles in coating thickness prediction. Non-dimensional analyses are conducted to illustrate the effects of coating thicknesses and various factors on surface temperature increments. It's theoretically possible to correlate coating thickness with surface increment. Based on the analyses, the laser power is selected in such a way: during the heating, the temperature increment is high enough to determine the coating thickness variance but low enough to avoid surface melting. Sixty-one pain-coated samples with coating thicknesses varying from 63.5 μm to 571 μm are used to train models. Hyper-parameters of the models are optimized by 10-folder cross validation. Another 28 sets of data are then collected to test the performance of the three methods. The study shows that SVM can provide reliable predictions of unknown data, due to its deterministic characteristics, and it works well when used for a small input data group. The SVM model generates more accurate coating thickness estimates than the ANN model.
Geodesic least squares regression for scaling studies in magnetic confinement fusion
DOE Office of Scientific and Technical Information (OSTI.GOV)
Verdoolaege, Geert
In regression analyses for deriving scaling laws that occur in various scientific disciplines, usually standard regression methods have been applied, of which ordinary least squares (OLS) is the most popular. However, concerns have been raised with respect to several assumptions underlying OLS in its application to scaling laws. We here discuss a new regression method that is robust in the presence of significant uncertainty on both the data and the regression model. The method, which we call geodesic least squares regression (GLS), is based on minimization of the Rao geodesic distance on a probabilistic manifold. We demonstrate the superiority ofmore » the method using synthetic data and we present an application to the scaling law for the power threshold for the transition to the high confinement regime in magnetic confinement fusion devices.« less
Pragmatic estimation of a spatio-temporal air quality model with irregular monitoring data
NASA Astrophysics Data System (ADS)
Sampson, Paul D.; Szpiro, Adam A.; Sheppard, Lianne; Lindström, Johan; Kaufman, Joel D.
2011-11-01
Statistical analyses of health effects of air pollution have increasingly used GIS-based covariates for prediction of ambient air quality in "land use" regression models. More recently these spatial regression models have accounted for spatial correlation structure in combining monitoring data with land use covariates. We present a flexible spatio-temporal modeling framework and pragmatic, multi-step estimation procedure that accommodates essentially arbitrary patterns of missing data with respect to an ideally complete space by time matrix of observations on a network of monitoring sites. The methodology incorporates a model for smooth temporal trends with coefficients varying in space according to Partial Least Squares regressions on a large set of geographic covariates and nonstationary modeling of spatio-temporal residuals from these regressions. This work was developed to provide spatial point predictions of PM 2.5 concentrations for the Multi-Ethnic Study of Atherosclerosis and Air Pollution (MESA Air) using irregular monitoring data derived from the AQS regulatory monitoring network and supplemental short-time scale monitoring campaigns conducted to better predict intra-urban variation in air quality. We demonstrate the interpretation and accuracy of this methodology in modeling data from 2000 through 2006 in six U.S. metropolitan areas and establish a basis for likelihood-based estimation.
Botha, J; de Ridder, J H; Potgieter, J C; Steyn, H S; Malan, L
2013-10-01
A recently proposed model for waist circumference cut points (RPWC), driven by increased blood pressure, was demonstrated in an African population. We therefore aimed to validate the RPWC by comparing the RPWC and the Joint Statement Consensus (JSC) models via Logistic Regression (LR) and Neural Networks (NN) analyses. Urban African gender groups (N=171) were stratified according to the JSC and RPWC cut point models. Ultrasound carotid intima media thickness (CIMT), blood pressure (BP) and fasting bloods (glucose, high density lipoprotein (HDL) and triglycerides) were obtained in a well-controlled setting. The RPWC male model (LR ROC AUC: 0.71, NN ROC AUC: 0.71) was practically equal to the JSC model (LR ROC AUC: 0.71, NN ROC AUC: 0.69) to predict structural vascular -disease. Similarly, the female RPWC model (LR ROC AUC: 0.84, NN ROC AUC: 0.82) and JSC model (LR ROC AUC: 0.82, NN ROC AUC: 0.81) equally predicted CIMT as surrogate marker for structural vascular disease. Odds ratios supported validity where prediction of CIMT revealed -clinical -significance, well over 1, for both the JSC and RPWC models in African males and females (OR 3.75-13.98). In conclusion, the proposed RPWC model was substantially validated utilizing linear and non-linear analyses. We therefore propose ethnic-specific WC cut points (African males, ≥90 cm; -females, ≥98 cm) to predict a surrogate marker for structural vascular disease. © J. A. Barth Verlag in Georg Thieme Verlag KG Stuttgart · New York.
Karami, K; Zerehdaran, S; Barzanooni, B; Lotfi, E
2017-12-01
1. The aim of the present study was to estimate genetic parameters for average egg weight (EW) and egg number (EN) at different ages in Japanese quail using multi-trait random regression (MTRR) models. 2. A total of 8534 records from 900 quail, hatched between 2014 and 2015, were used in the study. Average weekly egg weights and egg numbers were measured from second until sixth week of egg production. 3. Nine random regression models were compared to identify the best order of the Legendre polynomials (LP). The most optimal model was identified by the Bayesian Information Criterion. A model with second order of LP for fixed effects, second order of LP for additive genetic effects and third order of LP for permanent environmental effects (MTRR23) was found to be the best. 4. According to the MTRR23 model, direct heritability for EW increased from 0.26 in the second week to 0.53 in the sixth week of egg production, whereas the ratio of permanent environment to phenotypic variance decreased from 0.48 to 0.1. Direct heritability for EN was low, whereas the ratio of permanent environment to phenotypic variance decreased from 0.57 to 0.15 during the production period. 5. For each trait, estimated genetic correlations among weeks of egg production were high (from 0.85 to 0.98). Genetic correlations between EW and EN were low and negative for the first two weeks, but they were low and positive for the rest of the egg production period. 6. In conclusion, random regression models can be used effectively for analysing egg production traits in Japanese quail. Response to selection for increased egg weight would be higher at older ages because of its higher heritability and such a breeding program would have no negative genetic impact on egg production.
glmnetLRC f/k/a lrc package: Logistic Regression Classification
DOE Office of Scientific and Technical Information (OSTI.GOV)
2016-06-09
Methods for fitting and predicting logistic regression classifiers (LRC) with an arbitrary loss function using elastic net or best subsets. This package adds additional model fitting features to the existing glmnet and bestglm R packages. This package was created to perform the analyses described in Amidan BG, Orton DJ, LaMarche BL, et al. 2014. Signatures for Mass Spectrometry Data Quality. Journal of Proteome Research. 13(4), 2215-2222. It makes the model fitting available in the glmnet and bestglm packages more general by identifying optimal model parameters via cross validation with an customizable loss function. It also identifies the optimal threshold formore » binary classification.« less
Developing a dengue forecast model using machine learning: A case study in China.
Guo, Pi; Liu, Tao; Zhang, Qin; Wang, Li; Xiao, Jianpeng; Zhang, Qingying; Luo, Ganfeng; Li, Zhihao; He, Jianfeng; Zhang, Yonghui; Ma, Wenjun
2017-10-01
In China, dengue remains an important public health issue with expanded areas and increased incidence recently. Accurate and timely forecasts of dengue incidence in China are still lacking. We aimed to use the state-of-the-art machine learning algorithms to develop an accurate predictive model of dengue. Weekly dengue cases, Baidu search queries and climate factors (mean temperature, relative humidity and rainfall) during 2011-2014 in Guangdong were gathered. A dengue search index was constructed for developing the predictive models in combination with climate factors. The observed year and week were also included in the models to control for the long-term trend and seasonality. Several machine learning algorithms, including the support vector regression (SVR) algorithm, step-down linear regression model, gradient boosted regression tree algorithm (GBM), negative binomial regression model (NBM), least absolute shrinkage and selection operator (LASSO) linear regression model and generalized additive model (GAM), were used as candidate models to predict dengue incidence. Performance and goodness of fit of the models were assessed using the root-mean-square error (RMSE) and R-squared measures. The residuals of the models were examined using the autocorrelation and partial autocorrelation function analyses to check the validity of the models. The models were further validated using dengue surveillance data from five other provinces. The epidemics during the last 12 weeks and the peak of the 2014 large outbreak were accurately forecasted by the SVR model selected by a cross-validation technique. Moreover, the SVR model had the consistently smallest prediction error rates for tracking the dynamics of dengue and forecasting the outbreaks in other areas in China. The proposed SVR model achieved a superior performance in comparison with other forecasting techniques assessed in this study. The findings can help the government and community respond early to dengue epidemics.
Zeng, Fangfang; Li, Zhongtao; Yu, Xiaoling; Zhou, Linuo
2013-01-01
Background This study aimed to develop the artificial neural network (ANN) and multivariable logistic regression (LR) analyses for prediction modeling of cardiovascular autonomic (CA) dysfunction in the general population, and compare the prediction models using the two approaches. Methods and Materials We analyzed a previous dataset based on a Chinese population sample consisting of 2,092 individuals aged 30–80 years. The prediction models were derived from an exploratory set using ANN and LR analysis, and were tested in the validation set. Performances of these prediction models were then compared. Results Univariate analysis indicated that 14 risk factors showed statistically significant association with the prevalence of CA dysfunction (P<0.05). The mean area under the receiver-operating curve was 0.758 (95% CI 0.724–0.793) for LR and 0.762 (95% CI 0.732–0.793) for ANN analysis, but noninferiority result was found (P<0.001). The similar results were found in comparisons of sensitivity, specificity, and predictive values in the prediction models between the LR and ANN analyses. Conclusion The prediction models for CA dysfunction were developed using ANN and LR. ANN and LR are two effective tools for developing prediction models based on our dataset. PMID:23940593
Tools to Support Interpreting Multiple Regression in the Face of Multicollinearity
Kraha, Amanda; Turner, Heather; Nimon, Kim; Zientek, Linda Reichwein; Henson, Robin K.
2012-01-01
While multicollinearity may increase the difficulty of interpreting multiple regression (MR) results, it should not cause undue problems for the knowledgeable researcher. In the current paper, we argue that rather than using one technique to investigate regression results, researchers should consider multiple indices to understand the contributions that predictors make not only to a regression model, but to each other as well. Some of the techniques to interpret MR effects include, but are not limited to, correlation coefficients, beta weights, structure coefficients, all possible subsets regression, commonality coefficients, dominance weights, and relative importance weights. This article will review a set of techniques to interpret MR effects, identify the elements of the data on which the methods focus, and identify statistical software to support such analyses. PMID:22457655
Tools to support interpreting multiple regression in the face of multicollinearity.
Kraha, Amanda; Turner, Heather; Nimon, Kim; Zientek, Linda Reichwein; Henson, Robin K
2012-01-01
While multicollinearity may increase the difficulty of interpreting multiple regression (MR) results, it should not cause undue problems for the knowledgeable researcher. In the current paper, we argue that rather than using one technique to investigate regression results, researchers should consider multiple indices to understand the contributions that predictors make not only to a regression model, but to each other as well. Some of the techniques to interpret MR effects include, but are not limited to, correlation coefficients, beta weights, structure coefficients, all possible subsets regression, commonality coefficients, dominance weights, and relative importance weights. This article will review a set of techniques to interpret MR effects, identify the elements of the data on which the methods focus, and identify statistical software to support such analyses.
Erdogan, Saffet
2009-10-01
The aim of the study is to describe the inter-province differences in traffic accidents and mortality on roads of Turkey. Two different risk indicators were used to evaluate the road safety performance of the provinces in Turkey. These indicators are the ratios between the number of persons killed in road traffic accidents (1) and the number of accidents (2) (nominators) and their exposure to traffic risk (denominator). Population and the number of registered motor vehicles in the provinces were used as denominators individually. Spatial analyses were performed to the mean annual rate of deaths and to the number of fatal accidents that were calculated for the period of 2001-2006. Empirical Bayes smoothing was used to remove background noise from the raw death and accident rates because of the sparsely populated provinces and small number of accident and death rates of provinces. Global and local spatial autocorrelation analyses were performed to show whether the provinces with high rates of deaths-accidents show clustering or are located closer by chance. The spatial distribution of provinces with high rates of deaths and accidents was nonrandom and detected as clustered with significance of P<0.05 with spatial autocorrelation analyses. Regions with high concentration of fatal accidents and deaths were located in the provinces that contain the roads connecting the Istanbul, Ankara, and Antalya provinces. Accident and death rates were also modeled with some independent variables such as number of motor vehicles, length of roads, and so forth using geographically weighted regression analysis with forward step-wise elimination. The level of statistical significance was taken as P<0.05. Large differences were found between the rates of deaths and accidents according to denominators in the provinces. The geographically weighted regression analyses did significantly better predictions for both accident rates and death rates than did ordinary least regressions, as indicated by adjusted R(2) values. Geographically weighted regression provided values of 0.89-0.99 adjusted R(2) for death and accident rates, compared with 0.88-0.95, respectively, by ordinary least regressions. Geographically weighted regression has the potential to reveal local patterns in the spatial distribution of rates, which would be ignored by the ordinary least regression approach. The application of spatial analysis and modeling of accident statistics and death rates at provincial level in Turkey will help to identification of provinces with outstandingly high accident and death rates. This could help more efficient road safety management in Turkey.
Assessment of Weighted Quantile Sum Regression for Modeling Chemical Mixtures and Cancer Risk
Czarnota, Jenna; Gennings, Chris; Wheeler, David C
2015-01-01
In evaluation of cancer risk related to environmental chemical exposures, the effect of many chemicals on disease is ultimately of interest. However, because of potentially strong correlations among chemicals that occur together, traditional regression methods suffer from collinearity effects, including regression coefficient sign reversal and variance inflation. In addition, penalized regression methods designed to remediate collinearity may have limitations in selecting the truly bad actors among many correlated components. The recently proposed method of weighted quantile sum (WQS) regression attempts to overcome these problems by estimating a body burden index, which identifies important chemicals in a mixture of correlated environmental chemicals. Our focus was on assessing through simulation studies the accuracy of WQS regression in detecting subsets of chemicals associated with health outcomes (binary and continuous) in site-specific analyses and in non-site-specific analyses. We also evaluated the performance of the penalized regression methods of lasso, adaptive lasso, and elastic net in correctly classifying chemicals as bad actors or unrelated to the outcome. We based the simulation study on data from the National Cancer Institute Surveillance Epidemiology and End Results Program (NCI-SEER) case–control study of non-Hodgkin lymphoma (NHL) to achieve realistic exposure situations. Our results showed that WQS regression had good sensitivity and specificity across a variety of conditions considered in this study. The shrinkage methods had a tendency to incorrectly identify a large number of components, especially in the case of strong association with the outcome. PMID:26005323
Assessment of weighted quantile sum regression for modeling chemical mixtures and cancer risk.
Czarnota, Jenna; Gennings, Chris; Wheeler, David C
2015-01-01
In evaluation of cancer risk related to environmental chemical exposures, the effect of many chemicals on disease is ultimately of interest. However, because of potentially strong correlations among chemicals that occur together, traditional regression methods suffer from collinearity effects, including regression coefficient sign reversal and variance inflation. In addition, penalized regression methods designed to remediate collinearity may have limitations in selecting the truly bad actors among many correlated components. The recently proposed method of weighted quantile sum (WQS) regression attempts to overcome these problems by estimating a body burden index, which identifies important chemicals in a mixture of correlated environmental chemicals. Our focus was on assessing through simulation studies the accuracy of WQS regression in detecting subsets of chemicals associated with health outcomes (binary and continuous) in site-specific analyses and in non-site-specific analyses. We also evaluated the performance of the penalized regression methods of lasso, adaptive lasso, and elastic net in correctly classifying chemicals as bad actors or unrelated to the outcome. We based the simulation study on data from the National Cancer Institute Surveillance Epidemiology and End Results Program (NCI-SEER) case-control study of non-Hodgkin lymphoma (NHL) to achieve realistic exposure situations. Our results showed that WQS regression had good sensitivity and specificity across a variety of conditions considered in this study. The shrinkage methods had a tendency to incorrectly identify a large number of components, especially in the case of strong association with the outcome.
Spatial patterns of species richness in New World coral snakes and the metabolic theory of ecology
NASA Astrophysics Data System (ADS)
Terribile, Levi Carina; Diniz-Filho, José Alexandre Felizola
2009-03-01
The metabolic theory of ecology (MTE) has attracted great interest because it proposes an explanation for species diversity gradients based on temperature-metabolism relationships of organisms. Here we analyse the spatial richness pattern of 73 coral snake species from the New World in the context of MTE. We first analysed the association between ln-transformed richness and environmental variables, including the inverse transformation of annual temperature (1/ kT). We used eigenvector-based spatial filtering to remove the residual spatial autocorrelation in the data and geographically weighted regression to account for non-stationarity in data. In a model I regression (OLS), the observed slope between ln-richness and 1/ kT was -0.626 ( r2 = 0.413), but a model II regression generated a much steeper slope (-0.975). When we added additional environmental correlates and the spatial filters in the OLS model, the R2 increased to 0.863 and the partial regression coefficient of 1/ kT was -0.676. The GWR detected highly significant non-stationarity, in data, and the median of local slopes of ln-richness against 1/ kT was -0.38. Our results expose several problems regarding the assumptions needed to test MTE: although the slope of OLS fell within that predicted by the theory and the dataset complied with the assumption of temperature-independence of average body size, the fact that coral snakes consist of a restricted taxonomic group and the non-stationarity of slopes across geographical space makes MTE invalid to explain richness in this case. Also, it is clear that other ecological and historical factors are important drivers of species richness patterns and must be taken into account both in theoretical modeling and data analysis.
Logistic Regression in the Identification of Hazards in Construction
NASA Astrophysics Data System (ADS)
Drozd, Wojciech
2017-10-01
The construction site and its elements create circumstances that are conducive to the formation of risks to safety during the execution of works. Analysis indicates the critical importance of these factors in the set of characteristics that describe the causes of accidents in the construction industry. This article attempts to analyse the characteristics related to the construction site, in order to indicate their importance in defining the circumstances of accidents at work. The study includes sites inspected in 2014 - 2016 by the employees of the District Labour Inspectorate in Krakow (Poland). The analysed set of detailed (disaggregated) data includes both quantitative and qualitative characteristics. The substantive task focused on classification modelling in the identification of hazards in construction and identifying those of the analysed characteristics that are important in an accident. In terms of methodology, resource data analysis using statistical classifiers, in the form of logistic regression, was the method used.
About the Pace of Climate Change: Write a Report to the President
ERIC Educational Resources Information Center
Khadjavi, Lily
2013-01-01
This project allows students to better understand the scope and pace of climate change by conducting their own analyses. Using data readily available from NASA and NOAA, students can apply their knowledge of regression models (or of the modeling of rates of change). The results lend themselves to a writing assignment in which students demonstrate…
ERIC Educational Resources Information Center
Gordon, Rachel A.; Fujimoto, Ken; Kaestner, Robert; Korenman, Sanders; Abner, Kristin
2013-01-01
The Early Childhood Environment Rating Scale-Revised (ECERS-R) is widely used to associate child care quality with child development, but its validity for this purpose is not well established. We examined the validity of the ECERS-R using the multidimensional Rasch partial credit model (PCM), factor analyses, and regression analyses with data from…
NASA Astrophysics Data System (ADS)
Gürcan, Eser Kemal
2017-04-01
The most commonly used methods for analyzing time-dependent data are multivariate analysis of variance (MANOVA) and nonlinear regression models. The aim of this study was to compare some MANOVA techniques and nonlinear mixed modeling approach for investigation of growth differentiation in female and male Japanese quail. Weekly individual body weight data of 352 male and 335 female quail from hatch to 8 weeks of age were used to perform analyses. It is possible to say that when all the analyses are evaluated, the nonlinear mixed modeling is superior to the other techniques because it also reveals the individual variation. In addition, the profile analysis also provides important information.
NASA Astrophysics Data System (ADS)
O'Connor, J. E.; Wise, D. R.; Mangano, J.; Jones, K.
2015-12-01
Empirical analyses of suspended sediment and bedload transport gives estimates of sediment flux for western Oregon and northwestern California. The estimates of both bedload and suspended load are from regression models relating measured annual sediment yield to geologic, physiographic, and climatic properties of contributing basins. The best models include generalized geology and either slope or precipitation. The best-fit suspended-sediment model is based on basin geology, precipitation, and area of recent wildfire. It explains 65% of the variance for 68 suspended sediment measurement sites within the model area. Predicted suspended sediment yields range from no yield from the High Cascades geologic province to 200 tonnes/ km2-yr in the northern Oregon Coast Range and 1000 tonnes/km2-yr in recently burned areas of the northern Klamath terrain. Bed-material yield is similarly estimated from a regression model based on 22 sites of measured bed-material transport, mostly from reservoir accumulation analyses but also from several bedload measurement programs. The resulting best-fit regression is based on basin slope and the presence/absence of the Klamath geologic terrane. For the Klamath terrane, bed-material yield is twice that of the other geologic provinces. This model explains more than 80% of the variance of the better-quality measurements. Predicted bed-material yields range up to 350 tonnes/ km2-yr in steep areas of the Klamath terrane. Applying these regressions to small individual watersheds (mean size; 66 km2 for bed-material; 3 km2 for suspended sediment) and cumulating totals down the hydrologic network (but also decreasing the bed-material flux by experimentally determined attrition rates) gives spatially explicit estimates of both bed-material and suspended sediment flux. This enables assessment of several management issues, including the effects of dams on bedload transport, instream gravel mining, habitat formation processes, and water-quality. The combined fluxes can also be compared to long-term rock uplift and cosmogenically determined landscape erosion rates.
He, Liru; Chapple, Andrew; Liao, Zhongxing; Komaki, Ritsuko; Thall, Peter F; Lin, Steven H
2016-10-01
To evaluate radiation modality effects on pericardial effusion (PCE), pleural effusion (PE) and survival in esophageal cancer (EC) patients. We analyzed data from 470 EC patients treated with definitive concurrent chemoradiotherapy (CRT). Bayesian semi-competing risks (SCR) regression models were fit to assess effects of radiation modality and prognostic covariates on the risks of PCE and PE, and death either with or without these preceding events. Bayesian piecewise exponential regression models were fit for overall survival, the time to PCE or death, and the time to PE or death. All models included propensity score as a covariate to correct for potential selection bias. Median times to onset of PCE and PE after RT were 7.1 and 6.1months for IMRT, and 6.5 and 5.4months for 3DCRT, respectively. Compared to 3DCRT, the IMRT group had significantly lower risks of PE, PCE, and death. The respective probabilities of a patient being alive without either PCE or PE at 3-years and 5-years were 0.29 and 0.21 for IMRT compared to 0.13 and 0.08 for 3DCRT. In the SCR regression analyses, IMRT was associated with significantly lower risks of PCE (HR=0.26) and PE (HR=0.49), and greater overall survival (probability of beneficial effect (pbe)>0.99), after controlling for known clinical prognostic factors. IMRT reduces the incidence and postpones the onset of PCE and PE, and increases survival probability, compared to 3DCRT. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Dyer, Betsey D.; Kahn, Michael J.; LeBlanc, Mark D.
2008-01-01
Classification and regression tree (CART) analysis was applied to genome-wide tetranucleotide frequencies (genomic signatures) of 195 archaea and bacteria. Although genomic signatures have typically been used to classify evolutionary divergence, in this study, convergent evolution was the focus. Temperature optima for most of the organisms examined could be distinguished by CART analyses of tetranucleotide frequencies. This suggests that pervasive (nonlinear) qualities of genomes may reflect certain environmental conditions (such as temperature) in which those genomes evolved. The predominant use of GAGA and AGGA as the discriminating tetramers in CART models suggests that purine-loading and codon biases of thermophiles may explain some of the results. PMID:19054742
Interdependency of the maximum range of flexion-extension of hand metacarpophalangeal joints.
Gracia-Ibáñez, V; Vergara, M; Sancho-Bru, J-L
2016-12-01
Mobility of the fingers metacarpophalangeal (MCP) joints depends on the posture of the adjacent ones. Current Biomechanical hand models consider fixed ranges of movement at joints, regardless of the posture, thus allowing for non-realistic postures, generating wrong results in reach studies and forward dynamic analyses. This study provides data for more realistic hand models. The maximum voluntary extension (MVE) and flexion (MVF) of different combinations of MCP joints were measured covering their range of motion. Dependency of the MVF and MVE on the posture of the adjacent MCP joints was confirmed and mathematical models obtained through regression analyses (RMSE 7.7°).
Schnitzler, Caroline E; von Ranson, Kristin M; Wallace, Laurel M
2012-08-01
This study evaluated the cognitive-behavioral (CB) model of bulimia nervosa and an extension that included two additional maintaining factors - thin-ideal internalization and impulsiveness - in 327 undergraduate women. Participants completed measures of demographics, self-esteem, concern about shape and weight, dieting, bulimic symptoms, thin-ideal internalization, and impulsiveness. Both the original CB model and the extended model provided good fits to the data. Although structural equation modeling analyses suggested that the original CB model was most parsimonious, hierarchical regression analyses indicated that the additional variables accounted for significantly more variance. Additional analyses showed that the model fit could be improved by adding a path from concern about shape and weight, and deleting the path from dieting, to bulimic symptoms. Expanding upon the factors considered in the model may better capture the scope of variables maintaining bulimic symptoms in young women with a range of severity of bulimic symptoms. Copyright © 2012 Elsevier Ltd. All rights reserved.
Schnittger, Rebecca I B; Wherton, Joseph; Prendergast, David; Lawlor, Brian A
2012-01-01
To develop biopsychosocial models of loneliness and social support thereby identifying their key risk factors in an Irish sample of community-dwelling older adults. Additionally, to investigate indirect effects of social support on loneliness through mediating risk factors. A total of 579 participants (400 females; 179 males) were given a battery of biopsychosocial assessments with the primary measures being the De Jong Gierveld Loneliness Scale and the Lubben Social Network Scale along with a broad range of secondary measures. Bivariate correlation analyses identified items to be included in separate psychosocial, cognitive, biological and demographic multiple regression analyses. The resulting model items were then entered into further multiple regression analyses to obtain overall models. Following this, bootstrapping mediation analyses was conducted to examine indirect effects of social support on the subtypes (emotional and social) of loneliness. The overall model for (1) emotional loneliness included depression, neuroticism, perceived stress, living alone and accommodation type, (2) social loneliness included neuroticism, perceived stress, animal naming and number of grandchildren and (3) social support included extraversion, executive functioning (Trail Making Test B-time), history of falls, age and whether the participant drives or not. Social support influenced emotional loneliness predominantly through indirect means, while its effect on social loneliness was more direct. These results characterise the biopsychosocial risk factors of emotional loneliness, social loneliness and social support and identify key pathways by which social support influences emotional and social loneliness. These findings highlight issues with the potential for consideration in the development of targeted interventions.
Modeling time-to-event (survival) data using classification tree analysis.
Linden, Ariel; Yarnold, Paul R
2017-12-01
Time to the occurrence of an event is often studied in health research. Survival analysis differs from other designs in that follow-up times for individuals who do not experience the event by the end of the study (called censored) are accounted for in the analysis. Cox regression is the standard method for analysing censored data, but the assumptions required of these models are easily violated. In this paper, we introduce classification tree analysis (CTA) as a flexible alternative for modelling censored data. Classification tree analysis is a "decision-tree"-like classification model that provides parsimonious, transparent (ie, easy to visually display and interpret) decision rules that maximize predictive accuracy, derives exact P values via permutation tests, and evaluates model cross-generalizability. Using empirical data, we identify all statistically valid, reproducible, longitudinally consistent, and cross-generalizable CTA survival models and then compare their predictive accuracy to estimates derived via Cox regression and an unadjusted naïve model. Model performance is assessed using integrated Brier scores and a comparison between estimated survival curves. The Cox regression model best predicts average incidence of the outcome over time, whereas CTA survival models best predict either relatively high, or low, incidence of the outcome over time. Classification tree analysis survival models offer many advantages over Cox regression, such as explicit maximization of predictive accuracy, parsimony, statistical robustness, and transparency. Therefore, researchers interested in accurate prognoses and clear decision rules should consider developing models using the CTA-survival framework. © 2017 John Wiley & Sons, Ltd.
Gaskin, Cadeyrn J; Happell, Brenda
2014-05-01
To (a) assess the statistical power of nursing research to detect small, medium, and large effect sizes; (b) estimate the experiment-wise Type I error rate in these studies; and (c) assess the extent to which (i) a priori power analyses, (ii) effect sizes (and interpretations thereof), and (iii) confidence intervals were reported. Statistical review. Papers published in the 2011 volumes of the 10 highest ranked nursing journals, based on their 5-year impact factors. Papers were assessed for statistical power, control of experiment-wise Type I error, reporting of a priori power analyses, reporting and interpretation of effect sizes, and reporting of confidence intervals. The analyses were based on 333 papers, from which 10,337 inferential statistics were identified. The median power to detect small, medium, and large effect sizes was .40 (interquartile range [IQR]=.24-.71), .98 (IQR=.85-1.00), and 1.00 (IQR=1.00-1.00), respectively. The median experiment-wise Type I error rate was .54 (IQR=.26-.80). A priori power analyses were reported in 28% of papers. Effect sizes were routinely reported for Spearman's rank correlations (100% of papers in which this test was used), Poisson regressions (100%), odds ratios (100%), Kendall's tau correlations (100%), Pearson's correlations (99%), logistic regressions (98%), structural equation modelling/confirmatory factor analyses/path analyses (97%), and linear regressions (83%), but were reported less often for two-proportion z tests (50%), analyses of variance/analyses of covariance/multivariate analyses of variance (18%), t tests (8%), Wilcoxon's tests (8%), Chi-squared tests (8%), and Fisher's exact tests (7%), and not reported for sign tests, Friedman's tests, McNemar's tests, multi-level models, and Kruskal-Wallis tests. Effect sizes were infrequently interpreted. Confidence intervals were reported in 28% of papers. The use, reporting, and interpretation of inferential statistics in nursing research need substantial improvement. Most importantly, researchers should abandon the misleading practice of interpreting the results from inferential tests based solely on whether they are statistically significant (or not) and, instead, focus on reporting and interpreting effect sizes, confidence intervals, and significance levels. Nursing researchers also need to conduct and report a priori power analyses, and to address the issue of Type I experiment-wise error inflation in their studies. Crown Copyright © 2013. Published by Elsevier Ltd. All rights reserved.
Crane, Paul K; Gibbons, Laura E; Jolley, Lance; van Belle, Gerald
2006-11-01
We present an ordinal logistic regression model for identification of items with differential item functioning (DIF) and apply this model to a Mini-Mental State Examination (MMSE) dataset. We employ item response theory ability estimation in our models. Three nested ordinal logistic regression models are applied to each item. Model testing begins with examination of the statistical significance of the interaction term between ability and the group indicator, consistent with nonuniform DIF. Then we turn our attention to the coefficient of the ability term in models with and without the group term. If including the group term has a marked effect on that coefficient, we declare that it has uniform DIF. We examined DIF related to language of test administration in addition to self-reported race, Hispanic ethnicity, age, years of education, and sex. We used PARSCALE for IRT analyses and STATA for ordinal logistic regression approaches. We used an iterative technique for adjusting IRT ability estimates on the basis of DIF findings. Five items were found to have DIF related to language. These same items also had DIF related to other covariates. The ordinal logistic regression approach to DIF detection, when combined with IRT ability estimates, provides a reasonable alternative for DIF detection. There appear to be several items with significant DIF related to language of test administration in the MMSE. More attention needs to be paid to the specific criteria used to determine whether an item has DIF, not just the technique used to identify DIF.
Gender differences in body consciousness and substance use among high-risk adolescents.
Black, David Scott; Sussman, Steve; Unger, Jennifer; Pokhrel, Pallav; Sun, Ping
2010-08-01
This study explores the association between private and public body consciousness and past 30-day cigarette, alcohol, marijuana, and hard drug use among adolescents. Self-reported data from alterative high school students in California were analyzed (N = 976) using multilevel regression models to account for student clustering within schools. Separate regression analyses were conducted for males and females. Both cross-sectional baseline data and one-year longitudinal prediction models indicated that body consciousness is associated with specific drug use categories differentially by gender. Findings suggest that body consciousness accounts for additional variance in substance use etiology not explained by previously recognized dispositional variables.
Application of conditional moment tests to model checking for generalized linear models.
Pan, Wei
2002-06-01
Generalized linear models (GLMs) are increasingly being used in daily data analysis. However, model checking for GLMs with correlated discrete response data remains difficult. In this paper, through a case study on marginal logistic regression using a real data set, we illustrate the flexibility and effectiveness of using conditional moment tests (CMTs), along with other graphical methods, to do model checking for generalized estimation equation (GEE) analyses. Although CMTs provide an array of powerful diagnostic tests for model checking, they were originally proposed in the econometrics literature and, to our knowledge, have never been applied to GEE analyses. CMTs cover many existing tests, including the (generalized) score test for an omitted covariate, as special cases. In summary, we believe that CMTs provide a class of useful model checking tools.
NASA Astrophysics Data System (ADS)
Yilmaz, Işık
2009-06-01
The purpose of this study is to compare the landslide susceptibility mapping methods of frequency ratio (FR), logistic regression and artificial neural networks (ANN) applied in the Kat County (Tokat—Turkey). Digital elevation model (DEM) was first constructed using GIS software. Landslide-related factors such as geology, faults, drainage system, topographical elevation, slope angle, slope aspect, topographic wetness index (TWI) and stream power index (SPI) were used in the landslide susceptibility analyses. Landslide susceptibility maps were produced from the frequency ratio, logistic regression and neural networks models, and they were then compared by means of their validations. The higher accuracies of the susceptibility maps for all three models were obtained from the comparison of the landslide susceptibility maps with the known landslide locations. However, respective area under curve (AUC) values of 0.826, 0.842 and 0.852 for frequency ratio, logistic regression and artificial neural networks showed that the map obtained from ANN model is more accurate than the other models, accuracies of all models can be evaluated relatively similar. The results obtained in this study also showed that the frequency ratio model can be used as a simple tool in assessment of landslide susceptibility when a sufficient number of data were obtained. Input process, calculations and output process are very simple and can be readily understood in the frequency ratio model, however logistic regression and neural networks require the conversion of data to ASCII or other formats. Moreover, it is also very hard to process the large amount of data in the statistical package.
Perfectionism, Shame, and Depressive Symptoms
ERIC Educational Resources Information Center
Ashby, Jeffrey S.; Rice, Kenneth G.; Martin, James L.
2006-01-01
The authors examined the relationship between depression, maladaptive perfectionism, and shame. Regression analyses were used to replicate a model in which maladaptive perfectionism was negatively associated with self-esteem and positively associated with symptoms of depression, with self-esteem mediating the effects of maladaptive perfectionism…
A Clinical Decision Support System for Breast Cancer Patients
NASA Astrophysics Data System (ADS)
Fernandes, Ana S.; Alves, Pedro; Jarman, Ian H.; Etchells, Terence A.; Fonseca, José M.; Lisboa, Paulo J. G.
This paper proposes a Web clinical decision support system for clinical oncologists and for breast cancer patients making prognostic assessments, using the particular characteristics of the individual patient. This system comprises three different prognostic modelling methodologies: the clinically widely used Nottingham prognostic index (NPI); the Cox regression modelling and a partial logistic artificial neural network with automatic relevance determination (PLANN-ARD). All three models yield a different prognostic index that can be analysed together in order to obtain a more accurate prognostic assessment of the patient. Missing data is incorporated in the mentioned models, a common issue in medical data that was overcome using multiple imputation techniques. Risk group assignments are also provided through a methodology based on regression trees, where Boolean rules can be obtained expressed with patient characteristics.
Demand analysis of flood insurance by using logistic regression model and genetic algorithm
NASA Astrophysics Data System (ADS)
Sidi, P.; Mamat, M. B.; Sukono; Supian, S.; Putra, A. S.
2018-03-01
Citarum River floods in the area of South Bandung Indonesia, often resulting damage to some buildings belonging to the people living in the vicinity. One effort to alleviate the risk of building damage is to have flood insurance. The main obstacle is not all people in the Citarum basin decide to buy flood insurance. In this paper, we intend to analyse the decision to buy flood insurance. It is assumed that there are eight variables that influence the decision of purchasing flood assurance, include: income level, education level, house distance with river, building election with road, flood frequency experience, flood prediction, perception on insurance company, and perception towards government effort in handling flood. The analysis was done by using logistic regression model, and to estimate model parameters, it is done with genetic algorithm. The results of the analysis shows that eight variables analysed significantly influence the demand of flood insurance. These results are expected to be considered for insurance companies, to influence the decision of the community to be willing to buy flood insurance.
Random regression analyses using B-spline functions to model growth of Nellore cattle.
Boligon, A A; Mercadante, M E Z; Lôbo, R B; Baldi, F; Albuquerque, L G
2012-02-01
The objective of this study was to estimate (co)variance components using random regression on B-spline functions to weight records obtained from birth to adulthood. A total of 82 064 weight records of 8145 females obtained from the data bank of the Nellore Breeding Program (PMGRN/Nellore Brazil) which started in 1987, were used. The models included direct additive and maternal genetic effects and animal and maternal permanent environmental effects as random. Contemporary group and dam age at calving (linear and quadratic effect) were included as fixed effects, and orthogonal Legendre polynomials of age (cubic regression) were considered as random covariate. The random effects were modeled using B-spline functions considering linear, quadratic and cubic polynomials for each individual segment. Residual variances were grouped in five age classes. Direct additive genetic and animal permanent environmental effects were modeled using up to seven knots (six segments). A single segment with two knots at the end points of the curve was used for the estimation of maternal genetic and maternal permanent environmental effects. A total of 15 models were studied, with the number of parameters ranging from 17 to 81. The models that used B-splines were compared with multi-trait analyses with nine weight traits and to a random regression model that used orthogonal Legendre polynomials. A model fitting quadratic B-splines, with four knots or three segments for direct additive genetic effect and animal permanent environmental effect and two knots for maternal additive genetic effect and maternal permanent environmental effect, was the most appropriate and parsimonious model to describe the covariance structure of the data. Selection for higher weight, such as at young ages, should be performed taking into account an increase in mature cow weight. Particularly, this is important in most of Nellore beef cattle production systems, where the cow herd is maintained on range conditions. There is limited modification of the growth curve of Nellore cattle with respect to the aim of selecting them for rapid growth at young ages while maintaining constant adult weight.
NASA Astrophysics Data System (ADS)
Jokar Arsanjani, Jamal; Helbich, Marco; Kainz, Wolfgang; Darvishi Boloorani, Ali
2013-04-01
This research analyses the suburban expansion in the metropolitan area of Tehran, Iran. A hybrid model consisting of logistic regression model, Markov chain (MC), and cellular automata (CA) was designed to improve the performance of the standard logistic regression model. Environmental and socio-economic variables dealing with urban sprawl were operationalised to create a probability surface of spatiotemporal states of built-up land use for the years 2006, 2016, and 2026. For validation, the model was evaluated by means of relative operating characteristic values for different sets of variables. The approach was calibrated for 2006 by cross comparing of actual and simulated land use maps. The achieved outcomes represent a match of 89% between simulated and actual maps of 2006, which was satisfactory to approve the calibration process. Thereafter, the calibrated hybrid approach was implemented for forthcoming years. Finally, future land use maps for 2016 and 2026 were predicted by means of this hybrid approach. The simulated maps illustrate a new wave of suburban development in the vicinity of Tehran at the western border of the metropolis during the next decades.
Weeks, Justin W
2015-01-01
Wang, Hsu, Chiu, and Liang (2012, Journal of Anxiety Disorders, 26, 215-224) recently proposed a hierarchical model of social interaction anxiety and depression to account for both the commonalities and distinctions between these conditions. In the present paper, this model was extended to more broadly encompass the symptoms of social anxiety disorder, and replicated in a large unselected, undergraduate sample (n = 585). Structural equation modeling (SEM) and hierarchical regression analyses were employed. Negative affect and positive affect were conceptualized as general factors shared by social anxiety and depression; fear of negative evaluation (FNE) and disqualification of positive social outcomes were operationalized as specific factors, and fear of positive evaluation (FPE) was operationalized as a factor unique to social anxiety. This extended hierarchical model explicates structural relationships among these factors, in which the higher-level, general factors (i.e., high negative affect and low positive affect) represent vulnerability markers of both social anxiety and depression, and the lower-level factors (i.e., FNE, disqualification of positive social outcomes, and FPE) are the dimensions of specific cognitive features. Results from SEM and hierarchical regression analyses converged in support of the extended model. FPE is further supported as a key symptom that differentiates social anxiety from depression.
Vieira, Rute; McDonald, Suzanne; Araújo-Soares, Vera; Sniehotta, Falko F; Henderson, Robin
2017-09-01
N-of-1 studies are based on repeated observations within an individual or unit over time and are acknowledged as an important research method for generating scientific evidence about the health or behaviour of an individual. Statistical analyses of n-of-1 data require accurate modelling of the outcome while accounting for its distribution, time-related trend and error structures (e.g., autocorrelation) as well as reporting readily usable contextualised effect sizes for decision-making. A number of statistical approaches have been documented but no consensus exists on which method is most appropriate for which type of n-of-1 design. We discuss the statistical considerations for analysing n-of-1 studies and briefly review some currently used methodologies. We describe dynamic regression modelling as a flexible and powerful approach, adaptable to different types of outcomes and capable of dealing with the different challenges inherent to n-of-1 statistical modelling. Dynamic modelling borrows ideas from longitudinal and event history methodologies which explicitly incorporate the role of time and the influence of past on future. We also present an illustrative example of the use of dynamic regression on monitoring physical activity during the retirement transition. Dynamic modelling has the potential to expand researchers' access to robust and user-friendly statistical methods for individualised studies.
Dorazio, Robert M.
2012-01-01
Several models have been developed to predict the geographic distribution of a species by combining measurements of covariates of occurrence at locations where the species is known to be present with measurements of the same covariates at other locations where species occurrence status (presence or absence) is unknown. In the absence of species detection errors, spatial point-process models and binary-regression models for case-augmented surveys provide consistent estimators of a species’ geographic distribution without prior knowledge of species prevalence. In addition, these regression models can be modified to produce estimators of species abundance that are asymptotically equivalent to those of the spatial point-process models. However, if species presence locations are subject to detection errors, neither class of models provides a consistent estimator of covariate effects unless the covariates of species abundance are distinct and independently distributed from the covariates of species detection probability. These analytical results are illustrated using simulation studies of data sets that contain a wide range of presence-only sample sizes. Analyses of presence-only data of three avian species observed in a survey of landbirds in western Montana and northern Idaho are compared with site-occupancy analyses of detections and nondetections of these species.
Non-proportional odds multivariate logistic regression of ordinal family data.
Zaloumis, Sophie G; Scurrah, Katrina J; Harrap, Stephen B; Ellis, Justine A; Gurrin, Lyle C
2015-03-01
Methods to examine whether genetic and/or environmental sources can account for the residual variation in ordinal family data usually assume proportional odds. However, standard software to fit the non-proportional odds model to ordinal family data is limited because the correlation structure of family data is more complex than for other types of clustered data. To perform these analyses we propose the non-proportional odds multivariate logistic regression model and take a simulation-based approach to model fitting using Markov chain Monte Carlo methods, such as partially collapsed Gibbs sampling and the Metropolis algorithm. We applied the proposed methodology to male pattern baldness data from the Victorian Family Heart Study. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Tu, Yu-Kang; Krämer, Nicole; Lee, Wen-Chung
2012-07-01
In the analysis of trends in health outcomes, an ongoing issue is how to separate and estimate the effects of age, period, and cohort. As these 3 variables are perfectly collinear by definition, regression coefficients in a general linear model are not unique. In this tutorial, we review why identification is a problem, and how this problem may be tackled using partial least squares and principal components regression analyses. Both methods produce regression coefficients that fulfill the same collinearity constraint as the variables age, period, and cohort. We show that, because the constraint imposed by partial least squares and principal components regression is inherent in the mathematical relation among the 3 variables, this leads to more interpretable results. We use one dataset from a Taiwanese health-screening program to illustrate how to use partial least squares regression to analyze the trends in body heights with 3 continuous variables for age, period, and cohort. We then use another dataset of hepatocellular carcinoma mortality rates for Taiwanese men to illustrate how to use partial least squares regression to analyze tables with aggregated data. We use the second dataset to show the relation between the intrinsic estimator, a recently proposed method for the age-period-cohort analysis, and partial least squares regression. We also show that the inclusion of all indicator variables provides a more consistent approach. R code for our analyses is provided in the eAppendix.
ERIC Educational Resources Information Center
Luna, Andrew L.
2007-01-01
This study used two multiple regression analyses to develop an explanatory model to determine which model might best explain faculty salaries. The central purpose of the study was to determine if using a single market ratio variable was a stronger predictor for faculty salaries than the use of dummy variables representing various disciplines.…
Fossati, Andrea; Widiger, Thomas A; Borroni, Serena; Maffei, Cesare; Somma, Antonella
2017-06-01
To extend the evidence on the reliability and construct validity of the Five-Factor Model Rating Form (FFMRF) in its self-report version, two independent samples of Italian participants, which were composed of 510 adolescent high school students and 457 community-dwelling adults, respectively, were administered the FFMRF in its Italian translation. Adolescent participants were also administered the Italian translation of the Borderline Personality Features Scale for Children-11 (BPFSC-11), whereas adult participants were administered the Italian translation of the Triarchic Psychopathy Measure (TriPM). Cronbach α values were consistent with previous findings; in both samples, average interitem r values indicated acceptable internal consistency for all FFMRF scales. A multidimensional graded item response theory model indicated that the majority of FFMRF items had adequate discrimination parameters; information indices supported the reliability of the FFMRF scales. Both categorical (i.e., item-level) and scale-level regression analyses suggested that the FFMRF scores may predict a nonnegligible amount of variance in the BPFSC-11 total score in adolescent participants, and in the TriPM scale scores in adult participants.
On the use of log-transformation vs. nonlinear regression for analyzing biological power laws.
Xiao, Xiao; White, Ethan P; Hooten, Mevin B; Durham, Susan L
2011-10-01
Power-law relationships are among the most well-studied functional relationships in biology. Recently the common practice of fitting power laws using linear regression (LR) on log-transformed data has been criticized, calling into question the conclusions of hundreds of studies. It has been suggested that nonlinear regression (NLR) is preferable, but no rigorous comparison of these two methods has been conducted. Using Monte Carlo simulations, we demonstrate that the error distribution determines which method performs better, with NLR better characterizing data with additive, homoscedastic, normal error and LR better characterizing data with multiplicative, heteroscedastic, lognormal error. Analysis of 471 biological power laws shows that both forms of error occur in nature. While previous analyses based on log-transformation appear to be generally valid, future analyses should choose methods based on a combination of biological plausibility and analysis of the error distribution. We provide detailed guidelines and associated computer code for doing so, including a model averaging approach for cases where the error structure is uncertain.
Siegrist, Karin; Millier, Aurelie; Amri, Ikbal; Aballéa, Samuel; Toumi, Mondher
2015-12-30
The lack of social contacts may be an important element in the presumed vicious circle aggravating, or at least stabilising negative symptoms in patients with schizophrenia. A European 2-year cohort study collected negative symptom scores, psychosocial functioning scores, objective social contact frequency scores and quality of life scores every 6 months. Bivariate analyses, correlation analyses, multivariate regressions and random effects regressions were conducted to describe relations between social contact and outcomes of interest and to gain a better understanding of this relation over time. Using data from 1208 patients with schizophrenia, a link between social contact frequency and negative symptom scores, functioning and quality of life at baseline was established. Regression models confirmed the significant association between social contact and negative symptoms as well as psychosocial functioning. This study aimed at demonstrating the importance of social contact for deficient behavioural aspects of schizophrenia. Copyright © 2015 The Authors. Published by Elsevier Ireland Ltd.. All rights reserved.
Role of subdural electrocorticography in prediction of long-term seizure outcome in epilepsy surgery
Juhász, Csaba; Shah, Aashit; Sood, Sandeep; Chugani, Harry T.
2009-01-01
Since prediction of long-term seizure outcome using preoperative diagnostic modalities remains suboptimal in epilepsy surgery, we evaluated whether interictal spike frequency measures obtained from extraoperative subdural electrocorticography (ECoG) recording could predict long-term seizure outcome. This study included 61 young patients (age 0.4–23.0 years), who underwent extraoperative ECoG recording prior to cortical resection for alleviation of uncontrolled focal seizures. Patient age, frequency of preoperative seizures, neuroimaging findings, ictal and interictal ECoG measures were preoperatively obtained. The seizure outcome was prospectively measured [follow-up period: 2.5–6.4 years (mean 4.6 years)]. Univariate and multivariate logistic regression analyses determined how well preoperative demographic and diagnostic measures predicted long-term seizure outcome. Following the initial cortical resection, Engel Class I, II, III and IV outcomes were noted in 35, 6, 12 and 7 patients, respectively. One child died due to disseminated intravascular coagulation associated with pseudomonas sepsis 2 days after surgery. Univariate regression analyses revealed that incomplete removal of seizure onset zone, higher interictal spike-frequency in the preserved cortex and incomplete removal of cortical abnormalities on neuroimaging were associated with a greater risk of failing to obtain Class I outcome. Multivariate logistic regression analysis revealed that incomplete removal of seizure onset zone was the only independent predictor of failure to obtain Class I outcome. The goodness of regression model fit and the predictive ability of regression model were greatest in the full regression model incorporating both ictal and interictal measures [R2 0.44; Area under the receiver operating characteristic (ROC) curve: 0.81], slightly smaller in the reduced model incorporating ictal but not interictal measures (R2 0.40; Area under the ROC curve: 0.79) and slightly smaller again in the reduced model incorporating interictal but not ictal measures (R2 0.27; Area under the ROC curve: 0.77). Seizure onset zone and interictal spike frequency measures on subdural ECoG recording may both be useful in predicting the long-term seizure outcome of epilepsy surgery. Yet, the additive clinical impact of interictal spike frequency measures to predict long-term surgical outcome may be modest in the presence of ictal ECoG and neuroimaging data. PMID:19286694
Silva, F G; Torres, R A; Brito, L F; Euclydes, R F; Melo, A L P; Souza, N O; Ribeiro, J I; Rodrigues, M T
2013-12-11
The objective of this study was to identify the best random regression model using Legendre orthogonal polynomials to evaluate Alpine goats genetically and to estimate the parameters for test day milk yield. On the test day, we analyzed 20,710 records of milk yield of 667 goats from the Goat Sector of the Universidade Federal de Viçosa. The evaluated models had combinations of distinct fitting orders for polynomials (2-5), random genetic (1-7), and permanent environmental (1-7) fixed curves and a number of classes for residual variance (2, 4, 5, and 6). WOMBAT software was used for all genetic analyses. A random regression model using the best Legendre orthogonal polynomial for genetic evaluation of milk yield on the test day of Alpine goats considered a fixed curve of order 4, curve of genetic additive effects of order 2, curve of permanent environmental effects of order 7, and a minimum of 5 classes of residual variance because it was the most economical model among those that were equivalent to the complete model by the likelihood ratio test. Phenotypic variance and heritability were higher at the end of the lactation period, indicating that the length of lactation has more genetic components in relation to the production peak and persistence. It is very important that the evaluation utilizes the best combination of fixed, genetic additive and permanent environmental regressions, and number of classes of heterogeneous residual variance for genetic evaluation using random regression models, thereby enhancing the precision and accuracy of the estimates of parameters and prediction of genetic values.
Assessing Spurious Interaction Effects in Structural Equation Modeling
ERIC Educational Resources Information Center
Harring, Jeffrey R.; Weiss, Brandi A.; Li, Ming
2015-01-01
Several studies have stressed the importance of simultaneously estimating interaction and quadratic effects in multiple regression analyses, even if theory only suggests an interaction effect should be present. Specifically, past studies suggested that failing to simultaneously include quadratic effects when testing for interaction effects could…
Moeyaert, Mariola; Ugille, Maaike; Ferron, John M; Beretvas, S Natasha; Van den Noortgate, Wim
2014-09-01
The quantitative methods for analyzing single-subject experimental data have expanded during the last decade, including the use of regression models to statistically analyze the data, but still a lot of questions remain. One question is how to specify predictors in a regression model to account for the specifics of the design and estimate the effect size of interest. These quantitative effect sizes are used in retrospective analyses and allow synthesis of single-subject experimental study results which is informative for evidence-based decision making, research and theory building, and policy discussions. We discuss different design matrices that can be used for the most common single-subject experimental designs (SSEDs), namely, the multiple-baseline designs, reversal designs, and alternating treatment designs, and provide empirical illustrations. The purpose of this article is to guide single-subject experimental data analysts interested in analyzing and meta-analyzing SSED data. © The Author(s) 2014.
Zhang, Jingyi; Li, Bin; Chen, Yumin; Chen, Meijie; Fang, Tao; Liu, Yongfeng
2018-06-11
This paper proposes a regression model using the Eigenvector Spatial Filtering (ESF) method to estimate ground PM 2.5 concentrations. Covariates are derived from remotely sensed data including aerosol optical depth, normal differential vegetation index, surface temperature, air pressure, relative humidity, height of planetary boundary layer and digital elevation model. In addition, cultural variables such as factory densities and road densities are also used in the model. With the Yangtze River Delta region as the study area, we constructed ESF-based Regression (ESFR) models at different time scales, using data for the period between December 2015 and November 2016. We found that the ESFR models effectively filtered spatial autocorrelation in the OLS residuals and resulted in increases in the goodness-of-fit metrics as well as reductions in residual standard errors and cross-validation errors, compared to the classic OLS models. The annual ESFR model explained 70% of the variability in PM 2.5 concentrations, 16.7% more than the non-spatial OLS model. With the ESFR models, we performed detail analyses on the spatial and temporal distributions of PM 2.5 concentrations in the study area. The model predictions are lower than ground observations but match the general trend. The experiment shows that ESFR provides a promising approach to PM 2.5 analysis and prediction.
Straub, David E.; Ebner, Andrew D.
2011-01-01
The USGS, in cooperation with the Chippewa Subdistrict of the Muskingum Watershed Conservancy District, performed hydrologic and hydraulic analyses for selected reaches of three streams in Medina, Wayne, Stark, and Summit Counties in northeast Ohio: Chippewa Creek, Little Chippewa Creek, and River Styx. This study was done to facilitate assessment of various alternatives for mitigating flood hazards in the Chippewa Creek basin. StreamStats regional regression equations were used to estimate instantaneous peak discharges approximately corresponding to bankfull flows. Explanatory variables used in the regression equations were drainage area, main-channel slope, and storage area. Hydraulic models were developed to determine water-surface profiles along the three stream reaches studied for the bankfull discharges established in the hydrologic analyses. The HEC-RAS step-backwater hydraulic analysis model was used to determine water-surface profiles for the three streams. Starting water-surface elevations for all streams were established using normal depth computations in the HEC-RAS models. Cross-sectional elevation data, hydraulic-structure geometries, and roughness coefficients were collected in the field and (along with peak-discharge estimates) used as input for the models. Reach-averaged reductions in water-surface elevations ranged from 0.11 to 1.29 feet over the four roughness coefficient reduction scenarios.
Harris, Katherine M.; Koenig, Harold G.; Han, Xiaotong; Sullivan, Greer; Mattox, Rhonda; Tang, Lingqi
2009-01-01
Objective The negative association between religiosity (religious beliefs and church attendance) and the likelihood of substance use disorders is well established, but the mechanism(s) remain poorly understood. We investigated whether this association was mediated by social support or mental health status. Method We utilized cross-sectional data from the 2002 National Survey on Drug Use and Health (n = 36,370). We first used logistic regression to regress any alcohol use in the past year on sociodemographic and religiosity variables. Then, among individuals who drank in the past year, we regressed past year alcohol abuse/dependence on sociodemographic and religiosity variables. To investigate whether social support mediated the association between religiosity and alcohol use and alcohol abuse/dependence we repeated the above models, adding the social support variables. To the extent that these added predictors modified the magnitude of the effect of the religiosity variables, we interpreted social support as a possible mediator. We also formally tested for mediation using path analysis. We investigated the possible mediating role of mental health status analogously. Parallel sets of analyses were conducted for any drug use, and drug abuse/dependence among those using any drugs as the dependent variables. Results The addition of social support and mental health status variables to logistic regression models had little effect on the magnitude of the religiosity coefficients in any of the models. While some of the tests of mediation were significant in the path analyses, the results were not always in the expected direction, and the magnitude of the effects was small. Conclusions The association between religiosity and decreased likelihood of a substance use disorder does not appear to be substantively mediated by either social support or mental health status. PMID:19714282
Demographic responses of Pinguicula ionantha to prescribed fire: a regression-design LTRE approach.
Kesler, Herbert C; Trusty, Jennifer L; Hermann, Sharon M; Guyer, Craig
2008-06-01
This study describes the use of periodic matrix analysis and regression-design life table response experiments (LTRE) to investigate the effects of prescribed fire on demographic responses of Pinguicula ionantha, a federally listed plant endemic to the herb bog/savanna community in north Florida. Multi-state mark-recapture models with dead recoveries were used to estimate survival and transition probabilities for over 2,300 individuals in 12 populations of P. ionantha. These estimates were applied to parameterize matrix models used in further analyses. P. ionantha demographics were found to be strongly dependent on prescribed fire events. Periodic matrix models were used to evaluate season of burn (either growing or dormant season) for fire return intervals ranging from 1 to 20 years. Annual growing and biannual dormant season fires maximized population growth rates for this species. A regression design LTRE was used to evaluate the effect of number of days since last fire on population growth. Maximum population growth rates calculated using standard asymptotic analysis were realized shortly following a burn event (<2 years), and a regression design LTRE showed that short-term fire-mediated changes in vital rates translated into observed increases in population growth. The LTRE identified fecundity and individual growth as contributing most to increases in post-fire population growth. Our analyses found that the current four-year prescribed fire return intervals used at the study sites can be significantly shortened to increase the population growth rates of this rare species. Understanding the role of fire frequency and season in creating and maintaining appropriate habitat for this species may aid in the conservation of this and other rare herb bog/savanna inhabitants.
Life Satisfaction and Violent Behaviors among Middle School Students
ERIC Educational Resources Information Center
Valois, Robert F.; Paxton, Raheem J.; Zullig, Keith J.; Huebner, E. Scott
2006-01-01
We explored relationships between violent behaviors and perceived life satisfaction among 2,138 middle school students in a southern state using the CDC Middle School Youth Risk Behavior Survey (MSYRBS) and the Brief Multidimensional Student Life Satisfaction Scale (BMSLSS). Logistic regression analyses and multivariate models constructed…
Paranormal belief, experience, and the Keirsey Temperament Sorter.
Fox, J; Williams, C
2000-06-01
121 college students completed the Anomalous Experience Inventory and the Keirsey Temperament Sorter. Multiple regression analyses provided significant models predicting both Paranormal Experience and Belief; the main predictors were the other subscales of the Anomalous Experience Inventory with the Keirsey variables playing only a minor role.
Effect of motivational interviewing on rates of early childhood caries: a randomized trial.
Harrison, Rosamund; Benton, Tonya; Everson-Stewart, Siobhan; Weinstein, Phil
2007-01-01
The purposes of this randomized controlled trial were to: (1) test motivational interviewing (MI) to prevent early childhood caries; and (2) use Poisson regression for data analysis. A total of 240 South Asian children 6 to 18 months old were enrolled and randomly assigned to either the MI or control condition. Children had a dental exam, and their mothers completed pretested instruments at baseline and 1 and 2 years postintervention. Other covariates that might explain outcomes over and above treatment differences were modeled using Poisson regression. Hazard ratios were produced. Analyses included all participants whenever possible. Poisson regression supported a protective effect of MI (hazard ratio [HR]=0.54 (95%CI=035-0.84)-that is, the M/ group had about a 46% lower rate of dmfs at 2 years than did control children. Similar treatment effect estimates were obtained from models that included, as alternative outcomes, ds, dms, and dmfs, including "white spot lesions." Exploratory analyses revealed that rates of dmfs were higher in children whose mothers had: (1) prechewed their food; (2) been raised in a rural environment; and (3) a higher family income (P<.05). A motivational interviewing-style intervention shows promise to promote preventive behaviors in mothers of young children at high risk for caries.
The Application of Censored Regression Models in Low Streamflow Analyses
NASA Astrophysics Data System (ADS)
Kroll, C.; Luz, J.
2003-12-01
Estimation of low streamflow statistics at gauged and ungauged river sites is often a daunting task. This process is further confounded by the presence of intermittent streamflows, where streamflow is sometimes reported as zero, within a region. Streamflows recorded as zero may be zero, or may be less than the measurement detection limit. Such data is often referred to as censored data. Numerous methods have been developed to characterize intermittent streamflow series. Logit regression has been proposed to develop regional models of the probability annual lowflows series (such as 7-day lowflows) are zero. In addition, Tobit regression, a method of regression that allows for censored dependent variables, has been proposed for lowflow regional regression models in regions where the lowflow statistic of interest estimated as zero at some sites in the region. While these methods have been proposed, their use in practice has been limited. Here a delete-one jackknife simulation is presented to examine the performance of Logit and Tobit models of 7-day annual minimum flows in 6 USGS water resource regions in the United States. For the Logit model, an assessment is made of whether sites are correctly classified as having at least 10% of 7-day annual lowflows equal to zero. In such a situation, the 7-day, 10-year lowflow (Q710), a commonly employed low streamflow statistic, would be reported as zero. For the Tobit model, a comparison is made between results from the Tobit model, and from performing either ordinary least squares (OLS) or principal component regression (PCR) after the zero sites are dropped from the analysis. Initial results for the Logit model indicate this method to have a high probability of correctly classifying sites into groups with Q710s as zero and non-zero. Initial results also indicate the Tobit model produces better results than PCR and OLS when more than 5% of the sites in the region have Q710 values calculated as zero.
High-flow oxygen therapy: pressure analysis in a pediatric airway model.
Urbano, Javier; del Castillo, Jimena; López-Herce, Jesús; Gallardo, José A; Solana, María J; Carrillo, Ángel
2012-05-01
The mechanism of high-flow oxygen therapy and the pressures reached in the airway have not been defined. We hypothesized that the flow would generate a low continuous positive pressure, and that elevated flow rates in this model could produce moderate pressures. The objective of this study was to analyze the pressure generated by a high-flow oxygen therapy system in an experimental model of the pediatric airway. An experimental in vitro study was performed. A high-flow oxygen therapy system was connected to 3 types of interface (nasal cannulae, nasal mask, and oronasal mask) and applied to 2 types of pediatric manikin (infant and neonatal). The pressures generated in the circuit, in the airway, and in the pharynx were measured at different flow rates (5, 10, 15, and 20 L/min). The experiment was conducted with and without a leak (mouth sealed and unsealed). Linear regression analyses were performed for each set of measurements. The pressures generated with the different interfaces were very similar. The maximum pressure recorded was 4 cm H(2)O with a flow of 20 L/min via nasal cannulae or nasal mask. When the mouth of the manikin was held open, the pressures reached in the airway and pharynxes were undetectable. Linear regression analyses showed a similar linear relationship between flow and pressures measured in the pharynx (pressure = -0.375 + 0.138 × flow) and in the airway (pressure = -0.375 + 0.158 × flow) with the closed mouth condition. According to our hypothesis, high-flow oxygen therapy systems produced a low-level CPAP in an experimental pediatric model, even with the use of very high flow rates. Linear regression analyses showed similar linear relationships between flow and pressures measured in the pharynx and in the airway. This finding suggests that, at least in part, the effects may be due to other mechanisms.
Random regression models using different functions to model milk flow in dairy cows.
Laureano, M M M; Bignardi, A B; El Faro, L; Cardoso, V L; Tonhati, H; Albuquerque, L G
2014-09-12
We analyzed 75,555 test-day milk flow records from 2175 primiparous Holstein cows that calved between 1997 and 2005. Milk flow was obtained by dividing the mean milk yield (kg) of the 3 daily milking by the total milking time (min) and was expressed as kg/min. Milk flow was grouped into 43 weekly classes. The analyses were performed using a single-trait Random Regression Models that included direct additive genetic, permanent environmental, and residual random effects. In addition, the contemporary group and linear and quadratic effects of cow age at calving were included as fixed effects. Fourth-order orthogonal Legendre polynomial of days in milk was used to model the mean trend in milk flow. The additive genetic and permanent environmental covariance functions were estimated using random regression Legendre polynomials and B-spline functions of days in milk. The model using a third-order Legendre polynomial for additive genetic effects and a sixth-order polynomial for permanent environmental effects, which contained 7 residual classes, proved to be the most adequate to describe variations in milk flow, and was also the most parsimonious. The heritability in milk flow estimated by the most parsimonious model was of moderate to high magnitude.
Yilmaz, Banu; Aras, Egemen; Nacar, Sinan; Kankal, Murat
2018-05-23
The functional life of a dam is often determined by the rate of sediment delivery to its reservoir. Therefore, an accurate estimate of the sediment load in rivers with dams is essential for designing and predicting a dam's useful lifespan. The most credible method is direct measurements of sediment input, but this can be very costly and it cannot always be implemented at all gauging stations. In this study, we tested various regression models to estimate suspended sediment load (SSL) at two gauging stations on the Çoruh River in Turkey, including artificial bee colony (ABC), teaching-learning-based optimization algorithm (TLBO), and multivariate adaptive regression splines (MARS). These models were also compared with one another and with classical regression analyses (CRA). Streamflow values and previously collected data of SSL were used as model inputs with predicted SSL data as output. Two different training and testing dataset configurations were used to reinforce the model accuracy. For the MARS method, the root mean square error value was found to range between 35% and 39% for the test two gauging stations, which was lower than errors for other models. Error values were even lower (7% to 15%) using another dataset. Our results indicate that simultaneous measurements of streamflow with SSL provide the most effective parameter for obtaining accurate predictive models and that MARS is the most accurate model for predicting SSL. Copyright © 2017 Elsevier B.V. All rights reserved.
Estimating the exceedance probability of rain rate by logistic regression
NASA Technical Reports Server (NTRS)
Chiu, Long S.; Kedem, Benjamin
1990-01-01
Recent studies have shown that the fraction of an area with rain intensity above a fixed threshold is highly correlated with the area-averaged rain rate. To estimate the fractional rainy area, a logistic regression model, which estimates the conditional probability that rain rate over an area exceeds a fixed threshold given the values of related covariates, is developed. The problem of dependency in the data in the estimation procedure is bypassed by the method of partial likelihood. Analyses of simulated scanning multichannel microwave radiometer and observed electrically scanning microwave radiometer data during the Global Atlantic Tropical Experiment period show that the use of logistic regression in pixel classification is superior to multiple regression in predicting whether rain rate at each pixel exceeds a given threshold, even in the presence of noisy data. The potential of the logistic regression technique in satellite rain rate estimation is discussed.
Developing a dengue forecast model using machine learning: A case study in China
Zhang, Qin; Wang, Li; Xiao, Jianpeng; Zhang, Qingying; Luo, Ganfeng; Li, Zhihao; He, Jianfeng; Zhang, Yonghui; Ma, Wenjun
2017-01-01
Background In China, dengue remains an important public health issue with expanded areas and increased incidence recently. Accurate and timely forecasts of dengue incidence in China are still lacking. We aimed to use the state-of-the-art machine learning algorithms to develop an accurate predictive model of dengue. Methodology/Principal findings Weekly dengue cases, Baidu search queries and climate factors (mean temperature, relative humidity and rainfall) during 2011–2014 in Guangdong were gathered. A dengue search index was constructed for developing the predictive models in combination with climate factors. The observed year and week were also included in the models to control for the long-term trend and seasonality. Several machine learning algorithms, including the support vector regression (SVR) algorithm, step-down linear regression model, gradient boosted regression tree algorithm (GBM), negative binomial regression model (NBM), least absolute shrinkage and selection operator (LASSO) linear regression model and generalized additive model (GAM), were used as candidate models to predict dengue incidence. Performance and goodness of fit of the models were assessed using the root-mean-square error (RMSE) and R-squared measures. The residuals of the models were examined using the autocorrelation and partial autocorrelation function analyses to check the validity of the models. The models were further validated using dengue surveillance data from five other provinces. The epidemics during the last 12 weeks and the peak of the 2014 large outbreak were accurately forecasted by the SVR model selected by a cross-validation technique. Moreover, the SVR model had the consistently smallest prediction error rates for tracking the dynamics of dengue and forecasting the outbreaks in other areas in China. Conclusion and significance The proposed SVR model achieved a superior performance in comparison with other forecasting techniques assessed in this study. The findings can help the government and community respond early to dengue epidemics. PMID:29036169
[New method of mixed gas infrared spectrum analysis based on SVM].
Bai, Peng; Xie, Wen-Jun; Liu, Jun-Hua
2007-07-01
A new method of infrared spectrum analysis based on support vector machine (SVM) for mixture gas was proposed. The kernel function in SVM was used to map the seriously overlapping absorption spectrum into high-dimensional space, and after transformation, the high-dimensional data could be processed in the original space, so the regression calibration model was established, then the regression calibration model with was applied to analyze the concentration of component gas. Meanwhile it was proved that the regression calibration model with SVM also could be used for component recognition of mixture gas. The method was applied to the analysis of different data samples. Some factors such as scan interval, range of the wavelength, kernel function and penalty coefficient C that affect the model were discussed. Experimental results show that the component concentration maximal Mean AE is 0.132%, and the component recognition accuracy is higher than 94%. The problems of overlapping absorption spectrum, using the same method for qualitative and quantitative analysis, and limit number of training sample, were solved. The method could be used in other mixture gas infrared spectrum analyses, promising theoretic and application values.
Reddy, M Srinivasa; Basha, Shaik; Joshi, H V; Sravan Kumar, V G; Jha, B; Ghosh, P K
2005-01-01
Alang-Sosiya is the largest ship-scrapping yard in the world, established in 1982. Every year an average of 171 ships having a mean weight of 2.10 x 10(6)(+/-7.82 x 10(5)) of light dead weight tonnage (LDT) being scrapped. Apart from scrapped metals, this yard generates a massive amount of combustible solid waste in the form of waste wood, plastic, insulation material, paper, glass wool, thermocol pieces (polyurethane foam material), sponge, oiled rope, cotton waste, rubber, etc. In this study multiple regression analysis was used to develop predictive models for energy content of combustible ship-scrapping solid wastes. The scope of work comprised qualitative and quantitative estimation of solid waste samples and performing a sequential selection procedure for isolating variables. Three regression models were developed to correlate the energy content (net calorific values (LHV)) with variables derived from material composition, proximate and ultimate analyses. The performance of these models for this particular waste complies well with the equations developed by other researchers (Dulong, Steuer, Scheurer-Kestner and Bento's) for estimating energy content of municipal solid waste.
ERIC Educational Resources Information Center
O'Dwyer, Laura M.; Parker, Caroline E.
2014-01-01
Analyzing data that possess some form of nesting is often challenging for applied researchers or district staff who are involved in or in charge of conducting data analyses. This report provides a description of the challenges for analyzing nested data and provides a primer of how multilevel regression modeling may be used to resolve these…
Li, Min; Zhang, Lu; Yao, Xiaolong; Jiang, Xingyu
2017-01-01
The emerging membrane introduction mass spectrometry technique has been successfully used to detect benzene, toluene, ethyl benzene and xylene (BTEX), while overlapped spectra have unfortunately hindered its further application to the analysis of mixtures. Multivariate calibration, an efficient method to analyze mixtures, has been widely applied. In this paper, we compared univariate and multivariate analyses for quantification of the individual components of mixture samples. The results showed that the univariate analysis creates poor models with regression coefficients of 0.912, 0.867, 0.440 and 0.351 for BTEX, respectively. For multivariate analysis, a comparison to the partial-least squares (PLS) model shows that the orthogonal partial-least squares (OPLS) regression exhibits an optimal performance with regression coefficients of 0.995, 0.999, 0.980 and 0.976, favorable calibration parameters (RMSEC and RMSECV) and a favorable validation parameter (RMSEP). Furthermore, the OPLS exhibits a good recovery of 73.86 - 122.20% and relative standard deviation (RSD) of the repeatability of 1.14 - 4.87%. Thus, MIMS coupled with the OPLS regression provides an optimal approach for a quantitative BTEX mixture analysis in monitoring and predicting water pollution.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wenzel, Tom P.
2016-05-20
Previous analyses have indicated that mass reduction is associated with an increase in crash frequency (crashes per VMT), but a decrease in fatality or casualty risk once a crash has occurred, across all types of light-duty vehicles. These results are counter-intuitive: one would expect that lighter, and perhaps smaller, vehicles have better handling and shorter braking distances, and thus should be able to avoid crashes that heavier vehicles cannot. And one would expect that heavier vehicles would have lower risk once a crash has occurred than lighter vehicles. However, these trends occur under several alternative regression model specifications. This reportmore » tests whether these results continue to hold after accounting for crash severity, by excluding crashes that result in relatively minor damage to the vehicle(s) involved in the crash. Excluding non-severe crashes from the initial LBNL Phase 2 and simultaneous two-stage regression models for the most part has little effect on the unexpected relationships observed in the baseline regression models. This finding suggests that other subtle differences in vehicles and/or their drivers, or perhaps biases in the data reported in state crash databases, are causing the unexpected results from the regression models.« less
Evolahti, Annika; Hultcrantz, Malou; Collins, Aila
2006-11-01
The aim of the present study was to investigate whether there is an association between serum cortisol and work-related stress, as defined by the demand-control model in a longitudinal design. One hundred ten women aged 47-53 years completed a health questionnaire, including the Swedish version of the Job Content Scale, and participated in a psychological interview at baseline and in a follow-up session 2 years later. Morning blood samples were drawn for analyses of cortisol. Multiple stepwise regression analyses and logistic regression analyses showed that work demands and lack of social support were significantly associated with cortisol. The results of this study showed that negative work characteristics in terms of high demands and low social support contributed significantly to the biological stress levels in middle-aged women. Participation in the study may have served as an intervention, increasing the women's awareness and thus improving their health profiles on follow-up.
NASA Astrophysics Data System (ADS)
Berger, Lukas; Kleinheinz, Konstantin; Attili, Antonio; Bisetti, Fabrizio; Pitsch, Heinz; Mueller, Michael E.
2018-05-01
Modelling unclosed terms in partial differential equations typically involves two steps: First, a set of known quantities needs to be specified as input parameters for a model, and second, a specific functional form needs to be defined to model the unclosed terms by the input parameters. Both steps involve a certain modelling error, with the former known as the irreducible error and the latter referred to as the functional error. Typically, only the total modelling error, which is the sum of functional and irreducible error, is assessed, but the concept of the optimal estimator enables the separate analysis of the total and the irreducible errors, yielding a systematic modelling error decomposition. In this work, attention is paid to the techniques themselves required for the practical computation of irreducible errors. Typically, histograms are used for optimal estimator analyses, but this technique is found to add a non-negligible spurious contribution to the irreducible error if models with multiple input parameters are assessed. Thus, the error decomposition of an optimal estimator analysis becomes inaccurate, and misleading conclusions concerning modelling errors may be drawn. In this work, numerically accurate techniques for optimal estimator analyses are identified and a suitable evaluation of irreducible errors is presented. Four different computational techniques are considered: a histogram technique, artificial neural networks, multivariate adaptive regression splines, and an additive model based on a kernel method. For multiple input parameter models, only artificial neural networks and multivariate adaptive regression splines are found to yield satisfactorily accurate results. Beyond a certain number of input parameters, the assessment of models in an optimal estimator analysis even becomes practically infeasible if histograms are used. The optimal estimator analysis in this paper is applied to modelling the filtered soot intermittency in large eddy simulations using a dataset of a direct numerical simulation of a non-premixed sooting turbulent flame.
NASA Astrophysics Data System (ADS)
Jones, William I.
This study examined the understanding of nature of science among participants in their final year of a 4-year undergraduate teacher education program at a Midwest liberal arts university. The Logic Model Process was used as an integrative framework to focus the collection, organization, analysis, and interpretation of the data for the purpose of (1) describing participant understanding of NOS and (2) to identify participant characteristics and teacher education program features related to those understandings. The Views of Nature of Science Questionnaire form C (VNOS-C) was used to survey participant understanding of 7 target aspects of Nature of Science (NOS). A rubric was developed from a review of the literature to categorize and score participant understanding of the target aspects of NOS. Participants' high school and college transcripts, planning guides for their respective teacher education program majors, and science content and science teaching methods course syllabi were examined to identify and categorize participant characteristics and teacher education program features. The R software (R Project for Statistical Computing, 2010) was used to conduct an exploratory analysis to determine correlations of the antecedent and transaction predictor variables with participants' scores on the 7 target aspects of NOS. Fourteen participant characteristics and teacher education program features were moderately and significantly ( p < .01) correlated with participant scores on the target aspects of NOS. The 6 antecedent predictor variables were entered into multiple regression analyses to determine the best-fit model of antecedent predictor variables for each target NOS aspect. The transaction predictor variables were entered into separate multiple regression analyses to determine the best-fit model of transaction predictor variables for each target NOS aspect. Variables from the best-fit antecedent and best-fit transaction models for each target aspect of NOS were then combined. A regression analysis for each of the combined models was conducted to determine the relative effect of these variables on the target aspects of NOS. Findings from the multiple regression analyses revealed that each of the fourteen predictor variables was present in the best-fit model for at least 1 of the 7 target aspects of NOS. However, not all of the predictor variables were statistically significant (p < .007) in the models and their effect (beta) varied. Participants in the teacher education program who had higher ACT Math scores, completed more high school science credits, and were enrolled either in the Middle Childhood with a science concentration program major or in the Adolescent/Young Adult Science Education program major were more likely to have an informed understanding on each of the 7 target aspects of NOS. Analyses of the planning guides and the course syllabi in each teacher education program major revealed differences between the program majors that may account for the results.
Survival Regression Modeling Strategies in CVD Prediction.
Barkhordari, Mahnaz; Padyab, Mojgan; Sardarinia, Mahsa; Hadaegh, Farzad; Azizi, Fereidoun; Bozorgmanesh, Mohammadreza
2016-04-01
A fundamental part of prevention is prediction. Potential predictors are the sine qua non of prediction models. However, whether incorporating novel predictors to prediction models could be directly translated to added predictive value remains an area of dispute. The difference between the predictive power of a predictive model with (enhanced model) and without (baseline model) a certain predictor is generally regarded as an indicator of the predictive value added by that predictor. Indices such as discrimination and calibration have long been used in this regard. Recently, the use of added predictive value has been suggested while comparing the predictive performances of the predictive models with and without novel biomarkers. User-friendly statistical software capable of implementing novel statistical procedures is conspicuously lacking. This shortcoming has restricted implementation of such novel model assessment methods. We aimed to construct Stata commands to help researchers obtain the aforementioned statistical indices. We have written Stata commands that are intended to help researchers obtain the following. 1, Nam-D'Agostino X 2 goodness of fit test; 2, Cut point-free and cut point-based net reclassification improvement index (NRI), relative absolute integrated discriminatory improvement index (IDI), and survival-based regression analyses. We applied the commands to real data on women participating in the Tehran lipid and glucose study (TLGS) to examine if information relating to a family history of premature cardiovascular disease (CVD), waist circumference, and fasting plasma glucose can improve predictive performance of Framingham's general CVD risk algorithm. The command is adpredsurv for survival models. Herein we have described the Stata package "adpredsurv" for calculation of the Nam-D'Agostino X 2 goodness of fit test as well as cut point-free and cut point-based NRI, relative and absolute IDI, and survival-based regression analyses. We hope this work encourages the use of novel methods in examining predictive capacity of the emerging plethora of novel biomarkers.
ERIC Educational Resources Information Center
Everson, Howard T.; And Others
This paper explores the feasibility of neural computing methods such as artificial neural networks (ANNs) and abductory induction mechanisms (AIM) for use in educational measurement. ANNs and AIMS methods are contrasted with more traditional statistical techniques, such as multiple regression and discriminant function analyses, for making…
Framing the Future: Revisiting the Place of Educational Expectations in Status Attainment
ERIC Educational Resources Information Center
Bozick, Robert; Alexander, Karl; Entwisle, Doris; Dauber, Susan; Kerr, Kerri
2010-01-01
This study revisits the Wisconsin model of status attainment from a life course developmental perspective. Fixed-effects regression analyses lend strong support to the Wisconsin framework's core proposition that academic performance and significant others' influence shape educational expectations. However, investigating the process of expectation…
Sex differences in the effect of aging on dry eye disease.
Ahn, Jong Ho; Choi, Yoon-Hyeong; Paik, Hae Jung; Kim, Mee Kum; Wee, Won Ryang; Kim, Dong Hyun
2017-01-01
Aging is a major risk factor in dry eye disease (DED), and understanding sexual differences is very important in biomedical research. However, there is little information about sex differences in the effect of aging on DED. We investigated sex differences in the effect of aging and other risk factors for DED. This study included data of 16,824 adults from the Korea National Health and Nutrition Examination Survey (2010-2012), which is a population-based cross-sectional survey. DED was defined as the presence of frequent ocular dryness or a previous diagnosis by an ophthalmologist. Basic sociodemographic factors and previously known risk factors for DED were included in the analyses. Linear regression modeling and multivariate logistic regression modeling were used to compare the sex differences in the effect of risk factors for DED; we additionally performed tests for interactions between sex and other risk factors for DED in logistic regression models. In our linear regression models, the prevalence of DED symptoms in men increased with age ( R =0.311, P =0.012); however, there was no association between aging and DED in women ( P >0.05). Multivariate logistic regression analyses showed that aging in men was not associated with DED (DED symptoms/diagnosis: odds ratio [OR] =1.01/1.04, each P >0.05), while aging in women was protectively associated with DED (DED symptoms/diagnosis: OR =0.94/0.91, P =0.011/0.003). Previous ocular surgery was significantly associated with DED in both men and women (men/women: OR =2.45/1.77 [DED symptoms] and 3.17/2.05 [DED diagnosis], each P <0.001). Tests for interactions of sex revealed significantly different aging × sex and previous ocular surgery × sex interactions ( P for interaction of sex: DED symptoms/diagnosis - 0.044/0.011 [age] and 0.012/0.006 [previous ocular surgery]). There were distinct sex differences in the effect of aging on DED in the Korean population. DED following ocular surgery also showed sexually different patterns. Age matching and sex matching are strongly recommended in further studies about DED, especially DED following ocular surgery.
Hollier, John M; Czyzewski, Danita I; Self, Mariella M; Weidler, Erica M; Smith, E O'Brian; Shulman, Robert J
2017-03-01
This study evaluates whether certain patient or parental characteristics are associated with gastroenterology (GI) referral versus primary pediatrics care for pediatric irritable bowel syndrome (IBS). A retrospective clinical trial sample of patients meeting pediatric Rome III IBS criteria was assembled from a single metropolitan health care system. Baseline socioeconomic status (SES) and clinical symptom measures were gathered. Various instruments measured participant and parental psychosocial traits. Study outcomes were stratified by GI referral versus primary pediatrics care. Two separate analyses of SES measures and GI clinical symptoms and psychosocial measures identified key factors by univariate and multiple logistic regression analyses. For each analysis, identified factors were placed in unadjusted and adjusted multivariate logistic regression models to assess their impact in predicting GI referral. Of the 239 participants, 152 were referred to pediatric GI, and 87 were managed in primary pediatrics care. Of the SES and clinical symptom factors, child self-assessment of abdominal pain duration and lower percentage of people living in poverty were the strongest predictors of GI referral. Among the psychosocial measures, parental assessment of their child's functional disability was the sole predictor of GI referral. In multivariate logistic regression models, all selected factors continued to predict GI referral in each model. Socioeconomic environment, clinical symptoms, and functional disability are associated with GI referral. Future interventions designed to ameliorate the effect of these identified factors could reduce unnecessary specialty consultations and health care overutilization for IBS.
Hughes, James P.; Haley, Danielle F.; Frew, Paula M.; Golin, Carol E.; Adimora, Adaora A; Kuo, Irene; Justman, Jessica; Soto-Torres, Lydia; Wang, Jing; Hodder, Sally
2015-01-01
Purpose Reductions in risk behaviors are common following enrollment in HIV prevention studies. We develop methods to quantify the proportion of change in risk behaviors that can be attributed to regression to the mean versus study participation and other factors. Methods A novel model that incorporates both regression to the mean and study participation effects is developed for binary measures. The model is used to estimate the proportion of change in the prevalence of “unprotected sex in the past 6 months” that can be attributed to study participation versus regression to the mean in a longitudinal cohort of women at risk for HIV infection who were recruited from ten US communities with high rates of HIV and poverty. HIV risk behaviors were evaluated using audio computer-assisted self-interviews at baseline and every 6 months for up to 12 months. Results The prevalence of “unprotected sex in the past 6 months” declined from 96% at baseline to 77% at 12 months. However, this change could be almost completely explained by regression to the mean. Conclusions Analyses that examine changes over time in cohorts selected for high or low risk behaviors should account for regression to the mean effects. PMID:25883065
Li, Li; Nguyen, Kim-Huong; Comans, Tracy; Scuffham, Paul
2018-04-01
Several utility-based instruments have been applied in cost-utility analysis to assess health state values for people with dementia. Nevertheless, concerns and uncertainty regarding their performance for people with dementia have been raised. To assess the performance of available utility-based instruments for people with dementia by comparing their psychometric properties and to explore factors that cause variations in the reported health state values generated from those instruments by conducting meta-regression analyses. A literature search was conducted and psychometric properties were synthesized to demonstrate the overall performance of each instrument. When available, health state values and variables such as the type of instrument and cognitive impairment levels were extracted from each article. A meta-regression analysis was undertaken and available covariates were included in the models. A total of 64 studies providing preference-based values were identified and included. The EuroQol five-dimension questionnaire demonstrated the best combination of feasibility, reliability, and validity. Meta-regression analyses suggested that significant differences exist between instruments, type of respondents, and mode of administration and the variations in estimated utility values had influences on incremental quality-adjusted life-year calculation. This review finds that the EuroQol five-dimension questionnaire is the most valid utility-based instrument for people with dementia, but should be replaced by others under certain circumstances. Although no utility estimates were reported in the article, the meta-regression analyses that examined variations in utility estimates produced by different instruments impact on cost-utility analysis, potentially altering the decision-making process in some circumstances. Copyright © 2018 International Society for Pharmacoeconomics and Outcomes Research (ISPOR). Published by Elsevier Inc. All rights reserved.
Li, Michael Jonathan; Distefano, Anthony; Mouttapa, Michele; Gill, Jasmeet K
2014-02-01
The present study aimed to determine whether the experience of bias-motivated bullying was associated with behaviors known to increase the risk of HIV infection among young men who have sex with men (YMSM) aged 18-29, and to assess whether the psychosocial problems moderated this relationship. Using an Internet-based direct marketing approach in sampling, we recruited 545 YMSM residing in the USA to complete an online questionnaire. Multiple linear regression analyses tested three regression models where we controlled for sociodemographics. The first model indicated that bullying during high school was associated with unprotected receptive anal intercourse within the past 12 months, while the second model indicated that bullying after high school was associated with engaging in anal intercourse while under the influence of drugs or alcohol in the past 12 months. In the final regression model, our composite measure of HIV risk behavior was found to be associated with lifetime verbal harassment. None of the psychosocial problems measured in this study - depression, low self-esteem, and internalized homonegativity - moderated any of the associations between bias-motivated bullying victimization and HIV risk behaviors in our regression models. Still, these findings provide novel evidence that bullying prevention programs in schools and communities should be included in comprehensive approaches to HIV prevention among YMSM.
Determinants of The Grade A Embryos in Infertile Women; Zero-Inflated Regression Model.
Almasi-Hashiani, Amir; Ghaheri, Azadeh; Omani Samani, Reza
2017-10-01
In assisted reproductive technology, it is important to choose high quality embryos for embryo transfer. The aim of the present study was to determine the grade A embryo count and factors related to it in infertile women. This historical cohort study included 996 infertile women. The main outcome was the number of grade A embryos. Zero-Inflated Poisson (ZIP) regression and Zero-Inflated Negative Binomial (ZINB) regression were used to model the count data as it contained excessive zeros. Stata software, version 13 (Stata Corp, College Station, TX, USA) was used for all statistical analyses. After adjusting for potential confounders, results from the ZINB model show that for each unit increase in the number 2 pronuclear (2PN) zygotes, we get an increase of 1.45 times as incidence rate ratio (95% confidence interval (CI): 1.23-1.69, P=0.001) in the expected grade A embryo count number, and for each increase in the cleavage day we get a decrease 0.35 times (95% CI: 0.20-0.61, P=0.001) in expected grade A embryo count. There is a significant association between both the number of 2PN zygotes and cleavage day with the number of grade A embryos in both ZINB and ZIP regression models. The estimated coefficients are more plausible than values found in earlier studies using less relevant models. Copyright© by Royan Institute. All rights reserved.
Baqué, Michèle; Amendt, Jens
2013-01-01
Developmental data of juvenile blow flies (Diptera: Calliphoridae) are typically used to calculate the age of immature stages found on or around a corpse and thus to estimate a minimum post-mortem interval (PMI(min)). However, many of those data sets don't take into account that immature blow flies grow in a non-linear fashion. Linear models do not supply a sufficient reliability on age estimates and may even lead to an erroneous determination of the PMI(min). According to the Daubert standard and the need for improvements in forensic science, new statistic tools like smoothing methods and mixed models allow the modelling of non-linear relationships and expand the field of statistical analyses. The present study introduces into the background and application of these statistical techniques by analysing a model which describes the development of the forensically important blow fly Calliphora vicina at different temperatures. The comparison of three statistical methods (linear regression, generalised additive modelling and generalised additive mixed modelling) clearly demonstrates that only the latter provided regression parameters that reflect the data adequately. We focus explicitly on both the exploration of the data--to assure their quality and to show the importance of checking it carefully prior to conducting the statistical tests--and the validation of the resulting models. Hence, we present a common method for evaluating and testing forensic entomological data sets by using for the first time generalised additive mixed models.
Veldhuijzen van Zanten, Sophie E M; Lane, Adam; Heymans, Martijn W; Baugh, Joshua; Chaney, Brooklyn; Hoffman, Lindsey M; Doughman, Renee; Jansen, Marc H A; Sanchez, Esther; Vandertop, William P; Kaspers, Gertjan J L; van Vuurden, Dannis G; Fouladi, Maryam; Jones, Blaise V; Leach, James
2017-08-01
We aimed to perform external validation of the recently developed survival prediction model for diffuse intrinsic pontine glioma (DIPG), and discuss its utility. The DIPG survival prediction model was developed in a cohort of patients from the Netherlands, United Kingdom and Germany, registered in the SIOPE DIPG Registry, and includes age <3 years, longer symptom duration and receipt of chemotherapy as favorable predictors, and presence of ring-enhancement on MRI as unfavorable predictor. Model performance was evaluated by analyzing the discrimination and calibration abilities. External validation was performed using an unselected cohort from the International DIPG Registry, including patients from United States, Canada, Australia and New Zealand. Basic comparison with the results of the original study was performed using descriptive statistics, and univariate- and multivariable regression analyses in the validation cohort. External validation was assessed following a variety of analyses described previously. Baseline patient characteristics and results from the regression analyses were largely comparable. Kaplan-Meier curves of the validation cohort reproduced separated groups of standard (n = 39), intermediate (n = 125), and high-risk (n = 78) patients. This discriminative ability was confirmed by similar values for the hazard ratios across these risk groups. The calibration curve in the validation cohort showed a symmetric underestimation of the predicted survival probabilities. In this external validation study, we demonstrate that the DIPG survival prediction model has acceptable cross-cohort calibration and is able to discriminate patients with short, average, and increased survival. We discuss how this clinico-radiological model may serve a useful role in current clinical practice.
Suzuki, Hideaki; Tabata, Takahisa; Koizumi, Hiroki; Hohchi, Nobusuke; Takeuchi, Shoko; Kitamura, Takuro; Fujino, Yoshihisa; Ohbuchi, Toyoaki
2014-12-01
This study aimed to create a multiple regression model for predicting hearing outcomes of idiopathic sudden sensorineural hearing loss (ISSNHL). The participants were 205 consecutive patients (205 ears) with ISSNHL (hearing level ≥ 40 dB, interval between onset and treatment ≤ 30 days). They received systemic steroid administration combined with intratympanic steroid injection. Data were examined by simple and multiple regression analyses. Three hearing indices (percentage hearing improvement, hearing gain, and posttreatment hearing level [HLpost]) and 7 prognostic factors (age, days from onset to treatment, initial hearing level, initial hearing level at low frequencies, initial hearing level at high frequencies, presence of vertigo, and contralateral hearing level) were included in the multiple regression analysis as dependent and explanatory variables, respectively. In the simple regression analysis, the percentage hearing improvement, hearing gain, and HLpost showed significant correlation with 2, 5, and 6 of the 7 prognostic factors, respectively. The multiple correlation coefficients were 0.396, 0.503, and 0.714 for the percentage hearing improvement, hearing gain, and HLpost, respectively. Predicted values of HLpost calculated by the multiple regression equation were reliable with 70% probability with a 40-dB-width prediction interval. Prediction of HLpost by the multiple regression model may be useful to estimate the hearing prognosis of ISSNHL. © The Author(s) 2014.
Sampson, Maureen L; Gounden, Verena; van Deventer, Hendrik E; Remaley, Alan T
2016-02-01
The main drawback of the periodic analysis of quality control (QC) material is that test performance is not monitored in time periods between QC analyses, potentially leading to the reporting of faulty test results. The objective of this study was to develop a patient based QC procedure for the more timely detection of test errors. Results from a Chem-14 panel measured on the Beckman LX20 analyzer were used to develop the model. Each test result was predicted from the other 13 members of the panel by multiple regression, which resulted in correlation coefficients between the predicted and measured result of >0.7 for 8 of the 14 tests. A logistic regression model, which utilized the measured test result, the predicted test result, the day of the week and time of day, was then developed for predicting test errors. The output of the logistic regression was tallied by a daily CUSUM approach and used to predict test errors, with a fixed specificity of 90%. The mean average run length (ARL) before error detection by CUSUM-Logistic Regression (CSLR) was 20 with a mean sensitivity of 97%, which was considerably shorter than the mean ARL of 53 (sensitivity 87.5%) for a simple prediction model that only used the measured result for error detection. A CUSUM-Logistic Regression analysis of patient laboratory data can be an effective approach for the rapid and sensitive detection of clinical laboratory errors. Published by Elsevier Inc.
Analyzing industrial energy use through ordinary least squares regression models
NASA Astrophysics Data System (ADS)
Golden, Allyson Katherine
Extensive research has been performed using regression analysis and calibrated simulations to create baseline energy consumption models for residential buildings and commercial institutions. However, few attempts have been made to discuss the applicability of these methodologies to establish baseline energy consumption models for industrial manufacturing facilities. In the few studies of industrial facilities, the presented linear change-point and degree-day regression analyses illustrate ideal cases. It follows that there is a need in the established literature to discuss the methodologies and to determine their applicability for establishing baseline energy consumption models of industrial manufacturing facilities. The thesis determines the effectiveness of simple inverse linear statistical regression models when establishing baseline energy consumption models for industrial manufacturing facilities. Ordinary least squares change-point and degree-day regression methods are used to create baseline energy consumption models for nine different case studies of industrial manufacturing facilities located in the southeastern United States. The influence of ambient dry-bulb temperature and production on total facility energy consumption is observed. The energy consumption behavior of industrial manufacturing facilities is only sometimes sufficiently explained by temperature, production, or a combination of the two variables. This thesis also provides methods for generating baseline energy models that are straightforward and accessible to anyone in the industrial manufacturing community. The methods outlined in this thesis may be easily replicated by anyone that possesses basic spreadsheet software and general knowledge of the relationship between energy consumption and weather, production, or other influential variables. With the help of simple inverse linear regression models, industrial manufacturing facilities may better understand their energy consumption and production behavior, and identify opportunities for energy and cost savings. This thesis study also utilizes change-point and degree-day baseline energy models to disaggregate facility annual energy consumption into separate industrial end-user categories. The baseline energy model provides a suitable and economical alternative to sub-metering individual manufacturing equipment. One case study describes the conjoined use of baseline energy models and facility information gathered during a one-day onsite visit to perform an end-point energy analysis of an injection molding facility conducted by the Alabama Industrial Assessment Center. Applying baseline regression model results to the end-point energy analysis allowed the AIAC to better approximate the annual energy consumption of the facility's HVAC system.
Selapa, N W; Nephawe, K A; Maiwashe, A; Norris, D
2012-02-08
The aim of this study was to estimate genetic parameters for body weights of individually fed beef bulls measured at centralized testing stations in South Africa using random regression models. Weekly body weights of Bonsmara bulls (N = 2919) tested between 1999 and 2003 were available for the analyses. The model included a fixed regression of the body weights on fourth-order orthogonal Legendre polynomials of the actual days on test (7, 14, 21, 28, 35, 42, 49, 56, 63, 70, 77, and 84) for starting age and contemporary group effects. Random regressions on fourth-order orthogonal Legendre polynomials of the actual days on test were included for additive genetic effects and additional uncorrelated random effects of the weaning-herd-year and the permanent environment of the animal. Residual effects were assumed to be independently distributed with heterogeneous variance for each test day. Variance ratios for additive genetic, permanent environment and weaning-herd-year for weekly body weights at different test days ranged from 0.26 to 0.29, 0.37 to 0.44 and 0.26 to 0.34, respectively. The weaning-herd-year was found to have a significant effect on the variation of body weights of bulls despite a 28-day adjustment period. Genetic correlations amongst body weights at different test days were high, ranging from 0.89 to 1.00. Heritability estimates were comparable to literature using multivariate models. Therefore, random regression model could be applied in the genetic evaluation of body weight of individually fed beef bulls in South Africa.
ERIC Educational Resources Information Center
Luna, Andrew L.
2007-01-01
The purpose of this study was to determine if a market ratio factor was a better predictor of faculty salaries than the use of k-1 dummy variables representing the various disciplines. This study used two multiple regression analyses to develop an explanatory model to determine which model might best explain faculty salaries. A total of 20 out of…
Keshavarzi, Sareh; Ayatollahi, Seyyed Mohammad Taghi; Zare, Najaf; Pakfetrat, Maryam
2012-01-01
BACKGROUND. In many studies with longitudinal data, time-dependent covariates can only be measured intermittently (not at all observation times), and this presents difficulties for standard statistical analyses. This situation is common in medical studies, and methods that deal with this challenge would be useful. METHODS. In this study, we performed the seemingly unrelated regression (SUR) based models, with respect to each observation time in longitudinal data with intermittently observed time-dependent covariates and further compared these models with mixed-effect regression models (MRMs) under three classic imputation procedures. Simulation studies were performed to compare the sample size properties of the estimated coefficients for different modeling choices. RESULTS. In general, the proposed models in the presence of intermittently observed time-dependent covariates showed a good performance. However, when we considered only the observed values of the covariate without any imputations, the resulted biases were greater. The performances of the proposed SUR-based models in comparison with MRM using classic imputation methods were nearly similar with approximately equal amounts of bias and MSE. CONCLUSION. The simulation study suggests that the SUR-based models work as efficiently as MRM in the case of intermittently observed time-dependent covariates. Thus, it can be used as an alternative to MRM.
Maloney, Kelly O.; Schmid, Matthias; Weller, Donald E.
2012-01-01
Issues with ecological data (e.g. non-normality of errors, nonlinear relationships and autocorrelation of variables) and modelling (e.g. overfitting, variable selection and prediction) complicate regression analyses in ecology. Flexible models, such as generalized additive models (GAMs), can address data issues, and machine learning techniques (e.g. gradient boosting) can help resolve modelling issues. Gradient boosted GAMs do both. Here, we illustrate the advantages of this technique using data on benthic macroinvertebrates and fish from 1573 small streams in Maryland, USA.
Length bias correction in gene ontology enrichment analysis using logistic regression.
Mi, Gu; Di, Yanming; Emerson, Sarah; Cumbie, Jason S; Chang, Jeff H
2012-01-01
When assessing differential gene expression from RNA sequencing data, commonly used statistical tests tend to have greater power to detect differential expression of genes encoding longer transcripts. This phenomenon, called "length bias", will influence subsequent analyses such as Gene Ontology enrichment analysis. In the presence of length bias, Gene Ontology categories that include longer genes are more likely to be identified as enriched. These categories, however, are not necessarily biologically more relevant. We show that one can effectively adjust for length bias in Gene Ontology analysis by including transcript length as a covariate in a logistic regression model. The logistic regression model makes the statistical issue underlying length bias more transparent: transcript length becomes a confounding factor when it correlates with both the Gene Ontology membership and the significance of the differential expression test. The inclusion of the transcript length as a covariate allows one to investigate the direct correlation between the Gene Ontology membership and the significance of testing differential expression, conditional on the transcript length. We present both real and simulated data examples to show that the logistic regression approach is simple, effective, and flexible.
Szekér, Szabolcs; Vathy-Fogarassy, Ágnes
2018-01-01
Logistic regression based propensity score matching is a widely used method in case-control studies to select the individuals of the control group. This method creates a suitable control group if all factors affecting the output variable are known. However, if relevant latent variables exist as well, which are not taken into account during the calculations, the quality of the control group is uncertain. In this paper, we present a statistics-based research in which we try to determine the relationship between the accuracy of the logistic regression model and the uncertainty of the dependent variable of the control group defined by propensity score matching. Our analyses show that there is a linear correlation between the fit of the logistic regression model and the uncertainty of the output variable. In certain cases, a latent binary explanatory variable can result in a relative error of up to 70% in the prediction of the outcome variable. The observed phenomenon calls the attention of analysts to an important point, which must be taken into account when deducting conclusions.
Azagba, Sunday; Asbridge, Mark; Langille, Donald B
2014-12-01
School connectedness (SC) is associated with decreased student risk behavior and better health and social outcomes. While a considerable body of research has examined the factors associated with SC, there is limited evidence about the particular role of religiosity in shaping levels of SC. Employing data reported by junior and senior high school students from Atlantic Canada, this study examines whether religiosity is positively associated with SC and whether such associations differ by gender. We tested the association between SC and religiosity using a random intercept multilevel logistic regression. The between-school variability in SC was first determined by our estimating a null or empty model; three different model specifications that included covariates were estimated: in Model 1 we adjusted for gender, age, academic performance, parental education, and living arrangement; in Model 2 for sensation seeking and subjective social status in addition to Model 1 variables; and in Model 3 we added substance use to the analysis. Our multilevel regression analyses showed that religiosity was protectively associated with lower SC across the three model specifications when both genders were examined together. In gender-stratified analyses we found similar protective associations of religiosity, with lower SC for both males and females in all three models. Given the overwhelming positive impact of SC on a range of health, social and school outcomes, it is important to understand the role of religiosity, among other factors, that may be modified to enhance student's connectedness to school.
Riley, Richard D; Ensor, Joie; Jackson, Dan; Burke, Danielle L
2017-01-01
Many meta-analysis models contain multiple parameters, for example due to multiple outcomes, multiple treatments or multiple regression coefficients. In particular, meta-regression models may contain multiple study-level covariates, and one-stage individual participant data meta-analysis models may contain multiple patient-level covariates and interactions. Here, we propose how to derive percentage study weights for such situations, in order to reveal the (otherwise hidden) contribution of each study toward the parameter estimates of interest. We assume that studies are independent, and utilise a decomposition of Fisher's information matrix to decompose the total variance matrix of parameter estimates into study-specific contributions, from which percentage weights are derived. This approach generalises how percentage weights are calculated in a traditional, single parameter meta-analysis model. Application is made to one- and two-stage individual participant data meta-analyses, meta-regression and network (multivariate) meta-analysis of multiple treatments. These reveal percentage study weights toward clinically important estimates, such as summary treatment effects and treatment-covariate interactions, and are especially useful when some studies are potential outliers or at high risk of bias. We also derive percentage study weights toward methodologically interesting measures, such as the magnitude of ecological bias (difference between within-study and across-study associations) and the amount of inconsistency (difference between direct and indirect evidence in a network meta-analysis).
On the use of log-transformation vs. nonlinear regression for analyzing biological power laws
Xiao, X.; White, E.P.; Hooten, M.B.; Durham, S.L.
2011-01-01
Power-law relationships are among the most well-studied functional relationships in biology. Recently the common practice of fitting power laws using linear regression (LR) on log-transformed data has been criticized, calling into question the conclusions of hundreds of studies. It has been suggested that nonlinear regression (NLR) is preferable, but no rigorous comparison of these two methods has been conducted. Using Monte Carlo simulations, we demonstrate that the error distribution determines which method performs better, with NLR better characterizing data with additive, homoscedastic, normal error and LR better characterizing data with multiplicative, heteroscedastic, lognormal error. Analysis of 471 biological power laws shows that both forms of error occur in nature. While previous analyses based on log-transformation appear to be generally valid, future analyses should choose methods based on a combination of biological plausibility and analysis of the error distribution. We provide detailed guidelines and associated computer code for doing so, including a model averaging approach for cases where the error structure is uncertain. ?? 2011 by the Ecological Society of America.
When ab ≠ c - c': published errors in the reports of single-mediator models.
Petrocelli, John V; Clarkson, Joshua J; Whitmire, Melanie B; Moon, Paul E
2013-06-01
Accurate reports of mediation analyses are critical to the assessment of inferences related to causality, since these inferences are consequential for both the evaluation of previous research (e.g., meta-analyses) and the progression of future research. However, upon reexamination, approximately 15% of published articles in psychology contain at least one incorrect statistical conclusion (Bakker & Wicherts, Behavior research methods, 43, 666-678 2011), disparities that beget the question of inaccuracy in mediation reports. To quantify this question of inaccuracy, articles reporting standard use of single-mediator models in three high-impact journals in personality and social psychology during 2011 were examined. More than 24% of the 156 models coded failed an equivalence test (i.e., ab = c - c'), suggesting that one or more regression coefficients in mediation analyses are frequently misreported. The authors cite common sources of errors, provide recommendations for enhanced accuracy in reports of single-mediator models, and discuss implications for alternative methods.
Lamborn, Susie D; Felbab, Amanda J
2003-10-01
This study evaluated both the parenting styles and family ecologies models with interview responses from 93 14- and 15-year-old African-American adolescents. The parenting styles model was more strongly represented in both open-ended and structured interview responses. Using variables from the structured interview as independent variables, regression analyses contrasted each model with a joint model for predicting self-esteem, self-reliance, work orientation, and ethnic identity. Overall, the findings suggest that a joint model that combines elements from both models provides a richer understanding of African-American families.
Jacob, Benjamin J; Krapp, Fiorella; Ponce, Mario; Gottuzzo, Eduardo; Griffith, Daniel A; Novak, Robert J
2010-05-01
Spatial autocorrelation is problematic for classical hierarchical cluster detection tests commonly used in multi-drug resistant tuberculosis (MDR-TB) analyses as considerable random error can occur. Therefore, when MDRTB clusters are spatially autocorrelated the assumption that the clusters are independently random is invalid. In this research, a product moment correlation coefficient (i.e., the Moran's coefficient) was used to quantify local spatial variation in multiple clinical and environmental predictor variables sampled in San Juan de Lurigancho, Lima, Peru. Initially, QuickBird 0.61 m data, encompassing visible bands and the near infra-red bands, were selected to synthesize images of land cover attributes of the study site. Data of residential addresses of individual patients with smear-positive MDR-TB were geocoded, prevalence rates calculated and then digitally overlaid onto the satellite data within a 2 km buffer of 31 georeferenced health centers, using a 10 m2 grid-based algorithm. Geographical information system (GIS)-gridded measurements of each health center were generated based on preliminary base maps of the georeferenced data aggregated to block groups and census tracts within each buffered area. A three-dimensional model of the study site was constructed based on a digital elevation model (DEM) to determine terrain covariates associated with the sampled MDR-TB covariates. Pearson's correlation was used to evaluate the linear relationship between the DEM and the sampled MDR-TB data. A SAS/GIS(R) module was then used to calculate univariate statistics and to perform linear and non-linear regression analyses using the sampled predictor variables. The estimates generated from a global autocorrelation analyses were then spatially decomposed into empirical orthogonal bases using a negative binomial regression with a non-homogeneous mean. Results of the DEM analyses indicated a statistically non-significant, linear relationship between georeferenced health centers and the sampled covariate elevation. The data exhibited positive spatial autocorrelation and the decomposition of Moran's coefficient into uncorrelated, orthogonal map pattern components revealed global spatial heterogeneities necessary to capture latent autocorrelation in the MDR-TB model. It was thus shown that Poisson regression analyses and spatial eigenvector mapping can elucidate the mechanics of MDR-TB transmission by prioritizing clinical and environmental-sampled predictor variables for identifying high risk populations.
Time Advice and Learning Questions in Computer Simulations
ERIC Educational Resources Information Center
Rey, Gunter Daniel
2011-01-01
Students (N = 101) used an introductory text and a computer simulation to learn fundamental concepts about statistical analyses (e.g., analysis of variance, regression analysis and General Linear Model). Each learner was randomly assigned to one cell of a 2 (with or without time advice) x 3 (with learning questions and corrective feedback, with…
ERIC Educational Resources Information Center
Fulmer, Gavin W.; Polikoff, Morgan S.
2014-01-01
An essential component in school accountability efforts is for assessments to be well-aligned with the standards or curriculum they are intended to measure. However, relatively little prior research has explored methods to determine statistical significance of alignment or misalignment. This study explores analyses of alignment as a special case…
Using Faculty Characteristics to Predict Attitudes toward Developmental Education
ERIC Educational Resources Information Center
Sides, Meredith Louise Carr
2017-01-01
The study adapted Astin's I-E-O model and utilized multiple regression analyses to predict faculty attitudes toward developmental education. The study utilized a cross-sectional survey design to survey faculty members at 27 different higher education institutions in the state of Alabama. The survey instrument was a self-designed questionnaire that…
ERIC Educational Resources Information Center
Reeve, Charlie L.; Basalik, Debra
2011-01-01
The current study examines the degree to which state intellectual capital, state religiosity and reproductive health form a meaningful nexus of ecological relations. Though the specific magnitude of effects vary across outcomes, results from hierarchical regression analyses were consistent with the hypothesized path model indicating that a state's…
Do Nondomestic Undergraduates Choose a Major Field in Order to Maximize Grade Point Averages?
ERIC Educational Resources Information Center
Bergman, Matthew E.; Fass-Holmes, Barry
2016-01-01
The authors investigated whether undergraduates attending an American West Coast public university who were not U.S. citizens (nondomestic) maximized their grade point averages (GPA) through their choice of major field. Multiple regression hierarchical linear modeling analyses showed that major field's effect size was small for these…
Determinants of Student Attitudes toward Team Exams
ERIC Educational Resources Information Center
Reinig, Bruce A.; Horowitz, Ira; Whittenburg, Gene
2014-01-01
We examine how student attitudes toward their group, learning method, and perceived development of professional skills are initially shaped and subsequently evolve through multiple uses of team exams. Using a Tobit regression model to analyse a sequence of 10 team quizzes given in a graduate-level tax accounting course, we show that there is an…
Predicting postfire Douglas-fir beetle attacks and tree mortality in the northern Rocky Mountains
Sharon Hood; Barbara Bentz
2007-01-01
Douglas-fir (Pseudotsuga menziesii (Mirb.) Franco) were monitored for 4 years following three wildfires. Logistic regression analyses were used to develop models predicting the probability of attack by Douglas-fir beetle (Dendroctonus pseudotsugae Hopkins, 1905) and the probability of Douglas-fir mortality within 4 years following...
Effects of Morphological Family Size for Young Readers
ERIC Educational Resources Information Center
Perdijk, Kors; Schreuder, Robert; Baayen, R. Harald; Verhoeven, Ludo
2012-01-01
Dutch children, from the second and fourth grade of primary school, were each given a visual lexical decision test on 210 Dutch monomorphemic words. After removing words not recognized by a majority of the younger group, (lexical) decisions were analysed by mixed-model regression methods to see whether morphological Family Size influenced decision…
Relocation and Alienation: Support for Stokols' Social Psychological Theory.
ERIC Educational Resources Information Center
Hwalek, Melanie; Firestone, Ira
The extent to which Stokols' model of alienation could be applied to the relocation of elderly into age-homogeneous communal residences was investigated by interviewing 50 residents of two homes for the aged and one senior citizen apartment complex. Three types of regression analyses were performed to test the hypothesis that simultaneously…
What Is the Relationship between Teacher Quality and Student Achievement? An Exploratory Study
ERIC Educational Resources Information Center
Stronge, James H.; Ward, Thomas J.; Tucker, Pamela D.; Hindman, Jennifer L.
2007-01-01
The major purpose of the study was to examine what constitutes effective teaching as defined by measured increases in student learning with a focus on the instructional behaviors and practices. Ordinary least squares (OLS) regression analyses and hierarchical linear modeling (HLM) were used to identify teacher effectiveness levels while…
ERIC Educational Resources Information Center
McCarthy, Christopher J.; Fouladi, Rachel T.; Juncker, Brian D.; Matheny, Kenneth B.
2006-01-01
The association of protective resources, personality variables, life events, and gender with anxiety and depression was examined with university students. Building on regression analyses, a structural equation model was generated with good fit, indicating that with respect to both anxiety and depression, negative life events and coping resources…
Attitudes towards Participation in Business Development Programmes: An Ethnic Comparison in Sweden
ERIC Educational Resources Information Center
Abbasian, Saeid; Yazdanfar, Darush
2015-01-01
Purpose: The aim of the study is to investigate whether there are any differences between the attitudes towards participation in development programmes of entrepreneurs who are immigrants and those who are native-born. Design/methodology/approach: Several statistical methods, including a binary logistic regression model, were used to analyse a…
Playing the Recording Once or Twice: Effects on Listening Test Performances
ERIC Educational Resources Information Center
Ruhm, Richard; Leitner-Jones, Claire; Kulmhofer, Andrea; Kiefer, Thomas; Mlakar, Heike; Itzlinger-Bruneforth, Ursula
2016-01-01
Much debate surrounds the issue of whether allowing candidates to listen to recordings twice is more desirable in language tests than offering just one opportunity. Using regression models, this study investigates, analyses and interconnects both item difficulty and stimulus length in relation to the frequency of stimulus presentation and its…
The Use of Structure Coefficients to Address Multicollinearity in Sport and Exercise Science
ERIC Educational Resources Information Center
Yeatts, Paul E.; Barton, Mitch; Henson, Robin K.; Martin, Scott B.
2017-01-01
A common practice in general linear model (GLM) analyses is to interpret regression coefficients (e.g., standardized ß weights) as indicators of variable importance. However, focusing solely on standardized beta weights may provide limited or erroneous information. For example, ß weights become increasingly unreliable when predictor variables are…
Adaptive variation in Pinus ponderosa from Intermountain regions. II. Middle Columbia River system
Gerald Rehfeldt
1986-01-01
Seedling populations were grown and compared in common environments. Statistical analyses detected genetic differences between populations for numerous traits reflecting growth potential and periodicity of shoot elongation. Multiple regression models described an adaptive landscape in which populations from low elevations have a high growth potential while those from...
Family and Cultural Predictors of Depression among Samoan American Middle and High School Students
ERIC Educational Resources Information Center
Yeh, Christine J.; Borrero, Noah E.; Tito, Patsy
2013-01-01
This study investigated family intergenerational conflict and collective self-esteem as predictors of depression in a sample of 128 Samoan middle and high school students. Simultaneous regression analyses revealed that each independent variable significantly contributed to an overall model that accounted for 13% of the variance in depression.…
ERIC Educational Resources Information Center
Musekamp, Frank; Pearce, Jacob
2016-01-01
The goal of this paper is to examine the relationship of student motivation and achievement in low-stakes assessment contexts. Using Pearson product-moment correlations and hierarchical linear regression modelling to analyse data on 794 tertiary students who undertook a low-stakes engineering mechanics assessment (along with the questionnaire of…
1981-04-28
on initial doses. Residual doses are determined through an automiated procedure that utilizes raw data in regression analyses to fit space-time models...show their relationship to the observer positions. The computer-calculated doses do not reflect the presence of the human body in the radiological
Predictors of Service Utilization among Youth Diagnosed with Mood Disorders
ERIC Educational Resources Information Center
Mendenhall, Amy N.
2012-01-01
In this study, I investigated patterns and predictors of service utilization for children with mood disorders. The Behavioral Model for Health Care Utilization was used as an organizing framework for identifying predictors of the number and quality of services utilized. Hierarchical regression was used in secondary data analyses of the…
ERIC Educational Resources Information Center
Eren, Altay
2012-01-01
This study aimed to examine the mediating role of prospective teachers' academic optimism in the relationship between their future time perspective and professional plans about teaching. A total of 396 prospective teachers voluntarily participated in the study. Correlation, regression, and structural equation modeling analyses were conducted in…
Wood, Jeffrey J.; Lynne, Sarah D.; Langer, David A.; Wood, Patricia A.; Clark, Shaunna L.; Eddy, J. Mark; Ialongo, Nicholas
2011-01-01
This study tests a model of reciprocal influences between absenteeism and youth psychopathology using three longitudinal datasets (Ns= 20745, 2311, and 671). Participants in 1st through 12th grades were interviewed annually or bi-annually. Measures of psychopathology include self-, parent-, and teacher-report questionnaires. Structural cross-lagged regression models were tested. In a nationally representative dataset (Add Health), middle school students with relatively greater absenteeism at study year 1 tended towards increased depression and conduct problems in study year 2, over and above the effects of autoregressive associations and demographic covariates. The opposite direction of effects was found for both middle and high school students. Analyses with two regionally representative datasets were also partially supportive. Longitudinal links were more evident in adolescence than in childhood. PMID:22188462
Khanfar, Mohammad A; Banat, Fahmy; Alabed, Shada; Alqtaishat, Saja
2017-02-01
High expression of Nek2 has been detected in several types of cancer and it represents a novel target for human cancer. In the current study, structure-based pharmacophore modeling combined with multiple linear regression (MLR)-based QSAR analyses was applied to disclose the structural requirements for NEK2 inhibition. Generated pharmacophoric models were initially validated with receiver operating characteristic (ROC) curve, and optimum models were subsequently implemented in QSAR modeling with other physiochemical descriptors. QSAR-selected models were implied as 3D search filters to mine the National Cancer Institute (NCI) database for novel NEK2 inhibitors, whereas the associated QSAR model prioritized the bioactivities of captured hits for in vitro evaluation. Experimental validation identified several potent NEK2 inhibitors of novel structural scaffolds. The most potent captured hit exhibited an [Formula: see text] value of 237 nM.
Martelo-Vidal, M J; Vázquez, M
2014-09-01
Spectral analysis is a quick and non-destructive method to analyse wine. In this work, trans-resveratrol, oenin, malvin, catechin, epicatechin, quercetin and syringic acid were determined in commercial red wines from DO Rías Baixas and DO Ribeira Sacra (Spain) by UV-VIS-NIR spectroscopy. Calibration models were developed using principal component regression (PCR) or partial least squares (PLS) regression. HPLC was used as reference method. The results showed that reliable PLS models were obtained to quantify all polyphenols for Rías Baixas wines. For Ribeira Sacra, feasible models were obtained to determine quercetin, epicatechin, oenin and syringic acid. PCR calibration models showed worst reliable of prediction than PLS models. For red wines from mencía grapes, feasible models were obtained for catechin and oenin, regardless the geographical origin. The results obtained demonstrate that UV-VIS-NIR spectroscopy can be used to determine individual polyphenolic compounds in red wines. Copyright © 2014 Elsevier Ltd. All rights reserved.
John W. Edwards; Susan C. Loeb; David C. Guynn
1994-01-01
Multiple regression and use-availability analyses are two methods for examining habitat selection. Use-availability analysis is commonly used to evaluate macrohabitat selection whereas multiple regression analysis can be used to determine microhabitat selection. We compared these techniques using behavioral observations (n = 5534) and telemetry locations (n = 2089) of...
Multidrug-resistant pulmonary tuberculosis in Los Altos, Selva and Norte regions, Chiapas, Mexico.
Sánchez-Pérez, H J; Díaz-Vázquez, A; Nájera-Ortiz, J C; Balandrano, S; Martín-Mateo, M
2010-01-01
To analyse the proportion of multidrug-resistant tuberculosis (MDR-TB) in cultures performed during the period 2000-2002 in Los Altos, Selva and Norte regions, Chiapas, Mexico, and to analyse MDR-TB in terms of clinical and sociodemographic indicators. Cross-sectional study of patients with pulmonary tuberculosis (PTB) from the above regions. Drug susceptibility testing results from two research projects were analysed, as were those of routine sputum samples sent in by health personnel for processing (n = 114). MDR-TB was analysed in terms of the various variables of interest using bivariate tests of association and logistic regression. The proportion of primary MDR-TB was 4.6% (2 of 43), that of secondary MDR-TB was 29.2% (7/24), while among those whose history of treatment was unknown the proportion was 14.3% (3/21). According to the logistic regression model, the variables most highly associated with MDR-TB were as follows: having received anti-tuberculosis treatment previously, cough of >3 years' duration and not being indigenous. The high proportion of MDR cases found in the regions studied shows that it is necessary to significantly improve the control and surveillance of PTB.
Kim, Joohan; Seok, Jeong-Ho; Choi, Kang; Jon, Duk-In; Hong, Hyun Ju; Hong, Narei; Lee, Eunjeong
2015-11-01
Early life stress (ELS) may induce long-lasting psychological complications in adulthood. The protective role of resilience against the development of psychopathology is also important. The purpose of this study was to investigate the relationships among ELS, resilience, depression, anxiety, and aggression in young adults. Four hundred sixty-one army inductees gave written informed consent and participated in this study. We assessed psychopathology using the Korea Military Personality Test, ELS using the Childhood Abuse Experience Scale, and resilience with the resilience scale. Analyses of variance, correlation analyses, and hierarchical multiple linear regression analyses were conducted for statistical analyses. The regression model explained 35.8%, 41.0%, and 23.3% of the total variance in the depression, anxiety, and aggression indices, respectively. We can find that even though ELS experience is positively associated with depression, anxiety, and aggression, resilience may have significant attenuating effect against the ELS effect on severity of these psychopathologies. Emotion regulation showed the most beneficial effect among resilience factors on reducing severity of psychopathologies. To improve mental health for young adults, ELS assessment and resilience enhancement program should be considered.
Kim, Joohan; Choi, Kang; Jon, Duk-In; Hong, Hyun Ju; Hong, Narei; Lee, Eunjeong
2015-01-01
Early life stress (ELS) may induce long-lasting psychological complications in adulthood. The protective role of resilience against the development of psychopathology is also important. The purpose of this study was to investigate the relationships among ELS, resilience, depression, anxiety, and aggression in young adults. Four hundred sixty-one army inductees gave written informed consent and participated in this study. We assessed psychopathology using the Korea Military Personality Test, ELS using the Childhood Abuse Experience Scale, and resilience with the resilience scale. Analyses of variance, correlation analyses, and hierarchical multiple linear regression analyses were conducted for statistical analyses. The regression model explained 35.8%, 41.0%, and 23.3% of the total variance in the depression, anxiety, and aggression indices, respectively. We can find that even though ELS experience is positively associated with depression, anxiety, and aggression, resilience may have significant attenuating effect against the ELS effect on severity of these psychopathologies. Emotion regulation showed the most beneficial effect among resilience factors on reducing severity of psychopathologies. To improve mental health for young adults, ELS assessment and resilience enhancement program should be considered. PMID:26539013
CyTOF workflow: differential discovery in high-throughput high-dimensional cytometry datasets
Nowicka, Malgorzata; Krieg, Carsten; Weber, Lukas M.; Hartmann, Felix J.; Guglietta, Silvia; Becher, Burkhard; Levesque, Mitchell P.; Robinson, Mark D.
2017-01-01
High dimensional mass and flow cytometry (HDCyto) experiments have become a method of choice for high throughput interrogation and characterization of cell populations.Here, we present an R-based pipeline for differential analyses of HDCyto data, largely based on Bioconductor packages. We computationally define cell populations using FlowSOM clustering, and facilitate an optional but reproducible strategy for manual merging of algorithm-generated clusters. Our workflow offers different analysis paths, including association of cell type abundance with a phenotype or changes in signaling markers within specific subpopulations, or differential analyses of aggregated signals. Importantly, the differential analyses we show are based on regression frameworks where the HDCyto data is the response; thus, we are able to model arbitrary experimental designs, such as those with batch effects, paired designs and so on. In particular, we apply generalized linear mixed models to analyses of cell population abundance or cell-population-specific analyses of signaling markers, allowing overdispersion in cell count or aggregated signals across samples to be appropriately modeled. To support the formal statistical analyses, we encourage exploratory data analysis at every step, including quality control (e.g. multi-dimensional scaling plots), reporting of clustering results (dimensionality reduction, heatmaps with dendrograms) and differential analyses (e.g. plots of aggregated signals). PMID:28663787
Damman, Olga C; Stubbe, Janine H; Hendriks, Michelle; Arah, Onyebuchi A; Spreeuwenberg, Peter; Delnoij, Diana M J; Groenewegen, Peter P
2009-04-01
Ratings on the quality of healthcare from the consumer's perspective need to be adjusted for consumer characteristics to ensure fair and accurate comparisons between healthcare providers or health plans. Although multilevel analysis is already considered an appropriate method for analyzing healthcare performance data, it has rarely been used to assess case-mix adjustment of such data. The purpose of this article is to investigate whether multilevel regression analysis is a useful tool to detect case-mix adjusters in consumer assessment of healthcare. We used data on 11,539 consumers from 27 Dutch health plans, which were collected using the Dutch Consumer Quality Index health plan instrument. We conducted multilevel regression analyses of consumers' responses nested within health plans to assess the effects of consumer characteristics on consumer experience. We compared our findings to the results of another methodology: the impact factor approach, which combines the predictive effect of each case-mix variable with its heterogeneity across health plans. Both multilevel regression and impact factor analyses showed that age and education were the most important case-mix adjusters for consumer experience and ratings of health plans. With the exception of age, case-mix adjustment had little impact on the ranking of health plans. On both theoretical and practical grounds, multilevel modeling is useful for adequate case-mix adjustment and analysis of performance ratings.
Immortal time bias in observational studies of time-to-event outcomes.
Jones, Mark; Fowler, Robert
2016-12-01
The purpose of the study is to show, through simulation and example, the magnitude and direction of immortal time bias when an inappropriate analysis is used. We compare 4 methods of analysis for observational studies of time-to-event outcomes: logistic regression, standard Cox model, landmark analysis, and time-dependent Cox model using an example data set of patients critically ill with influenza and a simulation study. For the example data set, logistic regression, standard Cox model, and landmark analysis all showed some evidence that treatment with oseltamivir provides protection from mortality in patients critically ill with influenza. However, when the time-dependent nature of treatment exposure is taken account of using a time-dependent Cox model, there is no longer evidence of a protective effect of treatment. The simulation study showed that, under various scenarios, the time-dependent Cox model consistently provides unbiased treatment effect estimates, whereas standard Cox model leads to bias in favor of treatment. Logistic regression and landmark analysis may also lead to bias. To minimize the risk of immortal time bias in observational studies of survival outcomes, we strongly suggest time-dependent exposures be included as time-dependent variables in hazard-based analyses. Copyright © 2016 Elsevier Inc. All rights reserved.
Vesicular stomatitis forecasting based on Google Trends
Lu, Yi; Zhou, GuangYa; Chen, Qin
2018-01-01
Background Vesicular stomatitis (VS) is an important viral disease of livestock. The main feature of VS is irregular blisters that occur on the lips, tongue, oral mucosa, hoof crown and nipple. Humans can also be infected with vesicular stomatitis and develop meningitis. This study analyses 2014 American VS outbreaks in order to accurately predict vesicular stomatitis outbreak trends. Methods American VS outbreaks data were collected from OIE. The data for VS keywords were obtained by inputting 24 disease-related keywords into Google Trends. After calculating the Pearson and Spearman correlation coefficients, it was found that there was a relationship between outbreaks and keywords derived from Google Trends. Finally, the predicted model was constructed based on qualitative classification and quantitative regression. Results For the regression model, the Pearson correlation coefficients between the predicted outbreaks and actual outbreaks are 0.953 and 0.948, respectively. For the qualitative classification model, we constructed five classification predictive models and chose the best classification predictive model as the result. The results showed, SN (sensitivity), SP (specificity) and ACC (prediction accuracy) values of the best classification predictive model are 78.52%,72.5% and 77.14%, respectively. Conclusion This study applied Google search data to construct a qualitative classification model and a quantitative regression model. The results show that the method is effective and that these two models obtain more accurate forecast. PMID:29385198
Yu, Peigen; Low, Mei Yin; Zhou, Weibiao
2018-01-01
In order to develop products that would be preferred by consumers, the effects of the chemical compositions of ready-to-drink green tea beverages on consumer liking were studied through regression analyses. Green tea model systems were prepared by dosing solutions of 0.1% green tea extract with differing concentrations of eight flavour keys deemed to be important for green tea aroma and taste, based on a D-optimal experimental design, before undergoing commercial sterilisation. Sensory evaluation of the green tea model system was carried out using an untrained consumer panel to obtain hedonic liking scores of the samples. Regression models were subsequently trained to objectively predict the consumer liking scores of the green tea model systems. A linear partial least squares (PLS) regression model was developed to describe the effects of the eight flavour keys on consumer liking, with a coefficient of determination (R 2 ) of 0.733, and a root-mean-square error (RMSE) of 3.53%. The PLS model was further augmented with an artificial neural network (ANN) to establish a PLS-ANN hybrid model. The established hybrid model was found to give a better prediction of consumer liking scores, based on its R 2 (0.875) and RMSE (2.41%). Copyright © 2017 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Abunama, Taher; Othman, Faridah
2017-06-01
Analysing the fluctuations of wastewater inflow rates in sewage treatment plants (STPs) is essential to guarantee a sufficient treatment of wastewater before discharging it to the environment. The main objectives of this study are to statistically analyze and forecast the wastewater inflow rates into the Bandar Tun Razak STP in Kuala Lumpur, Malaysia. A time series analysis of three years’ weekly influent data (156weeks) has been conducted using the Auto-Regressive Integrated Moving Average (ARIMA) model. Various combinations of ARIMA orders (p, d, q) have been tried to select the most fitted model, which was utilized to forecast the wastewater inflow rates. The linear regression analysis was applied to testify the correlation between the observed and predicted influents. ARIMA (3, 1, 3) model was selected with the highest significance R-square and lowest normalized Bayesian Information Criterion (BIC) value, and accordingly the wastewater inflow rates were forecasted to additional 52weeks. The linear regression analysis between the observed and predicted values of the wastewater inflow rates showed a positive linear correlation with a coefficient of 0.831.
Environmental, Spatial, and Sociodemographic Factors Associated with Nonfatal Injuries in Indonesia.
Irianti, Sri; Prasetyoputra, Puguh
2017-01-01
Background . The determinants of injuries and their reoccurrence in Indonesia are not well understood, despite their importance in the prevention of injuries. Therefore, this study seeks to investigate the environmental, spatial, and sociodemographic factors associated with the reoccurrence of injuries among Indonesian people. Methods . Data from the 2013 round of the Indonesia Baseline Health Research (IBHR 2013) were analysed using a two-part hurdle regression model. A logit regression model was chosen for the zero-hurdle part , while a zero-truncated negative binomial regression model was selected for the counts part . Odds ratio (OR) and incidence rate ratio (IRR) were the measures of association, respectively. Results . The results suggest that living in a household with distant drinking water source, residing in slum areas, residing in Eastern Indonesia, having low educational attainment, being men, and being poorer are positively related to the likelihood of experiencing injury. Moreover, being a farmer or fishermen, having low educational attainment, and being men are positively associated with the frequency of injuries. Conclusion . This study would be useful to prioritise injury prevention programs in Indonesia based on the environmental, spatial, and sociodemographic characteristics.
Estelles-Lopez, Lucia; Ropodi, Athina; Pavlidis, Dimitris; Fotopoulou, Jenny; Gkousari, Christina; Peyrodie, Audrey; Panagou, Efstathios; Nychas, George-John; Mohareb, Fady
2017-09-01
Over the past decade, analytical approaches based on vibrational spectroscopy, hyperspectral/multispectral imagining and biomimetic sensors started gaining popularity as rapid and efficient methods for assessing food quality, safety and authentication; as a sensible alternative to the expensive and time-consuming conventional microbiological techniques. Due to the multi-dimensional nature of the data generated from such analyses, the output needs to be coupled with a suitable statistical approach or machine-learning algorithms before the results can be interpreted. Choosing the optimum pattern recognition or machine learning approach for a given analytical platform is often challenging and involves a comparative analysis between various algorithms in order to achieve the best possible prediction accuracy. In this work, "MeatReg", a web-based application is presented, able to automate the procedure of identifying the best machine learning method for comparing data from several analytical techniques, to predict the counts of microorganisms responsible of meat spoilage regardless of the packaging system applied. In particularly up to 7 regression methods were applied and these are ordinary least squares regression, stepwise linear regression, partial least square regression, principal component regression, support vector regression, random forest and k-nearest neighbours. MeatReg" was tested with minced beef samples stored under aerobic and modified atmosphere packaging and analysed with electronic nose, HPLC, FT-IR, GC-MS and Multispectral imaging instrument. Population of total viable count, lactic acid bacteria, pseudomonads, Enterobacteriaceae and B. thermosphacta, were predicted. As a result, recommendations of which analytical platforms are suitable to predict each type of bacteria and which machine learning methods to use in each case were obtained. The developed system is accessible via the link: www.sorfml.com. Copyright © 2017 Elsevier Ltd. All rights reserved.
Association between Personality Traits and Sleep Quality in Young Korean Women
Kim, Han-Na; Cho, Juhee; Chang, Yoosoo; Ryu, Seungho
2015-01-01
Personality is a trait that affects behavior and lifestyle, and sleep quality is an important component of a healthy life. We analyzed the association between personality traits and sleep quality in a cross-section of 1,406 young women (from 18 to 40 years of age) who were not reporting clinically meaningful depression symptoms. Surveys were carried out from December 2011 to February 2012, using the Revised NEO Personality Inventory and the Pittsburgh Sleep Quality Index (PSQI). All analyses were adjusted for demographic and behavioral variables. We considered beta weights, structure coefficients, unique effects, and common effects when evaluating the importance of sleep quality predictors in multiple linear regression models. Neuroticism was the most important contributor to PSQI global scores in the multiple regression models. By contrast, despite being strongly correlated with sleep quality, conscientiousness had a near-zero beta weight in linear regression models, because most variance was shared with other personality traits. However, conscientiousness was the most noteworthy predictor of poor sleep quality status (PSQI≥6) in logistic regression models and individuals high in conscientiousness were least likely to have poor sleep quality, which is consistent with an OR of 0.813, with conscientiousness being protective against poor sleep quality. Personality may be a factor in poor sleep quality and should be considered in sleep interventions targeting young women. PMID:26030141
A multicentre study of air pollution exposure and childhood asthma prevalence: the ESCAPE project.
Mölter, Anna; Simpson, Angela; Berdel, Dietrich; Brunekreef, Bert; Custovic, Adnan; Cyrys, Josef; de Jongste, Johan; de Vocht, Frank; Fuertes, Elaine; Gehring, Ulrike; Gruzieva, Olena; Heinrich, Joachim; Hoek, Gerard; Hoffmann, Barbara; Klümper, Claudia; Korek, Michal; Kuhlbusch, Thomas A J; Lindley, Sarah; Postma, Dirkje; Tischer, Christina; Wijga, Alet; Pershagen, Göran; Agius, Raymond
2015-03-01
The aim of this study was to determine the effect of six traffic-related air pollution metrics (nitrogen dioxide, nitrogen oxides, particulate matter with an aerodynamic diameter <10 μm (PM10), PM2.5, coarse particulate matter and PM2.5 absorbance) on childhood asthma and wheeze prevalence in five European birth cohorts: MAAS (England, UK), BAMSE (Sweden), PIAMA (the Netherlands), GINI and LISA (both Germany, divided into north and south areas). Land-use regression models were developed for each study area and used to estimate outdoor air pollution exposure at the home address of each child. Information on asthma and current wheeze prevalence at the ages of 4-5 and 8-10 years was collected using validated questionnaires. Multiple logistic regression was used to analyse the association between pollutant exposure and asthma within each cohort. Random-effects meta-analyses were used to combine effect estimates from individual cohorts. The meta-analyses showed no significant association between asthma prevalence and air pollution exposure (e.g. adjusted OR (95%CI) for asthma at age 8-10 years and exposure at the birth address (n=10377): 1.10 (0.81-1.49) per 10 μg · m(-3) nitrogen dioxide; 0.88 (0.63-1.24) per 10 μg · m(-3) PM10; 1.23 (0.78-1.95) per 5 μg · m(-3) PM2.5). This result was consistently found in initial crude models, adjusted models and further sensitivity analyses. This study found no significant association between air pollution exposure and childhood asthma prevalence in five European birth cohorts. Copyright ©ERS 2015.
Causes of coal-miner absenteeism. Information Circular/1987
DOE Office of Scientific and Technical Information (OSTI.GOV)
Peters, R.H.; Randolph, R.F.
The Bureau of Mines report describes several significant problems associated with absenteeism among underground coal miners. The vast empirical literature on employee absenteeism is reviewed, and a conceptual model of the factors that cause absenteeism among miners is presented. Portions of the model were empirically tested by performing correlational and multiple regression analyses on data collected from a group of 64 underground coal miners. The results of these tests are presented and discussed.
Unintended pregnancy and sex education in Chile: a behavioural model.
Herold, J M; Thompson, N J; Valenzuela, M S; Morris, L
1994-10-01
This study analysed factors associated with unintended pregnancy among adolescent and young adult women in Santiago, Chile. Three variations of a behavioural model were developed. Logistic regression showed that the effect of sex education on unintended pregnancy works through the use of contraception. Other significant effects were found for variables reflecting socioeconomic status and a woman's acceptance of her sexuality. The results also suggested that labelling affects measurement of 'unintended' pregnancy.
Comparison of random regression test-day models for Polish Black and White cattle.
Strabel, T; Szyda, J; Ptak, E; Jamrozik, J
2005-10-01
Test-day milk yields of first-lactation Black and White cows were used to select the model for routine genetic evaluation of dairy cattle in Poland. The population of Polish Black and White cows is characterized by small herd size, low level of production, and relatively early peak of lactation. Several random regression models for first-lactation milk yield were initially compared using the "percentage of squared bias" criterion and the correlations between true and predicted breeding values. Models with random herd-test-date effects, fixed age-season and herd-year curves, and random additive genetic and permanent environmental curves (Legendre polynomials of different orders were used for all regressions) were chosen for further studies. Additional comparisons included analyses of the residuals and shapes of variance curves in days in milk. The low production level and early peak of lactation of the breed required the use of Legendre polynomials of order 5 to describe age-season lactation curves. For the other curves, Legendre polynomials of order 3 satisfactorily described daily milk yield variation. Fitting third-order polynomials for the permanent environmental effect made it possible to adequately account for heterogeneous residual variance at different stages of lactation.
Asquith, William H.; Roussel, Meghan C.
2007-01-01
Estimation of representative hydrographs from design storms, which are known as design hydrographs, provides for cost-effective, riskmitigated design of drainage structures such as bridges, culverts, roadways, and other infrastructure. During 2001?07, the U.S. Geological Survey (USGS), in cooperation with the Texas Department of Transportation, investigated runoff hydrographs, design storms, unit hydrographs,and watershed-loss models to enhance design hydrograph estimation in Texas. Design hydrographs ideally should mimic the general volume, peak, and shape of observed runoff hydrographs. Design hydrographs commonly are estimated in part by unit hydrographs. A unit hydrograph is defined as the runoff hydrograph that results from a unit pulse of excess rainfall uniformly distributed over the watershed at a constant rate for a specific duration. A time-distributed, watershed-loss model is required for modeling by unit hydrographs. This report develops a specific time-distributed, watershed-loss model known as an initial-abstraction, constant-loss model. For this watershed-loss model, a watershed is conceptualized to have the capacity to store or abstract an absolute depth of rainfall at and near the beginning of a storm. Depths of total rainfall less than this initial abstraction do not produce runoff. The watershed also is conceptualized to have the capacity to remove rainfall at a constant rate (loss) after the initial abstraction is satisfied. Additional rainfall inputs after the initial abstraction is satisfied contribute to runoff if the rainfall rate (intensity) is larger than the constant loss. The initial abstraction, constant-loss model thus is a two-parameter model. The initial-abstraction, constant-loss model is investigated through detailed computational and statistical analysis of observed rainfall and runoff data for 92 USGS streamflow-gaging stations (watersheds) in Texas with contributing drainage areas from 0.26 to 166 square miles. The analysis is limited to a previously described, watershed-specific, gamma distribution model of the unit hydrograph. In particular, the initial-abstraction, constant-loss model is tuned to the gamma distribution model of the unit hydrograph. A complex computational analysis of observed rainfall and runoff for the 92 watersheds was done to determine, by storm, optimal values of initial abstraction and constant loss. Optimal parameter values for a given storm were defined as those values that produced a modeled runoff hydrograph with volume equal to the observed runoff hydrograph and also minimized the residual sum of squares of the two hydrographs. Subsequently, the means of the optimal parameters were computed on a watershed-specific basis. These means for each watershed are considered the most representative, are tabulated, and are used in further statistical analyses. Statistical analyses of watershed-specific, initial abstraction and constant loss include documentation of the distribution of each parameter using the generalized lambda distribution. The analyses show that watershed development has substantial influence on initial abstraction and limited influence on constant loss. The means and medians of the 92 watershed-specific parameters are tabulated with respect to watershed development; although they have considerable uncertainty, these parameters can be used for parameter prediction for ungaged watersheds. The statistical analyses of watershed-specific, initial abstraction and constant loss also include development of predictive procedures for estimation of each parameter for ungaged watersheds. Both regression equations and regression trees for estimation of initial abstraction and constant loss are provided. The watershed characteristics included in the regression analyses are (1) main-channel length, (2) a binary factor representing watershed development, (3) a binary factor representing watersheds with an abundance of rocky and thin-soiled terrain, and (4) curve numb
Association of tRNA methyltransferase NSUN2/IGF-II molecular signature with ovarian cancer survival.
Yang, Jia-Cheng; Risch, Eric; Zhang, Meiqin; Huang, Chan; Huang, Huatian; Lu, Lingeng
2017-09-01
To investigate the association between NSUN2/IGF-II signature and ovarian cancer survival. Using a publicly accessible dataset of RNA sequencing and clinical follow-up data, we performed Classification and Regression Tree and survival analyses. Patients with NSUN2 high IGF-II low had significantly superior overall and disease progression-free survival, followed by NSUN2 low IGF-II low , NSUN2 high IGF-II high and NSUN2 low IGF-II high (p < 0.0001 for overall, p = 0.0024 for progression-free survival, respectively). The associations of NSUN2/IGF-II signature with the risks of death and relapse remained significant in multivariate Cox regression models. Random-effects meta-analyses show the upregulated NSUN2 and IGF-II expression in ovarian cancer versus normal tissues. The NSUN2/IGF-II signature associates with heterogeneous outcome and may have clinical implications in managing ovarian cancer.
Fleischer, Adam E; Hshieh, Shenche; Crews, Ryan T; Waverly, Brett J; Jones, Jacob M; Klein, Erin E; Weil, Lowell; Weil, Lowell Scott
2018-05-01
Metatarsal length is believed to play a role in plantar plate dysfunction, although the mechanism through which progressive injury occurs is still uncertain. We aimed to clarify whether length of the second metatarsal was associated with increased plantar pressure measurements in the forefoot while walking. Weightbearing radiographs and corresponding pedobarographic data from 100 patients in our practice walking without a limp were retrospectively reviewed. Radiographs were assessed for several anatomic relationships, including metatarsal length, by a single rater. Pearson correlation analyses and multiple linear regression models were used to determine whether metatarsal length was associated with forefoot loading parameters. The relative length of the second to first metatarsal was positively associated with the ratio of peak pressure beneath the respective metatarsophalangeal joints ( r = 0.243, P = .015). The relative length of the second to third metatarsal was positively associated with the ratios of peak pressure ( r = 0.292, P = .003), pressure-time integral ( r = 0.249, P = .013), and force-time integral ( r = 0.221, P = .028) beneath the respective metatarsophalangeal joints. Although the variability in loading predicted by the various regression analyses was not large (4%-14%), the relative length of the second metatarsal (to the first and to the third) was maintained in each of the multiple regression models and remained the strongest predictor (highest standardized β-coefficient) in each of the models. Patients with longer second metatarsals exhibited relatively higher loads beneath the second metatarsophalangeal joint during barefoot walking. These findings provide a mechanism through which elongated second metatarsals may contribute to plantar plate injuries. Level III, comparative study.
Nguyen, Quynh C.; Osypuk, Theresa L.; Schmidt, Nicole M.; Glymour, M. Maria; Tchetgen Tchetgen, Eric J.
2015-01-01
Despite the recent flourishing of mediation analysis techniques, many modern approaches are difficult to implement or applicable to only a restricted range of regression models. This report provides practical guidance for implementing a new technique utilizing inverse odds ratio weighting (IORW) to estimate natural direct and indirect effects for mediation analyses. IORW takes advantage of the odds ratio's invariance property and condenses information on the odds ratio for the relationship between the exposure (treatment) and multiple mediators, conditional on covariates, by regressing exposure on mediators and covariates. The inverse of the covariate-adjusted exposure-mediator odds ratio association is used to weight the primary analytical regression of the outcome on treatment. The treatment coefficient in such a weighted regression estimates the natural direct effect of treatment on the outcome, and indirect effects are identified by subtracting direct effects from total effects. Weighting renders treatment and mediators independent, thereby deactivating indirect pathways of the mediators. This new mediation technique accommodates multiple discrete or continuous mediators. IORW is easily implemented and is appropriate for any standard regression model, including quantile regression and survival analysis. An empirical example is given using data from the Moving to Opportunity (1994–2002) experiment, testing whether neighborhood context mediated the effects of a housing voucher program on obesity. Relevant Stata code (StataCorp LP, College Station, Texas) is provided. PMID:25693776
Hughes, James P; Haley, Danielle F; Frew, Paula M; Golin, Carol E; Adimora, Adaora A; Kuo, Irene; Justman, Jessica; Soto-Torres, Lydia; Wang, Jing; Hodder, Sally
2015-06-01
Reductions in risk behaviors are common following enrollment in human immunodeficiency virus (HIV) prevention studies. We develop methods to quantify the proportion of change in risk behaviors that can be attributed to regression to the mean versus study participation and other factors. A novel model that incorporates both regression to the mean and study participation effects is developed for binary measures. The model is used to estimate the proportion of change in the prevalence of "unprotected sex in the past 6 months" that can be attributed to study participation versus regression to the mean in a longitudinal cohort of women at risk for HIV infection who were recruited from ten U.S. communities with high rates of HIV and poverty. HIV risk behaviors were evaluated using audio computer-assisted self-interviews at baseline and every 6 months for up to 12 months. The prevalence of "unprotected sex in the past 6 months" declined from 96% at baseline to 77% at 12 months. However, this change could be almost completely explained by regression to the mean. Analyses that examine changes over time in cohorts selected for high- or low- risk behaviors should account for regression to the mean effects. Copyright © 2015 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Muller, Sybrand Jacobus; van Niekerk, Adriaan
2016-07-01
Soil salinity often leads to reduced crop yield and quality and can render soils barren. Irrigated areas are particularly at risk due to intensive cultivation and secondary salinization caused by waterlogging. Regular monitoring of salt accumulation in irrigation schemes is needed to keep its negative effects under control. The dynamic spatial and temporal characteristics of remote sensing can provide a cost-effective solution for monitoring salt accumulation at irrigation scheme level. This study evaluated a range of pan-fused SPOT-5 derived features (spectral bands, vegetation indices, image textures and image transformations) for classifying salt-affected areas in two distinctly different irrigation schemes in South Africa, namely Vaalharts and Breede River. The relationship between the input features and electro conductivity measurements were investigated using regression modelling (stepwise linear regression, partial least squares regression, curve fit regression modelling) and supervised classification (maximum likelihood, nearest neighbour, decision tree analysis, support vector machine and random forests). Classification and regression trees and random forest were used to select the most important features for differentiating salt-affected and unaffected areas. The results showed that the regression analyses produced weak models (<0.4 R squared). Better results were achieved using the supervised classifiers, but the algorithms tend to over-estimate salt-affected areas. A key finding was that none of the feature sets or classification algorithms stood out as being superior for monitoring salt accumulation at irrigation scheme level. This was attributed to the large variations in the spectral responses of different crops types at different growing stages, coupled with their individual tolerances to saline conditions.
NASA Astrophysics Data System (ADS)
Solimun
2017-05-01
The aim of this research is to model survival data from kidney-transplant patients using the partial least squares (PLS)-Cox regression, which can both meet and not meet the no-multicollinearity assumption. The secondary data were obtained from research entitled "Factors affecting the survival of kidney-transplant patients". The research subjects comprised 250 patients. The predictor variables consisted of: age (X1), sex (X2); two categories, prior hemodialysis duration (X3), diabetes (X4); two categories, prior transplantation number (X5), number of blood transfusions (X6), discrepancy score (X7), use of antilymphocyte globulin(ALG) (X8); two categories, while the response variable was patient survival time (in months). Partial least squares regression is a model that connects the predictor variables X and the response variable y and it initially aims to determine the relationship between them. Results of the above analyses suggest that the survival of kidney transplant recipients ranged from 0 to 55 months, with 62% of the patients surviving until they received treatment that lasted for 55 months. The PLS-Cox regression analysis results revealed that patients' age and the use of ALG significantly affected the survival time of patients. The factor of patients' age (X1) in the PLS-Cox regression model merely affected the failure probability by 1.201. This indicates that the probability of dying for elderly patients with a kidney transplant is 1.152 times higher than that for younger patients.
The challenge of staying happier: testing the Hedonic Adaptation Prevention model.
Sheldon, Kennon M; Lyubomirsky, Sonja
2012-05-01
The happiness that comes from a particular success or change in fortune abates with time. The Hedonic Adaptation Prevention (HAP) model specifies two routes by which the well-being gains derived from a positive life change are eroded--the first involving bottom-up processes (i.e., declining positive emotions generated by the positive change) and the second involving top-down processes (i.e., increased aspirations for even more positivity). The model also specifies two moderators that can forestall these processes--continued appreciation of the original life change and continued variety in change-related experiences. The authors formally tested the predictions of the HAP model in a 3-month three-wave longitudinal study of 481 students. Temporal path analyses and moderated regression analyses provided good support for the model. Implications for the stability of well-being, the feasibility of "the pursuit of happiness," and the appeal of overconsumption are discussed.
Andrews, Arthur R.; Bridges, Ana J.; Gomez, Debbie
2014-01-01
Purpose The aims of the study were to evaluate the orthogonality of acculturation for Latinos. Design Regression analyses were used to examine acculturation in two Latino samples (N = 77; N = 40). In a third study (N = 673), confirmatory factor analyses compared unidimensional and bidimensional models. Method Acculturation was assessed with the ARSMA-II (Studies 1 and 2), and language proficiency items from the Children of Immigrants Longitudinal Study (Study 3). Results In Studies 1 and 2, the bidimensional model accounted for slightly more variance (R2Study 1 = .11; R2Study 2 = .21) than the unidimensional model (R2Study 1 = .10; R2Study 2 = .19). In Study 3, the bidimensional model evidenced better fit (Akaike information criterion = 167.36) than the unidimensional model (Akaike information criterion = 1204.92). Discussion/Conclusions Acculturation is multidimensional. Implications for Practice Care providers should examine acculturation as a bidimensional construct. PMID:23361579
Sternberg, Maya R; Schleicher, Rosemary L; Pfeiffer, Christine M
2013-06-01
The collection of articles in this supplement issue provides insight into the association of various covariates with concentrations of biochemical indicators of diet and nutrition (biomarkers), beyond age, race, and sex, using linear regression. We studied 10 specific sociodemographic and lifestyle covariates in combination with 29 biomarkers from NHANES 2003-2006 for persons aged ≥ 20 y. The covariates were organized into 2 sets or "chunks": sociodemographic (age, sex, race-ethnicity, education, and income) and lifestyle (dietary supplement use, smoking, alcohol consumption, BMI, and physical activity) and fit in hierarchical fashion by using each category or set of related variables to determine how covariates, jointly, are related to biomarker concentrations. In contrast to many regression modeling applications, all variables were retained in a full regression model regardless of significance to preserve the interpretation of the statistical properties of β coefficients, P values, and CIs and to keep the interpretation consistent across a set of biomarkers. The variables were preselected before data analysis, and the data analysis plan was designed at the outset to minimize the reporting of false-positive findings by limiting the amount of preliminary hypothesis testing. Although we generally found that demographic differences seen in biomarkers were over- or underestimated when ignoring other key covariates, the demographic differences generally remained significant after adjusting for sociodemographic and lifestyle variables. These articles are intended to provide a foundation to researchers to help them generate hypotheses for future studies or data analyses and/or develop predictive regression models using the wealth of NHANES data.
The arcsine is asinine: the analysis of proportions in ecology.
Warton, David I; Hui, Francis K C
2011-01-01
The arcsine square root transformation has long been standard procedure when analyzing proportional data in ecology, with applications in data sets containing binomial and non-binomial response variables. Here, we argue that the arcsine transform should not be used in either circumstance. For binomial data, logistic regression has greater interpretability and higher power than analyses of transformed data. However, it is important to check the data for additional unexplained variation, i.e., overdispersion, and to account for it via the inclusion of random effects in the model if found. For non-binomial data, the arcsine transform is undesirable on the grounds of interpretability, and because it can produce nonsensical predictions. The logit transformation is proposed as an alternative approach to address these issues. Examples are presented in both cases to illustrate these advantages, comparing various methods of analyzing proportions including untransformed, arcsine- and logit-transformed linear models and logistic regression (with or without random effects). Simulations demonstrate that logistic regression usually provides a gain in power over other methods.
Perl-speaks-NONMEM (PsN)--a Perl module for NONMEM related programming.
Lindbom, Lars; Ribbing, Jakob; Jonsson, E Niclas
2004-08-01
The NONMEM program is the most widely used nonlinear regression software in population pharmacokinetic/pharmacodynamic (PK/PD) analyses. In this article we describe a programming library, Perl-speaks-NONMEM (PsN), intended for programmers that aim at using the computational capability of NONMEM in external applications. The library is object oriented and written in the programming language Perl. The classes of the library are built around NONMEM's data, model and output files. The specification of the NONMEM model is easily set or changed through the model and data file classes while the output from a model fit is accessed through the output file class. The classes have methods that help the programmer perform common repetitive tasks, e.g. summarising the output from a NONMEM run, setting the initial estimates of a model based on a previous run or truncating values over a certain threshold in the data file. PsN creates a basis for the development of high-level software using NONMEM as the regression tool.
Empirical Likelihood in Nonignorable Covariate-Missing Data Problems.
Xie, Yanmei; Zhang, Biao
2017-04-20
Missing covariate data occurs often in regression analysis, which frequently arises in the health and social sciences as well as in survey sampling. We study methods for the analysis of a nonignorable covariate-missing data problem in an assumed conditional mean function when some covariates are completely observed but other covariates are missing for some subjects. We adopt the semiparametric perspective of Bartlett et al. (Improving upon the efficiency of complete case analysis when covariates are MNAR. Biostatistics 2014;15:719-30) on regression analyses with nonignorable missing covariates, in which they have introduced the use of two working models, the working probability model of missingness and the working conditional score model. In this paper, we study an empirical likelihood approach to nonignorable covariate-missing data problems with the objective of effectively utilizing the two working models in the analysis of covariate-missing data. We propose a unified approach to constructing a system of unbiased estimating equations, where there are more equations than unknown parameters of interest. One useful feature of these unbiased estimating equations is that they naturally incorporate the incomplete data into the data analysis, making it possible to seek efficient estimation of the parameter of interest even when the working regression function is not specified to be the optimal regression function. We apply the general methodology of empirical likelihood to optimally combine these unbiased estimating equations. We propose three maximum empirical likelihood estimators of the underlying regression parameters and compare their efficiencies with other existing competitors. We present a simulation study to compare the finite-sample performance of various methods with respect to bias, efficiency, and robustness to model misspecification. The proposed empirical likelihood method is also illustrated by an analysis of a data set from the US National Health and Nutrition Examination Survey (NHANES).
Hydrologic and Hydraulic Analyses of Selected Streams in Lorain County, Ohio, 2003
Jackson, K. Scott; Ostheimer, Chad J.; Whitehead, Matthew T.
2003-01-01
Hydrologic and hydraulic analyses were done for selected reaches of nine streams in Lorain County Ohio. To assess the alternatives for flood-damage mitigation, the Lorain County Engineer and the U.S. Geological Survey (USGS) initiated a cooperative study to investigate aspects of the hydrology and hydraulics of the nine streams. Historical streamflow data and regional regression equations were used to estimate instantaneous peak discharges for floods having recurrence intervals of 2, 5, 10, 25, 50, and 100 years. Explanatory variables used in the regression equations were drainage area, main-channel slope, and storage area. Drainage areas of the nine stream reaches studied ranged from 1.80 to 19.3 square miles. The step-backwater model HEC-RAS was used to determine water-surface-elevation profiles for the 10-year-recurrence-interval (10-year) flood along a selected reach of each stream. The water-surface pro-file information was used then to generate digital mapping of flood-plain boundaries. The analyses indicate that at the 10-year flood elevation, road overflow results at numerous hydraulic structures along the nine streams.
Use of probabilistic weights to enhance linear regression myoelectric control
NASA Astrophysics Data System (ADS)
Smith, Lauren H.; Kuiken, Todd A.; Hargrove, Levi J.
2015-12-01
Objective. Clinically available prostheses for transradial amputees do not allow simultaneous myoelectric control of degrees of freedom (DOFs). Linear regression methods can provide simultaneous myoelectric control, but frequently also result in difficulty with isolating individual DOFs when desired. This study evaluated the potential of using probabilistic estimates of categories of gross prosthesis movement, which are commonly used in classification-based myoelectric control, to enhance linear regression myoelectric control. Approach. Gaussian models were fit to electromyogram (EMG) feature distributions for three movement classes at each DOF (no movement, or movement in either direction) and used to weight the output of linear regression models by the probability that the user intended the movement. Eight able-bodied and two transradial amputee subjects worked in a virtual Fitts’ law task to evaluate differences in controllability between linear regression and probability-weighted regression for an intramuscular EMG-based three-DOF wrist and hand system. Main results. Real-time and offline analyses in able-bodied subjects demonstrated that probability weighting improved performance during single-DOF tasks (p < 0.05) by preventing extraneous movement at additional DOFs. Similar results were seen in experiments with two transradial amputees. Though goodness-of-fit evaluations suggested that the EMG feature distributions showed some deviations from the Gaussian, equal-covariance assumptions used in this experiment, the assumptions were sufficiently met to provide improved performance compared to linear regression control. Significance. Use of probability weights can improve the ability to isolate individual during linear regression myoelectric control, while maintaining the ability to simultaneously control multiple DOFs.
Instructional Advice, Time Advice and Learning Questions in Computer Simulations
ERIC Educational Resources Information Center
Rey, Gunter Daniel
2010-01-01
Undergraduate students (N = 97) used an introductory text and a computer simulation to learn fundamental concepts about statistical analyses (e.g., analysis of variance, regression analysis and General Linear Model). Each learner was randomly assigned to one cell of a 2 (with or without instructional advice) x 2 (with or without time advice) x 2…
ERIC Educational Resources Information Center
Pratt, Charlotte; Webber, Larry S.; Baggett, Chris D.; Ward, Dianne; Pate, Russell R.; Murray, David; Lohman, Timothy; Lytle, Leslie; Elder, John P.
2008-01-01
This study describes the relationships between sedentary activity and body composition in 1,458 sixth-grade girls from 36 middle schools across the United States. Multivariate associations between sedentary activity and body composition were examined with regression analyses using general linear mixed models. Mean age, body mass index, and…
Comparison of the Incremental Validity of the Old and New MCAT.
ERIC Educational Resources Information Center
Wolf, Fredric M.; And Others
The predictive and incremental validity of both the Old and New Medical College Admission Test (MCAT) was examined and compared with a sample of over 300 medical students. Results of zero order and incremental validity coefficients, as well as prediction models resulting from all possible subsets regression analyses using Mallow's Cp criterion,…
ERIC Educational Resources Information Center
Wragg, Regina E.
2013-01-01
This dissertation presents my explorations in both molecular biology and science education research. In study one, we determined the "ADIPOQ" and "ADIPORI" genotypes of 364 White and 148 Black BrCa patients and used dominant model univariate logistic regression analyses to determine individual SNP and haplotype associations…
Adaptive variation in Pinus ponderosa from Intermountain Regions. I. Snake and Salmon River Basins
Gerald Rehfeldt
1986-01-01
Genetic differentiation of 64 populations from central Idaho was studied in field, greenhouse and laboratory tests. Analyses of variables reflecting growth potential, phenology, morphology, cold hardiness and periodicity of shoot elongation revealed population differentiation for a variety of traits. Regression models related as much as 61 percent of the variance among...
W. Mark Ford; Steven B. Castleberry; Michael T. Mengak; Jane L. Rodrigue; Daniel J. Feller; Kevin R. Russell
2006-01-01
We examined a suite of macro-habitat and landscape variables around active and inactive Allegheny woodrat Neotoma magister colony sites in the Appalachian Mountains of the mid-Atlantic Highlands of Maryland, Virginia, and West Virginia using an information-theoretic modeling approach. Logistic regression analyses suggested that Allegheny woodrat presence was related...
ERIC Educational Resources Information Center
Kieffer, Kevin M.; Schinka, John A.; Curtiss, Glenn
2004-01-01
This study examined the contributions of the 5-Factor Model (FFM; P. T. Costa & R. R. McCrae, 1992) and RIASEC (J. L. Holland, 1994) constructs of consistency, differentiation, and person-environment congruence in predicting job performance ratings in a large sample (N = 514) of employees. Hierarchical regression analyses conducted separately by…
ERIC Educational Resources Information Center
Suurland, Jill; van der Heijden, Kristiaan B.; Huijbregts, Stephan C. J.; Smaling, Hanneke J. A.; de Sonneville, Leo M. J.; Van Goozen, Stephanie H. M.; Swaab, Hanna
2016-01-01
Inhibitory control (IC) and negative emotionality (NE) are both linked to aggressive behavior, but their interplay has not yet been clarified. This study examines different NE × IC interaction models in relation to aggressive behavior in 855 preschoolers (aged 2-5 years) using parental questionnaires. Hierarchical regression analyses revealed that…
Predicting Negative Discipline in Traditional Families: A Multi-Dimensional Stress Model.
ERIC Educational Resources Information Center
Fisher, Philip A.
An attempt is made to integrate existing theories of family violence by introducing the concept of family role stress. Role stressors may be defined as factors inhibiting the enactment of family roles. Multiple regression analyses were performed on data from 190 families to test a hypothesis involving the prediction of negative discipline at…
General Education Development (GED) Recipients' Life Course Experiences: Humanizing the Findings
ERIC Educational Resources Information Center
Hartigan, Lacey A.
2017-01-01
This study examines a range of GED recipients' life course contexts and experiences and their relationship with long-term outcomes. Using descriptive comparisons, bivariate tests, and propensity-score matched regression models to analyze data from rounds 1-15 of the National Longitudinal Survey of Youth, 1997, analyses aim to examine: (1)…
Ecological and Topographic Features of Volcanic Ash-Influenced Forest Soils
Mark Kimsey; Brian Gardner; Alan Busacca
2007-01-01
Volcanic ash distribution and thickness were determined for a forested region of north-central Idaho. Mean ash thickness and multiple linear regression analyses were used to model the effect of environmental variables on ash thickness. Slope and slope curvature relationships with volcanic ash thickness varied on a local spatial scale across the study area. Ash...
Meta-analyses of the 5-HTTLPR polymorphisms and post-traumatic stress disorder.
Navarro-Mateu, Fernando; Escámez, Teresa; Koenen, Karestan C; Alonso, Jordi; Sánchez-Meca, Julio
2013-01-01
To conduct a meta-analysis of all published genetic association studies of 5-HTTLPR polymorphisms performed in PTSD cases. Potential studies were identified through PubMed/MEDLINE, EMBASE, Web of Science databases (Web of Knowledge, WoK), PsychINFO, PsychArticles and HuGeNet (Human Genome Epidemiology Network) up until December 2011. Published observational studies reporting genotype or allele frequencies of this genetic factor in PTSD cases and in non-PTSD controls were all considered eligible for inclusion in this systematic review. Two reviewers selected studies for possible inclusion and extracted data independently following a standardized protocol. A biallelic and a triallelic meta-analysis, including the total S and S' frequencies, the dominant (S+/LL and S'+/L'L') and the recessive model (SS/L+ and S'S'/L'+), was performed with a random-effect model to calculate the pooled OR and its corresponding 95% CI. Forest plots and Cochran's Q-Statistic and I(2) index were calculated to check for heterogeneity. Subgroup analyses and meta-regression were carried out to analyze potential moderators. Publication bias and quality of reporting were also analyzed. 13 studies met our inclusion criteria, providing a total sample of 1874 patients with PTSD and 7785 controls in the biallelic meta-analyses and 627 and 3524, respectively, in the triallelic. None of the meta-analyses showed evidence of an association between 5-HTTLPR and PTSD but several characteristics (exposure to the same principal stressor for PTSD cases and controls, adjustment for potential confounding variables, blind assessment, study design, type of PTSD, ethnic distribution and Total Quality Score) influenced the results in subgroup analyses and meta-regression. There was no evidence of potential publication bias. Current evidence does not support a direct effect of 5-HTTLPR polymorphisms on PTSD. Further analyses of gene-environment interactions, epigenetic modulation and new studies with large samples and/or meta-analyses are required.
NASA Technical Reports Server (NTRS)
Duda, David P.; Minnis, Patrick
2009-01-01
Previous studies have shown that probabilistic forecasting may be a useful method for predicting persistent contrail formation. A probabilistic forecast to accurately predict contrail formation over the contiguous United States (CONUS) is created by using meteorological data based on hourly meteorological analyses from the Advanced Regional Prediction System (ARPS) and from the Rapid Update Cycle (RUC) as well as GOES water vapor channel measurements, combined with surface and satellite observations of contrails. Two groups of logistic models were created. The first group of models (SURFACE models) is based on surface-based contrail observations supplemented with satellite observations of contrail occurrence. The second group of models (OUTBREAK models) is derived from a selected subgroup of satellite-based observations of widespread persistent contrails. The mean accuracies for both the SURFACE and OUTBREAK models typically exceeded 75 percent when based on the RUC or ARPS analysis data, but decreased when the logistic models were derived from ARPS forecast data.
Green, Kimberly T.; Beckham, Jean C.; Youssef, Nagy; Elbogen, Eric B.
2013-01-01
Objective The present study sought to investigate the longitudinal effects of psychological resilience against alcohol misuse adjusting for socio-demographic factors, trauma-related variables, and self-reported history of alcohol abuse. Methodology Data were from National Post-Deployment Adjustment Study (NPDAS) participants who completed both a baseline and one-year follow-up survey (N=1090). Survey questionnaires measured combat exposure, probable posttraumatic stress disorder (PTSD), psychological resilience, and alcohol misuse, all of which were measured at two discrete time periods (baseline and one-year follow-up). Baseline resilience and change in resilience (increased or decreased) were utilized as independent variables in separate models evaluating alcohol misuse at the one-year follow-up. Results Multiple linear regression analyses controlled for age, gender, level of educational attainment, combat exposure, PTSD symptom severity, and self-reported alcohol abuse. Accounting for these covariates, findings revealed that lower baseline resilience, younger age, male gender, and self-reported alcohol abuse were related to alcohol misuse at the one-year follow-up. A separate regression analysis, adjusting for the same covariates, revealed a relationship between change in resilience (from baseline to the one-year follow-up) and alcohol misuse at the one-year follow-up. The regression model evaluating these variables in a subset of the sample in which all the participants had been deployed to Iraq and/or Afghanistan was consistent with findings involving the overall era sample. Finally, logistic regression analyses of the one-year follow-up data yielded similar results to the baseline and resilience change models. Conclusions These findings suggest that increased psychological resilience is inversely related to alcohol misuse and is protective against alcohol misuse over time. Additionally, it supports the conceptualization of resilience as a process which evolves over time. Moreover, our results underscore the importance of assessing resilience as part of alcohol use screening for preventing alcohol misuse in Iraq and Afghanistan era military veterans. PMID:24090625
Suvak, Michael K; Walling, Sherry M; Iverson, Katherine M; Taft, Casey T; Resick, Patricia A
2009-12-01
Multilevel modeling is a powerful and flexible framework for analyzing nested data structures (e.g., repeated measures or longitudinal designs). The authors illustrate a series of multilevel regression procedures that can be used to elucidate the nature of the relationship between two variables across time. The goal is to help trauma researchers become more aware of the utility of multilevel modeling as a tool for increasing the field's understanding of posttraumatic adaptation. These procedures are demonstrated by examining the relationship between two posttraumatic symptoms, intrusion and avoidance, across five assessment points in a sample of rape and robbery survivors (n = 286). Results revealed that changes in intrusion were highly correlated with changes in avoidance over the 18-month posttrauma period.
Webster, R J; Williams, A; Marchetti, F; Yauk, C L
2018-07-01
Mutations in germ cells pose potential genetic risks to offspring. However, de novo mutations are rare events that are spread across the genome and are difficult to detect. Thus, studies in this area have generally been under-powered, and no human germ cell mutagen has been identified. Whole Genome Sequencing (WGS) of human pedigrees has been proposed as an approach to overcome these technical and statistical challenges. WGS enables analysis of a much wider breadth of the genome than traditional approaches. Here, we performed power analyses to determine the feasibility of using WGS in human families to identify germ cell mutagens. Different statistical models were compared in the power analyses (ANOVA and multiple regression for one-child families, and mixed effect model sampling between two to four siblings per family). Assumptions were made based on parameters from the existing literature, such as the mutation-by-paternal age effect. We explored two scenarios: a constant effect due to an exposure that occurred in the past, and an accumulating effect where the exposure is continuing. Our analysis revealed the importance of modeling inter-family variability of the mutation-by-paternal age effect. Statistical power was improved by models accounting for the family-to-family variability. Our power analyses suggest that sufficient statistical power can be attained with 4-28 four-sibling families per treatment group, when the increase in mutations ranges from 40 to 10% respectively. Modeling family variability using mixed effect models provided a reduction in sample size compared to a multiple regression approach. Much larger sample sizes were required to detect an interaction effect between environmental exposures and paternal age. These findings inform study design and statistical modeling approaches to improve power and reduce sequencing costs for future studies in this area. Crown Copyright © 2018. Published by Elsevier B.V. All rights reserved.
Coping Styles in Heart Failure Patients with Depressive Symptoms
Trivedi, Ranak B.; Blumenthal, James A.; O'Connor, Christopher; Adams, Kirkwood; Hinderliter, Alan; Sueta-Dupree, Carla; Johnson, Kristy; Sherwood, Andrew
2009-01-01
Objective Elevated depressive symptoms have been linked to poorer prognosis in heart failure (HF) patients. Our objective was to identify coping styles associated with depressive symptoms in HF patients. Methods 222 stable HF patients (32.75% female, 45.4% non-Hispanic Black) completed multiple questionnaires. Beck Depression Inventory (BDI) assessed depressive symptoms, Life Orientation Test (LOT-R) assessed optimism, ENRICHD Social Support Inventory (ESSI) and Perceived Social Support Scale (PSSS) assessed social support, and COPE assessed coping styles. Linear regression analyses were employed to assess the association of coping styles with continuous BDI scores. Logistic regression analyses were performed using BDI scores dichotomized into BDI<10 versus BDI≥10, to identify coping styles accompanying clinically significant depressive symptoms. Results In linear regression models, higher BDI scores were associated with lower scores on the acceptance (β=-.14), humor (β=-.15), planning (β=-.15), and emotional support (β=-.14) subscales of the COPE, and higher scores on the behavioral disengagement (β=.41), denial (β=.33), venting (β=.25), and mental disengagement (β=.22) subscales. Higher PSSS and ESSI scores were associated with lower BDI scores (β=-.32 and -.25, respectively). Higher LOT-R scores were associated with higher BDI scores (β=.39, p<.001). In logistical regression models, BDI≥10 was associated with greater likelihood of behavioral disengagement (OR=1.3), denial (OR=1.2), mental disengagement (OR=1.3), venting (OR=1.2), and pessimism (OR=1.2), and lower perceived social support measured by PSSS (OR=.92) and ESSI (OR=.92). Conclusion Depressive symptoms in HF patients are associated with avoidant coping, lower perceived social support, and pessimism. Results raise the possibility that interventions designed to improve coping may reduce depressive symptoms. PMID:19773027
The effects of climate change on harp seals (Pagophilus groenlandicus).
Johnston, David W; Bowers, Matthew T; Friedlaender, Ari S; Lavigne, David M
2012-01-01
Harp seals (Pagophilus groenlandicus) have evolved life history strategies to exploit seasonal sea ice as a breeding platform. As such, individuals are prepared to deal with fluctuations in the quantity and quality of ice in their breeding areas. It remains unclear, however, how shifts in climate may affect seal populations. The present study assesses the effects of climate change on harp seals through three linked analyses. First, we tested the effects of short-term climate variability on young-of-the year harp seal mortality using a linear regression of sea ice cover in the Gulf of St. Lawrence against stranding rates of dead harp seals in the region during 1992 to 2010. A similar regression of stranding rates and North Atlantic Oscillation (NAO) index values was also conducted. These analyses revealed negative correlations between both ice cover and NAO conditions and seal mortality, indicating that lighter ice cover and lower NAO values result in higher mortality. A retrospective cross-correlation analysis of NAO conditions and sea ice cover from 1978 to 2011 revealed that NAO-related changes in sea ice may have contributed to the depletion of seals on the east coast of Canada during 1950 to 1972, and to their recovery during 1973 to 2000. This historical retrospective also reveals opposite links between neonatal mortality in harp seals in the Northeast Atlantic and NAO phase. Finally, an assessment of the long-term trends in sea ice cover in the breeding regions of harp seals across the entire North Atlantic during 1979 through 2011 using multiple linear regression models and mixed effects linear regression models revealed that sea ice cover in all harp seal breeding regions has been declining by as much as 6 percent per decade over the time series of available satellite data.
The Effects of Climate Change on Harp Seals (Pagophilus groenlandicus)
Johnston, David W.; Bowers, Matthew T.; Friedlaender, Ari S.; Lavigne, David M.
2012-01-01
Harp seals (Pagophilus groenlandicus) have evolved life history strategies to exploit seasonal sea ice as a breeding platform. As such, individuals are prepared to deal with fluctuations in the quantity and quality of ice in their breeding areas. It remains unclear, however, how shifts in climate may affect seal populations. The present study assesses the effects of climate change on harp seals through three linked analyses. First, we tested the effects of short-term climate variability on young-of-the year harp seal mortality using a linear regression of sea ice cover in the Gulf of St. Lawrence against stranding rates of dead harp seals in the region during 1992 to 2010. A similar regression of stranding rates and North Atlantic Oscillation (NAO) index values was also conducted. These analyses revealed negative correlations between both ice cover and NAO conditions and seal mortality, indicating that lighter ice cover and lower NAO values result in higher mortality. A retrospective cross-correlation analysis of NAO conditions and sea ice cover from 1978 to 2011 revealed that NAO-related changes in sea ice may have contributed to the depletion of seals on the east coast of Canada during 1950 to 1972, and to their recovery during 1973 to 2000. This historical retrospective also reveals opposite links between neonatal mortality in harp seals in the Northeast Atlantic and NAO phase. Finally, an assessment of the long-term trends in sea ice cover in the breeding regions of harp seals across the entire North Atlantic during 1979 through 2011 using multiple linear regression models and mixed effects linear regression models revealed that sea ice cover in all harp seal breeding regions has been declining by as much as 6 percent per decade over the time series of available satellite data. PMID:22238591
Coping styles in heart failure patients with depressive symptoms.
Trivedi, Ranak B; Blumenthal, James A; O'Connor, Christopher; Adams, Kirkwood; Hinderliter, Alan; Dupree, Carla; Johnson, Kristy; Sherwood, Andrew
2009-10-01
Elevated depressive symptoms have been linked to poorer prognosis in heart failure (HF) patients. Our objective was to identify coping styles associated with depressive symptoms in HF patients. A total of 222 stable HF patients (32.75% female, 45.4% non-Hispanic black) completed multiple questionnaires. Beck Depression Inventory (BDI) assessed depressive symptoms, Life Orientation Test (LOT-R) assessed optimism, ENRICHD Social Support Inventory (ESSI) and Perceived Social Support Scale (PSSS) assessed social support, and COPE assessed coping styles. Linear regression analyses were employed to assess the association of coping styles with continuous BDI scores. Logistic regression analyses were performed using BDI scores dichotomized into BDI<10 vs. BDI> or =10, to identify coping styles accompanying clinically significant depressive symptoms. In linear regression models, higher BDI scores were associated with lower scores on the acceptance (beta=-.14), humor (beta=-.15), planning (beta=-.15), and emotional support (beta=-.14) subscales of the COPE, and higher scores on the behavioral disengagement (beta=.41), denial (beta=.33), venting (beta=.25), and mental disengagement (beta=.22) subscales. Higher PSSS and ESSI scores were associated with lower BDI scores (beta=-.32 and -.25, respectively). Higher LOT-R scores were associated with higher BDI scores (beta=.39, P<.001). In logistical regression models, BDI> or =10 was associated with greater likelihood of behavioral disengagement (OR=1.3), denial (OR=1.2), mental disengagement (OR=1.3), venting (OR=1.2), and pessimism (OR=1.2), and lower perceived social support measured by PSSS (OR=.92) and ESSI (OR=.92). Depressive symptoms in HF patients are associated with avoidant coping, lower perceived social support, and pessimism. Results raise the possibility that interventions designed to improve coping may reduce depressive symptoms.
McCool, Judith P; Cameron, Linda D; Petrie, Keith J
2005-06-01
To assess a theoretical model of adolescents' exposure to films, perceptions of smoking imagery in film, and smoking intentions. A structured questionnaire was completed by 3041 Year 8 (aged 12 years) and Year 12 (aged 16 years) students from 25 schools in Auckland, New Zealand. The survey assessed the relationships among exposure to films, attitudes about smoking imagery, perceptions of smoking prevalence and its acceptability, and expectations of smoking in the future. Measures included exposure to films, perceived pervasiveness of, and nonchalant attitudes about smoking imagery, identification of positive smoker stereotypes in films, perceived smoking prevalence, judgment of smoking acceptability, and smoking expectations. Path analytic techniques, using multiple regression analyses, were used to test the pattern of associations identified by the media interpretation model. Hierarchical regression analyses revealed that film exposure predicted higher levels of perceived smoking prevalence, perceived imagery pervasiveness, and nonchalant attitudes about smoking imagery. Nonchalant attitudes, identification of positive smoker stereotypes, and perceived smoking prevalence predicted judgments of smoking acceptability. Acceptability judgments, identification of positive stereotypes, and perceived smoking prevalence were all positively associated with smoking expectations. The media interpretation model accounted for 24% of the variance in smoking expectations within the total sample. Smoking imagery in film may play a role in the development of smoking intentions through inflating the perception of smoking prevalence and presenting socially attractive images.
Custer, Christine M.; Custer, Thomas W.; Dummer, Paul; Etterson, Matthew A.; Thogmartin, Wayne E.; Wu, Qian; Kannan, Kurunthachalam; Trowbridge, Annette; McKann, Patrick C.
2013-01-01
The exposure and effects of perfluoroalkyl substances (PFASs) were studied at eight locations in Minnesota and Wisconsin between 2007 and 2011 using tree swallows (Tachycineta bicolor). Concentrations of PFASs were quantified as were reproductive success end points. The sample egg method was used wherein an egg sample is collected, and the hatching success of the remaining eggs in the nest is assessed. The association between PFAS exposure and reproductive success was assessed by site comparisons, logistic regression analysis, and multistate modeling, a technique not previously used in this context. There was a negative association between concentrations of perfluorooctane sulfonate (PFOS) in eggs and hatching success. The concentration at which effects became evident (150–200 ng/g wet weight) was far lower than effect levels found in laboratory feeding trials or egg-injection studies of other avian species. This discrepancy was likely because behavioral effects and other extrinsic factors are not accounted for in these laboratory studies and the possibility that tree swallows are unusually sensitive to PFASs. The results from multistate modeling and simple logistic regression analyses were nearly identical. Multistate modeling provides a better method to examine possible effects of additional covariates and assessment of models using Akaike information criteria analyses. There was a credible association between PFOS concentrations in plasma and eggs, so extrapolation between these two commonly sampled tissues can be performed.
Nideröst, Sibylle; Gredig, Daniel; Roulin, Christophe; Rickenbach, Martin
2011-07-01
This prospective study applies an extended Information-Motivation-Behavioural Skills (IMB) model to establish predictors of HIV-protection behaviour among HIV-positive men who have sex with men (MSM) during sex with casual partners. Data have been collected from anonymous, self-administered questionnaires and analysed by using descriptive and backward elimination regression analyses. In a sample of 165 HIV-positive MSM, 82 participants between the ages of 23 and 78 (M=46.4, SD=9.0) had sex with casual partners during the three-month period under investigation. About 62% (n=51) have always used a condom when having sex with casual partners. From the original IMB model, only subjective norm predicted condom use. More important predictors that increased condom use were low consumption of psychotropics, high satisfaction with sexuality, numerous changes in sexual behaviour after diagnosis, low social support from friends, alcohol use before sex and habitualised condom use with casual partner(s). The explanatory power of the calculated regression model was 49% (p<0.001). The study reveals the importance of personal and social resources and of routines for condom use, and provides information for the research-based conceptualisation of prevention offers addressing especially people living with HIV ("positive prevention").
Techniques for estimating flood-peak discharges of rural, unregulated streams in Ohio
Koltun, G.F.
2003-01-01
Regional equations for estimating 2-, 5-, 10-, 25-, 50-, 100-, and 500-year flood-peak discharges at ungaged sites on rural, unregulated streams in Ohio were developed by means of ordinary and generalized least-squares (GLS) regression techniques. One-variable, simple equations and three-variable, full-model equations were developed on the basis of selected basin characteristics and flood-frequency estimates determined for 305 streamflow-gaging stations in Ohio and adjacent states. The average standard errors of prediction ranged from about 39 to 49 percent for the simple equations, and from about 34 to 41 percent for the full-model equations. Flood-frequency estimates determined by means of log-Pearson Type III analyses are reported along with weighted flood-frequency estimates, computed as a function of the log-Pearson Type III estimates and the regression estimates. Values of explanatory variables used in the regression models were determined from digital spatial data sets by means of a geographic information system (GIS), with the exception of drainage area, which was determined by digitizing the area within basin boundaries manually delineated on topographic maps. Use of GIS-based explanatory variables represents a major departure in methodology from that described in previous reports on estimating flood-frequency characteristics of Ohio streams. Examples are presented illustrating application of the regression equations to ungaged sites on ungaged and gaged streams. A method is provided to adjust regression estimates for ungaged sites by use of weighted and regression estimates for a gaged site on the same stream. A region-of-influence method, which employs a computer program to estimate flood-frequency characteristics for ungaged sites based on data from gaged sites with similar characteristics, was also tested and compared to the GLS full-model equations. For all recurrence intervals, the GLS full-model equations had superior prediction accuracy relative to the simple equations and therefore are recommended for use.
NASA Astrophysics Data System (ADS)
Visser, H.; Molenaar, J.
1995-05-01
The detection of trends in climatological data has become central to the discussion on climate change due to the enhanced greenhouse effect. To prove detection, a method is needed (i) to make inferences on significant rises or declines in trends, (ii) to take into account natural variability in climate series, and (iii) to compare output from GCMs with the trends in observed climate data. To meet these requirements, flexible mathematical tools are needed. A structural time series model is proposed with which a stochastic trend, a deterministic trend, and regression coefficients can be estimated simultaneously. The stochastic trend component is described using the class of ARIMA models. The regression component is assumed to be linear. However, the regression coefficients corresponding with the explanatory variables may be time dependent to validate this assumption. The mathematical technique used to estimate this trend-regression model is the Kaiman filter. The main features of the filter are discussed.Examples of trend estimation are given using annual mean temperatures at a single station in the Netherlands (1706-1990) and annual mean temperatures at Northern Hemisphere land stations (1851-1990). The inclusion of explanatory variables is shown by regressing the latter temperature series on four variables: Southern Oscillation index (SOI), volcanic dust index (VDI), sunspot numbers (SSN), and a simulated temperature signal, induced by increasing greenhouse gases (GHG). In all analyses, the influence of SSN on global temperatures is found to be negligible. The correlations between temperatures and SOI and VDI appear to be negative. For SOI, this correlation is significant, but for VDI it is not, probably because of a lack of volcanic eruptions during the sample period. The relation between temperatures and GHG is positive, which is in agreement with the hypothesis of a warming climate because of increasing levels of greenhouse gases. The prediction performance of the model is rather poor, and possible explanations are discussed.
System dynamic modeling: an alternative method for budgeting.
Srijariya, Witsanuchai; Riewpaiboon, Arthorn; Chaikledkaew, Usa
2008-03-01
To construct, validate, and simulate a system dynamic financial model and compare it against the conventional method. The study was a cross-sectional analysis of secondary data retrieved from the National Health Security Office (NHSO) in the fiscal year 2004. The sample consisted of all emergency patients who received emergency services outside their registered hospital-catchments area. The dependent variable used was the amount of reimbursed money. Two types of model were constructed, namely, the system dynamic model using the STELLA software and the multiple linear regression model. The outputs of both methods were compared. The study covered 284,716 patients from various levels of providers. The system dynamic model had the capability of producing various types of outputs, for example, financial and graphical analyses. For the regression analysis, statistically significant predictors were composed of service types (outpatient or inpatient), operating procedures, length of stay, illness types (accident or not), hospital characteristics, age, and hospital location (adjusted R(2) = 0.74). The total budget arrived at from using the system dynamic model and regression model was US$12,159,614.38 and US$7,301,217.18, respectively, whereas the actual NHSO reimbursement cost was US$12,840,805.69. The study illustrated that the system dynamic model is a useful financial management tool, although it is not easy to construct. The model is not only more accurate in prediction but is also more capable of analyzing large and complex real-world situations than the conventional method.
Zhang, Nana; Yang, Xin; Zhu, Xiaolin; Zhao, Bin; Huang, Tianyi; Ji, Qiuhe
2017-04-01
Objectives To determine whether the associations with key risk factors in patients with diagnosed and undiagnosed type 2 diabetes mellitus (T2DM) are different using data from the National Health and Nutrition Examination Survey (NHANES) from 1999 to 2010. Methods The study analysed the prevalence and association with risk factors of undiagnosed and diagnosed T2DM using a regression model and a multinomial logistic regression model. Data from the NHANES 1999-2010 were used for the analyses. Results The study analysed data from 10 570 individuals. The overall prevalence of diagnosed and undiagnosed T2DM increased significantly from 1999 to 2010. The prevalence of undiagnosed T2DM was significantly higher in non-Hispanic whites, in individuals <30 years old and in those with near optimal (130-159 mg/dl) or very high (≥220 mg/dl) non-high-density lipoprotein cholesterol levels compared with diagnosed T2DM. Body mass index, low economic status or low educational level had no effect on T2DM diagnosis rates. Though diagnosed T2DM was associated with favourable diet/carbohydrate intake behavioural changes, it had no effect on physical activity levels. Conclusion The overall T2DM prevalence increased between 1999 and 2010, particularly for undiagnosed T2DM in patients that were formerly classified as low risk.
Zhang, Nana; Yang, Xin; Zhu, Xiaolin; Zhao, Bin; Huang, Tianyi
2017-01-01
Objectives To determine whether the associations with key risk factors in patients with diagnosed and undiagnosed type 2 diabetes mellitus (T2DM) are different using data from the National Health and Nutrition Examination Survey (NHANES) from 1999 to 2010. Methods The study analysed the prevalence and association with risk factors of undiagnosed and diagnosed T2DM using a regression model and a multinomial logistic regression model. Data from the NHANES 1999–2010 were used for the analyses. Results The study analysed data from 10 570 individuals. The overall prevalence of diagnosed and undiagnosed T2DM increased significantly from 1999 to 2010. The prevalence of undiagnosed T2DM was significantly higher in non-Hispanic whites, in individuals <30 years old and in those with near optimal (130–159 mg/dl) or very high (≥220 mg/dl) non-high-density lipoprotein cholesterol levels compared with diagnosed T2DM. Body mass index, low economic status or low educational level had no effect on T2DM diagnosis rates. Though diagnosed T2DM was associated with favourable diet/carbohydrate intake behavioural changes, it had no effect on physical activity levels. Conclusion The overall T2DM prevalence increased between 1999 and 2010, particularly for undiagnosed T2DM in patients that were formerly classified as low risk. PMID:28415936
The relative effect of noise at different times of day: An analysis of existing survey data
NASA Technical Reports Server (NTRS)
Fields, J. M.
1986-01-01
This report examines survey evidence on the relative impact of noise at different times of day and assesses the survey methodology which produces that evidence. Analyses of the regression of overall (24-hour) annoyance on noise levels in different time periods can provide direct estimates of the value of the parameters in human reaction models which are used in environmental noise indices such as LDN and CNEL. In this report these analyses are based on the original computer tapes containing the responses of 22,000 respondents from ten studies of response to noise in residential areas. The estimates derived from these analyses are found to be so inaccurate that they do not provide useful information for policy or scientific purposes. The possibility that the type of questionnaire item could be biasing the estimates of the time-of-day weightings is considered but not supported by the data. Two alternatives to the conventional noise reaction model (adjusted energy model) are considered but not supported by the data.
The relative effect of noise at different times of day: An analysis of existing survey data
NASA Astrophysics Data System (ADS)
Fields, J. M.
1986-04-01
This report examines survey evidence on the relative impact of noise at different times of day and assesses the survey methodology which produces that evidence. Analyses of the regression of overall (24-hour) annoyance on noise levels in different time periods can provide direct estimates of the value of the parameters in human reaction models which are used in environmental noise indices such as LDN and CNEL. In this report these analyses are based on the original computer tapes containing the responses of 22,000 respondents from ten studies of response to noise in residential areas. The estimates derived from these analyses are found to be so inaccurate that they do not provide useful information for policy or scientific purposes. The possibility that the type of questionnaire item could be biasing the estimates of the time-of-day weightings is considered but not supported by the data. Two alternatives to the conventional noise reaction model (adjusted energy model) are considered but not supported by the data.
Lawrence, Stephen J.
2012-01-01
Regression analyses show that E. coli density in samples was strongly related to turbidity, streamflow characteristics, and season at both sites. The regression equation chosen for the Norcross data showed that 78 percent of the variability in E. coli density (in log base 10 units) was explained by the variability in turbidity values (in log base 10 units), streamflow event (dry-weather flow or stormflow), season (cool or warm), and an interaction term that is the cross product of streamflow event and turbidity. The regression equation chosen for the Atlanta data showed that 76 percent of the variability in E. coli density (in log base 10 units) was explained by the variability in turbidity values (in log base 10 units), water temperature, streamflow event, and an interaction term that is the cross product of streamflow event and turbidity. Residual analysis and model confirmation using new data indicated the regression equations selected at both sites predicted E. coli density within the 90 percent prediction intervals of the equations and could be used to predict E. coli density in real time at both sites.
NASA Astrophysics Data System (ADS)
Wang, Wei; Zhong, Ming; Cheng, Ling; Jin, Lu; Shen, Si
2018-02-01
In the background of building global energy internet, it has both theoretical and realistic significance for forecasting and analysing the ratio of electric energy to terminal energy consumption. This paper firstly analysed the influencing factors of the ratio of electric energy to terminal energy and then used combination method to forecast and analyse the global proportion of electric energy. And then, construct the cointegration model for the proportion of electric energy by using influence factor such as electricity price index, GDP, economic structure, energy use efficiency and total population level. At last, this paper got prediction map of the proportion of electric energy by using the combination-forecasting model based on multiple linear regression method, trend analysis method, and variance-covariance method. This map describes the development trend of the proportion of electric energy in 2017-2050 and the proportion of electric energy in 2050 was analysed in detail using scenario analysis.
ERIC Educational Resources Information Center
Shafiq, M. Najeeb
2011-01-01
Using quantile regression analyses, this study examines gender gaps in mathematics, science, and reading in Azerbaijan, Indonesia, Jordan, the Kyrgyz Republic, Qatar, Tunisia, and Turkey among 15 year-old students. The analyses show that girls in Azerbaijan achieve as well as boys in mathematics and science and overachieve in reading. In Jordan,…
Efficient parameter estimation in longitudinal data analysis using a hybrid GEE method.
Leung, Denis H Y; Wang, You-Gan; Zhu, Min
2009-07-01
The method of generalized estimating equations (GEEs) provides consistent estimates of the regression parameters in a marginal regression model for longitudinal data, even when the working correlation model is misspecified (Liang and Zeger, 1986). However, the efficiency of a GEE estimate can be seriously affected by the choice of the working correlation model. This study addresses this problem by proposing a hybrid method that combines multiple GEEs based on different working correlation models, using the empirical likelihood method (Qin and Lawless, 1994). Analyses show that this hybrid method is more efficient than a GEE using a misspecified working correlation model. Furthermore, if one of the working correlation structures correctly models the within-subject correlations, then this hybrid method provides the most efficient parameter estimates. In simulations, the hybrid method's finite-sample performance is superior to a GEE under any of the commonly used working correlation models and is almost fully efficient in all scenarios studied. The hybrid method is illustrated using data from a longitudinal study of the respiratory infection rates in 275 Indonesian children.
Developmental trajectories of paediatric headache - sex-specific analyses and predictors.
Isensee, Corinna; Fernandez Castelao, Carolin; Kröner-Herwig, Birgit
2016-01-01
Headache is the most common pain disorder in children and adolescents and is associated with diverse dysfunctions and psychological symptoms. Several studies evidenced sex-specific differences in headache frequency. Until now no study exists that examined sex-specific patterns of change in paediatric headache across time and included pain-related somatic and (socio-)psychological predictors. Latent Class Growth Analysis (LCGA) was used in order to identify different trajectory classes of headache across four annual time points in a population-based sample (n = 3 227; mean age 11.34 years; 51.2 % girls). In multinomial logistic regression analyses the influence of several predictors on the class membership was examined. For girls, a four-class model was identified as the best fitting model. While the majority of girls reported no (30.5 %) or moderate headache frequencies (32.5 %) across time, one class with a high level of headache days (20.8 %) and a class with an increasing headache frequency across time (16.2 %) were identified. For boys a two class model with a 'no headache class' (48.6 %) and 'moderate headache class' (51.4 %) showed the best model fit. Regarding logistic regression analyses, migraine and parental headache proved to be stable predictors across sexes. Depression/anxiety was a significant predictor for all pain classes in girls. Life events, dysfunctional stress coping and school burden were also able to differentiate at least between some classes in both sexes. The identified trajectories reflect sex-specific differences in paediatric headache, as seen in the number and type of classes extracted. The documented risk factors can deliver ideas for preventive actions and considerations for treatment programmes.
Daigneault, B W; McNamara, K A; Purdy, P H; Krisher, R L; Knox, R V; Rodriguez-Zas, S L; Miller, D J
2015-05-01
Due to reduced fertility, cryopreserved semen is seldom used for commercial porcine artificial insemination (AI). Predicting the fertility of individual frozen ejaculates for selection of higher quality semen prior to AI would increase overall success. Our objective was to test novel and traditional laboratory analyses to identify characteristics of cryopreserved spermatozoa that are related to boar fertility. Traditional post-thaw analyses of motility, viability, and acrosome integrity were performed on each ejaculate. In vitro fertilization, cleavage, and blastocyst development were also determined. Finally, spermatozoa-oviduct binding and competitive zona-binding assays were applied to assess sperm adhesion to these two matrices. Fertility of the same ejaculates subjected to laboratory assays was determined for each boar by multi-sire AI and defined as (i) the mean percentage of the litter sired and (ii) the mean number of piglets sired in each litter. Means of each laboratory evaluation were calculated for each boar and those values were applied to multiple linear regression analyses to determine which sperm traits could collectively estimate fertility in the simplest model. The regression model to predict the percent of litter sired by each boar was highly effective (p < 0.001, r(2) = 0.87) and included five traits; acrosome-compromised spermatozoa, percent live spermatozoa (0 and 60 min post-thaw), percent total motility, and the number of zona-bound spermatozoa. A second model to predict the number of piglets sired by boar was also effective (p < 0.05, r(2) = 0.57). These models indicate that the fertility of cryopreserved boar spermatozoa can be predicted effectively by including traditional and novel laboratory assays that consider functions of spermatozoa. © 2015 American Society of Andrology and European Academy of Andrology.
Biondi-Zoccai, Giuseppe; Mastrangeli, Simona; Romagnoli, Enrico; Peruzzi, Mariangela; Frati, Giacomo; Roever, Leonardo; Giordano, Arturo
2018-01-17
Atherosclerosis has major morbidity and mortality implications globally. While it has often been considered an irreversible degenerative process, recent evidence provides compelling proof that atherosclerosis can be reversed. Plaque regression is however difficult to appraise and quantify, with competing diagnostic methods available. Given the potential of evidence synthesis to provide clinical guidance, we aimed to review recent meta-analyses on diagnostic methods for atherosclerotic plaque regression. We identified 8 meta-analyses published between 2015 and 2017, including 79 studies and 14,442 patients, followed for a median of 12 months. They reported on atherosclerotic plaque regression appraised with carotid duplex ultrasound, coronary computed tomography, carotid magnetic resonance, coronary intravascular ultrasound, and coronary optical coherence tomography. Overall, all meta-analyses showed significant atherosclerotic plaque regression with lipid-lowering therapy, with the most notable effects on echogenicity, lipid-rich necrotic core volume, wall/plaque volume, dense calcium volume, and fibrous cap thickness. Significant interactions were found with concomitant changes in low density lipoprotein cholesterol, high density lipoprotein cholesterol, and C-reactive protein levels, and with ethnicity. Atherosclerotic plaque regression and conversion to a stable phenotype is possible with intensive medical therapy and can be demonstrated in patients using a variety of non-invasive and invasive imaging modalities.
Osmani, M G; Thornton, R N; Dhand, N K; Hoque, M A; Milon, Sk M A; Kalam, M A; Hossain, M; Yamage, M
2014-12-01
A case-control study conducted during 2011 involved 90 randomly selected commercial layer farms infected with highly pathogenic avian influenza type A subtype H5N1 (HPAI) and 175 control farms randomly selected from within 5 km of infected farms. A questionnaire was designed to obtain information about potential risk factors for contracting HPAI and was administered to farm owners or managers. Logistic regression analyses were conducted to identify significant risk factors. A total of 20 of 43 risk factors for contracting HPAI were identified after univariable logistic regression analysis. A multivariable logistic regression model was derived by forward stepwise selection. Both unmatched and matched analyses were performed. The key risk factors identified were numbers of staff, frequency of veterinary visits, presence of village chickens roaming on the farm and staff trading birds. Aggregating these findings with those from other studies resulted in a list of 16 key risk factors identified in Bangladesh. Most of these related to biosecurity. It is considered feasible for Bangladesh to achieve a very low incidence of HPAI. Using the cumulative list of risk factors to enhance biosecurity pertaining to commercial farms would facilitate this objective. © 2013 Blackwell Verlag GmbH.
High dimensional linear regression models under long memory dependence and measurement error
NASA Astrophysics Data System (ADS)
Kaul, Abhishek
This dissertation consists of three chapters. The first chapter introduces the models under consideration and motivates problems of interest. A brief literature review is also provided in this chapter. The second chapter investigates the properties of Lasso under long range dependent model errors. Lasso is a computationally efficient approach to model selection and estimation, and its properties are well studied when the regression errors are independent and identically distributed. We study the case, where the regression errors form a long memory moving average process. We establish a finite sample oracle inequality for the Lasso solution. We then show the asymptotic sign consistency in this setup. These results are established in the high dimensional setup (p> n) where p can be increasing exponentially with n. Finally, we show the consistency, n½ --d-consistency of Lasso, along with the oracle property of adaptive Lasso, in the case where p is fixed. Here d is the memory parameter of the stationary error sequence. The performance of Lasso is also analysed in the present setup with a simulation study. The third chapter proposes and investigates the properties of a penalized quantile based estimator for measurement error models. Standard formulations of prediction problems in high dimension regression models assume the availability of fully observed covariates and sub-Gaussian and homogeneous model errors. This makes these methods inapplicable to measurement errors models where covariates are unobservable and observations are possibly non sub-Gaussian and heterogeneous. We propose weighted penalized corrected quantile estimators for the regression parameter vector in linear regression models with additive measurement errors, where unobservable covariates are nonrandom. The proposed estimators forgo the need for the above mentioned model assumptions. We study these estimators in both the fixed dimension and high dimensional sparse setups, in the latter setup, the dimensionality can grow exponentially with the sample size. In the fixed dimensional setting we provide the oracle properties associated with the proposed estimators. In the high dimensional setting, we provide bounds for the statistical error associated with the estimation, that hold with asymptotic probability 1, thereby providing the ℓ1-consistency of the proposed estimator. We also establish the model selection consistency in terms of the correctly estimated zero components of the parameter vector. A simulation study that investigates the finite sample accuracy of the proposed estimator is also included in this chapter.
Fernández, Leónides; Mediano, Pilar; García, Ricardo; Rodríguez, Juan M; Marín, María
2016-09-01
Objectives Lactational mastitis frequently leads to a premature abandonment of breastfeeding; its development has been associated with several risk factors. This study aims to use a decision tree (DT) approach to establish the main risk factors involved in mastitis and to compare its performance for predicting this condition with a stepwise logistic regression (LR) model. Methods Data from 368 cases (breastfeeding women with mastitis) and 148 controls were collected by a questionnaire about risk factors related to medical history of mother and infant, pregnancy, delivery, postpartum, and breastfeeding practices. The performance of the DT and LR analyses was compared using the area under the receiver operating characteristic (ROC) curve. Sensitivity, specificity and accuracy of both models were calculated. Results Cracked nipples, antibiotics and antifungal drugs during breastfeeding, infant age, breast pumps, familial history of mastitis and throat infection were significant risk factors associated with mastitis in both analyses. Bottle-feeding and milk supply were related to mastitis for certain subgroups in the DT model. The areas under the ROC curves were similar for LR and DT models (0.870 and 0.835, respectively). The LR model had better classification accuracy and sensitivity than the DT model, but the last one presented better specificity at the optimal threshold of each curve. Conclusions The DT and LR models constitute useful and complementary analytical tools to assess the risk of lactational infectious mastitis. The DT approach identifies high-risk subpopulations that need specific mastitis prevention programs and, therefore, it could be used to make the most of public health resources.
Predicting story goodness performance from cognitive measures following traumatic brain injury.
Lê, Karen; Coelho, Carl; Mozeiko, Jennifer; Krueger, Frank; Grafman, Jordan
2012-05-01
This study examined the prediction of performance on measures of the Story Goodness Index (SGI; Lê, Coelho, Mozeiko, & Grafman, 2011) from executive function (EF) and memory measures following traumatic brain injury (TBI). It was hypothesized that EF and memory measures would significantly predict SGI outcomes. One hundred sixty-seven individuals with TBI participated in the study. Story retellings were analyzed using the SGI protocol. Three cognitive measures--Delis-Kaplan Executive Function System (D-KEFS; Delis, Kaplan, & Kramer, 2001) Sorting Test, Wechsler Memory Scale--Third Edition (WMS-III; Wechsler, 1997) Working Memory Primary Index (WMI), and WMS-III Immediate Memory Primary Index (IMI)--were entered into a multiple linear regression model for each discourse measure. Two sets of regression analyses were performed, the first with the Sorting Test as the first predictor and the second with it as the last. The first set of regression analyses identified the Sorting Test and IMI as the only significant predictors of performance on measures of the SGI. The second set identified all measures as significant predictors when evaluating each step of the regression function. The cognitive variables predicted performance on the SGI measures, although there were differences in the amount of explained variance. The results (a) suggest that storytelling ability draws on a number of underlying skills and (b) underscore the importance of using discrete cognitive tasks rather than broad cognitive indices to investigate the cognitive substrates of discourse.
Meta-analysis of the effect of road work zones on crash occurrence.
Theofilatos, Athanasios; Ziakopoulos, Apostolos; Papadimitriou, Eleonora; Yannis, George; Diamandouros, Konstantinos
2017-11-01
There is strong evidence that work zones pose increased risk of crashes and injuries. The two most common risk factors associated with increased crash frequencies are work zone duration and length. However, relevant research on the topic is relatively limited. For that reason, this paper presents formal meta-analyses of studies that have estimated the relationship between the number of crashes and work zone duration and length, in order to provide overall estimates of those effects on crash frequencies. All studies presented in this paper are crash prediction models with similar specifications. According to the meta-analyses and after correcting for publication bias when it was considered appropriate, the summary estimates of regression coefficients were found to be 0.1703 for duration and 0.862 for length. These effects were significant for length but not for duration. However, the overall estimate of duration was significant before correcting for publication bias. Separate meta-analyses on the studies examining both duration and length was also carried out in order to have rough estimates of the combined effects. The estimate of duration was found to be 0.953, while for length was 0.847. Similar to previous meta-analyses the effect of duration after correcting for publication bias is not significant, while the effect of length was significant at a 95% level. Meta-regression findings indicate that the main factors influencing the overall estimates of the beta coefficients are study year and region for duration and study year and model specification for length. Copyright © 2017 Elsevier Ltd. All rights reserved.
Association between month of birth and melanoma risk: fact or fiction?
Fiessler, Cornelia; Pfahlberg, Annette B; Keller, Andrea K; Radespiel-Tröger, Martin; Uter, Wolfgang; Gefeller, Olaf
2017-04-01
Evidence on the effect of ultraviolet radiation (UVR) exposure in infancy on melanoma risk in later life is scarce. Three recent studies suggest that people born in spring carry a higher melanoma risk. Our study aimed at verifying whether such a seasonal pattern of melanoma risk actually exists. Data from the population-based Cancer Registry Bavaria (CRB) on the birth months of 28 374 incident melanoma cases between 2002 and 2012 were analysed and compared with data from the Bavarian State Office for Statistics and Data Processing on the birth month distribution in the Bavarian population. Crude and adjusted analyses using negative binomial regression models were performed in the total study group and supplemented by several subgroup analyses. In the crude analysis, the birth months March-May were over-represented among melanoma cases. Negative binomial regression models adjusted only for sex and birth year revealed a seasonal association between melanoma risk and birth month with 13-21% higher relative incidence rates for March, April and May compared with the reference December. However, after additionally adjusting for the birth month distribution of the Bavarian population, these risk estimates decreased markedly and no association with the birth month was observed any more. Similar results emerged in all subgroup analyses. Our large registry-based study provides no evidence that people born in spring carry a higher risk for developing melanoma in later life and thus lends no support to the hypothesis of higher UVR susceptibility during the first months of life. © The Author 2016; all rights reserved. Published by Oxford University Press on behalf of the International Epidemiological Association
Advanced Computational Framework for Environmental Management ZEM, Version 1.x
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vesselinov, Velimir V.; O'Malley, Daniel; Pandey, Sachin
2016-11-04
Typically environmental management problems require analysis of large and complex data sets originating from concurrent data streams with different data collection frequencies and pedigree. These big data sets require on-the-fly integration into a series of models with different complexity for various types of model analyses where the data are applied as soft and hard model constraints. This is needed to provide fast iterative model analyses based on the latest available data to guide decision-making. Furthermore, the data and model are associated with uncertainties. The uncertainties are probabilistic (e.g. measurement errors) and non-probabilistic (unknowns, e.g. alternative conceptual models characterizing site conditions).more » To address all of these issues, we have developed an integrated framework for real-time data and model analyses for environmental decision-making called ZEM. The framework allows for seamless and on-the-fly integration of data and modeling results for robust and scientifically-defensible decision-making applying advanced decision analyses tools such as Bayesian- Information-Gap Decision Theory (BIG-DT). The framework also includes advanced methods for optimization that are capable of dealing with a large number of unknown model parameters, and surrogate (reduced order) modeling capabilities based on support vector regression techniques. The framework is coded in Julia, a state-of-the-art high-performance programing language (http://julialang.org). The ZEM framework is open-source and can be applied to any environmental management site. The framework will be open-source and released under GPL V3 license.« less
A critical re-evaluation of the regression model specification in the US D1 EQ-5D value function
2012-01-01
Background The EQ-5D is a generic health-related quality of life instrument (five dimensions with three levels, 243 health states), used extensively in cost-utility/cost-effectiveness analyses. EQ-5D health states are assigned values on a scale anchored in perfect health (1) and death (0). The dominant procedure for defining values for EQ-5D health states involves regression modeling. These regression models have typically included a constant term, interpreted as the utility loss associated with any movement away from perfect health. The authors of the United States EQ-5D valuation study replaced this constant with a variable, D1, which corresponds to the number of impaired dimensions beyond the first. The aim of this study was to illustrate how the use of the D1 variable in place of a constant is problematic. Methods We compared the original D1 regression model with a mathematically equivalent model with a constant term. Comparisons included implications for the magnitude and statistical significance of the coefficients, multicollinearity (variance inflation factors, or VIFs), number of calculation steps needed to determine tariff values, and consequences for tariff interpretation. Results Using the D1 variable in place of a constant shifted all dummy variable coefficients away from zero by the value of the constant, greatly increased the multicollinearity of the model (maximum VIF of 113.2 vs. 21.2), and increased the mean number of calculation steps required to determine health state values. Discussion Using the D1 variable in place of a constant constitutes an unnecessary complication of the model, obscures the fact that at least two of the main effect dummy variables are statistically nonsignificant, and complicates and biases interpretation of the tariff algorithm. PMID:22244261
A critical re-evaluation of the regression model specification in the US D1 EQ-5D value function.
Rand-Hendriksen, Kim; Augestad, Liv A; Dahl, Fredrik A
2012-01-13
The EQ-5D is a generic health-related quality of life instrument (five dimensions with three levels, 243 health states), used extensively in cost-utility/cost-effectiveness analyses. EQ-5D health states are assigned values on a scale anchored in perfect health (1) and death (0).The dominant procedure for defining values for EQ-5D health states involves regression modeling. These regression models have typically included a constant term, interpreted as the utility loss associated with any movement away from perfect health. The authors of the United States EQ-5D valuation study replaced this constant with a variable, D1, which corresponds to the number of impaired dimensions beyond the first. The aim of this study was to illustrate how the use of the D1 variable in place of a constant is problematic. We compared the original D1 regression model with a mathematically equivalent model with a constant term. Comparisons included implications for the magnitude and statistical significance of the coefficients, multicollinearity (variance inflation factors, or VIFs), number of calculation steps needed to determine tariff values, and consequences for tariff interpretation. Using the D1 variable in place of a constant shifted all dummy variable coefficients away from zero by the value of the constant, greatly increased the multicollinearity of the model (maximum VIF of 113.2 vs. 21.2), and increased the mean number of calculation steps required to determine health state values. Using the D1 variable in place of a constant constitutes an unnecessary complication of the model, obscures the fact that at least two of the main effect dummy variables are statistically nonsignificant, and complicates and biases interpretation of the tariff algorithm.
Clary, Christelle; Lewis, Daniel J; Flint, Ellen; Smith, Neil R; Kestens, Yan; Cummins, Steven
2016-12-01
Studies that explore associations between the local food environment and diet routinely use global regression models, which assume that relationships are invariant across space, yet such stationarity assumptions have been little tested. We used global and geographically weighted regression models to explore associations between the residential food environment and fruit and vegetable intake. Analyses were performed in 4 boroughs of London, United Kingdom, using data collected between April 2012 and July 2012 from 969 adults in the Olympic Regeneration in East London Study. Exposures were assessed both as absolute densities of healthy and unhealthy outlets, taken separately, and as a relative measure (proportion of total outlets classified as healthy). Overall, local models performed better than global models (lower Akaike information criterion). Locally estimated coefficients varied across space, regardless of the type of exposure measure, although changes of sign were observed only when absolute measures were used. Despite findings from global models showing significant associations between the relative measure and fruit and vegetable intake (β = 0.022; P < 0.01) only, geographically weighted regression models using absolute measures outperformed models using relative measures. This study suggests that greater attention should be given to nonstationary relationships between the food environment and diet. It further challenges the idea that a single measure of exposure, whether relative or absolute, can reflect the many ways the food environment may shape health behaviors. © The Author 2016. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Muhlestein, Whitney E; Akagi, Dallin S; Kallos, Justiss A; Morone, Peter J; Weaver, Kyle D; Thompson, Reid C; Chambless, Lola B
2018-04-01
Objective Machine learning (ML) algorithms are powerful tools for predicting patient outcomes. This study pilots a novel approach to algorithm selection and model creation using prediction of discharge disposition following meningioma resection as a proof of concept. Materials and Methods A diversity of ML algorithms were trained on a single-institution database of meningioma patients to predict discharge disposition. Algorithms were ranked by predictive power and top performers were combined to create an ensemble model. The final ensemble was internally validated on never-before-seen data to demonstrate generalizability. The predictive power of the ensemble was compared with a logistic regression. Further analyses were performed to identify how important variables impact the ensemble. Results Our ensemble model predicted disposition significantly better than a logistic regression (area under the curve of 0.78 and 0.71, respectively, p = 0.01). Tumor size, presentation at the emergency department, body mass index, convexity location, and preoperative motor deficit most strongly influence the model, though the independent impact of individual variables is nuanced. Conclusion Using a novel ML technique, we built a guided ML ensemble model that predicts discharge destination following meningioma resection with greater predictive power than a logistic regression, and that provides greater clinical insight than a univariate analysis. These techniques can be extended to predict many other patient outcomes of interest.
Wathes, D C; Bourne, N; Cheng, Z; Mann, G E; Taylor, V J; Coffey, M P
2007-03-01
Results from 4 studies were combined (representing a total of 500 lactations) to investigate the relationships between metabolic parameters and fertility in dairy cows. Information was collected on blood metabolic traits and body condition score at 1 to 2 wk prepartum and at 2, 4, and 7 wk postpartum. Fertility traits were days to commencement of luteal activity, days to first service, days to conception, and failure to conceive. Primiparous and multiparous cows were considered separately. Initial linear regression analyses were used to determine relationships among fertility, metabolic, and endocrine traits at each time point. All metabolic and endocrine traits significantly related to fertility were included in stepwise multiple regression analyses alone (model 1), including peak milk yield and interval to commencement of luteal activity (model 2), and with the further addition of dietary group (model 3). In multiparous cows, extended calving to conception intervals were associated prepartum with greater concentrations of leptin and lesser concentrations of nonesterified fatty acids and urea, and postpartum with reduced insulin-like growth factor-I at 2 wk, greater urea at 7 wk, and greater peak milk yield. In primiparous cows, extended calving to conception intervals were associated with more body condition and more urea prepartum, elevated urea postpartum, and more body condition loss by 7 wk. In conclusion, some metabolic measurements were associated with poorer fertility outcomes. Relationships between fertility and metabolic and endocrine traits varied both according to the lactation number of the cow and with the time relative to calving.
Cashman, Kevin D.; Ritz, Christian; Kiely, Mairead
2017-01-01
Dietary Reference Values (DRVs) for vitamin D have a key role in the prevention of vitamin D deficiency. However, despite adopting similar risk assessment protocols, estimates from authoritative agencies over the last 6 years have been diverse. This may have arisen from diverse approaches to data analysis. Modelling strategies for pooling of individual subject data from cognate vitamin D randomized controlled trials (RCTs) are likely to provide the most appropriate DRV estimates. Thus, the objective of the present work was to undertake the first-ever individual participant data (IPD)-level meta-regression, which is increasingly recognized as best practice, from seven winter-based RCTs (with 882 participants ranging in age from 4 to 90 years) of the vitamin D intake–serum 25-hydroxyvitamin D (25(OH)D) dose-response. Our IPD-derived estimates of vitamin D intakes required to maintain 97.5% of 25(OH)D concentrations >25, 30, and 50 nmol/L across the population are 10, 13, and 26 µg/day, respectively. In contrast, standard meta-regression analyses with aggregate data (as used by several agencies in recent years) from the same RCTs estimated that a vitamin D intake requirement of 14 µg/day would maintain 97.5% of 25(OH)D >50 nmol/L. These first IPD-derived estimates offer improved dietary recommendations for vitamin D because the underpinning modeling captures the between-person variability in response of serum 25(OH)D to vitamin D intake. PMID:28481259
Sugihara, Toru; Yasunaga, Hideo; Horiguchi, Hiromasa; Fujimura, Tetsuya; Fushimi, Kiyohide; Yu, Changhong; Kattan, Michael W; Homma, Yukio
2014-12-01
Little is known about the disparity of choices between three urinary diversions after radical cystectomy, focusing on patient and institutional factors. We identified urothelial carcinoma patients who received radical cystectomy with cutaneous ureterostomy, ileal conduit or continent reservoir using the Japanese Diagnosis Procedure Combination database from 2007 to 2012. Data comprised age, sex, comorbidities (converted into the Charlson index), TNM classification (converted into oncological stage), hospitals' academic status, hospital volume, bed volume and geographical region. Multivariate ordinal logistic regression analyses fitted with the proportional odds model were performed to analyze factors affecting urinary diversion choices. For dependent variables, the three diversions were converted into an ordinal variable in order of complexity: cutaneous ureterostomy (reference), ileal conduit and continent reservoir. Geographical variations were also examined by multivariate logistic regression models. A total of 4790 patients (1131 cutaneous ureterostomies [23.6 %], 2970 ileal conduits [62.0 %] and 689 continent reservoirs [14.4 %]) were included. Ordinal logistic regression analyses showed that male sex, lower age, lower Charlson index, early tumor stage, higher hospital volume (≥3.4 cases/year) and larger bed volume (≥450 beds) were significantly associated with the preference of more complex urinary diversion. Significant geographical disparity was also found. Good patient condition and early oncological status, as well as institutional factors, including high hospital volume, large bed volume and specific geographical regions, are independently related to the likelihood of choosing complex diversions. Recognizing this disparity would help reinforce the need for clinical practice uniformity.
The relationships between empathy, stress and social support among medical students
Kim, Dong-hee; Kim, Seok Kyoung; Yi, Young Hoon; Jeong, Jae Hoon; Chae, Jiun; Hwang, Jiyeon; Roh, HyeRin
2015-01-01
Objectives To examine the relationship between stress, social support, and empathy among medical students. Methods We evaluated the relationships between stress and empathy, and social support and empathy among medical students. The respondents completed a question-naire including demographic information, the Jefferson Scale of Empathy, the Perceived Stress Scale, and the Multidimensional Scale of Perceived Social Support. Corre-lation and linear regression analyses were conducted, along with sub-analyses according to gender, admission system, and study year. Results In total, 2,692 questionnaires were analysed. Empathy and social support positively correlated, and empathy and stress negatively correlated. Similar correla-tion patterns were detected in the sub-analyses; the correla-tion between empathy and stress among female students was negligible. In the regression model, stress and social support predicted empathy among all the samples. In the sub-analysis, stress was not a significant predictor among female and first-year students. Conclusions Stress and social support were significant predictors of empathy among all the students. Medical educators should provide means to foster resilience against stress or stress alleviation, and to ameliorate social support, so as to increase or maintain empathy in the long term. Furthermore, stress management should be emphasised, particularly among female and first-year students. PMID:26342190
Spirituality and Resilience Among Mexican American IPV Survivors.
de la Rosa, Iván A; Barnett-Queen, Timothy; Messick, Madeline; Gurrola, Maria
2016-12-01
Women with abusive partners use a variety of coping strategies. This study examined the correlation between spirituality, resilience, and intimate partner violence using a cross-sectional survey of 54 Mexican American women living along the U.S.-Mexico border. The meaning-making coping model provides the conceptual framework to explore how spirituality is used as a copying strategy. Multiple ordinary least squares (OLS) regression results indicate women who score higher on spirituality also report greater resilient characteristics. Poisson regression analyses revealed that an increase in level of spirituality is associated with lower number of types of abuse experienced. Clinical, programmatic, and research implications are discussed. © The Author(s) 2015.
ERIC Educational Resources Information Center
Wiest, Dudley J.; Wong, Eugene H.; Kreil, Dennis A.
1998-01-01
The ability of measures of perceived competence, control, and autonomy support to predict self-worth and academic performance was studied across groups of high school students. Stepwise regression analyses indicate these variables in model predict self-worth and grade point average. In addition, levels of school status and depression predict…
ERIC Educational Resources Information Center
Lee, Jaekyung; Reeves, Todd
2012-01-01
This study examines the impact of high-stakes school accountability, capacity, and resources under NCLB on reading and math achievement outcomes through comparative interrupted time-series analyses of 1990-2009 NAEP state assessment data. Through hierarchical linear modeling latent variable regression with inverse probability of treatment…
Factors Affecting Success in the Professional Entry Exam for Accountants in Brazil
ERIC Educational Resources Information Center
Lima Rodrigues, Lúcia; Pinho, Carlos; Bugarim, Maria Clara; Craig, Russell; Machado, Diego
2018-01-01
This paper explores factors that have affected the success of candidates in the professional entry exam conducted by Brazil's Federal Council of Accounting. We analyse results of 18,948 candidates who sat for the exam in 2012, using a logistic regression model and the key indicators used by government to monitor the performance of higher education…
ERIC Educational Resources Information Center
Leos-Urbel, Jacob
2015-01-01
This article examines the relationship between after-school program quality, program attendance, and academic outcomes for a sample of low-income after-school program participants. Regression and hierarchical linear modeling analyses use a unique longitudinal data set including 29 after-school programs that served 5,108 students in Grades 4 to 8…
ERIC Educational Resources Information Center
Obasaju, Mayowa A.; Palin, Frances L.; Jacobs, Carli; Anderson, Page; Kaslow, Nadine J.
2009-01-01
An ecological model is used to explore the moderating effects of community-level variables on the relation between childhood sexual, physical, and emotional abuse and adult intimate partner violence (IPV) within a sample of 98 African American women from low incomes. Results from hierarchical, binary logistics regressions analyses show that…
ERIC Educational Resources Information Center
Bauermeister, Jose J.; Barkley, Russell A.; Bauermeister, Jose A.; Martinez, Jose V.; McBurnett, Keith
2012-01-01
This study examined the latent structure and validity of inattention, hyperactivity-impulsivity, and sluggish cognitive tempo (SCT) symptomatology. We evaluated mother and teacher ratings of ADHD and SCT symptoms in 140 Puerto Rican children (55.7% males), ages 6 to 11 years, via factor and regression analyses. A three-factor model (inattention,…
ERIC Educational Resources Information Center
Begley, Kim; McLaws, Mary-Louise; Ross, Michael W.; Gold, Julian
2008-01-01
This cross-sectional study identified variables associated with protease inhibitor (PI) non-adherence in 179 patients taking anti-retroviral therapy. Univariate analyses identified 11 variables associated with PI non-adherence. Multiple logistic regression modelling identified three predictors of PI non-adherence: low adherence self-efficacy and…
ERIC Educational Resources Information Center
Arnold, Carolyn L.; Kaufman, Phillip D.
This report examines the effects of both student and school characteristics on mathematics and science achievement levels in the third, seventh, and eleventh grades using data from the 1985-86 National Assessment of Educational Progress (NAEP). Analyses feature hierarchical linear models (HLM), a regression-like statistical technique that…
ERIC Educational Resources Information Center
Deering, Pamela Rose
2014-01-01
This research compares and contrasts two approaches to predictive analysis of three years' of school district data to investigate relationships between student and teacher characteristics and math achievement as measured by the state-mandated Maryland School Assessment mathematics exam. The sample for the study consisted of 3,514 students taught…
ERIC Educational Resources Information Center
Kandiko, C. B.
2008-01-01
To compare college and university student engagement in two countries with different responses to global forces, Canada and the United States (US), a series of hierarchical linear regression (HLM) models were developed to analyse data from the 2006 administration of the National Survey of Student Engagement (NSSE). Overall, students in the U.S.…
A method for estimating current attendance on sets of campgrounds...a pilot study
Richard L. Bury; Ruth Margolies
1964-01-01
Statistical models were devised for estimating both daily and seasonal attendance (and corresponding precision of estimates) through correlation-regression and ratio analyses. Total daily attendance for a test set of 23 campgrounds could be estimated from attendance measured in only one of them. The chances were that estimates would be within 10 percent of true...
Accounting for standard errors of vision-specific latent trait in regression models.
Wong, Wan Ling; Li, Xiang; Li, Jialiang; Wong, Tien Yin; Cheng, Ching-Yu; Lamoureux, Ecosse L
2014-07-11
To demonstrate the effectiveness of Hierarchical Bayesian (HB) approach in a modeling framework for association effects that accounts for SEs of vision-specific latent traits assessed using Rasch analysis. A systematic literature review was conducted in four major ophthalmic journals to evaluate Rasch analysis performed on vision-specific instruments. The HB approach was used to synthesize the Rasch model and multiple linear regression model for the assessment of the association effects related to vision-specific latent traits. The effectiveness of this novel HB one-stage "joint-analysis" approach allows all model parameters to be estimated simultaneously and was compared with the frequently used two-stage "separate-analysis" approach in our simulation study (Rasch analysis followed by traditional statistical analyses without adjustment for SE of latent trait). Sixty-six reviewed articles performed evaluation and validation of vision-specific instruments using Rasch analysis, and 86.4% (n = 57) performed further statistical analyses on the Rasch-scaled data using traditional statistical methods; none took into consideration SEs of the estimated Rasch-scaled scores. The two models on real data differed for effect size estimations and the identification of "independent risk factors." Simulation results showed that our proposed HB one-stage "joint-analysis" approach produces greater accuracy (average of 5-fold decrease in bias) with comparable power and precision in estimation of associations when compared with the frequently used two-stage "separate-analysis" procedure despite accounting for greater uncertainty due to the latent trait. Patient-reported data, using Rasch analysis techniques, do not take into account the SE of latent trait in association analyses. The HB one-stage "joint-analysis" is a better approach, producing accurate effect size estimations and information about the independent association of exposure variables with vision-specific latent traits. Copyright 2014 The Association for Research in Vision and Ophthalmology, Inc.
Villarrasa-Sapiña, Israel; Álvarez-Pitti, Julio; Cabeza-Ruiz, Ruth; Redón, Pau; Lurbe, Empar; García-Massó, Xavier
2018-02-01
Excess body weight during childhood causes reduced motor functionality and problems in postural control, a negative influence which has been reported in the literature. Nevertheless, no information regarding the effect of body composition on the postural control of overweight and obese children is available. The objective of this study was therefore to establish these relationships. A cross-sectional design was used to establish relationships between body composition and postural control variables obtained in bipedal eyes-open and eyes-closed conditions in twenty-two children. Centre of pressure signals were analysed in the temporal and frequency domains. Pearson correlations were applied to establish relationships between variables. Principal component analysis was applied to the body composition variables to avoid potential multicollinearity in the regression models. These principal components were used to perform a multiple linear regression analysis, from which regression models were obtained to predict postural control. Height and leg mass were the body composition variables that showed the highest correlation with postural control. Multiple regression models were also obtained and several of these models showed a higher correlation coefficient in predicting postural control than simple correlations. These models revealed that leg and trunk mass were good predictors of postural control. More equations were found in the eyes-open than eyes-closed condition. Body weight and height are negatively correlated with postural control. However, leg and trunk mass are better postural control predictors than arm or body mass. Finally, body composition variables are more useful in predicting postural control when the eyes are open. Copyright © 2017 Elsevier Ltd. All rights reserved.
Mothers' education and childhood mortality in Ghana.
Buor, Daniel
2003-06-01
The significant extent to which maternal education affects child health has been advanced in several sociodemographic-medical literature, but not much has been done in analysing the spatial dimension of the problem; and also using graphic and linear regression models of representation. In Ghana, very little has been done to relate the two variables and offer pragmatic explanations. The need to correlate the two, using a regression model, which is rarely applied in previous studies, is a methodological necessity. The paper examines the impact of mothers' education on childhood mortality in Ghana using, primarily, Ghana Demographic and Health Survey data of 1998 and World Bank data of 2000. The survey has emphatically established that there is an inverse relationship between mothers' education and child survivorship. The use of basic health facilities that relate to childhood survival shows a direct relationship with mothers' education. Recommendations for policy initiatives to simultaneously emphasise the education of the girl-child, and to ensure adequate access to maternal and child health services, have been made. The need for an experimental project of integrating maternal education and child health services has also been recommended. A linear regression model that illustrates the relationship between maternal education and childhood survival has emerged.
Incorporating Measurement Error from Modeled Air Pollution Exposures into Epidemiological Analyses.
Samoli, Evangelia; Butland, Barbara K
2017-12-01
Outdoor air pollution exposures used in epidemiological studies are commonly predicted from spatiotemporal models incorporating limited measurements, temporal factors, geographic information system variables, and/or satellite data. Measurement error in these exposure estimates leads to imprecise estimation of health effects and their standard errors. We reviewed methods for measurement error correction that have been applied in epidemiological studies that use model-derived air pollution data. We identified seven cohort studies and one panel study that have employed measurement error correction methods. These methods included regression calibration, risk set regression calibration, regression calibration with instrumental variables, the simulation extrapolation approach (SIMEX), and methods under the non-parametric or parameter bootstrap. Corrections resulted in small increases in the absolute magnitude of the health effect estimate and its standard error under most scenarios. Limited application of measurement error correction methods in air pollution studies may be attributed to the absence of exposure validation data and the methodological complexity of the proposed methods. Future epidemiological studies should consider in their design phase the requirements for the measurement error correction method to be later applied, while methodological advances are needed under the multi-pollutants setting.
Stamey, Timothy C.
1998-01-01
Simple and reliable methods for estimating hourly streamflow are needed for the calibration and verification of a Chattahoochee River basin model between Buford Dam and Franklin, Ga. The river basin model is being developed by Georgia Department of Natural Resources, Environmental Protection Division, as part of their Chattahoochee River Modeling Project. Concurrent streamflow data collected at 19 continuous-record, and 31 partial-record streamflow stations, were used in ordinary least-squares linear regression analyses to define estimating equations, and in verifying drainage-area prorations. The resulting regression or drainage-area ratio estimating equations were used to compute hourly streamflow at the partial-record stations. The coefficients of determination (r-squared values) for the regression estimating equations ranged from 0.90 to 0.99. Observed and estimated hourly and daily streamflow data were computed for May 1, 1995, through October 31, 1995. Comparisons of observed and estimated daily streamflow data for 12 continuous-record tributary stations, that had available streamflow data for all or part of the period from May 1, 1995, to October 31, 1995, indicate that the mean error of estimate for the daily streamflow was about 25 percent.
Flexible Meta-Regression to Assess the Shape of the Benzene–Leukemia Exposure–Response Curve
Vlaanderen, Jelle; Portengen, Lützen; Rothman, Nathaniel; Lan, Qing; Kromhout, Hans; Vermeulen, Roel
2010-01-01
Background Previous evaluations of the shape of the benzene–leukemia exposure–response curve (ERC) were based on a single set or on small sets of human occupational studies. Integrating evidence from all available studies that are of sufficient quality combined with flexible meta-regression models is likely to provide better insight into the functional relation between benzene exposure and risk of leukemia. Objectives We used natural splines in a flexible meta-regression method to assess the shape of the benzene–leukemia ERC. Methods We fitted meta-regression models to 30 aggregated risk estimates extracted from nine human observational studies and performed sensitivity analyses to assess the impact of a priori assessed study characteristics on the predicted ERC. Results The natural spline showed a supralinear shape at cumulative exposures less than 100 ppm-years, although this model fitted the data only marginally better than a linear model (p = 0.06). Stratification based on study design and jackknifing indicated that the cohort studies had a considerable impact on the shape of the ERC at high exposure levels (> 100 ppm-years) but that predicted risks for the low exposure range (< 50 ppm-years) were robust. Conclusions Although limited by the small number of studies and the large heterogeneity between studies, the inclusion of all studies of sufficient quality combined with a flexible meta-regression method provides the most comprehensive evaluation of the benzene–leukemia ERC to date. The natural spline based on all data indicates a significantly increased risk of leukemia [relative risk (RR) = 1.14; 95% confidence interval (CI), 1.04–1.26] at an exposure level as low as 10 ppm-years. PMID:20064779
A conceptual disease model for adult Pompe disease.
Kanters, Tim A; Redekop, W Ken; Rutten-Van Mölken, Maureen P M H; Kruijshaar, Michelle E; Güngör, Deniz; van der Ploeg, Ans T; Hakkaart, Leona
2015-09-15
Studies in orphan diseases are, by nature, confronted with small patient populations, meaning that randomized controlled trials will have limited statistical power. In order to estimate the effectiveness of treatments in orphan diseases and extrapolate effects into the future, alternative models might be needed. The purpose of this study is to develop a conceptual disease model for Pompe disease in adults (an orphan disease). This conceptual model describes the associations between the most important levels of health concepts for Pompe disease in adults, from biological parameters via physiological parameters, symptoms and functional indicators to health perceptions and final health outcomes as measured in terms of health-related quality of life. The structure of the Wilson-Cleary health outcomes model was used as a blueprint, and filled with clinically relevant aspects for Pompe disease based on literature and expert opinion. Multiple observations per patient from a Dutch cohort study in untreated patients were used to quantify the relationships between the different levels of health concepts in the model by means of regression analyses. Enzyme activity, muscle strength, respiratory function, fatigue, level of handicap, general health perceptions, mental and physical component scales and utility described the different levels of health concepts in the Wilson-Cleary model for Pompe disease. Regression analyses showed that functional status was affected by fatigue, muscle strength and respiratory function. Health perceptions were affected by handicap. In turn, self-reported quality of life was affected by health perceptions. We conceptualized a disease model that incorporated the mechanisms believed to be responsible for impaired quality of life in Pompe disease. The model provides a comprehensive overview of various aspects of Pompe disease in adults, which can be useful for both clinicians and policymakers to support their multi-faceted decision making.
Brown, C. Erwin
1993-01-01
Correlation analysis in conjunction with principal-component and multiple-regression analyses were applied to laboratory chemical and petrographic data to assess the usefulness of these techniques in evaluating selected physical and hydraulic properties of carbonate-rock aquifers in central Pennsylvania. Correlation and principal-component analyses were used to establish relations and associations among variables, to determine dimensions of property variation of samples, and to filter the variables containing similar information. Principal-component and correlation analyses showed that porosity is related to other measured variables and that permeability is most related to porosity and grain size. Four principal components are found to be significant in explaining the variance of data. Stepwise multiple-regression analysis was used to see how well the measured variables could predict porosity and (or) permeability for this suite of rocks. The variation in permeability and porosity is not totally predicted by the other variables, but the regression is significant at the 5% significance level. ?? 1993.
Malosetti, Marcos; Ribaut, Jean-Marcel; van Eeuwijk, Fred A.
2013-01-01
Genotype-by-environment interaction (GEI) is an important phenomenon in plant breeding. This paper presents a series of models for describing, exploring, understanding, and predicting GEI. All models depart from a two-way table of genotype by environment means. First, a series of descriptive and explorative models/approaches are presented: Finlay–Wilkinson model, AMMI model, GGE biplot. All of these approaches have in common that they merely try to group genotypes and environments and do not use other information than the two-way table of means. Next, factorial regression is introduced as an approach to explicitly introduce genotypic and environmental covariates for describing and explaining GEI. Finally, QTL modeling is presented as a natural extension of factorial regression, where marker information is translated into genetic predictors. Tests for regression coefficients corresponding to these genetic predictors are tests for main effect QTL expression and QTL by environment interaction (QEI). QTL models for which QEI depends on environmental covariables form an interesting model class for predicting GEI for new genotypes and new environments. For realistic modeling of genotypic differences across multiple environments, sophisticated mixed models are necessary to allow for heterogeneity of genetic variances and correlations across environments. The use and interpretation of all models is illustrated by an example data set from the CIMMYT maize breeding program, containing environments differing in drought and nitrogen stress. To help readers to carry out the statistical analyses, GenStat® programs, 15th Edition and Discovery® version, are presented as “Appendix.” PMID:23487515
Predicting stress in pre-registration nursing students.
Pryjmachuk, Steven; Richards, David A
2007-02-01
To determine which variables from a pool of potential predictors predict General Health Questionnaire 'caseness' in pre-registration nursing students. Cross-sectional survey, utilizing self-report measures of sources of stress, stress (psychological distress) and coping, together with pertinent demographic measures such as sex, ethnicity, educational programme and nursing specialty being pursued, and age, social class and highest qualifications on entry to the programme. Questionnaire packs were distributed to all pre-registration nursing students (N=1,362) in a large English university. Completed packs were coded, entered into statistical software and subjected to a series of logistic regression analyses. Of the questionnaire packs 1,005 (74%) were returned, of which up to 973 were available for the regression analyses undertaken. Four logistic regression models were considered and, on the principle of parsimony, a single model was chosen for discussion. This model suggested that the key predictors of caseness in the population studied were self-report of pressure, whether or not respondents had children (specifically, whether these children were pre-school or school-age), scores on a 'personal problems' scale and the type of coping employed. The overall caseness rate among the population was around one-third. Since self-report and personal, rather than academic, concerns predict stress, personal teachers need to play a key role in supporting students through 'active listening', especially when students self-report high levels of stress and where personal/social problems are evident. The work-life balance of students, especially those with child-care responsibilities, should be a central tenet in curriculum design in nurse education (and, indeed, the education of other professional and occupational groups). There may be some benefit in offering stress management (coping skills) training to nursing students and, indeed, students of other disciplines.
Use and misuse of motor-vehicle crash death rates in assessing highway-safety performance.
O'Neill, Brian; Kyrychenko, Sergey Y
2006-12-01
The objectives of the article are to assess the extent to which comparisons of motor-vehicle crash death rates can be used to determine the effectiveness of highway-safety policies over time in a country or to compare policy effectiveness across countries. Motor-vehicle crash death rates per mile traveled in the 50 U.S. states from 1980 to 2003 are used to show the influence on these rates of factors independent of highway-safety interventions. Multiple regression models relating state death rates to various measures related to urbanization and demographics are used. The analyses demonstrate strong relationships between state death rates and urbanization and demographics. Almost 60% of the variability among the state death rates can be explained by the independent variables in the multiple regression models. When the death rates for passenger vehicle occupants (i.e., excluding motorcycle, pedestrian, and other deaths) are used in the regression models, almost 70% of the variability in the rates can be explained by urbanization and demographics. The analyses presented in the article demonstrate that motor-vehicle crash death rates are strongly influenced by factors unrelated to highway-safety countermeasures. Overall death rates should not be used as a basis for judging the effectiveness (or ineffectiveness) of specific highway-safety countermeasures or to assess overall highway-safety policies, especially across jurisdictions. There can be no substitute for the use of carefully designed scientific evaluations of highway-safety interventions that use outcome measures directly related to the intervention; e.g., motorcyclist deaths should be used to assess the effectiveness of motorcycle helmet laws. While this may seem obvious, there are numerous examples in the literature of death rates from all crashes being used to assess the effectiveness of interventions aimed at specific subsets of crashes.
NASA Astrophysics Data System (ADS)
Camera, Corrado; Bruggeman, Adriana; Hadjinicolaou, Panos; Pashiardis, Stelios; Lange, Manfred A.
2014-01-01
High-resolution gridded daily data sets are essential for natural resource management and the analyses of climate changes and their effects. This study aims to evaluate the performance of 15 simple or complex interpolation techniques in reproducing daily precipitation at a resolution of 1 km2 over topographically complex areas. Methods are tested considering two different sets of observation densities and different rainfall amounts. We used rainfall data that were recorded at 74 and 145 observational stations, respectively, spread over the 5760 km2 of the Republic of Cyprus, in the Eastern Mediterranean. Regression analyses utilizing geographical copredictors and neighboring interpolation techniques were evaluated both in isolation and combined. Linear multiple regression (LMR) and geographically weighted regression methods (GWR) were tested. These included a step-wise selection of covariables, as well as inverse distance weighting (IDW), kriging, and 3D-thin plate splines (TPS). The relative rank of the different techniques changes with different station density and rainfall amounts. Our results indicate that TPS performs well for low station density and large-scale events and also when coupled with regression models. It performs poorly for high station density. The opposite is observed when using IDW. Simple IDW performs best for local events, while a combination of step-wise GWR and IDW proves to be the best method for large-scale events and high station density. This study indicates that the use of step-wise regression with a variable set of geographic parameters can improve the interpolation of large-scale events because it facilitates the representation of local climate dynamics.
Genome-wide regression and prediction with the BGLR statistical package.
Pérez, Paulino; de los Campos, Gustavo
2014-10-01
Many modern genomic data analyses require implementing regressions where the number of parameters (p, e.g., the number of marker effects) exceeds sample size (n). Implementing these large-p-with-small-n regressions poses several statistical and computational challenges, some of which can be confronted using Bayesian methods. This approach allows integrating various parametric and nonparametric shrinkage and variable selection procedures in a unified and consistent manner. The BGLR R-package implements a large collection of Bayesian regression models, including parametric variable selection and shrinkage methods and semiparametric procedures (Bayesian reproducing kernel Hilbert spaces regressions, RKHS). The software was originally developed for genomic applications; however, the methods implemented are useful for many nongenomic applications as well. The response can be continuous (censored or not) or categorical (either binary or ordinal). The algorithm is based on a Gibbs sampler with scalar updates and the implementation takes advantage of efficient compiled C and Fortran routines. In this article we describe the methods implemented in BGLR, present examples of the use of the package, and discuss practical issues emerging in real-data analysis. Copyright © 2014 by the Genetics Society of America.
NASA Astrophysics Data System (ADS)
Bernales, A. M.; Antolihao, J. A.; Samonte, C.; Campomanes, F.; Rojas, R. J.; dela Serna, A. M.; Silapan, J.
2016-06-01
The threat of the ailments related to urbanization like heat stress is very prevalent. There are a lot of things that can be done to lessen the effect of urbanization to the surface temperature of the area like using green roofs or planting trees in the area. So land use really matters in both increasing and decreasing surface temperature. It is known that there is a relationship between land use land cover (LULC) and land surface temperature (LST). Quantifying this relationship in terms of a mathematical model is very important so as to provide a way to predict LST based on the LULC alone. This study aims to examine the relationship between LST and LULC as well as to create a model that can predict LST using class-level spatial metrics from LULC. LST was derived from a Landsat 8 image and LULC classification was derived from LiDAR and Orthophoto datasets. Class-level spatial metrics were created in FRAGSTATS with the LULC and LST as inputs and these metrics were analysed using a statistical framework. Multi linear regression was done to create models that would predict LST for each class and it was found that the spatial metric "Effective mesh size" was a top predictor for LST in 6 out of 7 classes. The model created can still be refined by adding a temporal aspect by analysing the LST of another farming period (for rural areas) and looking for common predictors between LSTs of these two different farming periods.
A New Global Regression Analysis Method for the Prediction of Wind Tunnel Model Weight Corrections
NASA Technical Reports Server (NTRS)
Ulbrich, Norbert Manfred; Bridge, Thomas M.; Amaya, Max A.
2014-01-01
A new global regression analysis method is discussed that predicts wind tunnel model weight corrections for strain-gage balance loads during a wind tunnel test. The method determines corrections by combining "wind-on" model attitude measurements with least squares estimates of the model weight and center of gravity coordinates that are obtained from "wind-off" data points. The method treats the least squares fit of the model weight separate from the fit of the center of gravity coordinates. Therefore, it performs two fits of "wind- off" data points and uses the least squares estimator of the model weight as an input for the fit of the center of gravity coordinates. Explicit equations for the least squares estimators of the weight and center of gravity coordinates are derived that simplify the implementation of the method in the data system software of a wind tunnel. In addition, recommendations for sets of "wind-off" data points are made that take typical model support system constraints into account. Explicit equations of the confidence intervals on the model weight and center of gravity coordinates and two different error analyses of the model weight prediction are also discussed in the appendices of the paper.
A flexible count data regression model for risk analysis.
Guikema, Seth D; Coffelt, Jeremy P; Goffelt, Jeremy P
2008-02-01
In many cases, risk and reliability analyses involve estimating the probabilities of discrete events such as hardware failures and occurrences of disease or death. There is often additional information in the form of explanatory variables that can be used to help estimate the likelihood of different numbers of events in the future through the use of an appropriate regression model, such as a generalized linear model. However, existing generalized linear models (GLM) are limited in their ability to handle the types of variance structures often encountered in using count data in risk and reliability analysis. In particular, standard models cannot handle both underdispersed data (variance less than the mean) and overdispersed data (variance greater than the mean) in a single coherent modeling framework. This article presents a new GLM based on a reformulation of the Conway-Maxwell Poisson (COM) distribution that is useful for both underdispersed and overdispersed count data and demonstrates this model by applying it to the assessment of electric power system reliability. The results show that the proposed COM GLM can provide as good of fits to data as the commonly used existing models for overdispered data sets while outperforming these commonly used models for underdispersed data sets.
Optimizing methods for linking cinematic features to fMRI data.
Kauttonen, Janne; Hlushchuk, Yevhen; Tikka, Pia
2015-04-15
One of the challenges of naturalistic neurosciences using movie-viewing experiments is how to interpret observed brain activations in relation to the multiplicity of time-locked stimulus features. As previous studies have shown less inter-subject synchronization across viewers of random video footage than story-driven films, new methods need to be developed for analysis of less story-driven contents. To optimize the linkage between our fMRI data collected during viewing of a deliberately non-narrative silent film 'At Land' by Maya Deren (1944) and its annotated content, we combined the method of elastic-net regularization with the model-driven linear regression and the well-established data-driven independent component analysis (ICA) and inter-subject correlation (ISC) methods. In the linear regression analysis, both IC and region-of-interest (ROI) time-series were fitted with time-series of a total of 36 binary-valued and one real-valued tactile annotation of film features. The elastic-net regularization and cross-validation were applied in the ordinary least-squares linear regression in order to avoid over-fitting due to the multicollinearity of regressors, the results were compared against both the partial least-squares (PLS) regression and the un-regularized full-model regression. Non-parametric permutation testing scheme was applied to evaluate the statistical significance of regression. We found statistically significant correlation between the annotation model and 9 ICs out of 40 ICs. Regression analysis was also repeated for a large set of cubic ROIs covering the grey matter. Both IC- and ROI-based regression analyses revealed activations in parietal and occipital regions, with additional smaller clusters in the frontal lobe. Furthermore, we found elastic-net based regression more sensitive than PLS and un-regularized regression since it detected a larger number of significant ICs and ROIs. Along with the ISC ranking methods, our regression analysis proved a feasible method for ordering the ICs based on their functional relevance to the annotated cinematic features. The novelty of our method is - in comparison to the hypothesis-driven manual pre-selection and observation of some individual regressors biased by choice - in applying data-driven approach to all content features simultaneously. We found especially the combination of regularized regression and ICA useful when analyzing fMRI data obtained using non-narrative movie stimulus with a large set of complex and correlated features. Copyright © 2015. Published by Elsevier Inc.
Howard, Elizabeth J; Harville, Emily; Kissinger, Patricia; Xiong, Xu
2013-07-01
There is growing interest in the application of propensity scores (PS) in epidemiologic studies, especially within the field of reproductive epidemiology. This retrospective cohort study assesses the impact of a short interpregnancy interval (IPI) on preterm birth and compares the results of the conventional logistic regression analysis with analyses utilizing a PS. The study included 96,378 singleton infants from Louisiana birth certificate data (1995-2007). Five regression models designed for methods comparison are presented. Ten percent (10.17 %) of all births were preterm; 26.83 % of births were from a short IPI. The PS-adjusted model produced a more conservative estimate of the exposure variable compared to the conventional logistic regression method (β-coefficient: 0.21 vs. 0.43), as well as a smaller standard error (0.024 vs. 0.028), odds ratio and 95 % confidence intervals [1.15 (1.09, 1.20) vs. 1.23 (1.17, 1.30)]. The inclusion of more covariate and interaction terms in the PS did not change the estimates of the exposure variable. This analysis indicates that PS-adjusted regression may be appropriate for validation of conventional methods in a large dataset with a fairly common outcome. PS's may be beneficial in producing more precise estimates, especially for models with many confounders and effect modifiers and where conventional adjustment with logistic regression is unsatisfactory. Short intervals between pregnancies are associated with preterm birth in this population, according to either technique. Birth spacing is an issue that women have some control over. Educational interventions, including birth control, should be applied during prenatal visits and following delivery.
Lorenz, Alyson; Dhingra, Radhika; Chang, Howard H; Bisanzio, Donal; Liu, Yang; Remais, Justin V
2014-01-01
Extrapolating landscape regression models for use in assessing vector-borne disease risk and other applications requires thoughtful evaluation of fundamental model choice issues. To examine implications of such choices, an analysis was conducted to explore the extent to which disparate landscape models agree in their epidemiological and entomological risk predictions when extrapolated to new regions. Agreement between six literature-drawn landscape models was examined by comparing predicted county-level distributions of either Lyme disease or Ixodes scapularis vector using Spearman ranked correlation. AUC analyses and multinomial logistic regression were used to assess the ability of these extrapolated landscape models to predict observed national data. Three models based on measures of vegetation, habitat patch characteristics, and herbaceous landcover emerged as effective predictors of observed disease and vector distribution. An ensemble model containing these three models improved precision and predictive ability over individual models. A priori assessment of qualitative model characteristics effectively identified models that subsequently emerged as better predictors in quantitative analysis. Both a methodology for quantitative model comparison and a checklist for qualitative assessment of candidate models for extrapolation are provided; both tools aim to improve collaboration between those producing models and those interested in applying them to new areas and research questions.
Stata Modules for Calculating Novel Predictive Performance Indices for Logistic Models.
Barkhordari, Mahnaz; Padyab, Mojgan; Hadaegh, Farzad; Azizi, Fereidoun; Bozorgmanesh, Mohammadreza
2016-01-01
Prediction is a fundamental part of prevention of cardiovascular diseases (CVD). The development of prediction algorithms based on the multivariate regression models loomed several decades ago. Parallel with predictive models development, biomarker researches emerged in an impressively great scale. The key question is how best to assess and quantify the improvement in risk prediction offered by new biomarkers or more basically how to assess the performance of a risk prediction model. Discrimination, calibration, and added predictive value have been recently suggested to be used while comparing the predictive performances of the predictive models' with and without novel biomarkers. Lack of user-friendly statistical software has restricted implementation of novel model assessment methods while examining novel biomarkers. We intended, thus, to develop a user-friendly software that could be used by researchers with few programming skills. We have written a Stata command that is intended to help researchers obtain cut point-free and cut point-based net reclassification improvement index and (NRI) and relative and absolute Integrated discriminatory improvement index (IDI) for logistic-based regression analyses.We applied the commands to a real data on women participating the Tehran lipid and glucose study (TLGS) to examine if information of a family history of premature CVD, waist circumference, and fasting plasma glucose can improve predictive performance of the Framingham's "general CVD risk" algorithm. The command is addpred for logistic regression models. The Stata package provided herein can encourage the use of novel methods in examining predictive capacity of ever-emerging plethora of novel biomarkers.
NASA Astrophysics Data System (ADS)
Kutzbach, L.; Schneider, J.; Sachs, T.; Giebels, M.; Nykänen, H.; Shurpali, N. J.; Martikainen, P. J.; Alm, J.; Wilmking, M.
2007-11-01
Closed (non-steady state) chambers are widely used for quantifying carbon dioxide (CO2) fluxes between soils or low-stature canopies and the atmosphere. It is well recognised that covering a soil or vegetation by a closed chamber inherently disturbs the natural CO2 fluxes by altering the concentration gradients between the soil, the vegetation and the overlying air. Thus, the driving factors of CO2 fluxes are not constant during the closed chamber experiment, and no linear increase or decrease of CO2 concentration over time within the chamber headspace can be expected. Nevertheless, linear regression has been applied for calculating CO2 fluxes in many recent, partly influential, studies. This approach has been justified by keeping the closure time short and assuming the concentration change over time to be in the linear range. Here, we test if the application of linear regression is really appropriate for estimating CO2 fluxes using closed chambers over short closure times and if the application of nonlinear regression is necessary. We developed a nonlinear exponential regression model from diffusion and photosynthesis theory. This exponential model was tested with four different datasets of CO2 flux measurements (total number: 1764) conducted at three peatlands sites in Finland and a tundra site in Siberia. Thorough analyses of residuals demonstrated that linear regression was frequently not appropriate for the determination of CO2 fluxes by closed-chamber methods, even if closure times were kept short. The developed exponential model was well suited for nonlinear regression of the concentration over time c(t) evolution in the chamber headspace and estimation of the initial CO2 fluxes at closure time for the majority of experiments. However, a rather large percentage of the exponential regression functions showed curvatures not consistent with the theoretical model which is considered to be caused by violations of the underlying model assumptions. Especially the effects of turbulence and pressure disturbances by the chamber deployment are suspected to have caused unexplainable curvatures. CO2 flux estimates by linear regression can be as low as 40% of the flux estimates of exponential regression for closure times of only two minutes. The degree of underestimation increased with increasing CO2 flux strength and was dependent on soil and vegetation conditions which can disturb not only the quantitative but also the qualitative evaluation of CO2 flux dynamics. The underestimation effect by linear regression was observed to be different for CO2 uptake and release situations which can lead to stronger bias in the daily, seasonal and annual CO2 balances than in the individual fluxes. To avoid serious bias of CO2 flux estimates based on closed chamber experiments, we suggest further tests using published datasets and recommend the use of nonlinear regression models for future closed chamber studies.
No association between posture and musculoskeletal complaints in a professional bassist sample.
Woldendorp, K H; Boonstra, A M; Tijsma, A; Arendzen, J H; Reneman, M F
2016-03-01
Professional musicians receive little attention in pain medicine despite reports of high prevalence of musculoskeletal complaints. This study aims to investigate the association between work-related postures and musculoskeletal complaints of professional bass players. Participants were 141 professional and professional student double bassists and bass guitarists. Data about self-reported functioning, general and mental health status, location and intensity of musculoskeletal complaints and psychosocial distress were collected online with self constructed and existing questionnaires. Logistic regression analyses were performed to analyse associations between work-related postural stress (including type of instrument and accompanying specific exposures) and physical complains, adjusted for potential confounders. Logistic regression analyses revealed no association between complaints and the playing position of the left shoulder area in double bassists (p = 0.30), the right wrist area in the bass guitarists (p = 0.70), the right wrist area for the German versus French bowing style (p = 0.59). All three hypotheses were rejected. This study shows that in this sample of professional bass players' long-lasting exposures to postural stress were not associated with musculoskeletal complaints. This challenges a dominant model in pain medicine to focus on ergonomic postures. © 2015 European Pain Federation - EFIC®
O'Dwyer, Jean; Morris Downes, Margaret; Adley, Catherine C
2016-02-01
This study analyses the relationship between meteorological phenomena and outbreaks of waterborne-transmitted vero cytotoxin-producing Escherichia coli (VTEC) in the Republic of Ireland over an 8-year period (2005-2012). Data pertaining to the notification of waterborne VTEC outbreaks were extracted from the Computerised Infectious Disease Reporting system, which is administered through the national Health Protection Surveillance Centre as part of the Health Service Executive. Rainfall and temperature data were obtained from the national meteorological office and categorised as cumulative rainfall, heavy rainfall events in the previous 7 days, and mean temperature. Regression analysis was performed using logistic regression (LR) analysis. The LR model was significant (p < 0.001), with all independent variables: cumulative rainfall, heavy rainfall and mean temperature making a statistically significant contribution to the model. The study has found that rainfall, particularly heavy rainfall in the preceding 7 days of an outbreak, is a strong statistical indicator of a waterborne outbreak and that temperature also impacts waterborne VTEC outbreak occurrence.
Predictors of effects of lifestyle intervention on diabetes mellitus type 2 patients.
Jacobsen, Ramune; Vadstrup, Eva; Røder, Michael; Frølich, Anne
2012-01-01
The main aim of the study was to identify predictors of the effects of lifestyle intervention on diabetes mellitus type 2 patients by means of multivariate analysis. Data from a previously published randomised clinical trial, which compared the effects of a rehabilitation programme including standardised education and physical training sessions in the municipality's health care centre with the same duration of individual counseling in the diabetes outpatient clinic, were used. Data from 143 diabetes patients were analysed. The merged lifestyle intervention resulted in statistically significant improvements in patients' systolic blood pressure, waist circumference, exercise capacity, glycaemic control, and some aspects of general health-related quality of life. The linear multivariate regression models explained 45% to 80% of the variance in these improvements. The baseline outcomes in accordance to the logic of the regression to the mean phenomenon were the only statistically significant and robust predictors in all regression models. These results are important from a clinical point of view as they highlight the more urgent need for and better outcomes following lifestyle intervention for those patients who have worse general and disease-specific health.
Everyday attention lapses and memory failures: the affective consequences of mindlessness.
Carriere, Jonathan S A; Cheyne, J Allan; Smilek, Daniel
2008-09-01
We examined the affective consequences of everyday attention lapses and memory failures. Significant associations were found between self-report measures of attention lapses (MAAS-LO), attention-related cognitive errors (ARCES), and memory failures (MFS), on the one hand, and boredom (BPS) and depression (BDI-II), on the other. Regression analyses confirmed previous findings that the ARCES partially mediates the relation between the MAAS-LO and MFS. Further regression analyses also indicated that the association between the ARCES and BPS was entirely accounted for by the MAAS-LO and MFS, as was that between the ARCES and BDI-II. Structural modeling revealed the associations to be optimally explained by the MAAS-LO and MFS influencing the BPS and BDI-II, contrary to current conceptions of attention and memory problems as consequences of affective dysfunction. A lack of conscious awareness of one's actions, signaled by the propensity to experience brief lapses of attention and related memory failures, is thus seen as having significant consequences in terms of long-term affective well-being.
Love, Trust, and HIV Risk Among Female Sex Workers and Their Intimate Male Partners
Bazzi, Angela Robertson; Martinez, Gustavo; Rangel, M. Gudelia; Ulibarri, Monica D.; Fergus, Kirkpatrick B.; Amaro, Hortensia; Strathdee, Steffanie A.
2015-01-01
Objectives. We examined correlates of love and trust among female sex workers and their noncommercial male partners along the Mexico–US border. Methods. From 2011 to 2012, 322 partners in Tijuana and Ciudad Juárez, Mexico, completed assessments of love and trust. Cross-sectional dyadic regression analyses identified associations of relationship characteristics and HIV risk behaviors with love and trust. Results. Within 161 couples, love and trust scores were moderately high (median 70/95 and 29/40 points, respectively) and correlated with relationship satisfaction. In regression analyses of HIV risk factors, men and women who used methamphetamine reported lower love scores, whereas women who used heroin reported slightly higher love. In an alternate model, men with concurrent sexual partners had lower love scores. For both partners, relationship conflict was associated with lower trust. Conclusions. Love and trust are associated with relationship quality, sexual risk, and drug use patterns that shape intimate partners’ HIV risk. HIV interventions should consider the emotional quality of sex workers’ intimate relationships. PMID:26066947
Love, Trust, and HIV Risk Among Female Sex Workers and Their Intimate Male Partners.
Syvertsen, Jennifer L; Bazzi, Angela Robertson; Martinez, Gustavo; Rangel, M Gudelia; Ulibarri, Monica D; Fergus, Kirkpatrick B; Amaro, Hortensia; Strathdee, Steffanie A
2015-08-01
We examined correlates of love and trust among female sex workers and their noncommercial male partners along the Mexico-US border. From 2011 to 2012, 322 partners in Tijuana and Ciudad Juárez, Mexico, completed assessments of love and trust. Cross-sectional dyadic regression analyses identified associations of relationship characteristics and HIV risk behaviors with love and trust. Within 161 couples, love and trust scores were moderately high (median 70/95 and 29/40 points, respectively) and correlated with relationship satisfaction. In regression analyses of HIV risk factors, men and women who used methamphetamine reported lower love scores, whereas women who used heroin reported slightly higher love. In an alternate model, men with concurrent sexual partners had lower love scores. For both partners, relationship conflict was associated with lower trust. Love and trust are associated with relationship quality, sexual risk, and drug use patterns that shape intimate partners' HIV risk. HIV interventions should consider the emotional quality of sex workers' intimate relationships.
An Analysis of Advertising Effectiveness for U.S. Navy Recruiting
1997-09-01
This thesis estimates the effect of Navy television advertising on enlistment rates of high quality male recruits (Armed Forces Qualification Test...Joint advertising is for all Armed Forces), Joint journal, and Joint direct mail advertising are explored. Enlistments are modeled as a function of...several factors including advertising , recruiters, and economic. Regression analyses (Ordinary Least Squares and Two Stage Least Squares) explore the
ERIC Educational Resources Information Center
Hansen, Pernille
2017-01-01
This article analyses how a set of psycholinguistic factors may account for children's lexical development. Age of acquisition is compared to a measure of lexical development based on vocabulary size rather than age, and robust regression models are used to assess the individual and joint effects of word class, frequency, imageability and…
ERIC Educational Resources Information Center
Swygert, Kimberly A.
In this study, data from an operational computerized adaptive test (CAT) were examined in order to gather information concerning item response times in a CAT environment. The CAT under study included multiple-choice items measuring verbal, quantitative, and analytical reasoning. The analyses included the fitting of regression models describing the…
ERIC Educational Resources Information Center
Kaljee, Linda M.; Green, Mackenzie S.; Zhan, Min; Riel, Rosemary; Lerdboon, Porntip; Lostutter, Ty W.; Tho, Le Huu; Luong, Vo Van; Minh, Truong Tan
2011-01-01
A randomly selected cross-sectional survey was conducted with 880 youth (16 to 24 years) in Nha Trang City to assess relationships between alcohol consumption and sexual behaviors. A timeline followback method was employed. Chi-square, generalized logit modeling and logistic regression analyses were performed. Of the sample, 78.2% male and 56.1%…
Hunter, Paul R
2009-12-01
Household water treatment (HWT) is being widely promoted as an appropriate intervention for reducing the burden of waterborne disease in poor communities in developing countries. A recent study has raised concerns about the effectiveness of HWT, in part because of concerns over the lack of blinding and in part because of considerable heterogeneity in the reported effectiveness of randomized controlled trials. This study set out to attempt to investigate the causes of this heterogeneity and so identify factors associated with good health gains. Studies identified in an earlier systematic review and meta-analysis were supplemented with more recently published randomized controlled trials. A total of 28 separate studies of randomized controlled trials of HWT with 39 intervention arms were included in the analysis. Heterogeneity was studied using the "metareg" command in Stata. Initial analyses with single candidate predictors were undertaken and all variables significant at the P < 0.2 level were included in a final regression model. Further analyses were done to estimate the effect of the interventions over time by MonteCarlo modeling using @Risk and the parameter estimates from the final regression model. The overall effect size of all unblinded studies was relative risk = 0.56 (95% confidence intervals 0.51-0.63), but after adjusting for bias due to lack of blinding the effect size was much lower (RR = 0.85, 95% CI = 0.76-0.97). Four main variables were significant predictors of effectiveness of intervention in a multipredictor meta regression model: Log duration of study follow-up (regression coefficient of log effect size = 0.186, standard error (SE) = 0.072), whether or not the study was blinded (coefficient 0.251, SE 0.066) and being conducted in an emergency setting (coefficient -0.351, SE 0.076) were all significant predictors of effect size in the final model. Compared to the ceramic filter all other interventions were much less effective (Biosand 0.247, 0.073; chlorine and safe waste storage 0.295, 0.061; combined coagulant-chlorine 0.2349, 0.067; SODIS 0.302, 0.068). A Monte Carlo model predicted that over 12 months ceramic filters were likely to be still effective at reducing disease, whereas SODIS, chlorination, and coagulation-chlorination had little if any benefit. Indeed these three interventions are predicted to have the same or less effect than what may be expected due purely to reporting bias in unblinded studies With the currently available evidence ceramic filters are the most effective form of HWT in the longterm, disinfection-only interventions including SODIS appear to have poor if any longterm public health benefit.
Is complex allometry in field metabolic rates of mammals a statistical artifact?
Packard, Gary C
2017-01-01
Recent reports indicate that field metabolic rates (FMRs) of mammals conform to a pattern of complex allometry in which the exponent in a simple, two-parameter power equation increases steadily as a dependent function of body mass. The reports were based, however, on indirect analyses performed on logarithmic transformations of the original data. I re-examined values for FMR and body mass for 114 species of mammal by the conventional approach to allometric analysis (to illustrate why the approach is unreliable) and by linear and nonlinear regression on untransformed variables (to illustrate the power and versatility of newer analytical methods). The best of the regression models fitted directly to untransformed observations is a three-parameter power equation with multiplicative, lognormal, heteroscedastic error and an allometric exponent of 0.82. The mean function is a good fit to data in graphical display. The significant intercept in the model may simply have gone undetected in prior analyses because conventional allometry assumes implicitly that the intercept is zero; or the intercept may be a spurious finding resulting from bias introduced by the haphazard sampling that underlies "exploratory" analyses like the one reported here. The aforementioned issues can be resolved only by gathering new data specifically intended to address the question of scaling of FMR with body mass in mammals. However, there is no support for the concept of complex allometry in the relationship between FMR and body size in mammals. Copyright © 2016 Elsevier Inc. All rights reserved.
Meta-analysis of the effect of road safety campaigns on accidents.
Phillips, Ross Owen; Ulleberg, Pål; Vaa, Truls
2011-05-01
A meta-analysis of 67 studies evaluating the effect of road safety campaigns on accidents is reported. A total of 119 results were extracted from the studies, which were reported in 12 different countries between 1975 and 2007. After allowing for publication bias and heterogeneity of effects, the weighted average effect of road safety campaigns is a 9% reduction in accidents (with 95% confidence that the weighted average is between -12 and -6%). To account for the variability of effects measured across studies, data were collected to characterise aspects of the campaign and evaluation design associated with each effect, and analysed to identify a model of seven campaign factors for testing by meta-regression. The model was tested using both fixed and random effect meta-regression, and dependency among effects was accounted for by aggregation. These analyses suggest positive associations between accident reduction and the use of personal communication or roadside media as part of a campaign delivery strategy. Campaigns with a drink-driving theme were also associated with greater accident reductions, while some of the analyses suggested that accompanying enforcement and short campaign duration (less than one month) are beneficial. Overall the results are consistent with the idea that campaigns can be more effective in the short term if the message is delivered with personal communication in a way that is proximal in space and time to the behaviour targeted by the campaign. Copyright © 2011 Elsevier Ltd. All rights reserved.
Valeri, A; Briollais, L; Azzouzi, R; Fournier, G; Mangin, P; Berthon, P; Cussenot, O; Demenais, F
2003-03-01
Four segregation analyses concerning prostate cancer (CaP), three conducted in the United States and one in Northern Europe, have shown evidence for a dominant major gene but with different parameter estimates. A recent segregation analysis of Australian pedigrees has found a better fit of a two-locus model than single-locus models. This model included a dominantly inherited increased risk that was greater at younger ages and a recessively inherited or X-linked increased risk that was greater at older ages. Recent linkage analyses have led to the detection of at least 8 CaP predisposing genes, suggesting a complex inheritance and genetic heterogeneity. To assess the nature of familial aggregation of prostate cancer in France, segregation analysis was conducted in 691 families ascertained through 691 CaP patients, recruited from three French hospitals and unselected with respect to age at diagnosis, clinical stage or family history. This mode of family inclusion, without any particular selection of the probands, is unique, as probands from all previous analyses were selected according to various criteria. Segregation analysis was carried out using the logistic hazard regressive model, as incorporated in the REGRESS program, which can accommodate a major gene effect, residual familial dependences of any origin (genetic and/or environmental), and covariates, while including survival analysis concepts. Segregation analysis showed evidence for the segregation of an autosomal dominant gene (allele frequency of 0.03%) with an additional brother-brother dependence. The estimated cumulative risks of prostate cancer by age 85 years, among subjects with the at-risk genotype, were 86% in the fathers' generation and 99% in the probands' generation. This study supports the model of Mendelian transmission of a rare autosomal dominant gene with high penetrance, and demonstrates that additional genetic and/or common sibling environmental factors are involved to account for the familial clustering of CaP.
Temporal trends in sperm count: a systematic review and meta-regression analysis.
Levine, Hagai; Jørgensen, Niels; Martino-Andrade, Anderson; Mendiola, Jaime; Weksler-Derri, Dan; Mindlis, Irina; Pinotti, Rachel; Swan, Shanna H
2017-11-01
Reported declines in sperm counts remain controversial today and recent trends are unknown. A definitive meta-analysis is critical given the predictive value of sperm count for fertility, morbidity and mortality. To provide a systematic review and meta-regression analysis of recent trends in sperm counts as measured by sperm concentration (SC) and total sperm count (TSC), and their modification by fertility and geographic group. PubMed/MEDLINE and EMBASE were searched for English language studies of human SC published in 1981-2013. Following a predefined protocol 7518 abstracts were screened and 2510 full articles reporting primary data on SC were reviewed. A total of 244 estimates of SC and TSC from 185 studies of 42 935 men who provided semen samples in 1973-2011 were extracted for meta-regression analysis, as well as information on years of sample collection and covariates [fertility group ('Unselected by fertility' versus 'Fertile'), geographic group ('Western', including North America, Europe Australia and New Zealand versus 'Other', including South America, Asia and Africa), age, ejaculation abstinence time, semen collection method, method of measuring SC and semen volume, exclusion criteria and indicators of completeness of covariate data]. The slopes of SC and TSC were estimated as functions of sample collection year using both simple linear regression and weighted meta-regression models and the latter were adjusted for pre-determined covariates and modification by fertility and geographic group. Assumptions were examined using multiple sensitivity analyses and nonlinear models. SC declined significantly between 1973 and 2011 (slope in unadjusted simple regression models -0.70 million/ml/year; 95% CI: -0.72 to -0.69; P < 0.001; slope in adjusted meta-regression models = -0.64; -1.06 to -0.22; P = 0.003). The slopes in the meta-regression model were modified by fertility (P for interaction = 0.064) and geographic group (P for interaction = 0.027). There was a significant decline in SC between 1973 and 2011 among Unselected Western (-1.38; -2.02 to -0.74; P < 0.001) and among Fertile Western (-0.68; -1.31 to -0.05; P = 0.033), while no significant trends were seen among Unselected Other and Fertile Other. Among Unselected Western studies, the mean SC declined, on average, 1.4% per year with an overall decline of 52.4% between 1973 and 2011. Trends for TSC and SC were similar, with a steep decline among Unselected Western (-5.33 million/year, -7.56 to -3.11; P < 0.001), corresponding to an average decline in mean TSC of 1.6% per year and overall decline of 59.3%. Results changed minimally in multiple sensitivity analyses, and there was no statistical support for the use of a nonlinear model. In a model restricted to data post-1995, the slope both for SC and TSC among Unselected Western was similar to that for the entire period (-2.06 million/ml, -3.38 to -0.74; P = 0.004 and -8.12 million, -13.73 to -2.51, P = 0.006, respectively). This comprehensive meta-regression analysis reports a significant decline in sperm counts (as measured by SC and TSC) between 1973 and 2011, driven by a 50-60% decline among men unselected by fertility from North America, Europe, Australia and New Zealand. Because of the significant public health implications of these results, research on the causes of this continuing decline is urgently needed. © The Author 2017. Published by Oxford University Press on behalf of the European Society of Human Reproduction and Embryology. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Azeez, Adeboye; Obaromi, Davies; Odeyemi, Akinwumi; Ndege, James; Muntabayi, Ruffin
2016-07-26
Tuberculosis (TB) is a deadly infectious disease caused by Mycobacteria tuberculosis. Tuberculosis as a chronic and highly infectious disease is prevalent in almost every part of the globe. More than 95% of TB mortality occurs in low/middle income countries. In 2014, approximately 10 million people were diagnosed with active TB and two million died from the disease. In this study, our aim is to compare the predictive powers of the seasonal autoregressive integrated moving average (SARIMA) and neural network auto-regression (SARIMA-NNAR) models of TB incidence and analyse its seasonality in South Africa. TB incidence cases data from January 2010 to December 2015 were extracted from the Eastern Cape Health facility report of the electronic Tuberculosis Register (ERT.Net). A SARIMA model and a combined model of SARIMA model and a neural network auto-regression (SARIMA-NNAR) model were used in analysing and predicting the TB data from 2010 to 2015. Simulation performance parameters of mean square error (MSE), root mean square error (RMSE), mean absolute error (MAE), mean percent error (MPE), mean absolute scaled error (MASE) and mean absolute percentage error (MAPE) were applied to assess the better performance of prediction between the models. Though practically, both models could predict TB incidence, the combined model displayed better performance. For the combined model, the Akaike information criterion (AIC), second-order AIC (AICc) and Bayesian information criterion (BIC) are 288.56, 308.31 and 299.09 respectively, which were lower than the SARIMA model with corresponding values of 329.02, 327.20 and 341.99, respectively. The seasonality trend of TB incidence was forecast to have a slightly increased seasonal TB incidence trend from the SARIMA-NNAR model compared to the single model. The combined model indicated a better TB incidence forecasting with a lower AICc. The model also indicates the need for resolute intervention to reduce infectious disease transmission with co-infection with HIV and other concomitant diseases, and also at festival peak periods.
Chung, Sang M; Lee, David J; Hand, Austin; Young, Philip; Vaidyanathan, Jayabharathi; Sahajwalla, Chandrahas
2015-12-01
The study evaluated whether the renal function decline rate per year with age in adults varies based on two primary statistical analyses: cross-section (CS), using one observation per subject, and longitudinal (LT), using multiple observations per subject over time. A total of 16628 records (3946 subjects; age range 30-92 years) of creatinine clearance and relevant demographic data were used. On average, four samples per subject were collected for up to 2364 days (mean: 793 days). A simple linear regression and random coefficient models were selected for CS and LT analyses, respectively. The renal function decline rates per year were 1.33 and 0.95 ml/min/year for CS and LT analyses, respectively, and were slower when the repeated individual measurements were considered. The study confirms that rates are different based on statistical analyses, and that a statistically robust longitudinal model with a proper sampling design provides reliable individual as well as population estimates of the renal function decline rates per year with age in adults. In conclusion, our findings indicated that one should be cautious in interpreting the renal function decline rate with aging information because its estimation was highly dependent on the statistical analyses. From our analyses, a population longitudinal analysis (e.g. random coefficient model) is recommended if individualization is critical, such as a dose adjustment based on renal function during a chronic therapy. Copyright © 2015 John Wiley & Sons, Ltd.
Numeric promoter description - A comparative view on concepts and general application.
Beier, Rico; Labudde, Dirk
2016-01-01
Nucleic acid molecules play a key role in a variety of biological processes. Starting from storage and transfer tasks, this also comprises the triggering of biological processes, regulatory effects and the active influence gained by target binding. Based on the experimental output (in this case promoter sequences), further in silico analyses aid in gaining new insights into these processes and interactions. The numerical description of nucleic acids thereby constitutes a bridge between the concrete biological issues and the analytical methods. Hence, this study compares 26 descriptor sets obtained by applying well-known numerical description concepts to an established dataset of 38 DNA promoter sequences. The suitability of the description sets was evaluated by computing partial least squares regression models and assessing the model accuracy. We conclude that the major importance regarding the descriptive power is attached to positional information rather than to explicitly incorporated physico-chemical information, since a sufficient amount of implicit physico-chemical information is already encoded in the nucleobase classification. The regression models especially benefited from employing the information that is encoded in the sequential and structural neighborhood of the nucleobases. Thus, the analyses of n-grams (short fragments of length n) suggested that they are valuable descriptors for DNA target interactions. A mixed n-gram descriptor set thereby yielded the best description of the promoter sequences. The corresponding regression model was checked and found to be plausible as it was able to reproduce the characteristic binding motifs of promoter sequences in a reasonable degree. As most functional nucleic acids are based on the principle of molecular recognition, the findings are not restricted to promoter sequences, but can rather be transferred to other kinds of functional nucleic acids. Thus, the concepts presented in this study could provide advantages for future nucleic acid-based technologies, like biosensoring, therapeutics and molecular imaging. Copyright © 2015 Elsevier Inc. All rights reserved.
Coker, Freya; Williams, Cylie M; Taylor, Nicholas F; Caspers, Kirsten; McAlinden, Fiona; Wilton, Anita; Shields, Nora; Haines, Terry P
2018-05-10
This protocol considers three allied health staffing models across public health subacute hospitals. This quasi-experimental mixed-methods study, including qualitative process evaluation, aims to evaluate the impact of additional allied health services in subacute care, in rehabilitation and geriatric evaluation management settings, on patient, health service and societal outcomes. This health services research will analyse outcomes of patients exposed to different allied health models of care at three health services. Each health service will have a control ward (routine care) and an intervention ward (additional allied health). This project has two parts. Part 1: a whole of site data extraction for included wards. Outcome measures will include: length of stay, rate of readmissions, discharge destinations, community referrals, patient feedback and staff perspectives. Part 2: Functional Independence Measure scores will be collected every 2-3 days for the duration of 60 patient admissions.Data from part 1 will be analysed by linear regression analysis for continuous outcomes using patient-level data and logistic regression analysis for binary outcomes. Qualitative data will be analysed using a deductive thematic approach. For part 2, a linear mixed model analysis will be conducted using therapy service delivery and days since admission to subacute care as fixed factors in the model and individual participant as a random factor. Graphical analysis will be used to examine the growth curve of the model and transformations. The days since admission factor will be used to examine non-linear growth trajectories to determine if they lead to better model fit. Findings will be disseminated through local reports and to the Department of Health and Human Services Victoria. Results will be presented at conferences and submitted to peer-reviewed journals. The Monash Health Human Research Ethics committee approved this multisite research (HREC/17/MonH/144 and HREC/17/MonH/547). © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Orsini, C; Binnie, V; Wilson, S; Villegas, M J
2018-05-01
The aim of this study was to test the mediating role of the satisfaction of dental students' basic psychological needs of autonomy, competence and relatedness on the association between learning climate, feedback and student motivation. The latter was based on the self-determination theory's concepts of differentiation of autonomous motivation, controlled motivation and amotivation. A cross-sectional correlational study was conducted where 924 students completed self-reported questionnaires measuring motivation, perception of the learning climate, feedback and basic psychological needs satisfaction. Descriptive statistics, Cronbach's alpha scores and bivariate correlations were computed. Mediation of basic needs on each predictor-outcome association was tested based on a series of regression analyses. Finally, all variables were integrated into one structural equation model, controlling for the effects of age, gender and year of study. Cronbach's alpha scores were acceptable (.655 to .905). Correlation analyses showed positive and significant associations between both an autonomy-supportive learning climate and the quantity and quality of feedback received, and students' autonomous motivation, which decreased and became negative when correlated with controlled motivation and amotivation, respectively. Regression analyses revealed that these associations were indirect and mediated by how these predictors satisfied students' basic psychological needs. These results were corroborated by the structural equation analysis, in which data fit the model well and regression paths were in the expected direction. An autonomy-supportive learning climate and the quantity and quality of feedback were positive predictors of students' autonomous motivation and negative predictors of amotivation. However, this was an indirect association mediated by the satisfaction of students' basic psychological needs. Consequently, supporting students' needs of autonomy, competence and relatedness might lead to optimal types of motivation, which has an important influence on dental education. © 2017 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Li, Huixia; Luo, Miyang; Zheng, Jianfei; Luo, Jiayou; Zeng, Rong; Feng, Na; Du, Qiyun; Fang, Junqun
2017-02-01
An artificial neural network (ANN) model was developed to predict the risks of congenital heart disease (CHD) in pregnant women.This hospital-based case-control study involved 119 CHD cases and 239 controls all recruited from birth defect surveillance hospitals in Hunan Province between July 2013 and June 2014. All subjects were interviewed face-to-face to fill in a questionnaire that covered 36 CHD-related variables. The 358 subjects were randomly divided into a training set and a testing set at the ratio of 85:15. The training set was used to identify the significant predictors of CHD by univariate logistic regression analyses and develop a standard feed-forward back-propagation neural network (BPNN) model for the prediction of CHD. The testing set was used to test and evaluate the performance of the ANN model. Univariate logistic regression analyses were performed on SPSS 18.0. The ANN models were developed on Matlab 7.1.The univariate logistic regression identified 15 predictors that were significantly associated with CHD, including education level (odds ratio = 0.55), gravidity (1.95), parity (2.01), history of abnormal reproduction (2.49), family history of CHD (5.23), maternal chronic disease (4.19), maternal upper respiratory tract infection (2.08), environmental pollution around maternal dwelling place (3.63), maternal exposure to occupational hazards (3.53), maternal mental stress (2.48), paternal chronic disease (4.87), paternal exposure to occupational hazards (2.51), intake of vegetable/fruit (0.45), intake of fish/shrimp/meat/egg (0.59), and intake of milk/soymilk (0.55). After many trials, we selected a 3-layer BPNN model with 15, 12, and 1 neuron in the input, hidden, and output layers, respectively, as the best prediction model. The prediction model has accuracies of 0.91 and 0.86 on the training and testing sets, respectively. The sensitivity, specificity, and Yuden Index on the testing set (training set) are 0.78 (0.83), 0.90 (0.95), and 0.68 (0.78), respectively. The areas under the receiver operating curve on the testing and training sets are 0.87 and 0.97, respectively.This study suggests that the BPNN model could be used to predict the risk of CHD in individuals. This model should be further improved by large-sample-size research.
Covariate Imbalance and Adjustment for Logistic Regression Analysis of Clinical Trial Data
Ciolino, Jody D.; Martin, Reneé H.; Zhao, Wenle; Jauch, Edward C.; Hill, Michael D.; Palesch, Yuko Y.
2014-01-01
In logistic regression analysis for binary clinical trial data, adjusted treatment effect estimates are often not equivalent to unadjusted estimates in the presence of influential covariates. This paper uses simulation to quantify the benefit of covariate adjustment in logistic regression. However, International Conference on Harmonization guidelines suggest that covariate adjustment be pre-specified. Unplanned adjusted analyses should be considered secondary. Results suggest that that if adjustment is not possible or unplanned in a logistic setting, balance in continuous covariates can alleviate some (but never all) of the shortcomings of unadjusted analyses. The case of log binomial regression is also explored. PMID:24138438
Empirical and semi-analytical models for predicting peak outflows caused by embankment dam failures
NASA Astrophysics Data System (ADS)
Wang, Bo; Chen, Yunliang; Wu, Chao; Peng, Yong; Song, Jiajun; Liu, Wenjun; Liu, Xin
2018-07-01
Prediction of peak discharge of floods has attracted great attention for researchers and engineers. In present study, nine typical nonlinear mathematical models are established based on database of 40 historical dam failures. The first eight models that were developed with a series of regression analyses are purely empirical, while the last one is a semi-analytical approach that was derived from an analytical solution of dam-break floods in a trapezoidal channel. Water depth above breach invert (Hw), volume of water stored above breach invert (Vw), embankment length (El), and average embankment width (Ew) are used as independent variables to develop empirical formulas of estimating the peak outflow from breached embankment dams. It is indicated from the multiple regression analysis that a function using the former two variables (i.e., Hw and Vw) produce considerably more accurate results than that using latter two variables (i.e., El and Ew). It is shown that the semi-analytical approach works best in terms of both prediction accuracy and uncertainty, and the established empirical models produce considerably reasonable results except the model only using El. Moreover, present models have been compared with other models available in literature for estimating peak discharge.
Spatial distribution of psychotic disorders in an urban area of France: an ecological study.
Pignon, Baptiste; Schürhoff, Franck; Baudin, Grégoire; Ferchiou, Aziz; Richard, Jean-Romain; Saba, Ghassen; Leboyer, Marion; Kirkbride, James B; Szöke, Andrei
2016-05-18
Previous analyses of neighbourhood variations of non-affective psychotic disorders (NAPD) have focused mainly on incidence. However, prevalence studies provide important insights on factors associated with disease evolution as well as for healthcare resource allocation. This study aimed to investigate the distribution of prevalent NAPD cases in an urban area in France. The number of cases in each neighbourhood was modelled as a function of potential confounders and ecological variables, namely: migrant density, economic deprivation and social fragmentation. This was modelled using statistical models of increasing complexity: frequentist models (using Poisson and negative binomial regressions), and several Bayesian models. For each model, assumptions validity were checked and compared as to how this fitted to the data, in order to test for possible spatial variation in prevalence. Data showed significant overdispersion (invalidating the Poisson regression model) and residual autocorrelation (suggesting the need to use Bayesian models). The best Bayesian model was Leroux's model (i.e. a model with both strong correlation between neighbouring areas and weaker correlation between areas further apart), with economic deprivation as an explanatory variable (OR = 1.13, 95% CI [1.02-1.25]). In comparison with frequentist methods, the Bayesian model showed a better fit. The number of cases showed non-random spatial distribution and was linked to economic deprivation.
Secure Logistic Regression Based on Homomorphic Encryption: Design and Evaluation
Song, Yongsoo; Wang, Shuang; Xia, Yuhou; Jiang, Xiaoqian
2018-01-01
Background Learning a model without accessing raw data has been an intriguing idea to security and machine learning researchers for years. In an ideal setting, we want to encrypt sensitive data to store them on a commercial cloud and run certain analyses without ever decrypting the data to preserve privacy. Homomorphic encryption technique is a promising candidate for secure data outsourcing, but it is a very challenging task to support real-world machine learning tasks. Existing frameworks can only handle simplified cases with low-degree polynomials such as linear means classifier and linear discriminative analysis. Objective The goal of this study is to provide a practical support to the mainstream learning models (eg, logistic regression). Methods We adapted a novel homomorphic encryption scheme optimized for real numbers computation. We devised (1) the least squares approximation of the logistic function for accuracy and efficiency (ie, reduce computation cost) and (2) new packing and parallelization techniques. Results Using real-world datasets, we evaluated the performance of our model and demonstrated its feasibility in speed and memory consumption. For example, it took approximately 116 minutes to obtain the training model from the homomorphically encrypted Edinburgh dataset. In addition, it gives fairly accurate predictions on the testing dataset. Conclusions We present the first homomorphically encrypted logistic regression outsourcing model based on the critical observation that the precision loss of classification models is sufficiently small so that the decision plan stays still. PMID:29666041
QSAR modeling of flotation collectors using principal components extracted from topological indices.
Natarajan, R; Nirdosh, Inderjit; Basak, Subhash C; Mills, Denise R
2002-01-01
Several topological indices were calculated for substituted-cupferrons that were tested as collectors for the froth flotation of uranium. The principal component analysis (PCA) was used for data reduction. Seven principal components (PC) were found to account for 98.6% of the variance among the computed indices. The principal components thus extracted were used in stepwise regression analyses to construct regression models for the prediction of separation efficiencies (Es) of the collectors. A two-parameter model with a correlation coefficient of 0.889 and a three-parameter model with a correlation coefficient of 0.913 were formed. PCs were found to be better than partition coefficient to form regression equations, and inclusion of an electronic parameter such as Hammett sigma or quantum mechanically derived electronic charges on the chelating atoms did not improve the correlation coefficient significantly. The method was extended to model the separation efficiencies of mercaptobenzothiazoles (MBT) and aminothiophenols (ATP) used in the flotation of lead and zinc ores, respectively. Five principal components were found to explain 99% of the data variability in each series. A three-parameter equation with correlation coefficient of 0.985 and a two-parameter equation with correlation coefficient of 0.926 were obtained for MBT and ATP, respectively. The amenability of separation efficiencies of chelating collectors to QSAR modeling using PCs based on topological indices might lead to the selection of collectors for synthesis and testing from a virtual database.
An Empirical Study of Eight Nonparametric Tests in Hierarchical Regression.
ERIC Educational Resources Information Center
Harwell, Michael; Serlin, Ronald C.
When normality does not hold, nonparametric tests represent an important data-analytic alternative to parametric tests. However, the use of nonparametric tests in educational research has been limited by the absence of easily performed tests for complex experimental designs and analyses, such as factorial designs and multiple regression analyses,…
How Many Subjects Does It Take to Do a Regression Analysis?
ERIC Educational Resources Information Center
Green, Samuel B.
1991-01-01
An evaluation of the rules-of-thumb used to determine the minimum number of subjects required to conduct multiple regression analyses suggests that researchers who use a rule of thumb rather than power analyses trade simplicity of use for accuracy and specificity of response. Insufficient power is likely to result. (SLD)
Tenero, David; Green, Justin A; Goyal, Navin
2015-10-01
Tafenoquine (TQ), a new 8-aminoquinoline with activity against all stages of the Plasmodium vivax life cycle, is being developed for the radical cure of acute P. vivax malaria in combination with chloroquine. The efficacy and exposure data from a pivotal phase 2b dose-ranging study were used to conduct exposure-response analyses for TQ after administration to subjects with P. vivax malaria. TQ exposure (i.e., area under the concentration-time curve [AUC]) and region (Thailand compared to Peru and Brazil) were found to be statistically significant predictors of clinical response based on multivariate logistic regression analyses. After accounting for region/country, the odds of being relapse free at 6 months increased by approximately 51% (95% confidence intervals [CI], 25%, 82%) for each 25-U increase in AUC above the median value of 54.5 μg · h/ml. TQ exposure was also a significant predictor of the time to relapse of the infection. The final parametric, time-to-event model for the time to relapse, included a Weibull distribution hazard function, AUC, and country as covariates. Based on the model, the risk of relapse decreased by 30% (95% CI, 17% to 42%) for every 25-U increase in AUC. Monte Carlo simulations indicated that the 300-mg dose of TQ would provide an AUC greater than the clinically relevant breakpoint obtained in a classification and regression tree (CART) analysis (56.4 μg · h/ml) in more than 90% of subjects and consequently result in a high probability of being relapse free at 6 months. This model-based approach was critical in selecting an appropriate phase 3 dose. (This study has been registered at ClinicalTrials.gov under registration no. NCT01376167.). Copyright © 2015, American Society for Microbiology. All Rights Reserved.
Green, Justin A.; Goyal, Navin
2015-01-01
Tafenoquine (TQ), a new 8-aminoquinoline with activity against all stages of the Plasmodium vivax life cycle, is being developed for the radical cure of acute P. vivax malaria in combination with chloroquine. The efficacy and exposure data from a pivotal phase 2b dose-ranging study were used to conduct exposure-response analyses for TQ after administration to subjects with P. vivax malaria. TQ exposure (i.e., area under the concentration-time curve [AUC]) and region (Thailand compared to Peru and Brazil) were found to be statistically significant predictors of clinical response based on multivariate logistic regression analyses. After accounting for region/country, the odds of being relapse free at 6 months increased by approximately 51% (95% confidence intervals [CI], 25%, 82%) for each 25-U increase in AUC above the median value of 54.5 μg · h/ml. TQ exposure was also a significant predictor of the time to relapse of the infection. The final parametric, time-to-event model for the time to relapse, included a Weibull distribution hazard function, AUC, and country as covariates. Based on the model, the risk of relapse decreased by 30% (95% CI, 17% to 42%) for every 25-U increase in AUC. Monte Carlo simulations indicated that the 300-mg dose of TQ would provide an AUC greater than the clinically relevant breakpoint obtained in a classification and regression tree (CART) analysis (56.4 μg · h/ml) in more than 90% of subjects and consequently result in a high probability of being relapse free at 6 months. This model-based approach was critical in selecting an appropriate phase 3 dose. (This study has been registered at ClinicalTrials.gov under registration no. NCT01376167.) PMID:26248362
Hydrology and trout populations of cold-water rivers of Michigan and Wisconsin
Hendrickson, G.E.; Knutilla, R.L.
1974-01-01
Statistical multiple-regression analyses showed significant relationships between trout populations and hydrologic parameters. Parameters showing the higher levels of significance were temperature, hardness of water, percentage of gravel bottom, percentage of bottom vegetation, variability of streamflow, and discharge per unit drainage area. Trout populations increase with lower levels of annual maximum water temperatures, with increase in water hardness, and with increase in percentage of gravel and bottom vegetation. Trout populations also increase with decrease in variability of streamflow, and with increase in discharge per unit drainage area. Most hydrologic parameters were significant when evaluated collectively, but no parameter, by itself, showed a high degree of correlation with trout populations in regression analyses that included all the streams sampled. Regression analyses of stream segments that were restricted to certain limits of hardness, temperature, or percentage of gravel bottom showed improvements in correlation. Analyses of trout populations, in pounds per acre and pounds per mile and hydrologic parameters resulted in regression equations from which trout populations could be estimated with standard errors of 89 and 84 per cent, respectively.
Redick, Thomas S; Shipstead, Zach; Meier, Matthew E; Montroy, Janelle J; Hicks, Kenny L; Unsworth, Nash; Kane, Michael J; Hambrick, D Zachary; Engle, Randall W
2016-11-01
Previous research has identified several cognitive abilities that are important for multitasking, but few studies have attempted to measure a general multitasking ability using a diverse set of multitasks. In the final dataset, 534 young adult subjects completed measures of working memory (WM), attention control, fluid intelligence, and multitasking. Correlations, hierarchical regression analyses, confirmatory factor analyses, structural equation models, and relative weight analyses revealed several key findings. First, although the complex tasks used to assess multitasking differed greatly in their task characteristics and demands, a coherent construct specific to multitasking ability was identified. Second, the cognitive ability predictors accounted for substantial variance in the general multitasking construct, with WM and fluid intelligence accounting for the most multitasking variance compared to attention control. Third, the magnitude of the relationships among the cognitive abilities and multitasking varied as a function of the complexity and structure of the various multitasks assessed. Finally, structural equation models based on a multifaceted model of WM indicated that attention control and capacity fully mediated the WM and multitasking relationship. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Kim, Minjung; Lamont, Andrea E.; Jaki, Thomas; Feaster, Daniel; Howe, George; Van Horn, M. Lee
2015-01-01
Regression mixture models are a novel approach for modeling heterogeneous effects of predictors on an outcome. In the model building process residual variances are often disregarded and simplifying assumptions made without thorough examination of the consequences. This simulation study investigated the impact of an equality constraint on the residual variances across latent classes. We examine the consequence of constraining the residual variances on class enumeration (finding the true number of latent classes) and parameter estimates under a number of different simulation conditions meant to reflect the type of heterogeneity likely to exist in applied analyses. Results showed that bias in class enumeration increased as the difference in residual variances between the classes increased. Also, an inappropriate equality constraint on the residual variances greatly impacted estimated class sizes and showed the potential to greatly impact parameter estimates in each class. Results suggest that it is important to make assumptions about residual variances with care and to carefully report what assumptions were made. PMID:26139512
The Persistence of the Gender Gap in Introductory Physics
NASA Astrophysics Data System (ADS)
Kost, Lauren E.; Pollock, Steven J.; Finkelstein, Noah D.
2008-10-01
We previously showed[l] that despite teaching with interactive engagement techniques, the gap in performance between males and females on conceptual learning surveys persisted from pre- to posttest, at our institution. Such findings were counter to previously published work[2]. Our current work analyzes factors that may influence the observed gender gap in our courses. Posttest conceptual assessment data are modeled using both multiple regression and logistic regression analyses to estimate the gender gap in posttest scores after controlling for background factors that vary by gender. We find that at our institution the gender gap persists in interactive physics classes, but is largely due to differences in physics and math preparation and incoming attitudes and beliefs.
An analysis of first-time blood donors return behaviour using regression models.
Kheiri, S; Alibeigi, Z
2015-08-01
Blood products have a vital role in saving many patients' lives. The aim of this study was to analyse blood donor return behaviour. Using a cross-sectional follow-up design of 5-year duration, 864 first-time donors who had donated blood were selected using a systematic sampling. The behaviours of donors via three response variables, return to donation, frequency of return to donation and the time interval between donations, were analysed based on logistic regression, negative binomial regression and Cox's shared frailty model for recurrent events respectively. Successful return to donation rated at 49·1% and the deferral rate was 13·3%. There was a significant reverse relationship between the frequency of return to donation and the time interval between donations. Sex, body weight and job had an effect on return to donation; weight and frequency of donation during the first year had a direct effect on the total frequency of donations. Age, weight and job had a significant effect on the time intervals between donations. Aging decreases the chances of return to donation and increases the time interval between donations. Body weight affects the three response variables, i.e. the higher the weight, the more the chances of return to donation and the shorter the time interval between donations. There is a positive correlation between the frequency of donations in the first year and the total number of return to donations. Also, the shorter the time interval between donations is, the higher the frequency of donations. © 2015 British Blood Transfusion Society.
Social capital, political trust, and health locus of control: a population-based study.
Lindström, Martin
2011-02-01
To investigate the association between political trust in the Riksdag and lack of belief in the possibility to influence one's own health (external locus of control), taking horizontal trust into account. The 2008 public health survey in Skåne is a cross-sectional postal questionnaire study with a 55% participation rate. A random sample of 28,198 persons aged 18-80 years participated. Logistic regression models were used to investigate the associations between political trust in the Riksdag (an aspect of vertical trust) and lack of belief in the possibility to influence one's own health (external locus of control). The multiple regression analyses included age, country of birth, education, and horizontal trust in other people. A 33.7% of all men and 31.8% of all women lack internal locus of control. Low (external) health locus of control is more common in higher age groups, among people born outside Sweden, with lower education, low horizontal trust, low political trust, and no opinion concerning political trust. Respondents with not particularly strong political trust, no political trust at all and no opinion have significantly higher odds ratios of external locus of control throughout the multiple regression analyses. Low political trust in the Riksdag seems to be independently associated with external health locus of control.
Henry, Stephen G.; Jerant, Anthony; Iosif, Ana-Maria; Feldman, Mitchell D.; Cipri, Camille; Kravitz, Richard L.
2015-01-01
Objective To identify factors associated with participant consent to record visits; to estimate effects of recording on patient-clinician interactions Methods Secondary analysis of data from a randomized trial studying communication about depression; participants were asked for optional consent to audio record study visits. Multiple logistic regression was used to model likelihood of patient and clinician consent. Multivariable regression and propensity score analyses were used to estimate effects of audio recording on 6 dependent variables: discussion of depressive symptoms, preventive health, and depression diagnosis; depression treatment recommendations; visit length; visit difficulty. Results Of 867 visits involving 135 primary care clinicians, 39% were recorded. For clinicians, only working in academic settings (P=0.003) and having worked longer at their current practice (P=0.02) were associated with increased likelihood of consent. For patients, white race (P=0.002) and diabetes (P=0.03) were associated with increased likelihood of consent. Neither multivariable regression nor propensity score analyses revealed any significant effects of recording on the variables examined. Conclusion Few clinician or patient characteristics were significantly associated with consent. Audio recording had no significant effect on any dependent variables. Practice Implications Benefits of recording clinic visits likely outweigh the risks of bias in this setting. PMID:25837372
Visscher, Corine M; van Wesemael-Suijkerbuijk, Erin A; Lobbezoo, Frank
2016-10-01
The aim of this study was to explore the association between the presence of comorbidities and the pain experience in individual patients with temporomandibular disorder (TMD). This clinical trial comprised 112 patients with TMD pain. For all participants the presence of the following comorbid factors was assessed: pain in the neck; somatization; impaired sleep; and depression. Pain experience was evaluated using the McGill Pain Questionnaire (MPQ). For each subject the TMD-pain experience was assessed for three dimensions - sensory, affective, and evaluative - as specified in the MPQ. The association between comorbid factors and these three dimensions of TMD-pain experience was then evaluated using linear regression models. Univariable regression analyses showed that all comorbid factors, except for one factor, were positively associated with the level of pain, as rated by the sensory description of pain, the affective component of pain, and the evaluative experience of pain. The multivariable regression analyses showed that for all MPQ dimensions, depression showed the strongest associations with pain experience. It was found that in the presence of comorbid disorders, patients with TMD experience elevated levels of TMD pain. This information should be taken into consideration in the diagnostic process, as well as in the choice of treatment. © 2016 Eur J Oral Sci.
Nguyen, Quynh C; Osypuk, Theresa L; Schmidt, Nicole M; Glymour, M Maria; Tchetgen Tchetgen, Eric J
2015-03-01
Despite the recent flourishing of mediation analysis techniques, many modern approaches are difficult to implement or applicable to only a restricted range of regression models. This report provides practical guidance for implementing a new technique utilizing inverse odds ratio weighting (IORW) to estimate natural direct and indirect effects for mediation analyses. IORW takes advantage of the odds ratio's invariance property and condenses information on the odds ratio for the relationship between the exposure (treatment) and multiple mediators, conditional on covariates, by regressing exposure on mediators and covariates. The inverse of the covariate-adjusted exposure-mediator odds ratio association is used to weight the primary analytical regression of the outcome on treatment. The treatment coefficient in such a weighted regression estimates the natural direct effect of treatment on the outcome, and indirect effects are identified by subtracting direct effects from total effects. Weighting renders treatment and mediators independent, thereby deactivating indirect pathways of the mediators. This new mediation technique accommodates multiple discrete or continuous mediators. IORW is easily implemented and is appropriate for any standard regression model, including quantile regression and survival analysis. An empirical example is given using data from the Moving to Opportunity (1994-2002) experiment, testing whether neighborhood context mediated the effects of a housing voucher program on obesity. Relevant Stata code (StataCorp LP, College Station, Texas) is provided. © The Author 2015. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
A site specific model and analysis of the neutral somatic mutation rate in whole-genome cancer data.
Bertl, Johanna; Guo, Qianyun; Juul, Malene; Besenbacher, Søren; Nielsen, Morten Muhlig; Hornshøj, Henrik; Pedersen, Jakob Skou; Hobolth, Asger
2018-04-19
Detailed modelling of the neutral mutational process in cancer cells is crucial for identifying driver mutations and understanding the mutational mechanisms that act during cancer development. The neutral mutational process is very complex: whole-genome analyses have revealed that the mutation rate differs between cancer types, between patients and along the genome depending on the genetic and epigenetic context. Therefore, methods that predict the number of different types of mutations in regions or specific genomic elements must consider local genomic explanatory variables. A major drawback of most methods is the need to average the explanatory variables across the entire region or genomic element. This procedure is particularly problematic if the explanatory variable varies dramatically in the element under consideration. To take into account the fine scale of the explanatory variables, we model the probabilities of different types of mutations for each position in the genome by multinomial logistic regression. We analyse 505 cancer genomes from 14 different cancer types and compare the performance in predicting mutation rate for both regional based models and site-specific models. We show that for 1000 randomly selected genomic positions, the site-specific model predicts the mutation rate much better than regional based models. We use a forward selection procedure to identify the most important explanatory variables. The procedure identifies site-specific conservation (phyloP), replication timing, and expression level as the best predictors for the mutation rate. Finally, our model confirms and quantifies certain well-known mutational signatures. We find that our site-specific multinomial regression model outperforms the regional based models. The possibility of including genomic variables on different scales and patient specific variables makes it a versatile framework for studying different mutational mechanisms. Our model can serve as the neutral null model for the mutational process; regions that deviate from the null model are candidates for elements that drive cancer development.
Ross, Elsie Gyang; Shah, Nigam H; Dalman, Ronald L; Nead, Kevin T; Cooke, John P; Leeper, Nicholas J
2016-11-01
A key aspect of the precision medicine effort is the development of informatics tools that can analyze and interpret "big data" sets in an automated and adaptive fashion while providing accurate and actionable clinical information. The aims of this study were to develop machine learning algorithms for the identification of disease and the prognostication of mortality risk and to determine whether such models perform better than classical statistical analyses. Focusing on peripheral artery disease (PAD), patient data were derived from a prospective, observational study of 1755 patients who presented for elective coronary angiography. We employed multiple supervised machine learning algorithms and used diverse clinical, demographic, imaging, and genomic information in a hypothesis-free manner to build models that could identify patients with PAD and predict future mortality. Comparison was made to standard stepwise linear regression models. Our machine-learned models outperformed stepwise logistic regression models both for the identification of patients with PAD (area under the curve, 0.87 vs 0.76, respectively; P = .03) and for the prediction of future mortality (area under the curve, 0.76 vs 0.65, respectively; P = .10). Both machine-learned models were markedly better calibrated than the stepwise logistic regression models, thus providing more accurate disease and mortality risk estimates. Machine learning approaches can produce more accurate disease classification and prediction models. These tools may prove clinically useful for the automated identification of patients with highly morbid diseases for which aggressive risk factor management can improve outcomes. Copyright © 2016 Society for Vascular Surgery. Published by Elsevier Inc. All rights reserved.
How is the weather? Forecasting inpatient glycemic control
Saulnier, George E; Castro, Janna C; Cook, Curtiss B; Thompson, Bithika M
2017-01-01
Aim: Apply methods of damped trend analysis to forecast inpatient glycemic control. Method: Observed and calculated point-of-care blood glucose data trends were determined over 62 weeks. Mean absolute percent error was used to calculate differences between observed and forecasted values. Comparisons were drawn between model results and linear regression forecasting. Results: The forecasted mean glucose trends observed during the first 24 and 48 weeks of projections compared favorably to the results provided by linear regression forecasting. However, in some scenarios, the damped trend method changed inferences compared with linear regression. In all scenarios, mean absolute percent error values remained below the 10% accepted by demand industries. Conclusion: Results indicate that forecasting methods historically applied within demand industries can project future inpatient glycemic control. Additional study is needed to determine if forecasting is useful in the analyses of other glucometric parameters and, if so, how to apply the techniques to quality improvement. PMID:29134125
Traffic-related air pollution and spectacles use in schoolchildren
Nieuwenhuijsen, Mark J.; Basagaña, Xavier; Alvarez-Pedrerol, Mar; Dalmau-Bueno, Albert; Cirach, Marta; Rivas, Ioar; Brunekreef, Bert; Querol, Xavier; Morgan, Ian G.; Sunyer, Jordi
2017-01-01
Purpose To investigate the association between exposure to traffic-related air pollution and use of spectacles (as a surrogate measure for myopia) in schoolchildren. Methods We analyzed the impact of exposure to NO2 and PM2.5 light absorbance at home (predicted by land-use regression models) and exposure to NO2 and black carbon (BC) at school (measured by monitoring campaigns) on the use of spectacles in a cohort of 2727 schoolchildren (7–10 years old) in Barcelona (2012–2015). We conducted cross-sectional analyses based on lifelong exposure to air pollution and prevalent cases of spectacles at baseline data collection campaign as well as longitudinal analyses based on incident cases of spectacles use and exposure to air pollution during the three-year period between the baseline and last data collection campaigns. Logistic regression models were developed to quantify the association between spectacles use and each of air pollutants adjusted for relevant covariates. Results An interquartile range increase in exposure to NO2 and PM2.5 absorbance at home was respectively associated with odds ratios (95% confidence intervals (CIs)) for spectacles use of 1.16 (1.03, 1.29) and 1.13 (0.99, 1.28) in cross-sectional analyses and 1.15 (1.00, 1.33) and 1.23 (1.03, 1.46) in longitudinal analyses. Similarly, odds ratio (95% CIs) of spectacles use associated with an interquartile range increase in exposures to NO2 and black carbon at school was respectively 1.32 (1.09, 1.59) and 1.13 (0.97, 1.32) in cross-sectional analyses and 1.12 (0.84, 1.50) and 1.27 (1.03, 1.56) in longitudinal analyses. These findings were robust to a range of sensitivity analyses that we conducted. Conclusion We observed increased risk of spectacles use associated with exposure to traffic-related air pollution. These findings require further confirmation by future studies applying more refined outcome measures such as quantified visual acuity and separating different types of refractive errors. PMID:28369072
Gallucci, Andrew; Martin, Ryan; Beaujean, Alex; Usdan, Stuart
2015-01-01
The misuse of prescription stimulants (MPS) is an emergent adverse health behavior among undergraduate college students. However, current research on MPS is largely atheoretical. The purpose of this study was to validate a survey to assess MPS-related theory of planned behavior (TPB) constructs (i.e. attitudes, subjective norms, and perceived behavioral control) and determine the relationship between these constructs, MPS-related risk factors (e.g. gender and class status), and current MPS (i.e. past 30 days use) among college students. Participants (N = 978, 67.8% female and 82.9% Caucasian) at a large public university in the southeastern USA completed a survey assessing MPS and MPS-related TPB constructs during fall 2010. To examine the relationship between MPS-related TPB constructs and current MPS, we conducted (1) confirmatory factor analyses to validate that our survey items assessed MPS-related TPB constructs and (2) a series of regression analyses to examine associations between MPS-related TPB constructs, potential MPS-related risk factors, and MPS in this sample. Our factor analyses indicated that the survey items assessed MPS-related TPB constructs and our multivariate logistic regression analysis indicated that perceived behavioral control was significantly associated with current MPS. In addition, analyses found that having a prescription stimulant was a protective factor against MPS when the model included MPS-related TPB variables.
Meta-regression analysis of the effect of trans fatty acids on low-density lipoprotein cholesterol.
Allen, Bruce C; Vincent, Melissa J; Liska, DeAnn; Haber, Lynne T
2016-12-01
We conducted a meta-regression of controlled clinical trial data to investigate quantitatively the relationship between dietary intake of industrial trans fatty acids (iTFA) and increased low-density lipoprotein cholesterol (LDL-C). Previous regression analyses included insufficient data to determine the nature of the dose response in the low-dose region and have nonetheless assumed a linear relationship between iTFA intake and LDL-C levels. This work contributes to the previous work by 1) including additional studies examining low-dose intake (identified using an evidence mapping procedure); 2) investigating a range of curve shapes, including both linear and nonlinear models; and 3) using Bayesian meta-regression to combine results across trials. We found that, contrary to previous assumptions, the linear model does not acceptably fit the data, while the nonlinear, S-shaped Hill model fits the data well. Based on a conservative estimate of the degree of intra-individual variability in LDL-C (0.1 mmoL/L), as an estimate of a change in LDL-C that is not adverse, a change in iTFA intake of 2.2% of energy intake (%en) (corresponding to a total iTFA intake of 2.2-2.9%en) does not cause adverse effects on LDL-C. The iTFA intake associated with this change in LDL-C is substantially higher than the average iTFA intake (0.5%en). Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.
Emery, Clifton R; Jolley, Jennifer M; Wu, Shali
2011-12-01
This paper examined the relationship between reported Intimate Partner Violence (IPV) desistance and neighborhood concentrated disadvantage, ethnic heterogeneity, residential instability, collective efficacy and legal cynicism. Data from the Project on Human Development in Chicago Neighborhoods (PHDCN) Longitudinal survey were used to identify 599 cases of IPV in Wave 1 eligible for reported desistance in Wave 2. A Generalized Boosting Model was used to determine the best proximal predictors of IPV desistance from the longitudinal data. Controlling for these predictors, logistic regression of neighborhood characteristics from the PHDCN community survey was used to predict reported IPV desistance in Wave 2. The paper finds that participants living in neighborhoods high in legal cynicism have lower odds of reporting IPV desistance, controlling for other variables in the logistic regression model. Analyses did not find that IPV desistance was related to neighborhood concentrated disadvantage, ethnic heterogeneity, residential instability and collective efficacy.
Self-efficacy and physical activity in adolescent and parent dyads.
Rutkowski, Elaine M; Connelly, Cynthia D
2012-01-01
The study examined the relationships between self-efficacy and physical activity in adolescent and parent dyads. A cross-sectional, correlational design was used to explore the relationships among levels of parent physical activity, parent-adolescent self-efficacy, and adolescent physical activity. Descriptive and multivariate regression analyses were conducted in a purposive sample of 94 adolescent/parent dyads. Regression results indicated the overall model significantly predicted adolescent physical activity (R(2) = .20, R(2)(adj) = .14, F[5, 70]= 3.28, p= .01). Only one of the five predictor variables significantly contributed to the model. Higher levels of adolescent self-efficacy was positively related to greater levels of adolescent physical activity (β= .29, p= .01). Practitioners are encouraged to examine the level of self-efficacy and physical activity in families in an effort to develop strategies that impact these areas and ultimately to mediate obesity-related challenges in families seeking care. © 2011, Wiley Periodicals, Inc.
Explaining cross-national differences in marriage, cohabitation, and divorce in Europe, 1990-2000.
Kalmijn, Matthijs
2007-11-01
European countries differ considerably in their marriage patterns. The study presented in this paper describes these differences for the 1990s and attempts to explain them from a macro-level perspective. We find that different indicators of marriage (i.e., marriage rate, age at marriage, divorce rate, and prevalence of unmarried cohabitation) cannot be seen as indicators of an underlying concept such as the 'strength of marriage'. Multivariate ordinary least squares (OLS) regression analyses are estimated with countries as units and panel regression models are estimated in which annual time series for multiple countries are pooled. Using these models, we find that popular explanations of trends in the indicators - explanations that focus on gender roles, secularization, unemployment, and educational expansion - are also important for understanding differences among countries. We also find evidence for the role of historical continuity and societal disintegration in understanding cross-national differences.
Sampling and sensitivity analyses tools (SaSAT) for computational modelling
Hoare, Alexander; Regan, David G; Wilson, David P
2008-01-01
SaSAT (Sampling and Sensitivity Analysis Tools) is a user-friendly software package for applying uncertainty and sensitivity analyses to mathematical and computational models of arbitrary complexity and context. The toolbox is built in Matlab®, a numerical mathematical software package, and utilises algorithms contained in the Matlab® Statistics Toolbox. However, Matlab® is not required to use SaSAT as the software package is provided as an executable file with all the necessary supplementary files. The SaSAT package is also designed to work seamlessly with Microsoft Excel but no functionality is forfeited if that software is not available. A comprehensive suite of tools is provided to enable the following tasks to be easily performed: efficient and equitable sampling of parameter space by various methodologies; calculation of correlation coefficients; regression analysis; factor prioritisation; and graphical output of results, including response surfaces, tornado plots, and scatterplots. Use of SaSAT is exemplified by application to a simple epidemic model. To our knowledge, a number of the methods available in SaSAT for performing sensitivity analyses have not previously been used in epidemiological modelling and their usefulness in this context is demonstrated. PMID:18304361
Comparison of methods for estimating flood magnitudes on small streams in Georgia
Hess, Glen W.; Price, McGlone
1989-01-01
The U.S. Geological Survey has collected flood data for small, natural streams at many sites throughout Georgia during the past 20 years. Flood-frequency relations were developed for these data using four methods: (1) observed (log-Pearson Type III analysis) data, (2) rainfall-runoff model, (3) regional regression equations, and (4) map-model combination. The results of the latter three methods were compared to the analyses of the observed data in order to quantify the differences in the methods and determine if the differences are statistically significant.
Stability of Early EEG Background Patterns After Pediatric Cardiac Arrest.
Abend, Nicholas S; Xiao, Rui; Kessler, Sudha Kilaru; Topjian, Alexis A
2018-05-01
We aimed to determine whether EEG background characteristics remain stable across discrete time periods during the acute period after resuscitation from pediatric cardiac arrest. Children resuscitated from cardiac arrest underwent continuous conventional EEG monitoring. The EEG was scored in 12-hour epochs for up to 72 hours after return of circulation by an electroencephalographer using a Background Category with 4 levels (normal, slow-disorganized, discontinuous/burst-suppression, or attenuated-featureless) or 2 levels (normal/slow-disorganized or discontinuous/burst-suppression/attenuated-featureless). Survival analyses and mixed-effects ordinal logistic regression models evaluated whether the EEG remained stable across epochs. EEG monitoring was performed in 89 consecutive children. When EEG was assessed as the 4-level Background Category, 30% of subjects changed category over time. Based on initial Background Category, one quarter of the subjects changed EEG category by 24 hours if the initial EEG was attenuated-featureless, by 36 hours if the initial EEG was discontinuous or burst-suppression, by 48 hours if the initial EEG was slow-disorganized, and never if the initial EEG was normal. However, regression modeling for the 4-level Background Category indicated that the EEG did not change over time (odds ratio = 1.06, 95% confidence interval = 0.96-1.17, P = 0.26). Similarly, when EEG was assessed as the 2-level Background Category, 8% of subjects changed EEG category over time. However, regression modeling for the 2-level category indicated that the EEG did not change over time (odds ratio = 1.02, 95% confidence interval = 0.91-1.13, P = 0.75). The EEG Background Category changes over time whether analyzed as 4 levels (30% of subjects) or 2 levels (8% of subjects), although regression analyses indicated that no significant changes occurred over time for the full cohort. These data indicate that the Background Category is often stable during the acute 72 hours after pediatric cardiac arrest and thus may be a useful EEG assessment metric in future studies, but that some subjects do have EEG changes over time and therefore serial EEG assessments may be informative.
Logistic regression applied to natural hazards: rare event logistic regression with replications
NASA Astrophysics Data System (ADS)
Guns, M.; Vanacker, V.
2012-06-01
Statistical analysis of natural hazards needs particular attention, as most of these phenomena are rare events. This study shows that the ordinary rare event logistic regression, as it is now commonly used in geomorphologic studies, does not always lead to a robust detection of controlling factors, as the results can be strongly sample-dependent. In this paper, we introduce some concepts of Monte Carlo simulations in rare event logistic regression. This technique, so-called rare event logistic regression with replications, combines the strength of probabilistic and statistical methods, and allows overcoming some of the limitations of previous developments through robust variable selection. This technique was here developed for the analyses of landslide controlling factors, but the concept is widely applicable for statistical analyses of natural hazards.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Friddle, Carl J; Koga, Teiichiro; Rubin, Edward M.
2000-03-15
While cardiac hypertrophy has been the subject of intensive investigation, regression of hypertrophy has been significantly less studied, precluding large-scale analysis of the relationship between these processes. In the present study, using pharmacological models of hypertrophy in mice, expression profiling was performed with fragments of more than 3,000 genes to characterize and contrast expression changes during induction and regression of hypertrophy. Administration of angiotensin II and isoproterenol by osmotic minipump produced increases in heart weight (15% and 40% respectively) that returned to pre-induction size following drug withdrawal. From multiple expression analyses of left ventricular RNA isolated at daily time-points duringmore » cardiac hypertrophy and regression, we identified sets of genes whose expression was altered at specific stages of this process. While confirming the participation of 25 genes or pathways previously known to be altered by hypertrophy, a larger set of 30 genes was identified whose expression had not previously been associated with cardiac hypertrophy or regression. Of the 55 genes that showed reproducible changes during the time course of induction and regression, 32 genes were altered only during induction and 8 were altered only during regression. This study identified both known and novel genes whose expression is affected at different stages of cardiac hypertrophy and regression and demonstrates that cardiac remodeling during regression utilizes a set of genes that are distinct from those used during induction of hypertrophy.« less
Asano, Junichi; Hirakawa, Akihiro
2017-01-01
The Cox proportional hazards cure model is a survival model incorporating a cure rate with the assumption that the population contains both uncured and cured individuals. It contains a logistic regression for the cure rate, and a Cox regression to estimate the hazard for uncured patients. A single predictive model for both the cure and hazard can be developed by using a cure model that simultaneously predicts the cure rate and hazards for uncured patients; however, model selection is a challenge because of the lack of a measure for quantifying the predictive accuracy of a cure model. Recently, we developed an area under the receiver operating characteristic curve (AUC) for determining the cure rate in a cure model (Asano et al., 2014), but the hazards measure for uncured patients was not resolved. In this article, we propose novel C-statistics that are weighted by the patients' cure status (i.e., cured, uncured, or censored cases) for the cure model. The operating characteristics of the proposed C-statistics and their confidence interval were examined by simulation analyses. We also illustrate methods for predictive model selection and for further interpretation of variables using the proposed AUCs and C-statistics via application to breast cancer data.
A Tutorial on Multilevel Survival Analysis: Methods, Models and Applications
Austin, Peter C.
2017-01-01
Summary Data that have a multilevel structure occur frequently across a range of disciplines, including epidemiology, health services research, public health, education and sociology. We describe three families of regression models for the analysis of multilevel survival data. First, Cox proportional hazards models with mixed effects incorporate cluster-specific random effects that modify the baseline hazard function. Second, piecewise exponential survival models partition the duration of follow-up into mutually exclusive intervals and fit a model that assumes that the hazard function is constant within each interval. This is equivalent to a Poisson regression model that incorporates the duration of exposure within each interval. By incorporating cluster-specific random effects, generalised linear mixed models can be used to analyse these data. Third, after partitioning the duration of follow-up into mutually exclusive intervals, one can use discrete time survival models that use a complementary log–log generalised linear model to model the occurrence of the outcome of interest within each interval. Random effects can be incorporated to account for within-cluster homogeneity in outcomes. We illustrate the application of these methods using data consisting of patients hospitalised with a heart attack. We illustrate the application of these methods using three statistical programming languages (R, SAS and Stata). PMID:29307954
A Tutorial on Multilevel Survival Analysis: Methods, Models and Applications.
Austin, Peter C
2017-08-01
Data that have a multilevel structure occur frequently across a range of disciplines, including epidemiology, health services research, public health, education and sociology. We describe three families of regression models for the analysis of multilevel survival data. First, Cox proportional hazards models with mixed effects incorporate cluster-specific random effects that modify the baseline hazard function. Second, piecewise exponential survival models partition the duration of follow-up into mutually exclusive intervals and fit a model that assumes that the hazard function is constant within each interval. This is equivalent to a Poisson regression model that incorporates the duration of exposure within each interval. By incorporating cluster-specific random effects, generalised linear mixed models can be used to analyse these data. Third, after partitioning the duration of follow-up into mutually exclusive intervals, one can use discrete time survival models that use a complementary log-log generalised linear model to model the occurrence of the outcome of interest within each interval. Random effects can be incorporated to account for within-cluster homogeneity in outcomes. We illustrate the application of these methods using data consisting of patients hospitalised with a heart attack. We illustrate the application of these methods using three statistical programming languages (R, SAS and Stata).
Lanfredi, Mariangela; Candini, Valentina; Buizza, Chiara; Ferrari, Clarissa; Boero, Maria E; Giobbio, Gian M; Goldschmidt, Nicoletta; Greppo, Stefania; Iozzino, Laura; Maggi, Paolo; Melegari, Anna; Pasqualetti, Patrizio; Rossi, Giuseppe; de Girolamo, Giovanni
2014-05-15
Quality of life (QOL) has been considered an important outcome measure in psychiatric research and determinants of QOL have been widely investigated. We aimed at detecting predictors of QOL at baseline and at testing the longitudinal interrelations of the baseline predictors with QOL scores at a 1-year follow-up in a sample of patients living in Residential Facilities (RFs). Logistic regression models were adopted to evaluate the association between WHOQoL-Bref scores and potential determinants of QOL. In addition, all variables significantly associated with QOL domains in the final logistic regression model were included by using the Structural Equation Modeling (SEM). We included 139 patients with a diagnosis of schizophrenia spectrum. In the final logistic regression model level of activity, social support, age, service satisfaction, spiritual well-being and symptoms' severity were identified as predictors of QOL scores at baseline. Longitudinal analyses carried out by SEM showed that 40% of QOL follow-up variability was explained by QOL at baseline, and significant indirect effects toward QOL at follow-up were found for satisfaction with services and for social support. Rehabilitation plans for people with schizophrenia living in RFs should also consider mediators of change in subjective QOL such as satisfaction with mental health services. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Newman, J; Egan, T; Harbourne, N; O'Riordan, D; Jacquier, J C; O'Sullivan, M
2014-08-01
Sensory evaluation can be problematic for ingredients with a bitter taste during research and development phase of new food products. In this study, 19 dairy protein hydrolysates (DPH) were analysed by an electronic tongue and their physicochemical characteristics, the data obtained from these methods were correlated with their bitterness intensity as scored by a trained sensory panel and each model was also assessed by its predictive capabilities. The physiochemical characteristics of the DPHs investigated were degree of hydrolysis (DH%), and data relating to peptide size and relative hydrophobicity from size exclusion chromatography (SEC) and reverse phase (RP) HPLC. Partial least square regression (PLS) was used to construct the prediction models. All PLS regressions had good correlations (0.78 to 0.93) with the strongest being the combination of data obtained from SEC and RP HPLC. However, the PLS with the strongest predictive power was based on the e-tongue which had the PLS regression with the lowest root mean predicted residual error sum of squares (PRESS) in the study. The results show that the PLS models constructed with the e-tongue and the combination of SEC and RP-HPLC has potential to be used for prediction of bitterness and thus reducing the reliance on sensory analysis in DPHs for future food research. Copyright © 2014 Elsevier B.V. All rights reserved.
Wu, Robert; Glen, Peter; Ramsay, Tim; Martel, Guillaume
2014-06-28
Observational studies dominate the surgical literature. Statistical adjustment is an important strategy to account for confounders in observational studies. Research has shown that published articles are often poor in statistical quality, which may jeopardize their conclusions. The Statistical Analyses and Methods in the Published Literature (SAMPL) guidelines have been published to help establish standards for statistical reporting.This study will seek to determine whether the quality of statistical adjustment and the reporting of these methods are adequate in surgical observational studies. We hypothesize that incomplete reporting will be found in all surgical observational studies, and that the quality and reporting of these methods will be of lower quality in surgical journals when compared with medical journals. Finally, this work will seek to identify predictors of high-quality reporting. This work will examine the top five general surgical and medical journals, based on a 5-year impact factor (2007-2012). All observational studies investigating an intervention related to an essential component area of general surgery (defined by the American Board of Surgery), with an exposure, outcome, and comparator, will be included in this systematic review. Essential elements related to statistical reporting and quality were extracted from the SAMPL guidelines and include domains such as intent of analysis, primary analysis, multiple comparisons, numbers and descriptive statistics, association and correlation analyses, linear regression, logistic regression, Cox proportional hazard analysis, analysis of variance, survival analysis, propensity analysis, and independent and correlated analyses. Each article will be scored as a proportion based on fulfilling criteria in relevant analyses used in the study. A logistic regression model will be built to identify variables associated with high-quality reporting. A comparison will be made between the scores of surgical observational studies published in medical versus surgical journals. Secondary outcomes will pertain to individual domains of analysis. Sensitivity analyses will be conducted. This study will explore the reporting and quality of statistical analyses in surgical observational studies published in the most referenced surgical and medical journals in 2013 and examine whether variables (including the type of journal) can predict high-quality reporting.
Family and school environmental predictors of sleep bruxism in children.
Rossi, Debora; Manfredini, Daniele
2013-01-01
To identify potential predictors of self-reported sleep bruxism (SB) within children's family and school environments. A total of 65 primary school children (55.4% males, mean age 9.3 ± 1.9 years) were administered a 10-item questionnaire investigating the prevalence of self-reported SB as well as nine family and school-related potential bruxism predictors. Regression analyses were performed to assess the correlation between the potential predictors and SB. A positive answer to the self-reported SB item was endorsed by 18.8% of subjects, with no sex differences. Multiple variable regression analysis identified a final model showing that having divorced parents and not falling asleep easily were the only two weak predictors of self-reported SB. The percentage of explained variance for SB by the final multiple regression model was 13.3% (Nagelkerke's R² = 0.133). While having a high specificity and a good negative predictive value, the model showed unacceptable sensitivity and positive predictive values. The resulting accuracy to predict the presence of self-reported SB was 73.8%. The present investigation suggested that, among family and school-related matters, having divorced parents and not falling asleep easily were two predictors, even if weak, of a child's self-report of SB.
Lin, Lixin; Wang, Yunjia; Teng, Jiyao; Wang, Xuchen
2016-02-01
Hyperspectral estimation of soil organic matter (SOM) in coal mining regions is an important tool for enhancing fertilization in soil restoration programs. The correlation--partial least squares regression (PLSR) method effectively solves the information loss problem of correlation--multiple linear stepwise regression, but results of the correlation analysis must be optimized to improve precision. This study considers the relationship between spectral reflectance and SOM based on spectral reflectance curves of soil samples collected from coal mining regions. Based on the major absorption troughs in the 400-1006 nm spectral range, PLSR analysis was performed using 289 independent bands of the second derivative (SDR) with three levels and measured SOM values. A wavelet-correlation-PLSR (W-C-PLSR) model was then constructed. By amplifying useful information that was previously obscured by noise, the W-C-PLSR model was optimal for estimating SOM content, with smaller prediction errors in both calibration (R(2) = 0.970, root mean square error (RMSEC) = 3.10, and mean relative error (MREC) = 8.75) and validation (RMSEV = 5.85 and MREV = 14.32) analyses, as compared with other models. Results indicate that W-C-PLSR has great potential to estimate SOM in coal mining regions.
Price, Weather, and `Acreage Abandonment' in Western Great Plains Wheat Culture.
NASA Astrophysics Data System (ADS)
Michaels, Patrick J.
1983-07-01
Multivariate analyses of acreage abandonment patterns in the U.S. Great Plains winter wheat region indicate that the major mode of variation is an in-phase oscillation confined to the western half of the overall area, which is also the area with lowest average yields. This is one of the more agroclimatically marginal environments in the United States, with wide interannual fluctuations in both climate and profitability.We developed a multiple regression model to determine the relative roles of weather and expected price in the decision not to harvest. The overall model explained 77% of the spatial and temporal variation in abandonment. The 36.5% of the non-spatial variation was explained by two simple transformations of climatic data from three monthly aggregates-September-October, November-February and March-April. Price factors, expressed as indexed future delivery quotations,were barely significant, with only between 3 and 5% of the non-spatial variation explained, depending upon the model.The model was based upon weather, climate and price data from 1932 through 1975. It was tested by sequentially withholding three-year blocks of data, and using the respecified regression coefficients, along with observed weather and price, to estimate abandonment in the withheld years. Error analyses indicate no loss of model fidelity in the test mode. Also, prediction errors in the 1970-75 period, characterized by widely fluctuating prices, were not different from those in the rest of the model.The overall results suggest that the perceived quality of the crop, as influenced by weather, is a much more important determinant of the abandonment decision than are expected returns based upon price considerations.
Wagle, Jørgen; Farner, Lasse; Flekkøy, Kjell; Bruun Wyller, Torgeir; Sandvik, Leiv; Fure, Brynjar; Stensrød, Brynhild; Engedal, Knut
2011-01-01
To identify prognostic factors associated with functional outcome at 13 months in a sample of stroke rehabilitation patients. Specifically, we hypothesized that cognitive functioning early after stroke would predict long-term functional outcome independently of other factors. 163 stroke rehabilitation patients underwent a structured neuropsychological examination 2-3 weeks after hospital admittance, and their functional status was subsequently evaluated 13 months later with the modified Rankin Scale (mRS) as outcome measure. Three predictive models were built using linear regression analyses: a biological model (sociodemographics, apolipoprotein E genotype, prestroke vascular factors, lesion characteristics and neurological stroke-related impairment); a functional model (pre- and early post-stroke cognitive functioning, personal and instrumental activities of daily living, ADL, and depressive symptoms), and a combined model (including significant variables, with p value <0.05, from the biological and functional models). A combined model of 4 variables best predicted long-term functional outcome with explained variance of 49%: neurological impairment (National Institute of Health Stroke Scale; β = 0.402, p < 0.001), age (β = 0.233, p = 0.001), post-stroke cognitive functioning (Repeatable Battery of Neuropsychological Status, RBANS; β = -0.248, p = 0.001) and prestroke personal ADL (Barthel Index; β = -0.217, p = 0.002). Further linear regression analyses of which RBANS indexes and subtests best predicted long-term functional outcome showed that Coding (β = -0.484, p < 0.001) and Figure Copy (β = -0.233, p = 0.002) raw scores at baseline explained 42% of the variance in mRS scores at follow-up. Early post-stroke cognitive functioning as measured by the RBANS is a significant and independent predictor of long-term functional post-stroke outcome. Copyright © 2011 S. Karger AG, Basel.
Estimating Interaction Effects With Incomplete Predictor Variables
Enders, Craig K.; Baraldi, Amanda N.; Cham, Heining
2014-01-01
The existing missing data literature does not provide a clear prescription for estimating interaction effects with missing data, particularly when the interaction involves a pair of continuous variables. In this article, we describe maximum likelihood and multiple imputation procedures for this common analysis problem. We outline 3 latent variable model specifications for interaction analyses with missing data. These models apply procedures from the latent variable interaction literature to analyses with a single indicator per construct (e.g., a regression analysis with scale scores). We also discuss multiple imputation for interaction effects, emphasizing an approach that applies standard imputation procedures to the product of 2 raw score predictors. We thoroughly describe the process of probing interaction effects with maximum likelihood and multiple imputation. For both missing data handling techniques, we outline centering and transformation strategies that researchers can implement in popular software packages, and we use a series of real data analyses to illustrate these methods. Finally, we use computer simulations to evaluate the performance of the proposed techniques. PMID:24707955
Holtschlag, David J.; Shively, Dawn; Whitman, Richard L.; Haack, Sheridan K.; Fogarty, Lisa R.
2008-01-01
Regression analyses and hydrodynamic modeling were used to identify environmental factors and flow paths associated with Escherichia coli (E. coli) concentrations at Memorial and Metropolitan Beaches on Lake St. Clair in Macomb County, Mich. Lake St. Clair is part of the binational waterway between the United States and Canada that connects Lake Huron with Lake Erie in the Great Lakes Basin. Linear regression, regression-tree, and logistic regression models were developed from E. coli concentration and ancillary environmental data. Linear regression models on log10 E. coli concentrations indicated that rainfall prior to sampling, water temperature, and turbidity were positively associated with bacteria concentrations at both beaches. Flow from Clinton River, changes in water levels, wind conditions, and log10 E. coli concentrations 2 days before or after the target bacteria concentrations were statistically significant at one or both beaches. In addition, various interaction terms were significant at Memorial Beach. Linear regression models for both beaches explained only about 30 percent of the variability in log10 E. coli concentrations. Regression-tree models were developed from data from both Memorial and Metropolitan Beaches but were found to have limited predictive capability in this study. The results indicate that too few observations were available to develop reliable regression-tree models. Linear logistic models were developed to estimate the probability of E. coli concentrations exceeding 300 most probable number (MPN) per 100 milliliters (mL). Rainfall amounts before bacteria sampling were positively associated with exceedance probabilities at both beaches. Flow of Clinton River, turbidity, and log10 E. coli concentrations measured before or after the target E. coli measurements were related to exceedances at one or both beaches. The linear logistic models were effective in estimating bacteria exceedances at both beaches. A receiver operating characteristic (ROC) analysis was used to determine cut points for maximizing the true positive rate prediction while minimizing the false positive rate. A two-dimensional hydrodynamic model was developed to simulate horizontal current patterns on Lake St. Clair in response to wind, flow, and water-level conditions at model boundaries. Simulated velocity fields were used to track hypothetical massless particles backward in time from the beaches along flow paths toward source areas. Reverse particle tracking for idealized steady-state conditions shows changes in expected flow paths and traveltimes with wind speeds and directions from 24 sectors. The results indicate that three to four sets of contiguous wind sectors have similar effects on flow paths in the vicinity of the beaches. In addition, reverse particle tracking was used for transient conditions to identify expected flow paths for 10 E. coli sampling events in 2004. These results demonstrate the ability to track hypothetical particles from the beaches, backward in time, to likely source areas. This ability, coupled with a greater frequency of bacteria sampling, may provide insight into changes in bacteria concentrations between source and sink areas.
ERIC Educational Resources Information Center
Wang, Tien-ni; Wu, Ching-yi; Chen, Chia-ling; Shieh, Jeng-yi; Lu, Lu; Lin, Keh-chung
2013-01-01
Given the growing evidence for the effects of constraint-induced therapy (CIT) in children with cerebral palsy (CP), there is a need for investigating the characteristics of potential participants who may benefit most from this intervention. This study aimed to establish predictive models for the effects of pediatric CIT on motor and functional…
The impact of alcohol taxation on liver cirrhosis mortality.
Ponicki, William R; Gruenewald, Paul J
2006-11-01
The objective of this study is to investigate the impact of distilled spirits, wine, and beer taxes on cirrhosis mortality using a large-panel data set and statistical models that control for various other factors that may affect that mortality. The analyses were performed on a panel of 30 U.S. license states during the period 1971-1998 (N = 840 state-by-year observations). Exogenous measures included current and lagged versions of beverage taxes and income, as well as controls for states' age distribution, religion, race, health care availability, urbanity, tourism, and local bans on alcohol sales. Regression analyses were performed using random-effects models with corrections for serial autocorrelation and heteroscedasticity among states. Cirrhosis rates were found to be significantly related to taxes on distilled spirits but not to taxation of wine and beer. Consistent results were found using different statistical models and model specifications. Consistent with prior research, cirrhosis mortality in the United States appears more closely linked to consumption of distilled spirits than to that of other alcoholic beverages.
Fernández-Mayoralas, Gloria; Rojo-Pérez, Fermina; Martínez-Martín, Pablo; Prieto-Flores, Maria-Eugenia; Rodríguez-Blázquez, Carmen; Martín-García, Salomé; Rojo-Abuín, José-Manuel; Forjaz, Maria-Joao
2015-01-01
Active ageing, considered from the perspective of participation in leisure activities, promotes life satisfaction and personal well-being. The aims of this work are to define and explain leisure activity profiles among institutionalized older adults, considering their sociodemographic characteristics and objective and subjective conditions in relation to their quality of life. Two samples of institutionalized people aged 60 and over were analysed together: 234 older adults without dementia and 525 with dementia. Sociodemographic, economic, family and social network, and health and functioning variables were selected. Cluster analysis was applied to obtain activity profiles according to the leisure activities, and ordinal regression models were performed to analyse factors associated to activity level. The sample was clustered into three groups of people: active (27%), moderately active (35%) and inactive people (38%). In the final regression model (Nagelkerke pseudo R(2) = 0.500), a higher level of activity was associated with better cognitive function (Pfeiffer scale), self-perceived health status and functional ability, as well as with a higher frequency of gathering with family and friends, and higher educational level. The decline in physical and mental health, the loss of functional capabilities and the weakening of family and social ties represent a significant barrier to active ageing in a context of institutionalization.
Adair, T; Hoy, D; Dettrick, Z; Lopez, A D
2012-12-01
Global studies of the long-term association between tobacco consumption and chronic obstructive pulmonary disease (COPD) have relied upon descriptions of trends. To statistically analyse the relationship of tobacco consumption with data on mortality due to COPD over the past 100 years in Australia. Tobacco consumption was reconstructed back to 1887. Log-linear Poisson regression models were used to analyse cumulative cohort and lagged time-specific smoking data and its relationship with COPD mortality. Age-standardised COPD mortality, although likely misclassified with other diseases, decreased for males and females from 1907 until the start of the Second World War in contrast to steadily rising tobacco consumption. Thereafter, COPD mortality rose sharply in line with trends in smoking, peaking in the early 1970s for males and over 20 years later for females, before falling again. Regression models revealed both cumulative and time-specific tobacco consumption to be strongly predictive of COPD mortality, with a time lag of 15 years for males and 20 years for females. Sharp falls in COPD mortality before the Second World War were unrelated to tobacco consumption. Smoking was the primary driver of post-War trends, and the success of anti-smoking campaigns has sharply reduced COPD mortality levels.
Association between adolescent marriage and marital violence among young adult women in India
Raj, Anita; Saggurti, Niranjan; Lawrence, Danielle; Balaiah, Donta; Silverman, Jay G.
2010-01-01
Objective To assess whether a history of adolescent marriage (<18 years) places women in young adulthood in India at increased risk of physical or sexual marital violence. Methods Cross-sectional analysis was performed on data from a nationally representative household study of 124 385 Indian women aged 15–49 years collected in 2005–2006. The analyses were restricted to married women aged 20–24 years who participated in the marital violence (MV) survey module (n=10 514). Simple regression models and models adjusted for participant demographics were constructed to estimate the odds ratios (ORs) and 95% confidence intervals (CIs) for the associations between adolescent marriage and MV. Results Over half (58%) of the participants were married before 18 years of age; 35% of the women had experienced physical or sexual violence in their marriage; and 27% reported such abuse in the last year. Adjusted regression analyses revealed that women married as minors were significantly more likely than those married as adults to report ever experiencing MV (adjusted OR 1.77; 95% CI, 1.61–1.95) and in the last 12 months (adjusted OR 1.51; 95% CI, 1.36–1.67). Conclusions Women who were married as adolescents remain at increased risk of MV into young adulthood. PMID:20347089
Integrative eQTL analysis of tumor and host omics data in individuals with bladder cancer.
Pineda, Silvia; Van Steen, Kristel; Malats, Núria
2017-09-01
Integrative analyses of several omics data are emerging. The data are usually generated from the same source material (i.e., tumor sample) representing one level of regulation. However, integrating different regulatory levels (i.e., blood) with those from tumor may also reveal important knowledge about the human genetic architecture. To model this multilevel structure, an integrative-expression quantitative trait loci (eQTL) analysis applying two-stage regression (2SR) was proposed. This approach first regressed tumor gene expression levels with tumor markers and the adjusted residuals from the previous model were then regressed with the germline genotypes measured in blood. Previously, we demonstrated that penalized regression methods in combination with a permutation-based MaxT method (Global-LASSO) is a promising tool to fix some of the challenges that high-throughput omics data analysis imposes. Here, we assessed whether Global-LASSO can also be applied when tumor and blood omics data are integrated. We further compared our strategy with two 2SR-approaches, one using multiple linear regression (2SR-MLR) and other using LASSO (2SR-LASSO). We applied the three models to integrate genomic, epigenomic, and transcriptomic data from tumor tissue with blood germline genotypes from 181 individuals with bladder cancer included in the TCGA Consortium. Global-LASSO provided a larger list of eQTLs than the 2SR methods, identified a previously reported eQTLs in prostate stem cell antigen (PSCA), and provided further clues on the complexity of APBEC3B loci, with a minimal false-positive rate not achieved by 2SR-MLR. It also represents an important contribution for omics integrative analysis because it is easy to apply and adaptable to any type of data. © 2017 WILEY PERIODICALS, INC.
Stopka, Thomas J; Goulart, Michael A; Meyers, David J; Hutcheson, Marga; Barton, Kerri; Onofrey, Shauna; Church, Daniel; Donahue, Ashley; Chui, Kenneth K H
2017-04-20
Hepatitis C virus (HCV) infections have increased during the past decade but little is known about geographic clustering patterns. We used a unique analytical approach, combining geographic information systems (GIS), spatial epidemiology, and statistical modeling to identify and characterize HCV hotspots, statistically significant clusters of census tracts with elevated HCV counts and rates. We compiled sociodemographic and HCV surveillance data (n = 99,780 cases) for Massachusetts census tracts (n = 1464) from 2002 to 2013. We used a five-step spatial epidemiological approach, calculating incremental spatial autocorrelations and Getis-Ord Gi* statistics to identify clusters. We conducted logistic regression analyses to determine factors associated with the HCV hotspots. We identified nine HCV clusters, with the largest in Boston, New Bedford/Fall River, Worcester, and Springfield (p < 0.05). In multivariable analyses, we found that HCV hotspots were independently and positively associated with the percent of the population that was Hispanic (adjusted odds ratio [AOR]: 1.07; 95% confidence interval [CI]: 1.04, 1.09) and the percent of households receiving food stamps (AOR: 1.83; 95% CI: 1.22, 2.74). HCV hotspots were independently and negatively associated with the percent of the population that were high school graduates or higher (AOR: 0.91; 95% CI: 0.89, 0.93) and the percent of the population in the "other" race/ethnicity category (AOR: 0.88; 95% CI: 0.85, 0.91). We identified locations where HCV clusters were a concern, and where enhanced HCV prevention, treatment, and care can help combat the HCV epidemic in Massachusetts. GIS, spatial epidemiological and statistical analyses provided a rigorous approach to identify hotspot clusters of disease, which can inform public health policy and intervention targeting. Further studies that incorporate spatiotemporal cluster analyses, Bayesian spatial and geostatistical models, spatially weighted regression analyses, and assessment of associations between HCV clustering and the built environment are needed to expand upon our combined spatial epidemiological and statistical methods.
Authoritative school climate and high school dropout rates.
Jia, Yuane; Konold, Timothy R; Cornell, Dewey
2016-06-01
This study tested the association between school-wide measures of an authoritative school climate and high school dropout rates in a statewide sample of 315 high schools. Regression models at the school level of analysis used teacher and student measures of disciplinary structure, student support, and academic expectations to predict overall high school dropout rates. Analyses controlled for school demographics of school enrollment size, percentage of low-income students, percentage of minority students, and urbanicity. Consistent with authoritative school climate theory, moderation analyses found that when students perceive their teachers as supportive, high academic expectations are associated with lower dropout rates. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Røislien, Jo; Lossius, Hans Morten; Kristiansen, Thomas
2015-01-01
Background Trauma is a leading global cause of death. Trauma mortality rates are higher in rural areas, constituting a challenge for quality and equality in trauma care. The aim of the study was to explore population density and transport time to hospital care as possible predictors of geographical differences in mortality rates, and to what extent choice of statistical method might affect the analytical results and accompanying clinical conclusions. Methods Using data from the Norwegian Cause of Death registry, deaths from external causes 1998–2007 were analysed. Norway consists of 434 municipalities, and municipality population density and travel time to hospital care were entered as predictors of municipality mortality rates in univariate and multiple regression models of increasing model complexity. We fitted linear regression models with continuous and categorised predictors, as well as piecewise linear and generalised additive models (GAMs). Models were compared using Akaike's information criterion (AIC). Results Population density was an independent predictor of trauma mortality rates, while the contribution of transport time to hospital care was highly dependent on choice of statistical model. A multiple GAM or piecewise linear model was superior, and similar, in terms of AIC. However, while transport time was statistically significant in multiple models with piecewise linear or categorised predictors, it was not in GAM or standard linear regression. Conclusions Population density is an independent predictor of trauma mortality rates. The added explanatory value of transport time to hospital care is marginal and model-dependent, highlighting the importance of exploring several statistical models when studying complex associations in observational data. PMID:25972600
Austin, Peter C; Reeves, Mathew J
2013-03-01
Hospital report cards, in which outcomes following the provision of medical or surgical care are compared across health care providers, are being published with increasing frequency. Essential to the production of these reports is risk-adjustment, which allows investigators to account for differences in the distribution of patient illness severity across different hospitals. Logistic regression models are frequently used for risk adjustment in hospital report cards. Many applied researchers use the c-statistic (equivalent to the area under the receiver operating characteristic curve) of the logistic regression model as a measure of the credibility and accuracy of hospital report cards. To determine the relationship between the c-statistic of a risk-adjustment model and the accuracy of hospital report cards. Monte Carlo simulations were used to examine this issue. We examined the influence of 3 factors on the accuracy of hospital report cards: the c-statistic of the logistic regression model used for risk adjustment, the number of hospitals, and the number of patients treated at each hospital. The parameters used to generate the simulated datasets came from analyses of patients hospitalized with a diagnosis of acute myocardial infarction in Ontario, Canada. The c-statistic of the risk-adjustment model had, at most, a very modest impact on the accuracy of hospital report cards, whereas the number of patients treated at each hospital had a much greater impact. The c-statistic of a risk-adjustment model should not be used to assess the accuracy of a hospital report card.
Austin, Peter C.; Reeves, Mathew J.
2015-01-01
Background Hospital report cards, in which outcomes following the provision of medical or surgical care are compared across health care providers, are being published with increasing frequency. Essential to the production of these reports is risk-adjustment, which allows investigators to account for differences in the distribution of patient illness severity across different hospitals. Logistic regression models are frequently used for risk-adjustment in hospital report cards. Many applied researchers use the c-statistic (equivalent to the area under the receiver operating characteristic curve) of the logistic regression model as a measure of the credibility and accuracy of hospital report cards. Objectives To determine the relationship between the c-statistic of a risk-adjustment model and the accuracy of hospital report cards. Research Design Monte Carlo simulations were used to examine this issue. We examined the influence of three factors on the accuracy of hospital report cards: the c-statistic of the logistic regression model used for risk-adjustment, the number of hospitals, and the number of patients treated at each hospital. The parameters used to generate the simulated datasets came from analyses of patients hospitalized with a diagnosis of acute myocardial infarction in Ontario, Canada. Results The c-statistic of the risk-adjustment model had, at most, a very modest impact on the accuracy of hospital report cards, whereas the number of patients treated at each hospital had a much greater impact. Conclusions The c-statistic of a risk-adjustment model should not be used to assess the accuracy of a hospital report card. PMID:23295579
Changes in the timing of snowmelt and streamflow in Colorado: A response to recent warming
Clow, David W.
2010-01-01
Trends in the timing of snowmelt and associated runoff in Colorado were evaluated for the 1978-2007 water years using the regional Kendall test (RKT) on daily snow-water equivalent (SWE) data from snowpack telemetry (SNOTEL) sites and daily streamflow data from headwater streams. The RKT is a robust, nonparametric test that provides an increased power of trend detection by grouping data from multiple sites within a given geographic region. The RKT analyses indicated strong, pervasive trends in snowmelt and streamflow timing, which have shifted toward earlier in the year by a median of 2-3 weeks over the 29-yr study period. In contrast, relatively few statistically significant trends were detected using simple linear regression. RKT analyses also indicated that November-May air temperatures increased by a median of 0.9 degrees C decade-1, while 1 April SWE and maximum SWE declined by a median of 4.1 and 3.6 cm decade-1, respectively. Multiple linear regression models were created, using monthly air temperatures, snowfall, latitude, and elevation as explanatory variables to identify major controlling factors on snowmelt timing. The models accounted for 45% of the variance in snowmelt onset, and 78% of the variance in the snowmelt center of mass (when half the snowpack had melted). Variations in springtime air temperature and SWE explained most of the interannual variability in snowmelt timing. Regression coefficients for air temperature were negative, indicating that warm temperatures promote early melt. Regression coefficients for SWE, latitude, and elevation were positive, indicating that abundant snowfall tends to delay snowmelt, and snowmelt tends to occur later at northern latitudes and high elevations. Results from this study indicate that even the mountains of Colorado, with their high elevations and cold snowpacks, are experiencing substantial shifts in the timing of snowmelt and snowmelt runoff toward earlier in the year.
Zimmerman, Tammy M.
2006-01-01
The Lake Erie shoreline in Pennsylvania spans nearly 40 miles and is a valuable recreational resource for Erie County. Nearly 7 miles of the Lake Erie shoreline lies within Presque Isle State Park in Erie, Pa. Concentrations of Escherichia coli (E. coli) bacteria at permitted Presque Isle beaches occasionally exceed the single-sample bathing-water standard, resulting in unsafe swimming conditions and closure of the beaches. E. coli concentrations and other water-quality and environmental data collected at Presque Isle Beach 2 during the 2004 and 2005 recreational seasons were used to develop models using tobit regression analyses to predict E. coli concentrations. All variables statistically related to E. coli concentrations were included in the initial regression analyses, and after several iterations, only those explanatory variables that made the models significantly better at predicting E. coli concentrations were included in the final models. Regression models were developed using data from 2004, 2005, and the combined 2-year dataset. Variables in the 2004 model and the combined 2004-2005 model were log10 turbidity, rain weight, wave height (calculated), and wind direction. Variables in the 2005 model were log10 turbidity and wind direction. Explanatory variables not included in the final models were water temperature, streamflow, wind speed, and current speed; model results indicated these variables did not meet significance criteria at the 95-percent confidence level (probabilities were greater than 0.05). The predicted E. coli concentrations produced by the models were used to develop probabilities that concentrations would exceed the single-sample bathing-water standard for E. coli of 235 colonies per 100 milliliters. Analysis of the exceedence probabilities helped determine a threshold probability for each model, chosen such that the correct number of exceedences and nonexceedences was maximized and the number of false positives and false negatives was minimized. Future samples with computed exceedence probabilities higher than the selected threshold probability, as determined by the model, will likely exceed the E. coli standard and a beach advisory or closing may need to be issued; computed exceedence probabilities lower than the threshold probability will likely indicate the standard will not be exceeded. Additional data collected each year can be used to test and possibly improve the model. This study will aid beach managers in more rapidly determining when waters are not safe for recreational use and, subsequently, when to issue beach advisories or closings.
Barnett, Adrian Gerard
2016-01-01
Objective Foodborne illnesses in Australia, including salmonellosis, are estimated to cost over $A1.25 billion annually. The weather has been identified as being influential on salmonellosis incidence, as cases increase during summer, however time series modelling of salmonellosis is challenging because outbreaks cause strong autocorrelation. This study assesses whether switching models is an improved method of estimating weather–salmonellosis associations. Design We analysed weather and salmonellosis in South-East Queensland between 2004 and 2013 using 2 common regression models and a switching model, each with 21-day lags for temperature and precipitation. Results The switching model best fit the data, as judged by its substantial improvement in deviance information criterion over the regression models, less autocorrelated residuals and control of seasonality. The switching model estimated a 5°C increase in mean temperature and 10 mm precipitation were associated with increases in salmonellosis cases of 45.4% (95% CrI 40.4%, 50.5%) and 24.1% (95% CrI 17.0%, 31.6%), respectively. Conclusions Switching models improve on traditional time series models in quantifying weather–salmonellosis associations. A better understanding of how temperature and precipitation influence salmonellosis may identify where interventions can be made to lower the health and economic costs of salmonellosis. PMID:26916693
NASA Astrophysics Data System (ADS)
Grotti, Marco; Abelmoschi, Maria Luisa; Soggia, Francesco; Tiberiade, Christian; Frache, Roberto
2000-12-01
The multivariate effects of Na, K, Mg and Ca as nitrates on the electrothermal atomisation of manganese, cadmium and iron were studied by multiple linear regression modelling. Since the models proved to efficiently predict the effects of the considered matrix elements in a wide range of concentrations, they were applied to correct the interferences occurring in the determination of trace elements in seawater after pre-concentration of the analytes. In order to obtain a statistically significant number of samples, a large volume of the certified seawater reference materials CASS-3 and NASS-3 was treated with Chelex-100 resin; then, the chelating resin was separated from the solution, divided into several sub-samples, each of them was eluted with nitric acid and analysed by electrothermal atomic absorption spectrometry (for trace element determinations) and inductively coupled plasma optical emission spectrometry (for matrix element determinations). To minimise any other systematic error besides that due to matrix effects, accuracy of the pre-concentration step and contamination levels of the procedure were checked by inductively coupled plasma mass spectrometric measurements. Analytical results obtained by applying the multiple linear regression models were compared with those obtained with other calibration methods, such as external calibration using acid-based standards, external calibration using matrix-matched standards and the analyte addition technique. Empirical models proved to efficiently reduce interferences occurring in the analysis of real samples, allowing an improvement of accuracy better than for other calibration methods.
Seligman, D A; Pullinger, A G
2000-01-01
Confusion about the relationship of occlusion to temporomandibular disorders (TMD) persists. This study attempted to identify occlusal and attrition factors plus age that would characterize asymptomatic normal female subjects. A total of 124 female patients with intracapsular TMD were compared with 47 asymptomatic female controls for associations to 9 occlusal factors, 3 attrition severity measures, and age using classification tree, multiple stepwise logistic regression, and univariate analyses. Models were tested for accuracy (sensitivity and specificity) and total contribution to the variance. The classification tree model had 4 terminal nodes that used only anterior attrition and age. "Normals" were mainly characterized by low attrition levels, whereas patients had higher attrition and tended to be younger. The tree model was only moderately useful (sensitivity 63%, specificity 94%) in predicting normals. The logistic regression model incorporated unilateral posterior crossbite and mediotrusive attrition severity in addition to the 2 factors in the tree, but was slightly less accurate than the tree (sensitivity 51%, specificity 90%). When only occlusal factors were considered in the analysis, normals were additionally characterized by a lack of anterior open bite, smaller overjet, and smaller RCP-ICP slides. The log likelihood accounted for was similar for both the tree (pseudo R(2) = 29.38%; mean deviance = 0.95) and the multiple logistic regression (Cox Snell R(2) = 30.3%, mean deviance = 0.84) models. The occlusal and attrition factors studied were only moderately useful in differentiating normals from TMD patients.
Leffondré, Karen; Abrahamowicz, Michal; Siemiatycki, Jack
2003-12-30
Case-control studies are typically analysed using the conventional logistic model, which does not directly account for changes in the covariate values over time. Yet, many exposures may vary over time. The most natural alternative to handle such exposures would be to use the Cox model with time-dependent covariates. However, its application to case-control data opens the question of how to manipulate the risk sets. Through a simulation study, we investigate how the accuracy of the estimates of Cox's model depends on the operational definition of risk sets and/or on some aspects of the time-varying exposure. We also assess the estimates obtained from conventional logistic regression. The lifetime experience of a hypothetical population is first generated, and a matched case-control study is then simulated from this population. We control the frequency, the age at initiation, and the total duration of exposure, as well as the strengths of their effects. All models considered include a fixed-in-time covariate and one or two time-dependent covariate(s): the indicator of current exposure and/or the exposure duration. Simulation results show that none of the models always performs well. The discrepancies between the odds ratios yielded by logistic regression and the 'true' hazard ratio depend on both the type of the covariate and the strength of its effect. In addition, it seems that logistic regression has difficulty separating the effects of inter-correlated time-dependent covariates. By contrast, each of the two versions of Cox's model systematically induces either a serious under-estimation or a moderate over-estimation bias. The magnitude of the latter bias is proportional to the true effect, suggesting that an improved manipulation of the risk sets may eliminate, or at least reduce, the bias. Copyright 2003 JohnWiley & Sons, Ltd.
Liu, Xinyang; Qin, Shukui; Wang, Zhichao; Xu, Jianming; Xiong, Jianping; Bai, Yuxian; Wang, Zhehai; Yang, Yan; Sun, Guoping; Wang, Liwei; Zheng, Leizhen; Xu, Nong; Cheng, Ying; Guo, Weijian; Yu, Hao; Liu, Tianshu; Lagiou, Pagona; Li, Jin
2017-09-05
Reliable biomarkers of apatinib response in gastric cancer (GC) are lacking. We investigated the association between early presence of common adverse events (AEs) and clinical outcomes in metastatic GC patients. We conducted a retrospective cohort study using data on 269 apatinib-treated GC patients in two clinical trials. AEs were assessed at baseline until 28 days after the last dose of apatinib. Clinical outcomes were compared between patients with and without hypertension (HTN), proteinuria, or hand and foot syndrome (HFS) in the first 4 weeks. Time-to-event variables were assessed using Kaplan-Meier methods and Cox proportional hazard regression models. Binary endpoints were assessed using logistic regression models. Landmark analyses were performed as sensitivity analyses. Predictive model was analyzed, and risk scores were calculated to predict overall survival. Presence of AEs in the first 4 weeks was associated with prolonged median overall survival (169 vs. 103 days, log-rank p = 0.0039; adjusted hazard ratio (HR) 0.64, 95% confidence interval [CI] 0.64-0.84, p = 0.001), prolonged median progression-free survival (86.5 vs. 62 days, log-rank p = 0.0309; adjusted HR 0.69, 95% CI 0.53-0.91, p = 0.007), and increased disease control rate (54.67 vs. 32.77%; adjusted odds ratio 2.67, p < 0.001). Results remained significant in landmark analyses. The onset of any single AE or any combinations of the AEs were all statistically significantly associated with prolonged OS, except for the presence of proteinuria. An AE-based prediction model and subsequently derived scoring system showed high calibration and discrimination in predicting overall survival. Presence of HTN, proteinuria, or HFS during the first cycle of apatinib treatment was a viable biomarker of antitumor efficacy in metastatic GC patients.
Using Dual Regression to Investigate Network Shape and Amplitude in Functional Connectivity Analyses
Nickerson, Lisa D.; Smith, Stephen M.; Öngür, Döst; Beckmann, Christian F.
2017-01-01
Independent Component Analysis (ICA) is one of the most popular techniques for the analysis of resting state FMRI data because it has several advantageous properties when compared with other techniques. Most notably, in contrast to a conventional seed-based correlation analysis, it is model-free and multivariate, thus switching the focus from evaluating the functional connectivity of single brain regions identified a priori to evaluating brain connectivity in terms of all brain resting state networks (RSNs) that simultaneously engage in oscillatory activity. Furthermore, typical seed-based analysis characterizes RSNs in terms of spatially distributed patterns of correlation (typically by means of simple Pearson's coefficients) and thereby confounds together amplitude information of oscillatory activity and noise. ICA and other regression techniques, on the other hand, retain magnitude information and therefore can be sensitive to both changes in the spatially distributed nature of correlations (differences in the spatial pattern or “shape”) as well as the amplitude of the network activity. Furthermore, motion can mimic amplitude effects so it is crucial to use a technique that retains such information to ensure that connectivity differences are accurately localized. In this work, we investigate the dual regression approach that is frequently applied with group ICA to assess group differences in resting state functional connectivity of brain networks. We show how ignoring amplitude effects and how excessive motion corrupts connectivity maps and results in spurious connectivity differences. We also show how to implement the dual regression to retain amplitude information and how to use dual regression outputs to identify potential motion effects. Two key findings are that using a technique that retains magnitude information, e.g., dual regression, and using strict motion criteria are crucial for controlling both network amplitude and motion-related amplitude effects, respectively, in resting state connectivity analyses. We illustrate these concepts using realistic simulated resting state FMRI data and in vivo data acquired in healthy subjects and patients with bipolar disorder and schizophrenia. PMID:28348512
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ho, Clifford Kuofei
Chemical transport through human skin can play a significant role in human exposure to toxic chemicals in the workplace, as well as to chemical/biological warfare agents in the battlefield. The viability of transdermal drug delivery also relies on chemical transport processes through the skin. Models of percutaneous absorption are needed for risk-based exposure assessments and drug-delivery analyses, but previous mechanistic models have been largely deterministic. A probabilistic, transient, three-phase model of percutaneous absorption of chemicals has been developed to assess the relative importance of uncertain parameters and processes that may be important to risk-based assessments. Penetration routes through the skinmore » that were modeled include the following: (1) intercellular diffusion through the multiphase stratum corneum; (2) aqueous-phase diffusion through sweat ducts; and (3) oil-phase diffusion through hair follicles. Uncertainty distributions were developed for the model parameters, and a Monte Carlo analysis was performed to simulate probability distributions of mass fluxes through each of the routes. Sensitivity analyses using stepwise linear regression were also performed to identify model parameters that were most important to the simulated mass fluxes at different times. This probabilistic analysis of percutaneous absorption (PAPA) method has been developed to improve risk-based exposure assessments and transdermal drug-delivery analyses, where parameters and processes can be highly uncertain.« less
Preliminary Survey on TRY Forest Traits and Growth Index Relations - New Challenges
NASA Astrophysics Data System (ADS)
Lyubenova, Mariyana; Kattge, Jens; van Bodegom, Peter; Chikalanov, Alexandre; Popova, Silvia; Zlateva, Plamena; Peteva, Simona
2016-04-01
Forest ecosystems provide critical ecosystem goods and services, including food, fodder, water, shelter, nutrient cycling, and cultural and recreational value. Forests also store carbon, provide habitat for a wide range of species and help alleviate land degradation and desertification. Thus they have a potentially significant role to play in climate change adaptation planning through maintaining ecosystem services and providing livelihood options. Therefore the study of forest traits is such an important issue not just for individual countries but for the planet as a whole. We need to know what functional relations between forest traits exactly can express TRY data base and haw it will be significant for the global modeling and IPBES. The study of the biodiversity characteristics at all levels and functional links between them is extremely important for the selection of key indicators for assessing biodiversity and ecosystem services for sustainable natural capital control. By comparing the available information in tree data bases: TRY, ITR (International Tree Ring) and SP-PAM the 42 tree species are selected for the traits analyses. The dependence between location characteristics (latitude, longitude, altitude, annual precipitation, annual temperature and soil type) and forest traits (specific leaf area, leaf weight ratio, wood density and growth index) is studied by by multiply regression analyses (RDA) using the statistical software package Canoco 4.5. The Pearson correlation coefficient (measure of linear correlation), Kendal rank correlation coefficient (non parametric measure of statistical dependence) and Spearman correlation coefficient (monotonic function relationship between two variables) are calculated for each pair of variables (indexes) and species. After analysis of above mentioned correlation coefficients the dimensional linear regression models, multidimensional linear and nonlinear regression models and multidimensional neural networks models are built. The strongest dependence between It and WD was obtained. The research will support the work on: Strategic Plan for Biodiversity 2011-2020, modelling and implementation of ecosystem-based approaches to climate change adaptation and disaster risk reduction. Key words: Specific leaf area (SLA), Leaf weight ratio (LWR), Wood density (WD), Growth index (It)
Li, Shi; Batterman, Stuart; Wasilevich, Elizabeth; Wahl, Robert; Wirth, Julie; Su, Feng-Chiao; Mukherjee, Bhramar
2011-11-01
Asthma morbidity has been associated with ambient air pollutants in time-series and case-crossover studies. In such study designs, threshold effects of air pollutants on asthma outcomes have been relatively unexplored, which are of potential interest for exploring concentration-response relationships. This study analyzes daily data on the asthma morbidity experienced by the pediatric Medicaid population (ages 2-18 years) of Detroit, Michigan and concentrations of pollutants fine particles (PM2.5), CO, NO2 and SO2 for the 2004-2006 period, using both time-series and case-crossover designs. We use a simple, testable and readily implementable profile likelihood-based approach to estimate threshold parameters in both designs. Evidence of significant increases in daily acute asthma events was found for SO2 and PM2.5, and a significant threshold effect was estimated for PM2.5 at 13 and 11 μg m(-3) using generalized additive models and conditional logistic regression models, respectively. Stronger effect sizes above the threshold were typically noted compared to standard linear relationship, e.g., in the time series analysis, an interquartile range increase (9.2 μg m(-3)) in PM2.5 (5-day-moving average) had a risk ratio of 1.030 (95% CI: 1.001, 1.061) in the generalized additive models, and 1.066 (95% CI: 1.031, 1.102) in the threshold generalized additive models. The corresponding estimates for the case-crossover design were 1.039 (95% CI: 1.013, 1.066) in the conditional logistic regression, and 1.054 (95% CI: 1.023, 1.086) in the threshold conditional logistic regression. This study indicates that the associations of SO2 and PM2.5 concentrations with asthma emergency department visits and hospitalizations, as well as the estimated PM2.5 threshold were fairly consistent across time-series and case-crossover analyses, and suggests that effect estimates based on linear models (without thresholds) may underestimate the true risk. Copyright © 2011 Elsevier Inc. All rights reserved.
Validity of Treadmill-Derived Critical Speed on Predicting 5000-Meter Track-Running Performance.
Nimmerichter, Alfred; Novak, Nina; Triska, Christoph; Prinz, Bernhard; Breese, Brynmor C
2017-03-01
Nimmerichter, A, Novak, N, Triska, C, Prinz, B, and Breese, BC. Validity of treadmill-derived critical speed on predicting 5,000-meter track-running performance. J Strength Cond Res 31(3): 706-714, 2017-To evaluate 3 models of critical speed (CS) for the prediction of 5,000-m running performance, 16 trained athletes completed an incremental test on a treadmill to determine maximal aerobic speed (MAS) and 3 randomly ordered runs to exhaustion at the [INCREMENT]70% intensity, at 110% and 98% of MAS. Critical speed and the distance covered above CS (D') were calculated using the hyperbolic speed-time (HYP), the linear distance-time (LIN), and the linear speed inverse-time model (INV). Five thousand meter performance was determined on a 400-m running track. Individual predictions of 5,000-m running time (t = [5,000-D']/CS) and speed (s = D'/t + CS) were calculated across the 3 models in addition to multiple regression analyses. Prediction accuracy was assessed with the standard error of estimate (SEE) from linear regression analysis and the mean difference expressed in units of measurement and coefficient of variation (%). Five thousand meter running performance (speed: 4.29 ± 0.39 m·s; time: 1,176 ± 117 seconds) was significantly better than the predictions from all 3 models (p < 0.0001). The mean difference was 65-105 seconds (5.7-9.4%) for time and -0.22 to -0.34 m·s (-5.0 to -7.5%) for speed. Predictions from multiple regression analyses with CS and D' as predictor variables were not significantly different from actual running performance (-1.0 to 1.1%). The SEE across all models and predictions was approximately 65 seconds or 0.20 m·s and is therefore considered as moderate. The results of this study have shown the importance of aerobic and anaerobic energy system contribution to predict 5,000-m running performance. Using estimates of CS and D' is valuable for predicting performance over race distances of 5,000 m.
Morfeld, Peter; Spallek, Michael
2015-01-01
Vermeulen et al. 2014 published a meta-regression analysis of three relevant epidemiological US studies (Steenland et al. 1998, Garshick et al. 2012, Silverman et al. 2012) that estimated the association between occupational diesel engine exhaust (DEE) exposure and lung cancer mortality. The DEE exposure was measured as cumulative exposure to estimated respirable elemental carbon in μg/m(3)-years. Vermeulen et al. 2014 found a statistically significant dose-response association and described elevated lung cancer risks even at very low exposures. We performed an extended re-analysis using different modelling approaches (fixed and random effects regression analyses, Greenland/Longnecker method) and explored the impact of varying input data (modified coefficients of Garshick et al. 2012, results from Crump et al. 2015 replacing Silverman et al. 2012, modified analysis of Moehner et al. 2013). We reproduced the individual and main meta-analytical results of Vermeulen et al. 2014. However, our analysis demonstrated a heterogeneity of the baseline relative risk levels between the three studies. This heterogeneity was reduced after the coefficients of Garshick et al. 2012 were modified while the dose coefficient dropped by an order of magnitude for this study and was far from being significant (P = 0.6). A (non-significant) threshold estimate for the cumulative DEE exposure was found at 150 μg/m(3)-years when extending the meta-analyses of the three studies by hockey-stick regression modelling (including the modified coefficients for Garshick et al. 2012). The data used by Vermeulen and colleagues led to the highest relative risk estimate across all sensitivity analyses performed. The lowest relative risk estimate was found after exclusion of the explorative study by Steenland et al. 1998 in a meta-regression analysis of Garshick et al. 2012 (modified), Silverman et al. 2012 (modified according to Crump et al. 2015) and Möhner et al. 2013. The meta-coefficient was estimated to be about 10-20 % of the main effect estimate in Vermeulen et al. 2014 in this analysis. The findings of Vermeulen et al. 2014 should not be used without reservations in any risk assessments. This is particularly true for the low end of the exposure scale.
Can Emotional and Behavioral Dysregulation in Youth Be Decoded from Functional Neuroimaging?
Portugal, Liana C L; Rosa, Maria João; Rao, Anil; Bebko, Genna; Bertocci, Michele A; Hinze, Amanda K; Bonar, Lisa; Almeida, Jorge R C; Perlman, Susan B; Versace, Amelia; Schirda, Claudiu; Travis, Michael; Gill, Mary Kay; Demeter, Christine; Diwadkar, Vaibhav A; Ciuffetelli, Gary; Rodriguez, Eric; Forbes, Erika E; Sunshine, Jeffrey L; Holland, Scott K; Kowatch, Robert A; Birmaher, Boris; Axelson, David; Horwitz, Sarah M; Arnold, Eugene L; Fristad, Mary A; Youngstrom, Eric A; Findling, Robert L; Pereira, Mirtes; Oliveira, Leticia; Phillips, Mary L; Mourao-Miranda, Janaina
2016-01-01
High comorbidity among pediatric disorders characterized by behavioral and emotional dysregulation poses problems for diagnosis and treatment, and suggests that these disorders may be better conceptualized as dimensions of abnormal behaviors. Furthermore, identifying neuroimaging biomarkers related to dimensional measures of behavior may provide targets to guide individualized treatment. We aimed to use functional neuroimaging and pattern regression techniques to determine whether patterns of brain activity could accurately decode individual-level severity on a dimensional scale measuring behavioural and emotional dysregulation at two different time points. A sample of fifty-seven youth (mean age: 14.5 years; 32 males) was selected from a multi-site study of youth with parent-reported behavioral and emotional dysregulation. Participants performed a block-design reward paradigm during functional Magnetic Resonance Imaging (fMRI). Pattern regression analyses consisted of Relevance Vector Regression (RVR) and two cross-validation strategies implemented in the Pattern Recognition for Neuroimaging toolbox (PRoNTo). Medication was treated as a binary confounding variable. Decoded and actual clinical scores were compared using Pearson's correlation coefficient (r) and mean squared error (MSE) to evaluate the models. Permutation test was applied to estimate significance levels. Relevance Vector Regression identified patterns of neural activity associated with symptoms of behavioral and emotional dysregulation at the initial study screen and close to the fMRI scanning session. The correlation and the mean squared error between actual and decoded symptoms were significant at the initial study screen and close to the fMRI scanning session. However, after controlling for potential medication effects, results remained significant only for decoding symptoms at the initial study screen. Neural regions with the highest contribution to the pattern regression model included cerebellum, sensory-motor and fronto-limbic areas. The combination of pattern regression models and neuroimaging can help to determine the severity of behavioral and emotional dysregulation in youth at different time points.
NASA Astrophysics Data System (ADS)
Erener, Arzu; Sivas, A. Abdullah; Selcuk-Kestel, A. Sevtap; Düzgün, H. Sebnem
2017-07-01
All of the quantitative landslide susceptibility mapping (QLSM) methods requires two basic data types, namely, landslide inventory and factors that influence landslide occurrence (landslide influencing factors, LIF). Depending on type of landslides, nature of triggers and LIF, accuracy of the QLSM methods differs. Moreover, how to balance the number of 0 (nonoccurrence) and 1 (occurrence) in the training set obtained from the landslide inventory and how to select which one of the 1's and 0's to be included in QLSM models play critical role in the accuracy of the QLSM. Although performance of various QLSM methods is largely investigated in the literature, the challenge of training set construction is not adequately investigated for the QLSM methods. In order to tackle this challenge, in this study three different training set selection strategies along with the original data set is used for testing the performance of three different regression methods namely Logistic Regression (LR), Bayesian Logistic Regression (BLR) and Fuzzy Logistic Regression (FLR). The first sampling strategy is proportional random sampling (PRS), which takes into account a weighted selection of landslide occurrences in the sample set. The second method, namely non-selective nearby sampling (NNS), includes randomly selected sites and their surrounding neighboring points at certain preselected distances to include the impact of clustering. Selective nearby sampling (SNS) is the third method, which concentrates on the group of 1's and their surrounding neighborhood. A randomly selected group of landslide sites and their neighborhood are considered in the analyses similar to NNS parameters. It is found that LR-PRS, FLR-PRS and BLR-Whole Data set-ups, with order, yield the best fits among the other alternatives. The results indicate that in QLSM based on regression models, avoidance of spatial correlation in the data set is critical for the model's performance.
Sperm Retrieval in Patients with Klinefelter Syndrome: A Skewed Regression Model Analysis.
Chehrazi, Mohammad; Rahimiforoushani, Abbas; Sabbaghian, Marjan; Nourijelyani, Keramat; Sadighi Gilani, Mohammad Ali; Hoseini, Mostafa; Vesali, Samira; Yaseri, Mehdi; Alizadeh, Ahad; Mohammad, Kazem; Samani, Reza Omani
2017-01-01
The most common chromosomal abnormality due to non-obstructive azoospermia (NOA) is Klinefelter syndrome (KS) which occurs in 1-1.72 out of 500-1000 male infants. The probability of retrieving sperm as the outcome could be asymmetrically different between patients with and without KS, therefore logistic regression analysis is not a well-qualified test for this type of data. This study has been designed to evaluate skewed regression model analysis for data collected from microsurgical testicular sperm extraction (micro-TESE) among azoospermic patients with and without non-mosaic KS syndrome. This cohort study compared the micro-TESE outcome between 134 men with classic KS and 537 men with NOA and normal karyotype who were referred to Royan Institute between 2009 and 2011. In addition to our main outcome, which was sperm retrieval, we also used logistic and skewed regression analyses to compare the following demographic and hormonal factors: age, level of follicle stimulating hormone (FSH), luteinizing hormone (LH), and testosterone between the two groups. A comparison of the micro-TESE between the KS and control groups showed a success rate of 28.4% (38/134) for the KS group and 22.2% (119/537) for the control group. In the KS group, a significantly difference (P<0.001) existed between testosterone levels for the successful sperm retrieval group (3.4 ± 0.48 mg/mL) compared to the unsuccessful sperm retrieval group (2.33 ± 0.23 mg/mL). The index for quasi Akaike information criterion (QAIC) had a goodness of fit of 74 for the skewed model which was lower than logistic regression (QAIC=85). According to the results, skewed regression is more efficient in estimating sperm retrieval success when the data from patients with KS are analyzed. This finding should be investigated by conducting additional studies with different data structures.
ERIC Educational Resources Information Center
Shafiq, M. Najeeb
2013-01-01
Using quantile regression analyses, this study examines gender gaps in mathematics, science, and reading in Azerbaijan, Indonesia, Jordan, the Kyrgyz Republic, Qatar, Tunisia, and Turkey among 15-year-old students. The analyses show that girls in Azerbaijan achieve as well as boys in mathematics and science and overachieve in reading. In Jordan,…
Naya, Hugo; Urioste, Jorge I; Chang, Yu-Mei; Rodrigues-Motta, Mariana; Kremer, Roberto; Gianola, Daniel
2008-01-01
Dark spots in the fleece area are often associated with dark fibres in wool, which limits its competitiveness with other textile fibres. Field data from a sheep experiment in Uruguay revealed an excess number of zeros for dark spots. We compared the performance of four Poisson and zero-inflated Poisson (ZIP) models under four simulation scenarios. All models performed reasonably well under the same scenario for which the data were simulated. The deviance information criterion favoured a Poisson model with residual, while the ZIP model with a residual gave estimates closer to their true values under all simulation scenarios. Both Poisson and ZIP models with an error term at the regression level performed better than their counterparts without such an error. Field data from Corriedale sheep were analysed with Poisson and ZIP models with residuals. Parameter estimates were similar for both models. Although the posterior distribution of the sire variance was skewed due to a small number of rams in the dataset, the median of this variance suggested a scope for genetic selection. The main environmental factor was the age of the sheep at shearing. In summary, age related processes seem to drive the number of dark spots in this breed of sheep. PMID:18558072
The use of modelling to evaluate and adapt strategies for animal disease control.
Saegerman, C; Porter, S R; Humblet, M F
2011-08-01
Disease is often associated with debilitating clinical signs, disorders or production losses in animals and/or humans, leading to severe socio-economic repercussions. This explains the high priority that national health authorities and international organisations give to selecting control strategies for and the eradication of specific diseases. When a control strategy is selected and implemented, an effective method of evaluating its efficacy is through modelling. To illustrate the usefulness of models in evaluating control strategies, the authors describe several examples in detail, including three examples of classification and regression tree modelling to evaluate and improve the early detection of disease: West Nile fever in equids, bovine spongiform encephalopathy (BSE) and multifactorial diseases, such as colony collapse disorder (CCD) in the United States. Also examined are regression modelling to evaluate skin test practices and the efficacy of an awareness campaign for bovine tuberculosis (bTB); mechanistic modelling to monitor the progress of a control strategy for BSE; and statistical nationwide modelling to analyse the spatio-temporal dynamics of bTB and search for potential risk factors that could be used to target surveillance measures more effectively. In the accurate application of models, an interdisciplinary rather than a multidisciplinary approach is required, with the fewest assumptions possible.
Zhang, Xiang; Faries, Douglas E; Boytsov, Natalie; Stamey, James D; Seaman, John W
2016-09-01
Observational studies are frequently used to assess the effectiveness of medical interventions in routine clinical practice. However, the use of observational data for comparative effectiveness is challenged by selection bias and the potential of unmeasured confounding. This is especially problematic for analyses using a health care administrative database, in which key clinical measures are often not available. This paper provides an approach to conducting a sensitivity analyses to investigate the impact of unmeasured confounding in observational studies. In a real world osteoporosis comparative effectiveness study, the bone mineral density (BMD) score, an important predictor of fracture risk and a factor in the selection of osteoporosis treatments, is unavailable in the data base and lack of baseline BMD could potentially lead to significant selection bias. We implemented Bayesian twin-regression models, which simultaneously model both the observed outcome and the unobserved unmeasured confounder, using information from external sources. A sensitivity analysis was also conducted to assess the robustness of our conclusions to changes in such external data. The use of Bayesian modeling in this study suggests that the lack of baseline BMD did have a strong impact on the analysis, reversing the direction of the estimated effect (odds ratio of fracture incidence at 24 months: 0.40 vs. 1.36, with/without adjusting for unmeasured baseline BMD). The Bayesian twin-regression models provide a flexible sensitivity analysis tool to quantitatively assess the impact of unmeasured confounding in observational studies. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Separation in Logistic Regression: Causes, Consequences, and Control.
Mansournia, Mohammad Ali; Geroldinger, Angelika; Greenland, Sander; Heinze, Georg
2018-04-01
Separation is encountered in regression models with a discrete outcome (such as logistic regression) where the covariates perfectly predict the outcome. It is most frequent under the same conditions that lead to small-sample and sparse-data bias, such as presence of a rare outcome, rare exposures, highly correlated covariates, or covariates with strong effects. In theory, separation will produce infinite estimates for some coefficients. In practice, however, separation may be unnoticed or mishandled because of software limits in recognizing and handling the problem and in notifying the user. We discuss causes of separation in logistic regression and describe how common software packages deal with it. We then describe methods that remove separation, focusing on the same penalized-likelihood techniques used to address more general sparse-data problems. These methods improve accuracy, avoid software problems, and allow interpretation as Bayesian analyses with weakly informative priors. We discuss likelihood penalties, including some that can be implemented easily with any software package, and their relative advantages and disadvantages. We provide an illustration of ideas and methods using data from a case-control study of contraceptive practices and urinary tract infection.
Spatial Autocorrelation Approaches to Testing Residuals from Least Squares Regression.
Chen, Yanguang
2016-01-01
In geo-statistics, the Durbin-Watson test is frequently employed to detect the presence of residual serial correlation from least squares regression analyses. However, the Durbin-Watson statistic is only suitable for ordered time or spatial series. If the variables comprise cross-sectional data coming from spatial random sampling, the test will be ineffectual because the value of Durbin-Watson's statistic depends on the sequence of data points. This paper develops two new statistics for testing serial correlation of residuals from least squares regression based on spatial samples. By analogy with the new form of Moran's index, an autocorrelation coefficient is defined with a standardized residual vector and a normalized spatial weight matrix. Then by analogy with the Durbin-Watson statistic, two types of new serial correlation indices are constructed. As a case study, the two newly presented statistics are applied to a spatial sample of 29 China's regions. These results show that the new spatial autocorrelation models can be used to test the serial correlation of residuals from regression analysis. In practice, the new statistics can make up for the deficiencies of the Durbin-Watson test.
Hwang, Bosun; Han, Jonghee; Choi, Jong Min; Park, Kwang Suk
2008-11-01
The purpose of this study was to develop an unobtrusive energy expenditure (EE) measurement system using an infrared (IR) sensor-based activity monitoring system to measure indoor activities and to estimate individual quantitative EE. IR-sensor activation counts were measured with a Bluetooth-based monitoring system and the standard EE was calculated using an established regression equation. Ten male subjects participated in the experiment and three different EE measurement systems (gas analyzer, accelerometer, IR sensor) were used simultaneously in order to determine the regression equation and evaluate the performance. As a standard measurement, oxygen consumption was simultaneously measured by a portable metabolic system (Metamax 3X, Cortex, Germany). A single room experiment was performed to develop a regression model of the standard EE measurement from the proposed IR sensor-based measurement system. In addition, correlation and regression analyses were done to compare the performance of the IR system with that of the Actigraph system. We determined that our proposed IR-based EE measurement system shows a similar correlation to the Actigraph system with the standard measurement system.
[The influence of perceived discrimination on health in migrants].
Igel, Ulrike; Brähler, Elmar; Grande, Gesine
2010-05-01
The aim of the study was to investigate the influence of racial discrimination on subjective health in migrants. The sample included 1.844 migrants from the SOEP. Discrimination was assessed by two items. Socioeconomic status, country of origin, and health behavior were included in multivariate regression models to control for effects on health. Differential models with regard to gender and origin were analysed. Migrants who experienced discrimination report a worse health status. Discrimination determines mental and physical health of migrants. There are differences in models due to gender and origin. In addition to socioeconomic factors experienced discrimination should be taken into account as a psycho-social stressor of migrants.
NASA Astrophysics Data System (ADS)
Kutzbach, L.; Schneider, J.; Sachs, T.; Giebels, M.; Nykänen, H.; Shurpali, N. J.; Martikainen, P. J.; Alm, J.; Wilmking, M.
2007-07-01
Closed (non-steady state) chambers are widely used for quantifying carbon dioxide (CO2) fluxes between soils or low-stature canopies and the atmosphere. It is well recognised that covering a soil or vegetation by a closed chamber inherently disturbs the natural CO2 fluxes by altering the concentration gradients between the soil, the vegetation and the overlying air. Thus, the driving factors of CO2 fluxes are not constant during the closed chamber experiment, and no linear increase or decrease of CO2 concentration over time within the chamber headspace can be expected. Nevertheless, linear regression has been applied for calculating CO2 fluxes in many recent, partly influential, studies. This approach was justified by keeping the closure time short and assuming the concentration change over time to be in the linear range. Here, we test if the application of linear regression is really appropriate for estimating CO2 fluxes using closed chambers over short closure times and if the application of nonlinear regression is necessary. We developed a nonlinear exponential regression model from diffusion and photosynthesis theory. This exponential model was tested with four different datasets of CO2 flux measurements (total number: 1764) conducted at three peatland sites in Finland and a tundra site in Siberia. The flux measurements were performed using transparent chambers on vegetated surfaces and opaque chambers on bare peat surfaces. Thorough analyses of residuals demonstrated that linear regression was frequently not appropriate for the determination of CO2 fluxes by closed-chamber methods, even if closure times were kept short. The developed exponential model was well suited for nonlinear regression of the concentration over time c(t) evolution in the chamber headspace and estimation of the initial CO2 fluxes at closure time for the majority of experiments. CO2 flux estimates by linear regression can be as low as 40% of the flux estimates of exponential regression for closure times of only two minutes and even lower for longer closure times. The degree of underestimation increased with increasing CO2 flux strength and is dependent on soil and vegetation conditions which can disturb not only the quantitative but also the qualitative evaluation of CO2 flux dynamics. The underestimation effect by linear regression was observed to be different for CO2 uptake and release situations which can lead to stronger bias in the daily, seasonal and annual CO2 balances than in the individual fluxes. To avoid serious bias of CO2 flux estimates based on closed chamber experiments, we suggest further tests using published datasets and recommend the use of nonlinear regression models for future closed chamber studies.
Zijlema, W L; Wolf, K; Emeny, R; Ladwig, K H; Peters, A; Kongsgård, H; Hveem, K; Kvaløy, K; Yli-Tuomi, T; Partonen, T; Lanki, T; Eeftens, M; de Hoogh, K; Brunekreef, B; Stolk, R P; Rosmalen, J G M
2016-03-01
Exposure to ambient air pollution may be associated with impaired mental health, including depression. However, evidence originates mainly from animal studies and epidemiological studies in specific subgroups. We investigated the association between air pollution and depressed mood in four European general population cohorts. Data were obtained from LifeLines (the Netherlands), KORA (Germany), HUNT (Norway), and FINRISK (Finland). Residential exposure to particles (PM2.5, PM2.5absorbance, PM10) and nitrogen dioxide (NO2) was estimated using land use regression (LUR) models developed for the European Study of Cohorts for Air Pollution Effects (ESCAPE) and using European wide LUR models. Depressed mood was assessed with interviews and questionnaires. Logistic regression analyses were used to investigate the cohort specific associations between air pollution and depressed mood. A total of 70,928 participants were included in our analyses. Depressed mood ranged from 1.6% (KORA) to 11.3% (FINRISK). Cohort specific associations of the air pollutants and depressed mood showed heterogeneous results. For example, positive associations were found for NO2 in LifeLines (odds ratio [OR]=1.34; 95% CI: 1.17, 1.53 per 10 μg/m(3) increase in NO2), whereas negative associations were found in HUNT (OR=0.79; 95% CI: 0.66, 0.94 per 10 μg/m(3) increase in NO2). Our analyses of four European general population cohorts found no consistent evidence for an association between ambient air pollution and depressed mood. Copyright © 2015 Elsevier GmbH. All rights reserved.
Villarrasa-Sapiña, Israel; Serra-Añó, Pilar; Pardo-Ibáñez, Alberto; Gonzalez, Luis-Millán; García-Massó, Xavier
2017-01-01
Obesity is now a serious worldwide challenge, especially in children. This condition can cause a number of different health problems, including musculoskeletal disorders, some of which are due to mechanical stress caused by excess body weight. The aim of this study was to determine the association between body composition and the vertical ground reaction force produced during walking in obese children. Sixteen children participated in the study, six females and ten males [11.5 (1.2) years old, 69.8 (15.5) kg, 1.56 (0.09) m, and 28.36 (3.74) kg/m 2 of body mass index (BMI)]. Total weight, lean mass and fat mass were measured by dual-energy X-ray absorptiometry and vertical forces while walking were obtained by a force platform. The vertical force variables analysed were impact and propulsive forces, and the rate of development of both. Multiple regression models for each vertical force parameter were calculated using the body composition variables as input. The impact force regression model was found to be positively related to the weight of obese children and negatively related to lean mass. The regression model showed lean mass was positively related to the propulsive rate. Finally, regression models for impact and propulsive force showed a direct relationship with body weight. Impact force is positively related to the weight of obese children, but lean mass helps to reduce the impact force in this population. Exercise could help obese persons to reduce their total body weight and increase their lean mass, thus reducing impact forces during sports and other activities. Copyright © 2016 Elsevier Ltd. All rights reserved.
Sources of Biased Inference in Alcohol and Drug Services Research: An Instrumental Variable Approach
Schmidt, Laura A.; Tam, Tammy W.; Larson, Mary Jo
2012-01-01
Objective: This study examined the potential for biased inference due to endogeneity when using standard approaches for modeling the utilization of alcohol and drug treatment. Method: Results from standard regression analysis were compared with those that controlled for endogeneity using instrumental variables estimation. Comparable models predicted the likelihood of receiving alcohol treatment based on the widely used Aday and Andersen medical care–seeking model. Data were from the National Epidemiologic Survey on Alcohol and Related Conditions and included a representative sample of adults in households and group quarters throughout the contiguous United States. Results: Findings suggested that standard approaches for modeling treatment utilization are prone to bias because of uncontrolled reverse causation and omitted variables. Compared with instrumental variables estimation, standard regression analyses produced downwardly biased estimates of the impact of alcohol problem severity on the likelihood of receiving care. Conclusions: Standard approaches for modeling service utilization are prone to underestimating the true effects of problem severity on service use. Biased inference could lead to inaccurate policy recommendations, for example, by suggesting that people with milder forms of substance use disorder are more likely to receive care than is actually the case. PMID:22152672
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ye, Sheng; Li, Hongyi; Huang, Maoyi
2014-07-21
Subsurface stormflow is an important component of the rainfall–runoff response, especially in steep terrain. Its contribution to total runoff is, however, poorly represented in the current generation of land surface models. The lack of physical basis of these common parameterizations precludes a priori estimation of the stormflow (i.e. without calibration), which is a major drawback for prediction in ungauged basins, or for use in global land surface models. This paper is aimed at deriving regionalized parameterizations of the storage–discharge relationship relating to subsurface stormflow from a top–down empirical data analysis of streamflow recession curves extracted from 50 eastern United Statesmore » catchments. Detailed regression analyses were performed between parameters of the empirical storage–discharge relationships and the controlling climate, soil and topographic characteristics. The regression analyses performed on empirical recession curves at catchment scale indicated that the coefficient of the power-law form storage–discharge relationship is closely related to the catchment hydrologic characteristics, which is consistent with the hydraulic theory derived mainly at the hillslope scale. As for the exponent, besides the role of field scale soil hydraulic properties as suggested by hydraulic theory, it is found to be more strongly affected by climate (aridity) at the catchment scale. At a fundamental level these results point to the need for more detailed exploration of the co-dependence of soil, vegetation and topography with climate.« less
Coull, Brent A.; Just, Allan C.; Maxwell, Sarah L.; Schwartz, Joel; Gryparis, Alexandros; Kloog, Itai; Wright, Rosalind J.; Wright, Robert O.
2015-01-01
Background Prenatal traffic-related air pollution exposure is linked to adverse birth outcomes. However, modifying effects of maternal body mass index (BMI) and infant sex remain virtually unexplored. Objectives We examined whether associations between prenatal air pollution and birth weight differed by sex and maternal BMI in 670 urban ethnically mixed mother-child pairs. Methods Black carbon (BC) levels were estimated using a validated spatio-temporal land-use regression (LUR) model; fine particulate matter (PM2.5) was estimated using a hybrid LUR model incorporating satellite-derived Aerosol Optical Depth measures. Using stratified multivariable-adjusted regression analyses, we examined whether associations between prenatal air pollution and calculated birth weight for gestational age (BWGA) z-scores varied by sex and maternal pre-pregnancy BMI. Results Median birth weight was 3.3±0.6 kg; 33% of mothers were obese (BMI ≥30 kg/m3). In stratified analyses, the association between higher PM2.5 and lower birth weight was significant in males of obese mothers (−0.42 unit of BWGA z-score change per IQR increase in PM2.5, 95%CI: −0.79 to −0.06) ( PM2.5 × sex × obesity Pinteraction=0.02). Results were similar for BC models (Pinteraction=0.002). Conclusions Associations of prenatal exposure to traffic-related air pollution and reduced birth weight were most evident in males born to obese mothers. PMID:25601728
2011-01-01
Principal component regression is a multivariate data analysis approach routinely used to predict neurochemical concentrations from in vivo fast-scan cyclic voltammetry measurements. This mathematical procedure can rapidly be employed with present day computer programming languages. Here, we evaluate several methods that can be used to evaluate and improve multivariate concentration determination. The cyclic voltammetric representation of the calculated regression vector is shown to be a valuable tool in determining whether the calculated multivariate model is chemically appropriate. The use of Cook’s distance successfully identified outliers contained within in vivo fast-scan cyclic voltammetry training sets. This work also presents the first direct interpretation of a residual color plot and demonstrated the effect of peak shifts on predicted dopamine concentrations. Finally, separate analyses of smaller increments of a single continuous measurement could not be concatenated without substantial error in the predicted neurochemical concentrations due to electrode drift. Taken together, these tools allow for the construction of more robust multivariate calibration models and provide the first approach to assess the predictive ability of a procedure that is inherently impossible to validate because of the lack of in vivo standards. PMID:21966586
Visual abilities distinguish pitchers from hitters in professional baseball.
Klemish, David; Ramger, Benjamin; Vittetoe, Kelly; Reiter, Jerome P; Tokdar, Surya T; Appelbaum, Lawrence Gregory
2018-01-01
This study aimed to evaluate the possibility that differences in sensorimotor abilities exist between hitters and pitchers in a large cohort of baseball players of varying levels of experience. Secondary data analysis was performed on 9 sensorimotor tasks comprising the Nike Sensory Station assessment battery. Bayesian hierarchical regression modelling was applied to test for differences between pitchers and hitters in data from 566 baseball players (112 high school, 85 college, 369 professional) collected at 20 testing centres. Explanatory variables including height, handedness, eye dominance, concussion history, and player position were modelled along with age curves using basis regression splines. Regression analyses revealed better performance for hitters relative to pitchers at the professional level in the visual clarity and depth perception tasks, but these differences did not exist at the high school or college levels. No significant differences were observed in the other 7 measures of sensorimotor capabilities included in the test battery, and no systematic biases were found between the testing centres. These findings, indicating that professional-level hitters have better visual acuity and depth perception than professional-level pitchers, affirm the notion that highly experienced athletes have differing perceptual skills. Findings are discussed in relation to deliberate practice theory.
Beyond Reading Alone: The Relationship Between Aural Literacy And Asthma Management
Rosenfeld, Lindsay; Rudd, Rima; Emmons, Karen M.; Acevedo-García, Dolores; Martin, Laurie; Buka, Stephen
2010-01-01
Objectives To examine the relationship between literacy and asthma management with a focus on the oral exchange. Methods Study participants, all of whom reported asthma, were drawn from the New England Family Study (NEFS), an examination of links between education and health. NEFS data included reading, oral (speaking), and aural (listening) literacy measures. An additional survey was conducted with this group of study participants related to asthma issues, particularly asthma management. Data analysis focused on bivariate and multivariable logistic regression. Results In bivariate logistic regression models exploring aural literacy, there was a statistically significant association between those participants with lower aural literacy skills and less successful asthma management (OR:4.37, 95%CI:1.11, 17.32). In multivariable logistic regression analyses, controlling for gender, income, and race in separate models (one-at-a-time), there remained a statistically significant association between those participants with lower aural literacy skills and less successful asthma management. Conclusion Lower aural literacy skills seem to complicate asthma management capabilities. Practice Implications Greater attention to the oral exchange, in particular the listening skills highlighted by aural literacy, as well as other related literacy skills may help us develop strategies for clear communication related to asthma management. PMID:20399060
Keithley, Richard B; Wightman, R Mark
2011-06-07
Principal component regression is a multivariate data analysis approach routinely used to predict neurochemical concentrations from in vivo fast-scan cyclic voltammetry measurements. This mathematical procedure can rapidly be employed with present day computer programming languages. Here, we evaluate several methods that can be used to evaluate and improve multivariate concentration determination. The cyclic voltammetric representation of the calculated regression vector is shown to be a valuable tool in determining whether the calculated multivariate model is chemically appropriate. The use of Cook's distance successfully identified outliers contained within in vivo fast-scan cyclic voltammetry training sets. This work also presents the first direct interpretation of a residual color plot and demonstrated the effect of peak shifts on predicted dopamine concentrations. Finally, separate analyses of smaller increments of a single continuous measurement could not be concatenated without substantial error in the predicted neurochemical concentrations due to electrode drift. Taken together, these tools allow for the construction of more robust multivariate calibration models and provide the first approach to assess the predictive ability of a procedure that is inherently impossible to validate because of the lack of in vivo standards.
The relationship between severity of violence in the home and dating violence.
Sims, Eva Nowakowski; Dodd, Virginia J Noland; Tejeda, Manuel J
2008-01-01
This study used propositions from the social learning theory to explore the effects of the combined influences of child maltreatment, childhood witness to parental violence, sibling violence, and gender on dating violence perpetration using a modified version of the Conflict Tactics Scale 2 (CTS2). A weighted scoring method was utilized to determine how severity of violence in the home impacts dating violence perpetration. Bivariate correlations and linear regression models indicate significant associations between child maltreatment, sibling violence perpetration, childhood witness to parental violence, gender, and subsequent dating violence perpetration. Multiple regression analyses indicate that for men, history of severe violence victimization (i.e., child maltreatment and childhood witness to parental violence) and severe perpetration (sibling violence) significantly predict dating violence perpetration.
Standardized Regression Coefficients as Indices of Effect Sizes in Meta-Analysis
ERIC Educational Resources Information Center
Kim, Rae Seon
2011-01-01
When conducting a meta-analysis, it is common to find many collected studies that report regression analyses, because multiple regression analysis is widely used in many fields. Meta-analysis uses effect sizes drawn from individual studies as a means of synthesizing a collection of results. However, indices of effect size from regression analyses…
Stata Modules for Calculating Novel Predictive Performance Indices for Logistic Models
Barkhordari, Mahnaz; Padyab, Mojgan; Hadaegh, Farzad; Azizi, Fereidoun; Bozorgmanesh, Mohammadreza
2016-01-01
Background Prediction is a fundamental part of prevention of cardiovascular diseases (CVD). The development of prediction algorithms based on the multivariate regression models loomed several decades ago. Parallel with predictive models development, biomarker researches emerged in an impressively great scale. The key question is how best to assess and quantify the improvement in risk prediction offered by new biomarkers or more basically how to assess the performance of a risk prediction model. Discrimination, calibration, and added predictive value have been recently suggested to be used while comparing the predictive performances of the predictive models’ with and without novel biomarkers. Objectives Lack of user-friendly statistical software has restricted implementation of novel model assessment methods while examining novel biomarkers. We intended, thus, to develop a user-friendly software that could be used by researchers with few programming skills. Materials and Methods We have written a Stata command that is intended to help researchers obtain cut point-free and cut point-based net reclassification improvement index and (NRI) and relative and absolute Integrated discriminatory improvement index (IDI) for logistic-based regression analyses.We applied the commands to a real data on women participating the Tehran lipid and glucose study (TLGS) to examine if information of a family history of premature CVD, waist circumference, and fasting plasma glucose can improve predictive performance of the Framingham’s “general CVD risk” algorithm. Results The command is addpred for logistic regression models. Conclusions The Stata package provided herein can encourage the use of novel methods in examining predictive capacity of ever-emerging plethora of novel biomarkers. PMID:27279830
Secure Logistic Regression Based on Homomorphic Encryption: Design and Evaluation.
Kim, Miran; Song, Yongsoo; Wang, Shuang; Xia, Yuhou; Jiang, Xiaoqian
2018-04-17
Learning a model without accessing raw data has been an intriguing idea to security and machine learning researchers for years. In an ideal setting, we want to encrypt sensitive data to store them on a commercial cloud and run certain analyses without ever decrypting the data to preserve privacy. Homomorphic encryption technique is a promising candidate for secure data outsourcing, but it is a very challenging task to support real-world machine learning tasks. Existing frameworks can only handle simplified cases with low-degree polynomials such as linear means classifier and linear discriminative analysis. The goal of this study is to provide a practical support to the mainstream learning models (eg, logistic regression). We adapted a novel homomorphic encryption scheme optimized for real numbers computation. We devised (1) the least squares approximation of the logistic function for accuracy and efficiency (ie, reduce computation cost) and (2) new packing and parallelization techniques. Using real-world datasets, we evaluated the performance of our model and demonstrated its feasibility in speed and memory consumption. For example, it took approximately 116 minutes to obtain the training model from the homomorphically encrypted Edinburgh dataset. In addition, it gives fairly accurate predictions on the testing dataset. We present the first homomorphically encrypted logistic regression outsourcing model based on the critical observation that the precision loss of classification models is sufficiently small so that the decision plan stays still. ©Miran Kim, Yongsoo Song, Shuang Wang, Yuhou Xia, Xiaoqian Jiang. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 17.04.2018.
Tumour ischaemia by interferon-γ resembles physiological blood vessel regression.
Kammertoens, Thomas; Friese, Christian; Arina, Ainhoa; Idel, Christian; Briesemeister, Dana; Rothe, Michael; Ivanov, Andranik; Szymborska, Anna; Patone, Giannino; Kunz, Severine; Sommermeyer, Daniel; Engels, Boris; Leisegang, Matthias; Textor, Ana; Fehling, Hans Joerg; Fruttiger, Marcus; Lohoff, Michael; Herrmann, Andreas; Yu, Hua; Weichselbaum, Ralph; Uckert, Wolfgang; Hübner, Norbert; Gerhardt, Holger; Beule, Dieter; Schreiber, Hans; Blankenstein, Thomas
2017-05-04
The relative contribution of the effector molecules produced by T cells to tumour rejection is unclear, but interferon-γ (IFNγ) is critical in most of the analysed models. Although IFNγ can impede tumour growth by acting directly on cancer cells, it must also act on the tumour stroma for effective rejection of large, established tumours. However, which stroma cells respond to IFNγ and by which mechanism IFNγ contributes to tumour rejection through stromal targeting have remained unknown. Here we use a model of IFNγ induction and an IFNγ-GFP fusion protein in large, vascularized tumours growing in mice that express the IFNγ receptor exclusively in defined cell types. Responsiveness to IFNγ by myeloid cells and other haematopoietic cells, including T cells or fibroblasts, was not sufficient for IFNγ-induced tumour regression, whereas responsiveness of endothelial cells to IFNγ was necessary and sufficient. Intravital microscopy revealed IFNγ-induced regression of the tumour vasculature, resulting in arrest of blood flow and subsequent collapse of tumours, similar to non-haemorrhagic necrosis in ischaemia and unlike haemorrhagic necrosis induced by tumour necrosis factor. The early events of IFNγ-induced tumour ischaemia resemble non-apoptotic blood vessel regression during development, wound healing or IFNγ-mediated, pregnancy-induced remodelling of uterine arteries. A better mechanistic understanding of how solid tumours are rejected may aid the design of more effective protocols for adoptive T-cell therapy.
Tzeng, Jung-Ying; Zhang, Daowen; Pongpanich, Monnat; Smith, Chris; McCarthy, Mark I.; Sale, Michèle M.; Worrall, Bradford B.; Hsu, Fang-Chi; Thomas, Duncan C.; Sullivan, Patrick F.
2011-01-01
Genomic association analyses of complex traits demand statistical tools that are capable of detecting small effects of common and rare variants and modeling complex interaction effects and yet are computationally feasible. In this work, we introduce a similarity-based regression method for assessing the main genetic and interaction effects of a group of markers on quantitative traits. The method uses genetic similarity to aggregate information from multiple polymorphic sites and integrates adaptive weights that depend on allele frequencies to accomodate common and uncommon variants. Collapsing information at the similarity level instead of the genotype level avoids canceling signals that have the opposite etiological effects and is applicable to any class of genetic variants without the need for dichotomizing the allele types. To assess gene-trait associations, we regress trait similarities for pairs of unrelated individuals on their genetic similarities and assess association by using a score test whose limiting distribution is derived in this work. The proposed regression framework allows for covariates, has the capacity to model both main and interaction effects, can be applied to a mixture of different polymorphism types, and is computationally efficient. These features make it an ideal tool for evaluating associations between phenotype and marker sets defined by linkage disequilibrium (LD) blocks, genes, or pathways in whole-genome analysis. PMID:21835306
[Associations between dormitory environment/other factors and sleep quality of medical students].
Zheng, Bang; Wang, Kailu; Pan, Ziqi; Li, Man; Pan, Yuting; Liu, Ting; Xu, Dan; Lyu, Jun
2016-03-01
To investigate the sleep quality and related factors among medical students in China, understand the association between dormitory environment and sleep quality, and provide evidence and recommendations for sleep hygiene intervention. A total of 555 undergraduate students were selected from a medical school of an university in Beijing through stratified-cluster random-sampling to conduct a questionnaire survey by using Chinese version of Pittsburgh Sleep Quality Index (PSQI) and self-designed questionnaire. Analyses were performed by using multiple logistic regression model as well as multilevel linear regression model. The prevalence of sleep disorder was 29.1%(149/512), and 39.1%(200/512) of the students reported that the sleep quality was influenced by dormitory environment. PSQI score was negatively correlated with self-reported rating of dormitory environment (γs=-0.310, P<0.001). Logistic regression analysis showed the related factors of sleep disorder included grade, sleep regularity, self-rated health status, pressures of school work and employment, as well as dormitory environment. RESULTS of multilevel regression analysis also indicated that perception on dormitory environment (individual level) was associated with sleep quality with the dormitory level random effects under control (b=-0.619, P<0.001). The prevalence of sleep disorder was high in medical students, which was associated with multiple factors. Dormitory environment should be taken into consideration when the interventions are taken to improve the sleep quality of students.
PSHREG: A SAS macro for proportional and nonproportional subdistribution hazards regression
Kohl, Maria; Plischke, Max; Leffondré, Karen; Heinze, Georg
2015-01-01
We present a new SAS macro %pshreg that can be used to fit a proportional subdistribution hazards model for survival data subject to competing risks. Our macro first modifies the input data set appropriately and then applies SAS's standard Cox regression procedure, PROC PHREG, using weights and counting-process style of specifying survival times to the modified data set. The modified data set can also be used to estimate cumulative incidence curves for the event of interest. The application of PROC PHREG has several advantages, e.g., it directly enables the user to apply the Firth correction, which has been proposed as a solution to the problem of undefined (infinite) maximum likelihood estimates in Cox regression, frequently encountered in small sample analyses. Deviation from proportional subdistribution hazards can be detected by both inspecting Schoenfeld-type residuals and testing correlation of these residuals with time, or by including interactions of covariates with functions of time. We illustrate application of these extended methods for competing risk regression using our macro, which is freely available at: http://cemsiis.meduniwien.ac.at/en/kb/science-research/software/statistical-software/pshreg, by means of analysis of a real chronic kidney disease study. We discuss differences in features and capabilities of %pshreg and the recent (January 2014) SAS PROC PHREG implementation of proportional subdistribution hazards modelling. PMID:25572709
Kumar, Rajesh; Dogra, Vishal; Rani, Khushbu; Sahu, Kanti
2017-01-01
District level determinants of total fertility rate in Empowered Action Group states of India can help in ongoing population stabilization programs in India. Present study intends to assess the role of district level determinants in predicting total fertility rate among districts of the Empowered Action Group states of India. Data from Annual Health Survey (2011-12) was analysed using STATA and R software packages. Multiple linear regression models were built and evaluated using Akaike Information Criterion. For further understanding, recursive partitioning was used to prepare a regression tree. Female married illiteracy positively associated with total fertility rate and explained more than half (53%) of variance. Under multiple linear regression model, married illiteracy, infant mortality rate, Ante natal care registration, household size, median age of live birth and sex ratio explained 70% of total variance in total fertility rate. In regression tree, female married illiteracy was the root node and splits at 42% determined TFR <= 2.7. The next left side branch was again married illiteracy with splits at 23% to determine TFR <= 2.1. We conclude that female married illiteracy is one of the most important determinants explaining total fertility rate among the districts of an Empowered Action Group states. Focus on female literacy is required to stabilize the population growth in long run.
Ye, Jiang-Feng; Zhao, Yu-Xin; Ju, Jian; Wang, Wei
2017-10-01
To discuss the value of the Bedside Index for Severity in Acute Pancreatitis (BISAP), Modified Early Warning Score (MEWS), serum Ca2+, similarly hereinafter, and red cell distribution width (RDW) for predicting the severity grade of acute pancreatitis and to develop and verify a more accurate scoring system to predict the severity of AP. In 302 patients with AP, we calculated BISAP and MEWS scores and conducted regression analyses on the relationships of BISAP scoring, RDW, MEWS, and serum Ca2+ with the severity of AP using single-factor logistics. The variables with statistical significance in the single-factor logistic regression were used in a multi-factor logistic regression model; forward stepwise regression was used to screen variables and build a multi-factor prediction model. A receiver operating characteristic curve (ROC curve) was constructed, and the significance of multi- and single-factor prediction models in predicting the severity of AP using the area under the ROC curve (AUC) was evaluated. The internal validity of the model was verified through bootstrapping. Among 302 patients with AP, 209 had mild acute pancreatitis (MAP) and 93 had severe acute pancreatitis (SAP). According to single-factor logistic regression analysis, we found that BISAP, MEWS and serum Ca2+ are prediction indexes of the severity of AP (P-value<0.001), whereas RDW is not a prediction index of AP severity (P-value>0.05). The multi-factor logistic regression analysis showed that BISAP and serum Ca2+ are independent prediction indexes of AP severity (P-value<0.001), and MEWS is not an independent prediction index of AP severity (P-value>0.05); BISAP is negatively related to serum Ca2+ (r=-0.330, P-value<0.001). The constructed model is as follows: ln()=7.306+1.151*BISAP-4.516*serum Ca2+. The predictive ability of each model for SAP follows the order of the combined BISAP and serum Ca2+ prediction model>Ca2+>BISAP. There is no statistical significance for the predictive ability of BISAP and serum Ca2+ (P-value>0.05); however, there is remarkable statistical significance for the predictive ability using the newly built prediction model as well as BISAP and serum Ca2+ individually (P-value<0.01). Verification of the internal validity of the models by bootstrapping is favorable. BISAP and serum Ca2+ have high predictive value for the severity of AP. However, the model built by combining BISAP and serum Ca2+ is remarkably superior to those of BISAP and serum Ca2+ individually. Furthermore, this model is simple, practical and appropriate for clinical use. Copyright © 2016. Published by Elsevier Masson SAS.