Sample records for regression models assuming

  1. A SEMIPARAMETRIC BAYESIAN MODEL FOR CIRCULAR-LINEAR REGRESSION

    EPA Science Inventory

    We present a Bayesian approach to regress a circular variable on a linear predictor. The regression coefficients are assumed to have a nonparametric distribution with a Dirichlet process prior. The semiparametric Bayesian approach gives added flexibility to the model and is usefu...

  2. Ridge Regression for Interactive Models.

    ERIC Educational Resources Information Center

    Tate, Richard L.

    1988-01-01

    An exploratory study of the value of ridge regression for interactive models is reported. Assuming that the linear terms in a simple interactive model are centered to eliminate non-essential multicollinearity, a variety of common models, representing both ordinal and disordinal interactions, are shown to have "orientations" that are…

  3. A Comparison of Alternative Approaches to the Analysis of Interrupted Time-Series.

    ERIC Educational Resources Information Center

    Harrop, John W.; Velicer, Wayne F.

    1985-01-01

    Computer generated data representative of 16 Auto Regressive Integrated Moving Averages (ARIMA) models were used to compare the results of interrupted time-series analysis using: (1) the known model identification, (2) an assumed (l,0,0) model, and (3) an assumed (3,0,0) model as an approximation to the General Transformation approach. (Author/BW)

  4. Reliability Analysis of the Gradual Degradation of Semiconductor Devices.

    DTIC Science & Technology

    1983-07-20

    under the heading of linear models or linear statistical models . 3 ,4 We have not used this material in this report. Assuming catastrophic failure when...assuming a catastrophic model . In this treatment we first modify our system loss formula and then proceed to the actual analysis. II. ANALYSIS OF...Failure Time 1 Ti Ti 2 T2 T2 n Tn n and are easily analyzed by simple linear regression. Since we have assumed a log normal/Arrhenius activation

  5. Estimation of variance in Cox's regression model with shared gamma frailties.

    PubMed

    Andersen, P K; Klein, J P; Knudsen, K M; Tabanera y Palacios, R

    1997-12-01

    The Cox regression model with a shared frailty factor allows for unobserved heterogeneity or for statistical dependence between the observed survival times. Estimation in this model when the frailties are assumed to follow a gamma distribution is reviewed, and we address the problem of obtaining variance estimates for regression coefficients, frailty parameter, and cumulative baseline hazards using the observed nonparametric information matrix. A number of examples are given comparing this approach with fully parametric inference in models with piecewise constant baseline hazards.

  6. Minimizing bias in biomass allometry: Model selection and log transformation of data

    Treesearch

    Joseph Mascaro; undefined undefined; Flint Hughes; Amanda Uowolo; Stefan A. Schnitzer

    2011-01-01

    Nonlinear regression is increasingly used to develop allometric equations for forest biomass estimation (i.e., as opposed to the raditional approach of log-transformation followed by linear regression). Most statistical software packages, however, assume additive errors by default, violating a key assumption of allometric theory and possibly producing spurious models....

  7. Consequences of kriging and land use regression for PM2.5 predictions in epidemiologic analyses: Insights into spatial variability using high-resolution satellite data

    PubMed Central

    Alexeeff, Stacey E.; Schwartz, Joel; Kloog, Itai; Chudnovsky, Alexandra; Koutrakis, Petros; Coull, Brent A.

    2016-01-01

    Many epidemiological studies use predicted air pollution exposures as surrogates for true air pollution levels. These predicted exposures contain exposure measurement error, yet simulation studies have typically found negligible bias in resulting health effect estimates. However, previous studies typically assumed a statistical spatial model for air pollution exposure, which may be oversimplified. We address this shortcoming by assuming a realistic, complex exposure surface derived from fine-scale (1km x 1km) remote-sensing satellite data. Using simulation, we evaluate the accuracy of epidemiological health effect estimates in linear and logistic regression when using spatial air pollution predictions from kriging and land use regression models. We examined chronic (long-term) and acute (short-term) exposure to air pollution. Results varied substantially across different scenarios. Exposure models with low out-of-sample R2 yielded severe biases in the health effect estimates of some models, ranging from 60% upward bias to 70% downward bias. One land use regression exposure model with greater than 0.9 out-of-sample R2 yielded upward biases up to 13% for acute health effect estimates. Almost all models drastically underestimated the standard errors. Land use regression models performed better in chronic effects simulations. These results can help researchers when interpreting health effect estimates in these types of studies. PMID:24896768

  8. A Goal Programming/Constrained Regression Review of the Bell System Breakup.

    DTIC Science & Technology

    1985-05-01

    characteristically employ. 4 .- - -. . ,. - - ;--.. . . .. 2. MULTI-PRODUCT COST MODEL AND DATA DETAILS When technical efficiency (i.e. zero waste ) can be assumed...assumming, but we believe that it was probably technical (= zero waste ) efficiency by virtue of the following reasons. Scale efficien- cy was a

  9. Estimating Causal Effects with Ancestral Graph Markov Models

    PubMed Central

    Malinsky, Daniel; Spirtes, Peter

    2017-01-01

    We present an algorithm for estimating bounds on causal effects from observational data which combines graphical model search with simple linear regression. We assume that the underlying system can be represented by a linear structural equation model with no feedback, and we allow for the possibility of latent variables. Under assumptions standard in the causal search literature, we use conditional independence constraints to search for an equivalence class of ancestral graphs. Then, for each model in the equivalence class, we perform the appropriate regression (using causal structure information to determine which covariates to include in the regression) to estimate a set of possible causal effects. Our approach is based on the “IDA” procedure of Maathuis et al. (2009), which assumes that all relevant variables have been measured (i.e., no unmeasured confounders). We generalize their work by relaxing this assumption, which is often violated in applied contexts. We validate the performance of our algorithm on simulated data and demonstrate improved precision over IDA when latent variables are present. PMID:28217244

  10. Analyzing Seasonal Variations in Suicide With Fourier Poisson Time-Series Regression: A Registry-Based Study From Norway, 1969-2007.

    PubMed

    Bramness, Jørgen G; Walby, Fredrik A; Morken, Gunnar; Røislien, Jo

    2015-08-01

    Seasonal variation in the number of suicides has long been acknowledged. It has been suggested that this seasonality has declined in recent years, but studies have generally used statistical methods incapable of confirming this. We examined all suicides occurring in Norway during 1969-2007 (more than 20,000 suicides in total) to establish whether seasonality decreased over time. Fitting of additive Fourier Poisson time-series regression models allowed for formal testing of a possible linear decrease in seasonality, or a reduction at a specific point in time, while adjusting for a possible smooth nonlinear long-term change without having to categorize time into discrete yearly units. The models were compared using Akaike's Information Criterion and analysis of variance. A model with a seasonal pattern was significantly superior to a model without one. There was a reduction in seasonality during the period. Both the model assuming a linear decrease in seasonality and the model assuming a change at a specific point in time were both superior to a model assuming constant seasonality, thus confirming by formal statistical testing that the magnitude of the seasonality in suicides has diminished. The additive Fourier Poisson time-series regression model would also be useful for studying other temporal phenomena with seasonal components. © The Author 2015. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  11. Modelling of capital asset pricing by considering the lagged effects

    NASA Astrophysics Data System (ADS)

    Sukono; Hidayat, Y.; Bon, A. Talib bin; Supian, S.

    2017-01-01

    In this paper the problem of modelling the Capital Asset Pricing Model (CAPM) with the effect of the lagged is discussed. It is assumed that asset returns are analysed influenced by the market return and the return of risk-free assets. To analyse the relationship between asset returns, the market return, and the return of risk-free assets, it is conducted by using a regression equation of CAPM, and regression equation of lagged distributed CAPM. Associated with the regression equation lagged CAPM distributed, this paper also developed a regression equation of Koyck transformation CAPM. Results of development show that the regression equation of Koyck transformation CAPM has advantages, namely simple as it only requires three parameters, compared with regression equation of lagged distributed CAPM.

  12. Estimation of the Regression Effect Using a Latent Trait Model.

    ERIC Educational Resources Information Center

    Quinn, Jimmy L.

    A logistic model was used to generate data to serve as a proxy for an immediate retest from item responses to a fourth grade standardized reading comprehension test of 45 items. Assuming that the actual test may be considered a pretest and the proxy data may be considered a retest, the effect of regression was investigated using a percentage of…

  13. Quantile Regression Models for Current Status Data

    PubMed Central

    Ou, Fang-Shu; Zeng, Donglin; Cai, Jianwen

    2016-01-01

    Current status data arise frequently in demography, epidemiology, and econometrics where the exact failure time cannot be determined but is only known to have occurred before or after a known observation time. We propose a quantile regression model to analyze current status data, because it does not require distributional assumptions and the coefficients can be interpreted as direct regression effects on the distribution of failure time in the original time scale. Our model assumes that the conditional quantile of failure time is a linear function of covariates. We assume conditional independence between the failure time and observation time. An M-estimator is developed for parameter estimation which is computed using the concave-convex procedure and its confidence intervals are constructed using a subsampling method. Asymptotic properties for the estimator are derived and proven using modern empirical process theory. The small sample performance of the proposed method is demonstrated via simulation studies. Finally, we apply the proposed method to analyze data from the Mayo Clinic Study of Aging. PMID:27994307

  14. Accounting for measurement error in log regression models with applications to accelerated testing.

    PubMed

    Richardson, Robert; Tolley, H Dennis; Evenson, William E; Lunt, Barry M

    2018-01-01

    In regression settings, parameter estimates will be biased when the explanatory variables are measured with error. This bias can significantly affect modeling goals. In particular, accelerated lifetime testing involves an extrapolation of the fitted model, and a small amount of bias in parameter estimates may result in a significant increase in the bias of the extrapolated predictions. Additionally, bias may arise when the stochastic component of a log regression model is assumed to be multiplicative when the actual underlying stochastic component is additive. To account for these possible sources of bias, a log regression model with measurement error and additive error is approximated by a weighted regression model which can be estimated using Iteratively Re-weighted Least Squares. Using the reduced Eyring equation in an accelerated testing setting, the model is compared to previously accepted approaches to modeling accelerated testing data with both simulations and real data.

  15. Shrinkage Estimation of Varying Covariate Effects Based On Quantile Regression

    PubMed Central

    Peng, Limin; Xu, Jinfeng; Kutner, Nancy

    2013-01-01

    Varying covariate effects often manifest meaningful heterogeneity in covariate-response associations. In this paper, we adopt a quantile regression model that assumes linearity at a continuous range of quantile levels as a tool to explore such data dynamics. The consideration of potential non-constancy of covariate effects necessitates a new perspective for variable selection, which, under the assumed quantile regression model, is to retain variables that have effects on all quantiles of interest as well as those that influence only part of quantiles considered. Current work on l1-penalized quantile regression either does not concern varying covariate effects or may not produce consistent variable selection in the presence of covariates with partial effects, a practical scenario of interest. In this work, we propose a shrinkage approach by adopting a novel uniform adaptive LASSO penalty. The new approach enjoys easy implementation without requiring smoothing. Moreover, it can consistently identify the true model (uniformly across quantiles) and achieve the oracle estimation efficiency. We further extend the proposed shrinkage method to the case where responses are subject to random right censoring. Numerical studies confirm the theoretical results and support the utility of our proposals. PMID:25332515

  16. Should metacognition be measured by logistic regression?

    PubMed

    Rausch, Manuel; Zehetleitner, Michael

    2017-03-01

    Are logistic regression slopes suitable to quantify metacognitive sensitivity, i.e. the efficiency with which subjective reports differentiate between correct and incorrect task responses? We analytically show that logistic regression slopes are independent from rating criteria in one specific model of metacognition, which assumes (i) that rating decisions are based on sensory evidence generated independently of the sensory evidence used for primary task responses and (ii) that the distributions of evidence are logistic. Given a hierarchical model of metacognition, logistic regression slopes depend on rating criteria. According to all considered models, regression slopes depend on the primary task criterion. A reanalysis of previous data revealed that massive numbers of trials are required to distinguish between hierarchical and independent models with tolerable accuracy. It is argued that researchers who wish to use logistic regression as measure of metacognitive sensitivity need to control the primary task criterion and rating criteria. Copyright © 2017 Elsevier Inc. All rights reserved.

  17. Examining the influence of link function misspecification in conventional regression models for developing crash modification factors.

    PubMed

    Wu, Lingtao; Lord, Dominique

    2017-05-01

    This study further examined the use of regression models for developing crash modification factors (CMFs), specifically focusing on the misspecification in the link function. The primary objectives were to validate the accuracy of CMFs derived from the commonly used regression models (i.e., generalized linear models or GLMs with additive linear link functions) when some of the variables have nonlinear relationships and quantify the amount of bias as a function of the nonlinearity. Using the concept of artificial realistic data, various linear and nonlinear crash modification functions (CM-Functions) were assumed for three variables. Crash counts were randomly generated based on these CM-Functions. CMFs were then derived from regression models for three different scenarios. The results were compared with the assumed true values. The main findings are summarized as follows: (1) when some variables have nonlinear relationships with crash risk, the CMFs for these variables derived from the commonly used GLMs are all biased, especially around areas away from the baseline conditions (e.g., boundary areas); (2) with the increase in nonlinearity (i.e., nonlinear relationship becomes stronger), the bias becomes more significant; (3) the quality of CMFs for other variables having linear relationships can be influenced when mixed with those having nonlinear relationships, but the accuracy may still be acceptable; and (4) the misuse of the link function for one or more variables can also lead to biased estimates for other parameters. This study raised the importance of the link function when using regression models for developing CMFs. Copyright © 2017 Elsevier Ltd. All rights reserved.

  18. Modeling containment of large wildfires using generalized linear mixed-model analysis

    Treesearch

    Mark Finney; Isaac C. Grenfell; Charles W. McHugh

    2009-01-01

    Billions of dollars are spent annually in the United States to contain large wildland fires, but the factors contributing to suppression success remain poorly understood. We used a regression model (generalized linear mixed-model) to model containment probability of individual fires, assuming that containment was a repeated-measures problem (fixed effect) and...

  19. Logistic regression for dichotomized counts.

    PubMed

    Preisser, John S; Das, Kalyan; Benecha, Habtamu; Stamm, John W

    2016-12-01

    Sometimes there is interest in a dichotomized outcome indicating whether a count variable is positive or zero. Under this scenario, the application of ordinary logistic regression may result in efficiency loss, which is quantifiable under an assumed model for the counts. In such situations, a shared-parameter hurdle model is investigated for more efficient estimation of regression parameters relating to overall effects of covariates on the dichotomous outcome, while handling count data with many zeroes. One model part provides a logistic regression containing marginal log odds ratio effects of primary interest, while an ancillary model part describes the mean count of a Poisson or negative binomial process in terms of nuisance regression parameters. Asymptotic efficiency of the logistic model parameter estimators of the two-part models is evaluated with respect to ordinary logistic regression. Simulations are used to assess the properties of the models with respect to power and Type I error, the latter investigated under both misspecified and correctly specified models. The methods are applied to data from a randomized clinical trial of three toothpaste formulations to prevent incident dental caries in a large population of Scottish schoolchildren. © The Author(s) 2014.

  20. The Mapping Model: A Cognitive Theory of Quantitative Estimation

    ERIC Educational Resources Information Center

    von Helversen, Bettina; Rieskamp, Jorg

    2008-01-01

    How do people make quantitative estimations, such as estimating a car's selling price? Traditionally, linear-regression-type models have been used to answer this question. These models assume that people weight and integrate all information available to estimate a criterion. The authors propose an alternative cognitive theory for quantitative…

  1. Incorporation of prior information on parameters into nonlinear regression groundwater flow models: 1. Theory

    USGS Publications Warehouse

    Cooley, Richard L.

    1982-01-01

    Prior information on the parameters of a groundwater flow model can be used to improve parameter estimates obtained from nonlinear regression solution of a modeling problem. Two scales of prior information can be available: (1) prior information having known reliability (that is, bias and random error structure) and (2) prior information consisting of best available estimates of unknown reliability. A regression method that incorporates the second scale of prior information assumes the prior information to be fixed for any particular analysis to produce improved, although biased, parameter estimates. Approximate optimization of two auxiliary parameters of the formulation is used to help minimize the bias, which is almost always much smaller than that resulting from standard ridge regression. It is shown that if both scales of prior information are available, then a combined regression analysis may be made.

  2. Markovian prediction of future values for food grains in the economic survey

    NASA Astrophysics Data System (ADS)

    Sathish, S.; Khadar Babu, S. K.

    2017-11-01

    Now-a-days prediction and forecasting are plays a vital role in research. For prediction, regression is useful to predict the future value and current value on production process. In this paper, we assume food grain production exhibit Markov chain dependency and time homogeneity. The economic generative performance evaluation the balance time artificial fertilization different level in Estrusdetection using a daily Markov chain model. Finally, Markov process prediction gives better performance compare with Regression model.

  3. Large-area forest inventory regression modeling: spatial scale considerations

    Treesearch

    James A. Westfall

    2015-01-01

    In many forest inventories, statistical models are employed to predict values for attributes that are difficult and/or time-consuming to measure. In some applications, models are applied across a large geographic area, which assumes the relationship between the response variable and predictors is ubiquitously invariable within the area. The extent to which this...

  4. A hierarchical linear model for tree height prediction.

    Treesearch

    Vicente J. Monleon

    2003-01-01

    Measuring tree height is a time-consuming process. Often, tree diameter is measured and height is estimated from a published regression model. Trees used to develop these models are clustered into stands, but this structure is ignored and independence is assumed. In this study, hierarchical linear models that account explicitly for the clustered structure of the data...

  5. Functional mixture regression.

    PubMed

    Yao, Fang; Fu, Yuejiao; Lee, Thomas C M

    2011-04-01

    In functional linear models (FLMs), the relationship between the scalar response and the functional predictor process is often assumed to be identical for all subjects. Motivated by both practical and methodological considerations, we relax this assumption and propose a new class of functional regression models that allow the regression structure to vary for different groups of subjects. By projecting the predictor process onto its eigenspace, the new functional regression model is simplified to a framework that is similar to classical mixture regression models. This leads to the proposed approach named as functional mixture regression (FMR). The estimation of FMR can be readily carried out using existing software implemented for functional principal component analysis and mixture regression. The practical necessity and performance of FMR are illustrated through applications to a longevity analysis of female medflies and a human growth study. Theoretical investigations concerning the consistent estimation and prediction properties of FMR along with simulation experiments illustrating its empirical properties are presented in the supplementary material available at Biostatistics online. Corresponding results demonstrate that the proposed approach could potentially achieve substantial gains over traditional FLMs.

  6. Models of subjective response to in-flight motion data

    NASA Technical Reports Server (NTRS)

    Rudrapatna, A. N.; Jacobson, I. D.

    1973-01-01

    Mathematical relationships between subjective comfort and environmental variables in an air transportation system are investigated. As a first step in model building, only the motion variables are incorporated and sensitivities are obtained using stepwise multiple regression analysis. The data for these models have been collected from commercial passenger flights. Two models are considered. In the first, subjective comfort is assumed to depend on rms values of the six-degrees-of-freedom accelerations. The second assumes a Rustenburg type human response function in obtaining frequency weighted rms accelerations, which are used in a linear model. The form of the human response function is examined and the results yield a human response weighting function for different degrees of freedom.

  7. Using Generalized Additive Models to Analyze Single-Case Designs

    ERIC Educational Resources Information Center

    Shadish, William; Sullivan, Kristynn

    2013-01-01

    Many analyses for single-case designs (SCDs)--including nearly all the effect size indicators-- currently assume no trend in the data. Regression and multilevel models allow for trend, but usually test only linear trend and have no principled way of knowing if higher order trends should be represented in the model. This paper shows how Generalized…

  8. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Butler, W.J.; Kalasinski, L.A.

    In this paper, a generalized logistic regression model for correlated observations is used to analyze epidemiologic data on the frequency of spontaneous abortion among a group of women office workers. The results are compared to those obtained from the use of the standard logistic regression model that assumes statistical independence among all the pregnancies contributed by one woman. In this example, the correlation among pregnancies from the same woman is fairly small and did not have a substantial impact on the magnitude of estimates of parameters of the model. This is due at least partly to the small average numbermore » of pregnancies contributed by each woman.« less

  9. Analysis of Learning Curve Fitting Techniques.

    DTIC Science & Technology

    1987-09-01

    1986. 15. Neter, John and others. Applied Linear Regression Models. Homewood IL: Irwin, 19-33. 16. SAS User’s Guide: Basics, Version 5 Edition. SAS... Linear Regression Techniques (15:23-52). Random errors are assumed to be normally distributed when using -# ordinary least-squares, according to Johnston...lot estimated by the improvement curve formula. For a more detailed explanation of the ordinary least-squares technique, see Neter, et. al., Applied

  10. Modeling health survey data with excessive zero and K responses.

    PubMed

    Lin, Ting Hsiang; Tsai, Min-Hsiao

    2013-04-30

    Zero-inflated Poisson regression is a popular tool used to analyze data with excessive zeros. Although much work has already been performed to fit zero-inflated data, most models heavily depend on special features of the individual data. To be specific, this means that there is a sizable group of respondents who endorse the same answers making the data have peaks. In this paper, we propose a new model with the flexibility to model excessive counts other than zero, and the model is a mixture of multinomial logistic and Poisson regression, in which the multinomial logistic component models the occurrence of excessive counts, including zeros, K (where K is a positive integer) and all other values. The Poisson regression component models the counts that are assumed to follow a Poisson distribution. Two examples are provided to illustrate our models when the data have counts containing many ones and sixes. As a result, the zero-inflated and K-inflated models exhibit a better fit than the zero-inflated Poisson and standard Poisson regressions. Copyright © 2012 John Wiley & Sons, Ltd.

  11. An Analysis of San Diego's Housing Market Using a Geographically Weighted Regression Approach

    NASA Astrophysics Data System (ADS)

    Grant, Christina P.

    San Diego County real estate transaction data was evaluated with a set of linear models calibrated by ordinary least squares and geographically weighted regression (GWR). The goal of the analysis was to determine whether the spatial effects assumed to be in the data are best studied globally with no spatial terms, globally with a fixed effects submarket variable, or locally with GWR. 18,050 single-family residential sales which closed in the six months between April 2014 and September 2014 were used in the analysis. Diagnostic statistics including AICc, R2, Global Moran's I, and visual inspection of diagnostic plots and maps indicate superior model performance by GWR as compared to both global regressions.

  12. Identifying Autocorrelation Generated by Various Error Processes in Interrupted Time-Series Regression Designs: A Comparison of AR1 and Portmanteau Tests

    ERIC Educational Resources Information Center

    Huitema, Bradley E.; McKean, Joseph W.

    2007-01-01

    Regression models used in the analysis of interrupted time-series designs assume statistically independent errors. Four methods of evaluating this assumption are the Durbin-Watson (D-W), Huitema-McKean (H-M), Box-Pierce (B-P), and Ljung-Box (L-B) tests. These tests were compared with respect to Type I error and power under a wide variety of error…

  13. Linear regression analysis of survival data with missing censoring indicators.

    PubMed

    Wang, Qihua; Dinse, Gregg E

    2011-04-01

    Linear regression analysis has been studied extensively in a random censorship setting, but typically all of the censoring indicators are assumed to be observed. In this paper, we develop synthetic data methods for estimating regression parameters in a linear model when some censoring indicators are missing. We define estimators based on regression calibration, imputation, and inverse probability weighting techniques, and we prove all three estimators are asymptotically normal. The finite-sample performance of each estimator is evaluated via simulation. We illustrate our methods by assessing the effects of sex and age on the time to non-ambulatory progression for patients in a brain cancer clinical trial.

  14. Issues and Importance of "Good" Starting Points for Nonlinear Regression for Mathematical Modeling with Maple: Basic Model Fitting to Make Predictions with Oscillating Data

    ERIC Educational Resources Information Center

    Fox, William

    2012-01-01

    The purpose of our modeling effort is to predict future outcomes. We assume the data collected are both accurate and relatively precise. For our oscillating data, we examined several mathematical modeling forms for predictions. We also examined both ignoring the oscillations as an important feature and including the oscillations as an important…

  15. Genomic-Enabled Prediction of Ordinal Data with Bayesian Logistic Ordinal Regression.

    PubMed

    Montesinos-López, Osval A; Montesinos-López, Abelardo; Crossa, José; Burgueño, Juan; Eskridge, Kent

    2015-08-18

    Most genomic-enabled prediction models developed so far assume that the response variable is continuous and normally distributed. The exception is the probit model, developed for ordered categorical phenotypes. In statistical applications, because of the easy implementation of the Bayesian probit ordinal regression (BPOR) model, Bayesian logistic ordinal regression (BLOR) is implemented rarely in the context of genomic-enabled prediction [sample size (n) is much smaller than the number of parameters (p)]. For this reason, in this paper we propose a BLOR model using the Pólya-Gamma data augmentation approach that produces a Gibbs sampler with similar full conditional distributions of the BPOR model and with the advantage that the BPOR model is a particular case of the BLOR model. We evaluated the proposed model by using simulation and two real data sets. Results indicate that our BLOR model is a good alternative for analyzing ordinal data in the context of genomic-enabled prediction with the probit or logit link. Copyright © 2015 Montesinos-López et al.

  16. Penalized spline estimation for functional coefficient regression models.

    PubMed

    Cao, Yanrong; Lin, Haiqun; Wu, Tracy Z; Yu, Yan

    2010-04-01

    The functional coefficient regression models assume that the regression coefficients vary with some "threshold" variable, providing appreciable flexibility in capturing the underlying dynamics in data and avoiding the so-called "curse of dimensionality" in multivariate nonparametric estimation. We first investigate the estimation, inference, and forecasting for the functional coefficient regression models with dependent observations via penalized splines. The P-spline approach, as a direct ridge regression shrinkage type global smoothing method, is computationally efficient and stable. With established fixed-knot asymptotics, inference is readily available. Exact inference can be obtained for fixed smoothing parameter λ, which is most appealing for finite samples. Our penalized spline approach gives an explicit model expression, which also enables multi-step-ahead forecasting via simulations. Furthermore, we examine different methods of choosing the important smoothing parameter λ: modified multi-fold cross-validation (MCV), generalized cross-validation (GCV), and an extension of empirical bias bandwidth selection (EBBS) to P-splines. In addition, we implement smoothing parameter selection using mixed model framework through restricted maximum likelihood (REML) for P-spline functional coefficient regression models with independent observations. The P-spline approach also easily allows different smoothness for different functional coefficients, which is enabled by assigning different penalty λ accordingly. We demonstrate the proposed approach by both simulation examples and a real data application.

  17. Multilevel covariance regression with correlated random effects in the mean and variance structure.

    PubMed

    Quintero, Adrian; Lesaffre, Emmanuel

    2017-09-01

    Multivariate regression methods generally assume a constant covariance matrix for the observations. In case a heteroscedastic model is needed, the parametric and nonparametric covariance regression approaches can be restrictive in the literature. We propose a multilevel regression model for the mean and covariance structure, including random intercepts in both components and allowing for correlation between them. The implied conditional covariance function can be different across clusters as a result of the random effect in the variance structure. In addition, allowing for correlation between the random intercepts in the mean and covariance makes the model convenient for skewedly distributed responses. Furthermore, it permits us to analyse directly the relation between the mean response level and the variability in each cluster. Parameter estimation is carried out via Gibbs sampling. We compare the performance of our model to other covariance modelling approaches in a simulation study. Finally, the proposed model is applied to the RN4CAST dataset to identify the variables that impact burnout of nurses in Belgium. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  18. Regression-assisted deconvolution.

    PubMed

    McIntyre, Julie; Stefanski, Leonard A

    2011-06-30

    We present a semi-parametric deconvolution estimator for the density function of a random variable biX that is measured with error, a common challenge in many epidemiological studies. Traditional deconvolution estimators rely only on assumptions about the distribution of X and the error in its measurement, and ignore information available in auxiliary variables. Our method assumes the availability of a covariate vector statistically related to X by a mean-variance function regression model, where regression errors are normally distributed and independent of the measurement errors. Simulations suggest that the estimator achieves a much lower integrated squared error than the observed-data kernel density estimator when models are correctly specified and the assumption of normal regression errors is met. We illustrate the method using anthropometric measurements of newborns to estimate the density function of newborn length. Copyright © 2011 John Wiley & Sons, Ltd.

  19. Regression mixture models: Does modeling the covariance between independent variables and latent classes improve the results?

    PubMed Central

    Lamont, Andrea E.; Vermunt, Jeroen K.; Van Horn, M. Lee

    2016-01-01

    Regression mixture models are increasingly used as an exploratory approach to identify heterogeneity in the effects of a predictor on an outcome. In this simulation study, we test the effects of violating an implicit assumption often made in these models – i.e., independent variables in the model are not directly related to latent classes. Results indicated that the major risk of failing to model the relationship between predictor and latent class was an increase in the probability of selecting additional latent classes and biased class proportions. Additionally, this study tests whether regression mixture models can detect a piecewise relationship between a predictor and outcome. Results suggest that these models are able to detect piecewise relations, but only when the relationship between the latent class and the predictor is included in model estimation. We illustrate the implications of making this assumption through a re-analysis of applied data examining heterogeneity in the effects of family resources on academic achievement. We compare previous results (which assumed no relation between independent variables and latent class) to the model where this assumption is lifted. Implications and analytic suggestions for conducting regression mixture based on these findings are noted. PMID:26881956

  20. The use of cognitive ability measures as explanatory variables in regression analysis.

    PubMed

    Junker, Brian; Schofield, Lynne Steuerle; Taylor, Lowell J

    2012-12-01

    Cognitive ability measures are often taken as explanatory variables in regression analysis, e.g., as a factor affecting a market outcome such as an individual's wage, or a decision such as an individual's education acquisition. Cognitive ability is a latent construct; its true value is unobserved. Nonetheless, researchers often assume that a test score , constructed via standard psychometric practice from individuals' responses to test items, can be safely used in regression analysis. We examine problems that can arise, and suggest that an alternative approach, a "mixed effects structural equations" (MESE) model, may be more appropriate in many circumstances.

  1. Characterizing nonconstant instrumental variance in emerging miniaturized analytical techniques.

    PubMed

    Noblitt, Scott D; Berg, Kathleen E; Cate, David M; Henry, Charles S

    2016-04-07

    Measurement variance is a crucial aspect of quantitative chemical analysis. Variance directly affects important analytical figures of merit, including detection limit, quantitation limit, and confidence intervals. Most reported analyses for emerging analytical techniques implicitly assume constant variance (homoskedasticity) by using unweighted regression calibrations. Despite the assumption of constant variance, it is known that most instruments exhibit heteroskedasticity, where variance changes with signal intensity. Ignoring nonconstant variance results in suboptimal calibrations, invalid uncertainty estimates, and incorrect detection limits. Three techniques where homoskedasticity is often assumed were covered in this work to evaluate if heteroskedasticity had a significant quantitative impact-naked-eye, distance-based detection using paper-based analytical devices (PADs), cathodic stripping voltammetry (CSV) with disposable carbon-ink electrode devices, and microchip electrophoresis (MCE) with conductivity detection. Despite these techniques representing a wide range of chemistries and precision, heteroskedastic behavior was confirmed for each. The general variance forms were analyzed, and recommendations for accounting for nonconstant variance discussed. Monte Carlo simulations of instrument responses were performed to quantify the benefits of weighted regression, and the sensitivity to uncertainty in the variance function was tested. Results show that heteroskedasticity should be considered during development of new techniques; even moderate uncertainty (30%) in the variance function still results in weighted regression outperforming unweighted regressions. We recommend utilizing the power model of variance because it is easy to apply, requires little additional experimentation, and produces higher-precision results and more reliable uncertainty estimates than assuming homoskedasticity. Copyright © 2016 Elsevier B.V. All rights reserved.

  2. Multinomial logistic regression modelling of obesity and overweight among primary school students in a rural area of Negeri Sembilan

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ghazali, Amirul Syafiq Mohd; Ali, Zalila; Noor, Norlida Mohd

    Multinomial logistic regression is widely used to model the outcomes of a polytomous response variable, a categorical dependent variable with more than two categories. The model assumes that the conditional mean of the dependent categorical variables is the logistic function of an affine combination of predictor variables. Its procedure gives a number of logistic regression models that make specific comparisons of the response categories. When there are q categories of the response variable, the model consists of q-1 logit equations which are fitted simultaneously. The model is validated by variable selection procedures, tests of regression coefficients, a significant test ofmore » the overall model, goodness-of-fit measures, and validation of predicted probabilities using odds ratio. This study used the multinomial logistic regression model to investigate obesity and overweight among primary school students in a rural area on the basis of their demographic profiles, lifestyles and on the diet and food intake. The results indicated that obesity and overweight of students are related to gender, religion, sleep duration, time spent on electronic games, breakfast intake in a week, with whom meals are taken, protein intake, and also, the interaction between breakfast intake in a week with sleep duration, and the interaction between gender and protein intake.« less

  3. Multinomial logistic regression modelling of obesity and overweight among primary school students in a rural area of Negeri Sembilan

    NASA Astrophysics Data System (ADS)

    Ghazali, Amirul Syafiq Mohd; Ali, Zalila; Noor, Norlida Mohd; Baharum, Adam

    2015-10-01

    Multinomial logistic regression is widely used to model the outcomes of a polytomous response variable, a categorical dependent variable with more than two categories. The model assumes that the conditional mean of the dependent categorical variables is the logistic function of an affine combination of predictor variables. Its procedure gives a number of logistic regression models that make specific comparisons of the response categories. When there are q categories of the response variable, the model consists of q-1 logit equations which are fitted simultaneously. The model is validated by variable selection procedures, tests of regression coefficients, a significant test of the overall model, goodness-of-fit measures, and validation of predicted probabilities using odds ratio. This study used the multinomial logistic regression model to investigate obesity and overweight among primary school students in a rural area on the basis of their demographic profiles, lifestyles and on the diet and food intake. The results indicated that obesity and overweight of students are related to gender, religion, sleep duration, time spent on electronic games, breakfast intake in a week, with whom meals are taken, protein intake, and also, the interaction between breakfast intake in a week with sleep duration, and the interaction between gender and protein intake.

  4. An approach to checking case-crossover analyses based on equivalence with time-series methods.

    PubMed

    Lu, Yun; Symons, James Morel; Geyh, Alison S; Zeger, Scott L

    2008-03-01

    The case-crossover design has been increasingly applied to epidemiologic investigations of acute adverse health effects associated with ambient air pollution. The correspondence of the design to that of matched case-control studies makes it inferentially appealing for epidemiologic studies. Case-crossover analyses generally use conditional logistic regression modeling. This technique is equivalent to time-series log-linear regression models when there is a common exposure across individuals, as in air pollution studies. Previous methods for obtaining unbiased estimates for case-crossover analyses have assumed that time-varying risk factors are constant within reference windows. In this paper, we rely on the connection between case-crossover and time-series methods to illustrate model-checking procedures from log-linear model diagnostics for time-stratified case-crossover analyses. Additionally, we compare the relative performance of the time-stratified case-crossover approach to time-series methods under 3 simulated scenarios representing different temporal patterns of daily mortality associated with air pollution in Chicago, Illinois, during 1995 and 1996. Whenever a model-be it time-series or case-crossover-fails to account appropriately for fluctuations in time that confound the exposure, the effect estimate will be biased. It is therefore important to perform model-checking in time-stratified case-crossover analyses rather than assume the estimator is unbiased.

  5. Seemingly unrelated intervention time series models for effectiveness evaluation of large scale environmental remediation.

    PubMed

    Ip, Ryan H L; Li, W K; Leung, Kenneth M Y

    2013-09-15

    Large scale environmental remediation projects applied to sea water always involve large amount of capital investments. Rigorous effectiveness evaluations of such projects are, therefore, necessary and essential for policy review and future planning. This study aims at investigating effectiveness of environmental remediation using three different Seemingly Unrelated Regression (SUR) time series models with intervention effects, including Model (1) assuming no correlation within and across variables, Model (2) assuming no correlation across variable but allowing correlations within variable across different sites, and Model (3) allowing all possible correlations among variables (i.e., an unrestricted model). The results suggested that the unrestricted SUR model is the most reliable one, consistently having smallest variations of the estimated model parameters. We discussed our results with reference to marine water quality management in Hong Kong while bringing managerial issues into consideration. Copyright © 2013 Elsevier Ltd. All rights reserved.

  6. Models for predicting the mass of lime fruits by some engineering properties.

    PubMed

    Miraei Ashtiani, Seyed-Hassan; Baradaran Motie, Jalal; Emadi, Bagher; Aghkhani, Mohammad-Hosein

    2014-11-01

    Grading fruits based on mass is important in packaging and reduces the waste, also increases the marketing value of agricultural produce. The aim of this study was mass modeling of two major cultivars of Iranian limes based on engineering attributes. Models were classified into three: 1-Single and multiple variable regressions of lime mass and dimensional characteristics. 2-Single and multiple variable regressions of lime mass and projected areas. 3-Single regression of lime mass based on its actual volume and calculated volume assumed as ellipsoid and prolate spheroid shapes. All properties considered in the current study were found to be statistically significant (ρ < 0.01). The results indicated that mass modeling of lime based on minor diameter and first projected area are the most appropriate models in the first and the second classifications, respectively. In third classification, the best model was obtained on the basis of the prolate spheroid volume. It was finally concluded that the suitable grading system of lime mass is based on prolate spheroid volume.

  7. A simple linear regression method for quantitative trait loci linkage analysis with censored observations.

    PubMed

    Anderson, Carl A; McRae, Allan F; Visscher, Peter M

    2006-07-01

    Standard quantitative trait loci (QTL) mapping techniques commonly assume that the trait is both fully observed and normally distributed. When considering survival or age-at-onset traits these assumptions are often incorrect. Methods have been developed to map QTL for survival traits; however, they are both computationally intensive and not available in standard genome analysis software packages. We propose a grouped linear regression method for the analysis of continuous survival data. Using simulation we compare this method to both the Cox and Weibull proportional hazards models and a standard linear regression method that ignores censoring. The grouped linear regression method is of equivalent power to both the Cox and Weibull proportional hazards methods and is significantly better than the standard linear regression method when censored observations are present. The method is also robust to the proportion of censored individuals and the underlying distribution of the trait. On the basis of linear regression methodology, the grouped linear regression model is computationally simple and fast and can be implemented readily in freely available statistical software.

  8. Development of a Multiple Linear Regression Model to Forecast Facility Electrical Consumption at an Air Force Base.

    DTIC Science & Technology

    1981-09-01

    corresponds to the same square footage that consumed the electrical energy. 3. The basic assumptions of multiple linear regres- sion, as enumerated in...7. Data related to the sample of bases is assumed to be representative of bases in the population. Limitations Basic limitations on this research were... Ratemaking --Overview. Rand Report R-5894, Santa Monica CA, May 1977. Chatterjee, Samprit, and Bertram Price. Regression Analysis by Example. New York: John

  9. Accounting for groundwater in stream fish thermal habitat responses to climate change

    USGS Publications Warehouse

    Snyder, Craig D.; Hitt, Nathaniel P.; Young, John A.

    2015-01-01

    Forecasting climate change effects on aquatic fauna and their habitat requires an understanding of how water temperature responds to changing air temperature (i.e., thermal sensitivity). Previous efforts to forecast climate effects on brook trout habitat have generally assumed uniform air-water temperature relationships over large areas that cannot account for groundwater inputs and other processes that operate at finer spatial scales. We developed regression models that accounted for groundwater influences on thermal sensitivity from measured air-water temperature relationships within forested watersheds in eastern North America (Shenandoah National Park, USA, 78 sites in 9 watersheds). We used these reach-scale models to forecast climate change effects on stream temperature and brook trout thermal habitat, and compared our results to previous forecasts based upon large-scale models. Observed stream temperatures were generally less sensitive to air temperature than previously assumed, and we attribute this to the moderating effect of shallow groundwater inputs. Predicted groundwater temperatures from air-water regression models corresponded well to observed groundwater temperatures elsewhere in the study area. Predictions of brook trout future habitat loss derived from our fine-grained models were far less pessimistic than those from prior models developed at coarser spatial resolutions. However, our models also revealed spatial variation in thermal sensitivity within and among catchments resulting in a patchy distribution of thermally suitable habitat. Habitat fragmentation due to thermal barriers therefore may have an increasingly important role for trout population viability in headwater streams. Our results demonstrate that simple adjustments to air-water temperature regression models can provide a powerful and cost-effective approach for predicting future stream temperatures while accounting for effects of groundwater.

  10. Notes on power of normality tests of error terms in regression models

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Střelec, Luboš

    2015-03-10

    Normality is one of the basic assumptions in applying statistical procedures. For example in linear regression most of the inferential procedures are based on the assumption of normality, i.e. the disturbance vector is assumed to be normally distributed. Failure to assess non-normality of the error terms may lead to incorrect results of usual statistical inference techniques such as t-test or F-test. Thus, error terms should be normally distributed in order to allow us to make exact inferences. As a consequence, normally distributed stochastic errors are necessary in order to make a not misleading inferences which explains a necessity and importancemore » of robust tests of normality. Therefore, the aim of this contribution is to discuss normality testing of error terms in regression models. In this contribution, we introduce the general RT class of robust tests for normality, and present and discuss the trade-off between power and robustness of selected classical and robust normality tests of error terms in regression models.« less

  11. Regional regression of flood characteristics employing historical information

    USGS Publications Warehouse

    Tasker, Gary D.; Stedinger, J.R.

    1987-01-01

    Streamflow gauging networks provide hydrologic information for use in estimating the parameters of regional regression models. The regional regression models can be used to estimate flood statistics, such as the 100 yr peak, at ungauged sites as functions of drainage basin characteristics. A recent innovation in regional regression is the use of a generalized least squares (GLS) estimator that accounts for unequal station record lengths and sample cross correlation among the flows. However, this technique does not account for historical flood information. A method is proposed here to adjust this generalized least squares estimator to account for possible information about historical floods available at some stations in a region. The historical information is assumed to be in the form of observations of all peaks above a threshold during a long period outside the systematic record period. A Monte Carlo simulation experiment was performed to compare the GLS estimator adjusted for historical floods with the unadjusted GLS estimator and the ordinary least squares estimator. Results indicate that using the GLS estimator adjusted for historical information significantly improves the regression model. ?? 1987.

  12. Quantifying the statistical importance of utilizing regression over classic energy intensity calculations for tracking efficiency improvements in industry

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nimbalkar, Sachin U.; Wenning, Thomas J.; Guo, Wei

    In the United States, manufacturing facilities account for about 32% of total domestic energy consumption in 2014. Robust energy tracking methodologies are critical to understanding energy performance in manufacturing facilities. Due to its simplicity and intuitiveness, the classic energy intensity method (i.e. the ratio of total energy use over total production) is the most widely adopted. However, the classic energy intensity method does not take into account the variation of other relevant parameters (i.e. product type, feed stock type, weather, etc.). Furthermore, the energy intensity method assumes that the facilities’ base energy consumption (energy use at zero production) is zero,more » which rarely holds true. Therefore, it is commonly recommended to utilize regression models rather than the energy intensity approach for tracking improvements at the facility level. Unfortunately, many energy managers have difficulties understanding why regression models are statistically better than utilizing the classic energy intensity method. While anecdotes and qualitative information may convince some, many have major reservations about the accuracy of regression models and whether it is worth the time and effort to gather data and build quality regression models. This paper will explain why regression models are theoretically and quantitatively more accurate for tracking energy performance improvements. Based on the analysis of data from 114 manufacturing plants over 12 years, this paper will present quantitative results on the importance of utilizing regression models over the energy intensity methodology. This paper will also document scenarios where regression models do not have significant relevance over the energy intensity method.« less

  13. The Heteroscedastic Graded Response Model with a Skewed Latent Trait: Testing Statistical and Substantive Hypotheses Related to Skewed Item Category Functions

    ERIC Educational Resources Information Center

    Molenaar, Dylan; Dolan, Conor V.; de Boeck, Paul

    2012-01-01

    The Graded Response Model (GRM; Samejima, "Estimation of ability using a response pattern of graded scores," Psychometric Monograph No. 17, Richmond, VA: The Psychometric Society, 1969) can be derived by assuming a linear regression of a continuous variable, Z, on the trait, [theta], to underlie the ordinal item scores (Takane & de Leeuw in…

  14. A quantile regression model for failure-time data with time-dependent covariates

    PubMed Central

    Gorfine, Malka; Goldberg, Yair; Ritov, Ya’acov

    2017-01-01

    Summary Since survival data occur over time, often important covariates that we wish to consider also change over time. Such covariates are referred as time-dependent covariates. Quantile regression offers flexible modeling of survival data by allowing the covariates to vary with quantiles. This article provides a novel quantile regression model accommodating time-dependent covariates, for analyzing survival data subject to right censoring. Our simple estimation technique assumes the existence of instrumental variables. In addition, we present a doubly-robust estimator in the sense of Robins and Rotnitzky (1992, Recovery of information and adjustment for dependent censoring using surrogate markers. In: Jewell, N. P., Dietz, K. and Farewell, V. T. (editors), AIDS Epidemiology. Boston: Birkhaäuser, pp. 297–331.). The asymptotic properties of the estimators are rigorously studied. Finite-sample properties are demonstrated by a simulation study. The utility of the proposed methodology is demonstrated using the Stanford heart transplant dataset. PMID:27485534

  15. The use of cognitive ability measures as explanatory variables in regression analysis

    PubMed Central

    Junker, Brian; Schofield, Lynne Steuerle; Taylor, Lowell J

    2015-01-01

    Cognitive ability measures are often taken as explanatory variables in regression analysis, e.g., as a factor affecting a market outcome such as an individual’s wage, or a decision such as an individual’s education acquisition. Cognitive ability is a latent construct; its true value is unobserved. Nonetheless, researchers often assume that a test score, constructed via standard psychometric practice from individuals’ responses to test items, can be safely used in regression analysis. We examine problems that can arise, and suggest that an alternative approach, a “mixed effects structural equations” (MESE) model, may be more appropriate in many circumstances. PMID:26998417

  16. Goodness-of-fit tests and model diagnostics for negative binomial regression of RNA sequencing data.

    PubMed

    Mi, Gu; Di, Yanming; Schafer, Daniel W

    2015-01-01

    This work is about assessing model adequacy for negative binomial (NB) regression, particularly (1) assessing the adequacy of the NB assumption, and (2) assessing the appropriateness of models for NB dispersion parameters. Tools for the first are appropriate for NB regression generally; those for the second are primarily intended for RNA sequencing (RNA-Seq) data analysis. The typically small number of biological samples and large number of genes in RNA-Seq analysis motivate us to address the trade-offs between robustness and statistical power using NB regression models. One widely-used power-saving strategy, for example, is to assume some commonalities of NB dispersion parameters across genes via simple models relating them to mean expression rates, and many such models have been proposed. As RNA-Seq analysis is becoming ever more popular, it is appropriate to make more thorough investigations into power and robustness of the resulting methods, and into practical tools for model assessment. In this article, we propose simulation-based statistical tests and diagnostic graphics to address model adequacy. We provide simulated and real data examples to illustrate that our proposed methods are effective for detecting the misspecification of the NB mean-variance relationship as well as judging the adequacy of fit of several NB dispersion models.

  17. Non-proportional odds multivariate logistic regression of ordinal family data.

    PubMed

    Zaloumis, Sophie G; Scurrah, Katrina J; Harrap, Stephen B; Ellis, Justine A; Gurrin, Lyle C

    2015-03-01

    Methods to examine whether genetic and/or environmental sources can account for the residual variation in ordinal family data usually assume proportional odds. However, standard software to fit the non-proportional odds model to ordinal family data is limited because the correlation structure of family data is more complex than for other types of clustered data. To perform these analyses we propose the non-proportional odds multivariate logistic regression model and take a simulation-based approach to model fitting using Markov chain Monte Carlo methods, such as partially collapsed Gibbs sampling and the Metropolis algorithm. We applied the proposed methodology to male pattern baldness data from the Victorian Family Heart Study. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  18. Predicting spatio-temporal failure in large scale observational and micro scale experimental systems

    NASA Astrophysics Data System (ADS)

    de las Heras, Alejandro; Hu, Yong

    2006-10-01

    Forecasting has become an essential part of modern thought, but the practical limitations still are manifold. We addressed future rates of change by comparing models that take into account time, and models that focus more on space. Cox regression confirmed that linear change can be safely assumed in the short-term. Spatially explicit Poisson regression, provided a ceiling value for the number of deforestation spots. With several observed and estimated rates, it was decided to forecast using the more robust assumptions. A Markov-chain cellular automaton thus projected 5-year deforestation in the Amazonian Arc of Deforestation, showing that even a stable rate of change would largely deplete the forest area. More generally, resolution and implementation of the existing models could explain many of the modelling difficulties still affecting forecasting.

  19. “Smooth” Semiparametric Regression Analysis for Arbitrarily Censored Time-to-Event Data

    PubMed Central

    Zhang, Min; Davidian, Marie

    2008-01-01

    Summary A general framework for regression analysis of time-to-event data subject to arbitrary patterns of censoring is proposed. The approach is relevant when the analyst is willing to assume that distributions governing model components that are ordinarily left unspecified in popular semiparametric regression models, such as the baseline hazard function in the proportional hazards model, have densities satisfying mild “smoothness” conditions. Densities are approximated by a truncated series expansion that, for fixed degree of truncation, results in a “parametric” representation, which makes likelihood-based inference coupled with adaptive choice of the degree of truncation, and hence flexibility of the model, computationally and conceptually straightforward with data subject to any pattern of censoring. The formulation allows popular models, such as the proportional hazards, proportional odds, and accelerated failure time models, to be placed in a common framework; provides a principled basis for choosing among them; and renders useful extensions of the models straightforward. The utility and performance of the methods are demonstrated via simulations and by application to data from time-to-event studies. PMID:17970813

  20. Practical application of cure mixture model for long-term censored survivor data from a withdrawal clinical trial of patients with major depressive disorder.

    PubMed

    Arano, Ichiro; Sugimoto, Tomoyuki; Hamasaki, Toshimitsu; Ohno, Yuko

    2010-04-23

    Survival analysis methods such as the Kaplan-Meier method, log-rank test, and Cox proportional hazards regression (Cox regression) are commonly used to analyze data from randomized withdrawal studies in patients with major depressive disorder. However, unfortunately, such common methods may be inappropriate when a long-term censored relapse-free time appears in data as the methods assume that if complete follow-up were possible for all individuals, each would eventually experience the event of interest. In this paper, to analyse data including such a long-term censored relapse-free time, we discuss a semi-parametric cure regression (Cox cure regression), which combines a logistic formulation for the probability of occurrence of an event with a Cox proportional hazards specification for the time of occurrence of the event. In specifying the treatment's effect on disease-free survival, we consider the fraction of long-term survivors and the risks associated with a relapse of the disease. In addition, we develop a tree-based method for the time to event data to identify groups of patients with differing prognoses (cure survival CART). Although analysis methods typically adapt the log-rank statistic for recursive partitioning procedures, the method applied here used a likelihood ratio (LR) test statistic from a fitting of cure survival regression assuming exponential and Weibull distributions for the latency time of relapse. The method is illustrated using data from a sertraline randomized withdrawal study in patients with major depressive disorder. We concluded that Cox cure regression reveals facts on who may be cured, and how the treatment and other factors effect on the cured incidence and on the relapse time of uncured patients, and that cure survival CART output provides easily understandable and interpretable information, useful both in identifying groups of patients with differing prognoses and in utilizing Cox cure regression models leading to meaningful interpretations.

  1. Security of statistical data bases: invasion of privacy through attribute correlational modeling

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Palley, M.A.

    This study develops, defines, and applies a statistical technique for the compromise of confidential information in a statistical data base. Attribute Correlational Modeling (ACM) recognizes that the information contained in a statistical data base represents real world statistical phenomena. As such, ACM assumes correlational behavior among the database attributes. ACM proceeds to compromise confidential information through creation of a regression model, where the confidential attribute is treated as the dependent variable. The typical statistical data base may preclude the direct application of regression. In this scenario, the research introduces the notion of a synthetic data base, created through legitimate queriesmore » of the actual data base, and through proportional random variation of responses to these queries. The synthetic data base is constructed to resemble the actual data base as closely as possible in a statistical sense. ACM then applies regression analysis to the synthetic data base, and utilizes the derived model to estimate confidential information in the actual database.« less

  2. Mixed effect Poisson log-linear models for clinical and epidemiological sleep hypnogram data

    PubMed Central

    Swihart, Bruce J.; Caffo, Brian S.; Crainiceanu, Ciprian; Punjabi, Naresh M.

    2013-01-01

    Bayesian Poisson log-linear multilevel models scalable to epidemiological studies are proposed to investigate population variability in sleep state transition rates. Hierarchical random effects are used to account for pairings of subjects and repeated measures within those subjects, as comparing diseased to non-diseased subjects while minimizing bias is of importance. Essentially, non-parametric piecewise constant hazards are estimated and smoothed, allowing for time-varying covariates and segment of the night comparisons. The Bayesian Poisson regression is justified through a re-derivation of a classical algebraic likelihood equivalence of Poisson regression with a log(time) offset and survival regression assuming exponentially distributed survival times. Such re-derivation allows synthesis of two methods currently used to analyze sleep transition phenomena: stratified multi-state proportional hazards models and log-linear models with GEE for transition counts. An example data set from the Sleep Heart Health Study is analyzed. Supplementary material includes the analyzed data set as well as the code for a reproducible analysis. PMID:22241689

  3. Probability Theory Plus Noise: Descriptive Estimation and Inferential Judgment.

    PubMed

    Costello, Fintan; Watts, Paul

    2018-01-01

    We describe a computational model of two central aspects of people's probabilistic reasoning: descriptive probability estimation and inferential probability judgment. This model assumes that people's reasoning follows standard frequentist probability theory, but it is subject to random noise. This random noise has a regressive effect in descriptive probability estimation, moving probability estimates away from normative probabilities and toward the center of the probability scale. This random noise has an anti-regressive effect in inferential judgement, however. These regressive and anti-regressive effects explain various reliable and systematic biases seen in people's descriptive probability estimation and inferential probability judgment. This model predicts that these contrary effects will tend to cancel out in tasks that involve both descriptive estimation and inferential judgement, leading to unbiased responses in those tasks. We test this model by applying it to one such task, described by Gallistel et al. ). Participants' median responses in this task were unbiased, agreeing with normative probability theory over the full range of responses. Our model captures the pattern of unbiased responses in this task, while simultaneously explaining systematic biases away from normatively correct probabilities seen in other tasks. Copyright © 2018 Cognitive Science Society, Inc.

  4. Analytic model for the long-term evolution of circular Earth satellite orbits including lunar node regression

    NASA Astrophysics Data System (ADS)

    Zhu, Ting-Lei; Zhao, Chang-Yin; Zhang, Ming-Jiang

    2017-04-01

    This paper aims to obtain an analytic approximation to the evolution of circular orbits governed by the Earth's J2 and the luni-solar gravitational perturbations. Assuming that the lunar orbital plane coincides with the ecliptic plane, Allan and Cook (Proc. R. Soc. A, Math. Phys. Eng. Sci. 280(1380):97, 1964) derived an analytic solution to the orbital plane evolution of circular orbits. Using their result as an intermediate solution, we establish an approximate analytic model with lunar orbital inclination and its node regression be taken into account. Finally, an approximate analytic expression is derived, which is accurate compared to the numerical results except for the resonant cases when the period of the reference orbit approximately equals the integer multiples (especially 1 or 2 times) of lunar node regression period.

  5. Simulation program for estimating statistical power of Cox's proportional hazards model assuming no specific distribution for the survival time.

    PubMed

    Akazawa, K; Nakamura, T; Moriguchi, S; Shimada, M; Nose, Y

    1991-07-01

    Small sample properties of the maximum partial likelihood estimates for Cox's proportional hazards model depend on the sample size, the true values of regression coefficients, covariate structure, censoring pattern and possibly baseline hazard functions. Therefore, it would be difficult to construct a formula or table to calculate the exact power of a statistical test for the treatment effect in any specific clinical trial. The simulation program, written in SAS/IML, described in this paper uses Monte-Carlo methods to provide estimates of the exact power for Cox's proportional hazards model. For illustrative purposes, the program was applied to real data obtained from a clinical trial performed in Japan. Since the program does not assume any specific function for the baseline hazard, it is, in principle, applicable to any censored survival data as long as they follow Cox's proportional hazards model.

  6. A fuzzy adaptive network approach to parameter estimation in cases where independent variables come from an exponential distribution

    NASA Astrophysics Data System (ADS)

    Dalkilic, Turkan Erbay; Apaydin, Aysen

    2009-11-01

    In a regression analysis, it is assumed that the observations come from a single class in a data cluster and the simple functional relationship between the dependent and independent variables can be expressed using the general model; Y=f(X)+[epsilon]. However; a data cluster may consist of a combination of observations that have different distributions that are derived from different clusters. When faced with issues of estimating a regression model for fuzzy inputs that have been derived from different distributions, this regression model has been termed the [`]switching regression model' and it is expressed with . Here li indicates the class number of each independent variable and p is indicative of the number of independent variables [J.R. Jang, ANFIS: Adaptive-network-based fuzzy inference system, IEEE Transaction on Systems, Man and Cybernetics 23 (3) (1993) 665-685; M. Michel, Fuzzy clustering and switching regression models using ambiguity and distance rejects, Fuzzy Sets and Systems 122 (2001) 363-399; E.Q. Richard, A new approach to estimating switching regressions, Journal of the American Statistical Association 67 (338) (1972) 306-310]. In this study, adaptive networks have been used to construct a model that has been formed by gathering obtained models. There are methods that suggest the class numbers of independent variables heuristically. Alternatively, in defining the optimal class number of independent variables, the use of suggested validity criterion for fuzzy clustering has been aimed. In the case that independent variables have an exponential distribution, an algorithm has been suggested for defining the unknown parameter of the switching regression model and for obtaining the estimated values after obtaining an optimal membership function, which is suitable for exponential distribution.

  7. Do the methods used to analyse missing data really matter? An examination of data from an observational study of Intermediate Care patients.

    PubMed

    Kaambwa, Billingsley; Bryan, Stirling; Billingham, Lucinda

    2012-06-27

    Missing data is a common statistical problem in healthcare datasets from populations of older people. Some argue that arbitrarily assuming the mechanism responsible for the missingness and therefore the method for dealing with this missingness is not the best option-but is this always true? This paper explores what happens when extra information that suggests that a particular mechanism is responsible for missing data is disregarded and methods for dealing with the missing data are chosen arbitrarily. Regression models based on 2,533 intermediate care (IC) patients from the largest evaluation of IC done and published in the UK to date were used to explain variation in costs, EQ-5D and Barthel index. Three methods for dealing with missingness were utilised, each assuming a different mechanism as being responsible for the missing data: complete case analysis (assuming missing completely at random-MCAR), multiple imputation (assuming missing at random-MAR) and Heckman selection model (assuming missing not at random-MNAR). Differences in results were gauged by examining the signs of coefficients as well as the sizes of both coefficients and associated standard errors. Extra information strongly suggested that missing cost data were MCAR. The results show that MCAR and MAR-based methods yielded similar results with sizes of most coefficients and standard errors differing by less than 3.4% while those based on MNAR-methods were statistically different (up to 730% bigger). Significant variables in all regression models also had the same direction of influence on costs. All three mechanisms of missingness were shown to be potential causes of the missing EQ-5D and Barthel data. The method chosen to deal with missing data did not seem to have any significant effect on the results for these data as they led to broadly similar conclusions with sizes of coefficients and standard errors differing by less than 54% and 322%, respectively. Arbitrary selection of methods to deal with missing data should be avoided. Using extra information gathered during the data collection exercise about the cause of missingness to guide this selection would be more appropriate.

  8. Prediction models for clustered data: comparison of a random intercept and standard regression model

    PubMed Central

    2013-01-01

    Background When study data are clustered, standard regression analysis is considered inappropriate and analytical techniques for clustered data need to be used. For prediction research in which the interest of predictor effects is on the patient level, random effect regression models are probably preferred over standard regression analysis. It is well known that the random effect parameter estimates and the standard logistic regression parameter estimates are different. Here, we compared random effect and standard logistic regression models for their ability to provide accurate predictions. Methods Using an empirical study on 1642 surgical patients at risk of postoperative nausea and vomiting, who were treated by one of 19 anesthesiologists (clusters), we developed prognostic models either with standard or random intercept logistic regression. External validity of these models was assessed in new patients from other anesthesiologists. We supported our results with simulation studies using intra-class correlation coefficients (ICC) of 5%, 15%, or 30%. Standard performance measures and measures adapted for the clustered data structure were estimated. Results The model developed with random effect analysis showed better discrimination than the standard approach, if the cluster effects were used for risk prediction (standard c-index of 0.69 versus 0.66). In the external validation set, both models showed similar discrimination (standard c-index 0.68 versus 0.67). The simulation study confirmed these results. For datasets with a high ICC (≥15%), model calibration was only adequate in external subjects, if the used performance measure assumed the same data structure as the model development method: standard calibration measures showed good calibration for the standard developed model, calibration measures adapting the clustered data structure showed good calibration for the prediction model with random intercept. Conclusion The models with random intercept discriminate better than the standard model only if the cluster effect is used for predictions. The prediction model with random intercept had good calibration within clusters. PMID:23414436

  9. Prediction models for clustered data: comparison of a random intercept and standard regression model.

    PubMed

    Bouwmeester, Walter; Twisk, Jos W R; Kappen, Teus H; van Klei, Wilton A; Moons, Karel G M; Vergouwe, Yvonne

    2013-02-15

    When study data are clustered, standard regression analysis is considered inappropriate and analytical techniques for clustered data need to be used. For prediction research in which the interest of predictor effects is on the patient level, random effect regression models are probably preferred over standard regression analysis. It is well known that the random effect parameter estimates and the standard logistic regression parameter estimates are different. Here, we compared random effect and standard logistic regression models for their ability to provide accurate predictions. Using an empirical study on 1642 surgical patients at risk of postoperative nausea and vomiting, who were treated by one of 19 anesthesiologists (clusters), we developed prognostic models either with standard or random intercept logistic regression. External validity of these models was assessed in new patients from other anesthesiologists. We supported our results with simulation studies using intra-class correlation coefficients (ICC) of 5%, 15%, or 30%. Standard performance measures and measures adapted for the clustered data structure were estimated. The model developed with random effect analysis showed better discrimination than the standard approach, if the cluster effects were used for risk prediction (standard c-index of 0.69 versus 0.66). In the external validation set, both models showed similar discrimination (standard c-index 0.68 versus 0.67). The simulation study confirmed these results. For datasets with a high ICC (≥15%), model calibration was only adequate in external subjects, if the used performance measure assumed the same data structure as the model development method: standard calibration measures showed good calibration for the standard developed model, calibration measures adapting the clustered data structure showed good calibration for the prediction model with random intercept. The models with random intercept discriminate better than the standard model only if the cluster effect is used for predictions. The prediction model with random intercept had good calibration within clusters.

  10. A review and comparison of Bayesian and likelihood-based inferences in beta regression and zero-or-one-inflated beta regression.

    PubMed

    Liu, Fang; Eugenio, Evercita C

    2018-04-01

    Beta regression is an increasingly popular statistical technique in medical research for modeling of outcomes that assume values in (0, 1), such as proportions and patient reported outcomes. When outcomes take values in the intervals [0,1), (0,1], or [0,1], zero-or-one-inflated beta (zoib) regression can be used. We provide a thorough review on beta regression and zoib regression in the modeling, inferential, and computational aspects via the likelihood-based and Bayesian approaches. We demonstrate the statistical and practical importance of correctly modeling the inflation at zero/one rather than ad hoc replacing them with values close to zero/one via simulation studies; the latter approach can lead to biased estimates and invalid inferences. We show via simulation studies that the likelihood-based approach is computationally faster in general than MCMC algorithms used in the Bayesian inferences, but runs the risk of non-convergence, large biases, and sensitivity to starting values in the optimization algorithm especially with clustered/correlated data, data with sparse inflation at zero and one, and data that warrant regularization of the likelihood. The disadvantages of the regular likelihood-based approach make the Bayesian approach an attractive alternative in these cases. Software packages and tools for fitting beta and zoib regressions in both the likelihood-based and Bayesian frameworks are also reviewed.

  11. Convergent Time-Varying Regression Models for Data Streams: Tracking Concept Drift by the Recursive Parzen-Based Generalized Regression Neural Networks.

    PubMed

    Duda, Piotr; Jaworski, Maciej; Rutkowski, Leszek

    2018-03-01

    One of the greatest challenges in data mining is related to processing and analysis of massive data streams. Contrary to traditional static data mining problems, data streams require that each element is processed only once, the amount of allocated memory is constant and the models incorporate changes of investigated streams. A vast majority of available methods have been developed for data stream classification and only a few of them attempted to solve regression problems, using various heuristic approaches. In this paper, we develop mathematically justified regression models working in a time-varying environment. More specifically, we study incremental versions of generalized regression neural networks, called IGRNNs, and we prove their tracking properties - weak (in probability) and strong (with probability one) convergence assuming various concept drift scenarios. First, we present the IGRNNs, based on the Parzen kernels, for modeling stationary systems under nonstationary noise. Next, we extend our approach to modeling time-varying systems under nonstationary noise. We present several types of concept drifts to be handled by our approach in such a way that weak and strong convergence holds under certain conditions. Finally, in the series of simulations, we compare our method with commonly used heuristic approaches, based on forgetting mechanism or sliding windows, to deal with concept drift. Finally, we apply our concept in a real life scenario solving the problem of currency exchange rates prediction.

  12. MIXOR: a computer program for mixed-effects ordinal regression analysis.

    PubMed

    Hedeker, D; Gibbons, R D

    1996-03-01

    MIXOR provides maximum marginal likelihood estimates for mixed-effects ordinal probit, logistic, and complementary log-log regression models. These models can be used for analysis of dichotomous and ordinal outcomes from either a clustered or longitudinal design. For clustered data, the mixed-effects model assumes that data within clusters are dependent. The degree of dependency is jointly estimated with the usual model parameters, thus adjusting for dependence resulting from clustering of the data. Similarly, for longitudinal data, the mixed-effects approach can allow for individual-varying intercepts and slopes across time, and can estimate the degree to which these time-related effects vary in the population of individuals. MIXOR uses marginal maximum likelihood estimation, utilizing a Fisher-scoring solution. For the scoring solution, the Cholesky factor of the random-effects variance-covariance matrix is estimated, along with the effects of model covariates. Examples illustrating usage and features of MIXOR are provided.

  13. Comparative study of some robust statistical methods: weighted, parametric, and nonparametric linear regression of HPLC convoluted peak responses using internal standard method in drug bioavailability studies.

    PubMed

    Korany, Mohamed A; Maher, Hadir M; Galal, Shereen M; Ragab, Marwa A A

    2013-05-01

    This manuscript discusses the application and the comparison between three statistical regression methods for handling data: parametric, nonparametric, and weighted regression (WR). These data were obtained from different chemometric methods applied to the high-performance liquid chromatography response data using the internal standard method. This was performed on a model drug Acyclovir which was analyzed in human plasma with the use of ganciclovir as internal standard. In vivo study was also performed. Derivative treatment of chromatographic response ratio data was followed by convolution of the resulting derivative curves using 8-points sin x i polynomials (discrete Fourier functions). This work studies and also compares the application of WR method and Theil's method, a nonparametric regression (NPR) method with the least squares parametric regression (LSPR) method, which is considered the de facto standard method used for regression. When the assumption of homoscedasticity is not met for analytical data, a simple and effective way to counteract the great influence of the high concentrations on the fitted regression line is to use WR method. WR was found to be superior to the method of LSPR as the former assumes that the y-direction error in the calibration curve will increase as x increases. Theil's NPR method was also found to be superior to the method of LSPR as the former assumes that errors could occur in both x- and y-directions and that might not be normally distributed. Most of the results showed a significant improvement in the precision and accuracy on applying WR and NPR methods relative to LSPR.

  14. Impact of trucking network flow on preferred biorefinery locations in the southern United States

    Treesearch

    Timothy M. Young; Lee D. Han; James H. Perdue; Stephanie R. Hargrove; Frank M. Guess; Xia Huang; Chung-Hao Chen

    2017-01-01

    The impact of the trucking transportation network flow was modeled for the southern United States. The study addresses a gap in existing research by applying a Bayesian logistic regression and Geographic Information System (GIS) geospatial analysis to predict biorefinery site locations. A one-way trucking cost assuming a 128.8 km (80-mile) haul distance was estimated...

  15. Measuring bulrush culm relationships to estimate plant biomass within a southern California treatment wetland

    USGS Publications Warehouse

    Daniels, Joan S. (Thullen); Cade, Brian S.; Sartoris, James J.

    2010-01-01

    Assessment of emergent vegetation biomass can be time consuming and labor intensive. To establish a less onerous, yet accurate method, for determining emergent plant biomass than by direct measurements we collected vegetation data over a six-year period and modeled biomass using easily obtained variables: culm (stem) diameter, culm height and culm density. From 1998 through 2005, we collected emergent vegetation samples (Schoenoplectus californicus andSchoenoplectus acutus) at a constructed treatment wetland in San Jacinto, California during spring and fall. Various statistical models were run on the data to determine the strongest relationships. We found that the nonlinear relationship: CB=β0DHβ110ε, where CB was dry culm biomass (g m−2), DH was density of culms × average height of culms in a plot, and β0 and β1 were parameters to estimate, proved to be the best fit for predicting dried-live above-ground biomass of the two Schoenoplectus species. The random error distribution, ε, was either assumed to be normally distributed for mean regression estimates or assumed to be an unspecified continuous distribution for quantile regression estimates.

  16. Pseudo and conditional score approach to joint analysis of current count and current status data.

    PubMed

    Wen, Chi-Chung; Chen, Yi-Hau

    2018-04-17

    We develop a joint analysis approach for recurrent and nonrecurrent event processes subject to case I interval censorship, which are also known in literature as current count and current status data, respectively. We use a shared frailty to link the recurrent and nonrecurrent event processes, while leaving the distribution of the frailty fully unspecified. Conditional on the frailty, the recurrent event is assumed to follow a nonhomogeneous Poisson process, and the mean function of the recurrent event and the survival function of the nonrecurrent event are assumed to follow some general form of semiparametric transformation models. Estimation of the models is based on the pseudo-likelihood and the conditional score techniques. The resulting estimators for the regression parameters and the unspecified baseline functions are shown to be consistent with rates of square and cubic roots of the sample size, respectively. Asymptotic normality with closed-form asymptotic variance is derived for the estimator of the regression parameters. We apply the proposed method to a fracture-osteoporosis survey data to identify risk factors jointly for fracture and osteoporosis in elders, while accounting for association between the two events within a subject. © 2018, The International Biometric Society.

  17. Fuzzy multinomial logistic regression analysis: A multi-objective programming approach

    NASA Astrophysics Data System (ADS)

    Abdalla, Hesham A.; El-Sayed, Amany A.; Hamed, Ramadan

    2017-05-01

    Parameter estimation for multinomial logistic regression is usually based on maximizing the likelihood function. For large well-balanced datasets, Maximum Likelihood (ML) estimation is a satisfactory approach. Unfortunately, ML can fail completely or at least produce poor results in terms of estimated probabilities and confidence intervals of parameters, specially for small datasets. In this study, a new approach based on fuzzy concepts is proposed to estimate parameters of the multinomial logistic regression. The study assumes that the parameters of multinomial logistic regression are fuzzy. Based on the extension principle stated by Zadeh and Bárdossy's proposition, a multi-objective programming approach is suggested to estimate these fuzzy parameters. A simulation study is used to evaluate the performance of the new approach versus Maximum likelihood (ML) approach. Results show that the new proposed model outperforms ML in cases of small datasets.

  18. Stress Regression Analysis of Asphalt Concrete Deck Pavement Based on Orthogonal Experimental Design and Interlayer Contact

    NASA Astrophysics Data System (ADS)

    Wang, Xuntao; Feng, Jianhu; Wang, Hu; Hong, Shidi; Zheng, Supei

    2018-03-01

    A three-dimensional finite element box girder bridge and its asphalt concrete deck pavement were established by ANSYS software, and the interlayer bonding condition of asphalt concrete deck pavement was assumed to be contact bonding condition. Orthogonal experimental design is used to arrange the testing plans of material parameters, and an evaluation of the effect of different material parameters in the mechanical response of asphalt concrete surface layer was conducted by multiple linear regression model and using the results from the finite element analysis. Results indicated that stress regression equations can well predict the stress of the asphalt concrete surface layer, and elastic modulus of waterproof layer has a significant influence on stress values of asphalt concrete surface layer.

  19. Building factorial regression models to explain and predict nitrate concentrations in groundwater under agricultural land

    NASA Astrophysics Data System (ADS)

    Stigter, T. Y.; Ribeiro, L.; Dill, A. M. M. Carvalho

    2008-07-01

    SummaryFactorial regression models, based on correspondence analysis, are built to explain the high nitrate concentrations in groundwater beneath an agricultural area in the south of Portugal, exceeding 300 mg/l, as a function of chemical variables, electrical conductivity (EC), land use and hydrogeological setting. Two important advantages of the proposed methodology are that qualitative parameters can be involved in the regression analysis and that multicollinearity is avoided. Regression is performed on eigenvectors extracted from the data similarity matrix, the first of which clearly reveals the impact of agricultural practices and hydrogeological setting on the groundwater chemistry of the study area. Significant correlation exists between response variable NO3- and explanatory variables Ca 2+, Cl -, SO42-, depth to water, aquifer media and land use. Substituting Cl - by the EC results in the most accurate regression model for nitrate, when disregarding the four largest outliers (model A). When built solely on land use and hydrogeological setting, the regression model (model B) is less accurate but more interesting from a practical viewpoint, as it is based on easily obtainable data and can be used to predict nitrate concentrations in groundwater in other areas with similar conditions. This is particularly useful for conservative contaminants, where risk and vulnerability assessment methods, based on assumed rather than established correlations, generally produce erroneous results. Another purpose of the models can be to predict the future evolution of nitrate concentrations under influence of changes in land use or fertilization practices, which occur in compliance with policies such as the Nitrates Directive. Model B predicts a 40% decrease in nitrate concentrations in groundwater of the study area, when horticulture is replaced by other land use with much lower fertilization and irrigation rates.

  20. Score tests for independence in semiparametric competing risks models.

    PubMed

    Saïd, Mériem; Ghazzali, Nadia; Rivest, Louis-Paul

    2009-12-01

    A popular model for competing risks postulates the existence of a latent unobserved failure time for each risk. Assuming that these underlying failure times are independent is attractive since it allows standard statistical tools for right-censored lifetime data to be used in the analysis. This paper proposes simple independence score tests for the validity of this assumption when the individual risks are modeled using semiparametric proportional hazards regressions. It assumes that covariates are available, making the model identifiable. The score tests are derived for alternatives that specify that copulas are responsible for a possible dependency between the competing risks. The test statistics are constructed by adding to the partial likelihoods for the individual risks an explanatory variable for the dependency between the risks. A variance estimator is derived by writing the score function and the Fisher information matrix for the marginal models as stochastic integrals. Pitman efficiencies are used to compare test statistics. A simulation study and a numerical example illustrate the methodology proposed in this paper.

  1. Simultaneous treatment of unspecified heteroskedastic model error distribution and mismeasured covariates for restricted moment models.

    PubMed

    Garcia, Tanya P; Ma, Yanyuan

    2017-10-01

    We develop consistent and efficient estimation of parameters in general regression models with mismeasured covariates. We assume the model error and covariate distributions are unspecified, and the measurement error distribution is a general parametric distribution with unknown variance-covariance. We construct root- n consistent, asymptotically normal and locally efficient estimators using the semiparametric efficient score. We do not estimate any unknown distribution or model error heteroskedasticity. Instead, we form the estimator under possibly incorrect working distribution models for the model error, error-prone covariate, or both. Empirical results demonstrate robustness to different incorrect working models in homoscedastic and heteroskedastic models with error-prone covariates.

  2. Application of logistic regression to case-control association studies involving two causative loci.

    PubMed

    North, Bernard V; Curtis, David; Sham, Pak C

    2005-01-01

    Models in which two susceptibility loci jointly influence the risk of developing disease can be explored using logistic regression analysis. Comparison of likelihoods of models incorporating different sets of disease model parameters allows inferences to be drawn regarding the nature of the joint effect of the loci. We have simulated case-control samples generated assuming different two-locus models and then analysed them using logistic regression. We show that this method is practicable and that, for the models we have used, it can be expected to allow useful inferences to be drawn from sample sizes consisting of hundreds of subjects. Interactions between loci can be explored, but interactive effects do not exactly correspond with classical definitions of epistasis. We have particularly examined the issue of the extent to which it is helpful to utilise information from a previously identified locus when investigating a second, unknown locus. We show that for some models conditional analysis can have substantially greater power while for others unconditional analysis can be more powerful. Hence we conclude that in general both conditional and unconditional analyses should be performed when searching for additional loci.

  3. Outcome modelling strategies in epidemiology: traditional methods and basic alternatives

    PubMed Central

    Greenland, Sander; Daniel, Rhian; Pearce, Neil

    2016-01-01

    Abstract Controlling for too many potential confounders can lead to or aggravate problems of data sparsity or multicollinearity, particularly when the number of covariates is large in relation to the study size. As a result, methods to reduce the number of modelled covariates are often deployed. We review several traditional modelling strategies, including stepwise regression and the ‘change-in-estimate’ (CIE) approach to deciding which potential confounders to include in an outcome-regression model for estimating effects of a targeted exposure. We discuss their shortcomings, and then provide some basic alternatives and refinements that do not require special macros or programming. Throughout, we assume the main goal is to derive the most accurate effect estimates obtainable from the data and commercial software. Allowing that most users must stay within standard software packages, this goal can be roughly approximated using basic methods to assess, and thereby minimize, mean squared error (MSE). PMID:27097747

  4. Regression dilution bias: tools for correction methods and sample size calculation.

    PubMed

    Berglund, Lars

    2012-08-01

    Random errors in measurement of a risk factor will introduce downward bias of an estimated association to a disease or a disease marker. This phenomenon is called regression dilution bias. A bias correction may be made with data from a validity study or a reliability study. In this article we give a non-technical description of designs of reliability studies with emphasis on selection of individuals for a repeated measurement, assumptions of measurement error models, and correction methods for the slope in a simple linear regression model where the dependent variable is a continuous variable. Also, we describe situations where correction for regression dilution bias is not appropriate. The methods are illustrated with the association between insulin sensitivity measured with the euglycaemic insulin clamp technique and fasting insulin, where measurement of the latter variable carries noticeable random error. We provide software tools for estimation of a corrected slope in a simple linear regression model assuming data for a continuous dependent variable and a continuous risk factor from a main study and an additional measurement of the risk factor in a reliability study. Also, we supply programs for estimation of the number of individuals needed in the reliability study and for choice of its design. Our conclusion is that correction for regression dilution bias is seldom applied in epidemiological studies. This may cause important effects of risk factors with large measurement errors to be neglected.

  5. Robust Image Regression Based on the Extended Matrix Variate Power Exponential Distribution of Dependent Noise.

    PubMed

    Luo, Lei; Yang, Jian; Qian, Jianjun; Tai, Ying; Lu, Gui-Fu

    2017-09-01

    Dealing with partial occlusion or illumination is one of the most challenging problems in image representation and classification. In this problem, the characterization of the representation error plays a crucial role. In most current approaches, the error matrix needs to be stretched into a vector and each element is assumed to be independently corrupted. This ignores the dependence between the elements of error. In this paper, it is assumed that the error image caused by partial occlusion or illumination changes is a random matrix variate and follows the extended matrix variate power exponential distribution. This has the heavy tailed regions and can be used to describe a matrix pattern of l×m dimensional observations that are not independent. This paper reveals the essence of the proposed distribution: it actually alleviates the correlations between pixels in an error matrix E and makes E approximately Gaussian. On the basis of this distribution, we derive a Schatten p -norm-based matrix regression model with L q regularization. Alternating direction method of multipliers is applied to solve this model. To get a closed-form solution in each step of the algorithm, two singular value function thresholding operators are introduced. In addition, the extended Schatten p -norm is utilized to characterize the distance between the test samples and classes in the design of the classifier. Extensive experimental results for image reconstruction and classification with structural noise demonstrate that the proposed algorithm works much more robustly than some existing regression-based methods.

  6. Bayesian semi-parametric analysis of Poisson change-point regression models: application to policy making in Cali, Colombia.

    PubMed

    Park, Taeyoung; Krafty, Robert T; Sánchez, Alvaro I

    2012-07-27

    A Poisson regression model with an offset assumes a constant baseline rate after accounting for measured covariates, which may lead to biased estimates of coefficients in an inhomogeneous Poisson process. To correctly estimate the effect of time-dependent covariates, we propose a Poisson change-point regression model with an offset that allows a time-varying baseline rate. When the nonconstant pattern of a log baseline rate is modeled with a nonparametric step function, the resulting semi-parametric model involves a model component of varying dimension and thus requires a sophisticated varying-dimensional inference to obtain correct estimates of model parameters of fixed dimension. To fit the proposed varying-dimensional model, we devise a state-of-the-art MCMC-type algorithm based on partial collapse. The proposed model and methods are used to investigate an association between daily homicide rates in Cali, Colombia and policies that restrict the hours during which the legal sale of alcoholic beverages is permitted. While simultaneously identifying the latent changes in the baseline homicide rate which correspond to the incidence of sociopolitical events, we explore the effect of policies governing the sale of alcohol on homicide rates and seek a policy that balances the economic and cultural dependencies on alcohol sales to the health of the public.

  7. SEPARABLE FACTOR ANALYSIS WITH APPLICATIONS TO MORTALITY DATA

    PubMed Central

    Fosdick, Bailey K.; Hoff, Peter D.

    2014-01-01

    Human mortality data sets can be expressed as multiway data arrays, the dimensions of which correspond to categories by which mortality rates are reported, such as age, sex, country and year. Regression models for such data typically assume an independent error distribution or an error model that allows for dependence along at most one or two dimensions of the data array. However, failing to account for other dependencies can lead to inefficient estimates of regression parameters, inaccurate standard errors and poor predictions. An alternative to assuming independent errors is to allow for dependence along each dimension of the array using a separable covariance model. However, the number of parameters in this model increases rapidly with the dimensions of the array and, for many arrays, maximum likelihood estimates of the covariance parameters do not exist. In this paper, we propose a submodel of the separable covariance model that estimates the covariance matrix for each dimension as having factor analytic structure. This model can be viewed as an extension of factor analysis to array-valued data, as it uses a factor model to estimate the covariance along each dimension of the array. We discuss properties of this model as they relate to ordinary factor analysis, describe maximum likelihood and Bayesian estimation methods, and provide a likelihood ratio testing procedure for selecting the factor model ranks. We apply this methodology to the analysis of data from the Human Mortality Database, and show in a cross-validation experiment how it outperforms simpler methods. Additionally, we use this model to impute mortality rates for countries that have no mortality data for several years. Unlike other approaches, our methodology is able to estimate similarities between the mortality rates of countries, time periods and sexes, and use this information to assist with the imputations. PMID:25489353

  8. New approach to probability estimate of femoral neck fracture by fall (Slovak regression model).

    PubMed

    Wendlova, J

    2009-01-01

    3,216 Slovak women with primary or secondary osteoporosis or osteopenia, aged 20-89 years, were examined with the bone densitometer DXA (dual energy X-ray absorptiometry, GE, Prodigy - Primo), x = 58.9, 95% C.I. (58.42; 59.38). The values of the following variables for each patient were measured: FSI (femur strength index), T-score total hip left, alpha angle - left, theta angle - left, HAL (hip axis length) left, BMI (body mass index) was calculated from the height and weight of the patients. Regression model determined the following order of independent variables according to the intensity of their influence upon the occurrence of values of dependent FSI variable: 1. BMI, 2. theta angle, 3. T-score total hip, 4. alpha angle, 5. HAL. The regression model equation, calculated from the variables monitored in the study, enables a doctor in praxis to determine the probability magnitude (absolute risk) for the occurrence of pathological value of FSI (FSI < 1) in the femoral neck area, i. e., allows for probability estimate of a femoral neck fracture by fall for Slovak women. 1. The Slovak regression model differs from regression models, published until now, in chosen independent variables and a dependent variable, belonging to biomechanical variables, characterising the bone quality. 2. The Slovak regression model excludes the inaccuracies of other models, which are not able to define precisely the current and past clinical condition of tested patients (e.g., to define the length and dose of exposure to risk factors). 3. The Slovak regression model opens the way to a new method of estimating the probability (absolute risk) or the odds for a femoral neck fracture by fall, based upon the bone quality determination. 4. It is assumed that the development will proceed by improving the methods enabling to measure the bone quality, determining the probability of fracture by fall (Tab. 6, Fig. 3, Ref. 22). Full Text (Free, PDF) www.bmj.sk.

  9. Taxonomic and regional uncertainty in species-area relationships and the identification of richness hotspots

    PubMed Central

    Guilhaumon, François; Gimenez, Olivier; Gaston, Kevin J.; Mouillot, David

    2008-01-01

    Species-area relationships (SARs) are fundamental to the study of key and high-profile issues in conservation biology and are particularly widely used in establishing the broad patterns of biodiversity that underpin approaches to determining priority areas for biological conservation. Classically, the SAR has been argued in general to conform to a power-law relationship, and this form has been widely assumed in most applications in the field of conservation biology. Here, using nonlinear regressions within an information theoretical model selection framework, we included uncertainty regarding both model selection and parameter estimation in SAR modeling and conducted a global-scale analysis of the form of SARs for vascular plants and major vertebrate groups across 792 terrestrial ecoregions representing almost 97% of Earth's inhabited land. The results revealed a high level of uncertainty in model selection across biomes and taxa, and that the power-law model is clearly the most appropriate in only a minority of cases. Incorporating this uncertainty into a hotspots analysis using multimodel SARs led to the identification of a dramatically different set of global richness hotspots than when the power-law SAR was assumed. Our findings suggest that the results of analyses that assume a power-law model may be at severe odds with real ecological patterns, raising significant concerns for conservation priority-setting schemes and biogeographical studies. PMID:18832179

  10. Evaluation of earthquake potential in China

    NASA Astrophysics Data System (ADS)

    Rong, Yufang

    I present three earthquake potential estimates for magnitude 5.4 and larger earthquakes for China. The potential is expressed as the rate density (that is, the probability per unit area, magnitude and time). The three methods employ smoothed seismicity-, geologic slip rate-, and geodetic strain rate data. I test all three estimates, and another published estimate, against earthquake data. I constructed a special earthquake catalog which combines previous catalogs covering different times. I estimated moment magnitudes for some events using regression relationships that are derived in this study. I used the special catalog to construct the smoothed seismicity model and to test all models retrospectively. In all the models, I adopted a kind of Gutenberg-Richter magnitude distribution with modifications at higher magnitude. The assumed magnitude distribution depends on three parameters: a multiplicative " a-value," the slope or "b-value," and a "corner magnitude" marking a rapid decrease of earthquake rate with magnitude. I assumed the "b-value" to be constant for the whole study area and estimated the other parameters from regional or local geophysical data. The smoothed seismicity method assumes that the rate density is proportional to the magnitude of past earthquakes and declines as a negative power of the epicentral distance out to a few hundred kilometers. I derived the upper magnitude limit from the special catalog, and estimated local "a-values" from smoothed seismicity. I have begun a "prospective" test, and earthquakes since the beginning of 2000 are quite compatible with the model. For the geologic estimations, I adopted the seismic source zones that are used in the published Global Seismic Hazard Assessment Project (GSHAP) model. The zones are divided according to geological, geodetic and seismicity data. Corner magnitudes are estimated from fault length, while fault slip rates and an assumed locking depth determine earthquake rates. The geological model fits the earthquake data better than the GSHAP model. By smoothing geodetic strain rate, another potential model was constructed and tested. I derived the upper magnitude limit from the Special catalog, and assume local "a-values" proportional to geodetic strain rates. "Prospective" tests show that the geodetic strain rate model is quite compatible with earthquakes. By assuming the smoothed seismicity model as a null hypothesis, I tested every other model against it. Test results indicate that the smoothed seismicity model performs best.

  11. Introduction to statistical modelling 2: categorical variables and interactions in linear regression.

    PubMed

    Lunt, Mark

    2015-07-01

    In the first article in this series we explored the use of linear regression to predict an outcome variable from a number of predictive factors. It assumed that the predictive factors were measured on an interval scale. However, this article shows how categorical variables can also be included in a linear regression model, enabling predictions to be made separately for different groups and allowing for testing the hypothesis that the outcome differs between groups. The use of interaction terms to measure whether the effect of a particular predictor variable differs between groups is also explained. An alternative approach to testing the difference between groups of the effect of a given predictor, which consists of measuring the effect in each group separately and seeing whether the statistical significance differs between the groups, is shown to be misleading. © The Author 2013. Published by Oxford University Press on behalf of the British Society for Rheumatology. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  12. Commensurate Priors for Incorporating Historical Information in Clinical Trials Using General and Generalized Linear Models

    PubMed Central

    Hobbs, Brian P.; Sargent, Daniel J.; Carlin, Bradley P.

    2014-01-01

    Assessing between-study variability in the context of conventional random-effects meta-analysis is notoriously difficult when incorporating data from only a small number of historical studies. In order to borrow strength, historical and current data are often assumed to be fully homogeneous, but this can have drastic consequences for power and Type I error if the historical information is biased. In this paper, we propose empirical and fully Bayesian modifications of the commensurate prior model (Hobbs et al., 2011) extending Pocock (1976), and evaluate their frequentist and Bayesian properties for incorporating patient-level historical data using general and generalized linear mixed regression models. Our proposed commensurate prior models lead to preposterior admissible estimators that facilitate alternative bias-variance trade-offs than those offered by pre-existing methodologies for incorporating historical data from a small number of historical studies. We also provide a sample analysis of a colon cancer trial comparing time-to-disease progression using a Weibull regression model. PMID:24795786

  13. Nursing workload, patient safety incidents and mortality: an observational study from Finland

    PubMed Central

    Kinnunen, Marina; Saarela, Jan

    2018-01-01

    Objective To investigate whether the daily workload per nurse (Oulu Patient Classification (OPCq)/nurse) as measured by the RAFAELA system correlates with different types of patient safety incidents and with patient mortality, and to compare the results with regressions based on the standard patients/nurse measure. Setting We obtained data from 36 units from four Finnish hospitals. One was a tertiary acute care hospital, and the three others were secondary acute care hospitals. Participants Patients’ nursing intensity (249 123 classifications), nursing resources, patient safety incidents and patient mortality were collected on a daily basis during 1 year, corresponding to 12 475 data points. Associations between OPC/nurse and patient safety incidents or mortality were estimated using unadjusted logistic regression models, and models that adjusted for ward-specific effects, and effects of day of the week, holiday and season. Primary and secondary outcome measures Main outcome measures were patient safety incidents and death of a patient. Results When OPC/nurse was above the assumed optimal level, the adjusted odds for a patient safety incident were 1.24 (95% CI 1.08 to 1.42) that of the assumed optimal level, and 0.79 (95% CI 0.67 to 0.93) if it was below the assumed optimal level. Corresponding estimates for patient mortality were 1.43 (95% CI 1.18 to 1.73) and 0.78 (95% CI 0.60 to 1.00), respectively. As compared with the patients/nurse classification, models estimated on basis of the RAFAELA classification system generally provided larger effect sizes, greater statistical power and better model fit, although the difference was not very large. Net benefits as calculated on the basis of decision analysis did not provide any clear evidence on which measure to prefer. Conclusions We have demonstrated an association between daily workload per nurse and patient safety incidents and mortality. Current findings need to be replicated by future studies. PMID:29691240

  14. Covariance functions for body weight from birth to maturity in Nellore cows.

    PubMed

    Boligon, A A; Mercadante, M E Z; Forni, S; Lôbo, R B; Albuquerque, L G

    2010-03-01

    The objective of this study was to estimate (co)variance functions using random regression models on Legendre polynomials for the analysis of repeated measures of BW from birth to adult age. A total of 82,064 records from 8,145 females were analyzed. Different models were compared. The models included additive direct and maternal effects, and animal and maternal permanent environmental effects as random terms. Contemporary group and dam age at calving (linear and quadratic effect) were included as fixed effects, and orthogonal Legendre polynomials of animal age (cubic regression) were considered as random covariables. Eight models with polynomials of third to sixth order were used to describe additive direct and maternal effects, and animal and maternal permanent environmental effects. Residual effects were modeled using 1 (i.e., assuming homogeneity of variances across all ages) or 5 age classes. The model with 5 classes was the best to describe the trajectory of residuals along the growth curve. The model including fourth- and sixth-order polynomials for additive direct and animal permanent environmental effects, respectively, and third-order polynomials for maternal genetic and maternal permanent environmental effects were the best. Estimates of (co)variance obtained with the multi-trait and random regression models were similar. Direct heritability estimates obtained with the random regression models followed a trend similar to that obtained with the multi-trait model. The largest estimates of maternal heritability were those of BW taken close to 240 d of age. In general, estimates of correlation between BW from birth to 8 yr of age decreased with increasing distance between ages.

  15. Multivariate decoding of brain images using ordinal regression.

    PubMed

    Doyle, O M; Ashburner, J; Zelaya, F O; Williams, S C R; Mehta, M A; Marquand, A F

    2013-11-01

    Neuroimaging data are increasingly being used to predict potential outcomes or groupings, such as clinical severity, drug dose response, and transitional illness states. In these examples, the variable (target) we want to predict is ordinal in nature. Conventional classification schemes assume that the targets are nominal and hence ignore their ranked nature, whereas parametric and/or non-parametric regression models enforce a metric notion of distance between classes. Here, we propose a novel, alternative multivariate approach that overcomes these limitations - whole brain probabilistic ordinal regression using a Gaussian process framework. We applied this technique to two data sets of pharmacological neuroimaging data from healthy volunteers. The first study was designed to investigate the effect of ketamine on brain activity and its subsequent modulation with two compounds - lamotrigine and risperidone. The second study investigates the effect of scopolamine on cerebral blood flow and its modulation using donepezil. We compared ordinal regression to multi-class classification schemes and metric regression. Considering the modulation of ketamine with lamotrigine, we found that ordinal regression significantly outperformed multi-class classification and metric regression in terms of accuracy and mean absolute error. However, for risperidone ordinal regression significantly outperformed metric regression but performed similarly to multi-class classification both in terms of accuracy and mean absolute error. For the scopolamine data set, ordinal regression was found to outperform both multi-class and metric regression techniques considering the regional cerebral blood flow in the anterior cingulate cortex. Ordinal regression was thus the only method that performed well in all cases. Our results indicate the potential of an ordinal regression approach for neuroimaging data while providing a fully probabilistic framework with elegant approaches for model selection. Copyright © 2013. Published by Elsevier Inc.

  16. Assessment of wastewater treatment facility compliance with decreasing ammonia discharge limits using a regression tree model.

    PubMed

    Suchetana, Bihu; Rajagopalan, Balaji; Silverstein, JoAnn

    2017-11-15

    A regression tree-based diagnostic approach is developed to evaluate factors affecting US wastewater treatment plant compliance with ammonia discharge permit limits using Discharge Monthly Report (DMR) data from a sample of 106 municipal treatment plants for the period of 2004-2008. Predictor variables used to fit the regression tree are selected using random forests, and consist of the previous month's effluent ammonia, influent flow rates and plant capacity utilization. The tree models are first used to evaluate compliance with existing ammonia discharge standards at each facility and then applied assuming more stringent discharge limits, under consideration in many states. The model predicts that the ability to meet both current and future limits depends primarily on the previous month's treatment performance. With more stringent discharge limits predicted ammonia concentration relative to the discharge limit, increases. In-sample validation shows that the regression trees can provide a median classification accuracy of >70%. The regression tree model is validated using ammonia discharge data from an operating wastewater treatment plant and is able to accurately predict the observed ammonia discharge category approximately 80% of the time, indicating that the regression tree model can be applied to predict compliance for individual treatment plants providing practical guidance for utilities and regulators with an interest in controlling ammonia discharges. The proposed methodology is also used to demonstrate how to delineate reliable sources of demand and supply in a point source-to-point source nutrient credit trading scheme, as well as how planners and decision makers can set reasonable discharge limits in future. Copyright © 2017 Elsevier B.V. All rights reserved.

  17. A computer tool for a minimax criterion in binary response and heteroscedastic simple linear regression models.

    PubMed

    Casero-Alonso, V; López-Fidalgo, J; Torsney, B

    2017-01-01

    Binary response models are used in many real applications. For these models the Fisher information matrix (FIM) is proportional to the FIM of a weighted simple linear regression model. The same is also true when the weight function has a finite integral. Thus, optimal designs for one binary model are also optimal for the corresponding weighted linear regression model. The main objective of this paper is to provide a tool for the construction of MV-optimal designs, minimizing the maximum of the variances of the estimates, for a general design space. MV-optimality is a potentially difficult criterion because of its nondifferentiability at equal variance designs. A methodology for obtaining MV-optimal designs where the design space is a compact interval [a, b] will be given for several standard weight functions. The methodology will allow us to build a user-friendly computer tool based on Mathematica to compute MV-optimal designs. Some illustrative examples will show a representation of MV-optimal designs in the Euclidean plane, taking a and b as the axes. The applet will be explained using two relevant models. In the first one the case of a weighted linear regression model is considered, where the weight function is directly chosen from a typical family. In the second example a binary response model is assumed, where the probability of the outcome is given by a typical probability distribution. Practitioners can use the provided applet to identify the solution and to know the exact support points and design weights. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  18. Deriving percentage study weights in multi-parameter meta-analysis models: with application to meta-regression, network meta-analysis and one-stage individual participant data models.

    PubMed

    Riley, Richard D; Ensor, Joie; Jackson, Dan; Burke, Danielle L

    2017-01-01

    Many meta-analysis models contain multiple parameters, for example due to multiple outcomes, multiple treatments or multiple regression coefficients. In particular, meta-regression models may contain multiple study-level covariates, and one-stage individual participant data meta-analysis models may contain multiple patient-level covariates and interactions. Here, we propose how to derive percentage study weights for such situations, in order to reveal the (otherwise hidden) contribution of each study toward the parameter estimates of interest. We assume that studies are independent, and utilise a decomposition of Fisher's information matrix to decompose the total variance matrix of parameter estimates into study-specific contributions, from which percentage weights are derived. This approach generalises how percentage weights are calculated in a traditional, single parameter meta-analysis model. Application is made to one- and two-stage individual participant data meta-analyses, meta-regression and network (multivariate) meta-analysis of multiple treatments. These reveal percentage study weights toward clinically important estimates, such as summary treatment effects and treatment-covariate interactions, and are especially useful when some studies are potential outliers or at high risk of bias. We also derive percentage study weights toward methodologically interesting measures, such as the magnitude of ecological bias (difference between within-study and across-study associations) and the amount of inconsistency (difference between direct and indirect evidence in a network meta-analysis).

  19. Genetic analysis of body weights of individually fed beef bulls in South Africa using random regression models.

    PubMed

    Selapa, N W; Nephawe, K A; Maiwashe, A; Norris, D

    2012-02-08

    The aim of this study was to estimate genetic parameters for body weights of individually fed beef bulls measured at centralized testing stations in South Africa using random regression models. Weekly body weights of Bonsmara bulls (N = 2919) tested between 1999 and 2003 were available for the analyses. The model included a fixed regression of the body weights on fourth-order orthogonal Legendre polynomials of the actual days on test (7, 14, 21, 28, 35, 42, 49, 56, 63, 70, 77, and 84) for starting age and contemporary group effects. Random regressions on fourth-order orthogonal Legendre polynomials of the actual days on test were included for additive genetic effects and additional uncorrelated random effects of the weaning-herd-year and the permanent environment of the animal. Residual effects were assumed to be independently distributed with heterogeneous variance for each test day. Variance ratios for additive genetic, permanent environment and weaning-herd-year for weekly body weights at different test days ranged from 0.26 to 0.29, 0.37 to 0.44 and 0.26 to 0.34, respectively. The weaning-herd-year was found to have a significant effect on the variation of body weights of bulls despite a 28-day adjustment period. Genetic correlations amongst body weights at different test days were high, ranging from 0.89 to 1.00. Heritability estimates were comparable to literature using multivariate models. Therefore, random regression model could be applied in the genetic evaluation of body weight of individually fed beef bulls in South Africa.

  20. Bayesian shrinkage approach for a joint model of longitudinal and survival outcomes assuming different association structures.

    PubMed

    Andrinopoulou, Eleni-Rosalina; Rizopoulos, Dimitris

    2016-11-20

    The joint modeling of longitudinal and survival data has recently received much attention. Several extensions of the standard joint model that consists of one longitudinal and one survival outcome have been proposed including the use of different association structures between the longitudinal and the survival outcomes. However, in general, relatively little attention has been given to the selection of the most appropriate functional form to link the two outcomes. In common practice, it is assumed that the underlying value of the longitudinal outcome is associated with the survival outcome. However, it could be that different characteristics of the patients' longitudinal profiles influence the hazard. For example, not only the current value but also the slope or the area under the curve of the longitudinal outcome. The choice of which functional form to use is an important decision that needs to be investigated because it could influence the results. In this paper, we use a Bayesian shrinkage approach in order to determine the most appropriate functional forms. We propose a joint model that includes different association structures of different biomarkers and assume informative priors for the regression coefficients that correspond to the terms of the longitudinal process. Specifically, we assume Bayesian lasso, Bayesian ridge, Bayesian elastic net, and horseshoe. These methods are applied to a dataset consisting of patients with a chronic liver disease, where it is important to investigate which characteristics of the biomarkers have an influence on survival. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  1. Granger causality--statistical analysis under a configural perspective.

    PubMed

    von Eye, Alexander; Wiedermann, Wolfgang; Mun, Eun-Young

    2014-03-01

    The concept of Granger causality can be used to examine putative causal relations between two series of scores. Based on regression models, it is asked whether one series can be considered the cause for the second series. In this article, we propose extending the pool of methods available for testing hypotheses that are compatible with Granger causation by adopting a configural perspective. This perspective allows researchers to assume that effects exist for specific categories only or for specific sectors of the data space, but not for other categories or sectors. Configural Frequency Analysis (CFA) is proposed as the method of analysis from a configural perspective. CFA base models are derived for the exploratory analysis of Granger causation. These models are specified so that they parallel the regression models used for variable-oriented analysis of hypotheses of Granger causation. An example from the development of aggression in adolescence is used. The example shows that only one pattern of change in aggressive impulses over time Granger-causes change in physical aggression against peers.

  2. Effects of functional constraints and opportunism on the functional structure of a vertebrate predator assemblage.

    PubMed

    Farias, Ariel A; Jaksic, Fabian M

    2007-03-01

    1. Within mainstream ecological literature, functional structure has been viewed as resulting from the interplay of species interactions, resource levels and environmental variability. Classical models state that interspecific competition generates species segregation and guild formation in stable saturated environments, whereas opportunism causes species aggregation on abundant resources in variable unsaturated situations. 2. Nevertheless, intrinsic functional constraints may result in species-specific differences in resource-use capabilities. This could force some degree of functional structure without assuming other putative causes. However, the influence of such constraints has rarely been tested, and their relative contribution to observed patterns has not been quantified. 3. We used a multiple null-model approach to quantify the magnitude and direction (non-random aggregation or divergence) of the functional structure of a vertebrate predator assemblage exposed to variable prey abundance over an 18-year period. Observed trends were contrasted with predictions from null-models designed in an orthogonal fashion to account independently for the effects of functional constraints and opportunism. Subsequently, the unexplained variation was regressed against environmental variables to search for evidence of interspecific competition. 4. Overall, null-models accounting for functional constraints showed the best fit to the observed data, and suggested an effect of this factor in modulating predator opportunistic responses. However, regression models on residual variation indicated that such an effect was dependent on both total and relative abundance of principal (small mammals) and alternative (arthropods, birds, reptiles) prey categories. 5. In addition, no clear evidence for interspecific competition was found, but differential delays in predator functional responses could explain some of the unaccounted variation. Thus, we call for caution when interpreting empirical data in the context of classical models assuming synchronous responses of consumers to resource levels.

  3. Neural network modeling for surgical decisions on traumatic brain injury patients.

    PubMed

    Li, Y C; Liu, L; Chiu, W T; Jian, W S

    2000-01-01

    Computerized medical decision support systems have been a major research topic in recent years. Intelligent computer programs were implemented to aid physicians and other medical professionals in making difficult medical decisions. This report compares three different mathematical models for building a traumatic brain injury (TBI) medical decision support system (MDSS). These models were developed based on a large TBI patient database. This MDSS accepts a set of patient data such as the types of skull fracture, Glasgow Coma Scale (GCS), episode of convulsion and return the chance that a neurosurgeon would recommend an open-skull surgery for this patient. The three mathematical models described in this report including a logistic regression model, a multi-layer perceptron (MLP) neural network and a radial-basis-function (RBF) neural network. From the 12,640 patients selected from the database. A randomly drawn 9480 cases were used as the training group to develop/train our models. The other 3160 cases were in the validation group which we used to evaluate the performance of these models. We used sensitivity, specificity, areas under receiver-operating characteristics (ROC) curve and calibration curves as the indicator of how accurate these models are in predicting a neurosurgeon's decision on open-skull surgery. The results showed that, assuming equal importance of sensitivity and specificity, the logistic regression model had a (sensitivity, specificity) of (73%, 68%), compared to (80%, 80%) from the RBF model and (88%, 80%) from the MLP model. The resultant areas under ROC curve for logistic regression, RBF and MLP neural networks are 0.761, 0.880 and 0.897, respectively (P < 0.05). Among these models, the logistic regression has noticeably poorer calibration. This study demonstrated the feasibility of applying neural networks as the mechanism for TBI decision support systems based on clinical databases. The results also suggest that neural networks may be a better solution for complex, non-linear medical decision support systems than conventional statistical techniques such as logistic regression.

  4. Assessment of parametric uncertainty for groundwater reactive transport modeling,

    USGS Publications Warehouse

    Shi, Xiaoqing; Ye, Ming; Curtis, Gary P.; Miller, Geoffery L.; Meyer, Philip D.; Kohler, Matthias; Yabusaki, Steve; Wu, Jichun

    2014-01-01

    The validity of using Gaussian assumptions for model residuals in uncertainty quantification of a groundwater reactive transport model was evaluated in this study. Least squares regression methods explicitly assume Gaussian residuals, and the assumption leads to Gaussian likelihood functions, model parameters, and model predictions. While the Bayesian methods do not explicitly require the Gaussian assumption, Gaussian residuals are widely used. This paper shows that the residuals of the reactive transport model are non-Gaussian, heteroscedastic, and correlated in time; characterizing them requires using a generalized likelihood function such as the formal generalized likelihood function developed by Schoups and Vrugt (2010). For the surface complexation model considered in this study for simulating uranium reactive transport in groundwater, parametric uncertainty is quantified using the least squares regression methods and Bayesian methods with both Gaussian and formal generalized likelihood functions. While the least squares methods and Bayesian methods with Gaussian likelihood function produce similar Gaussian parameter distributions, the parameter distributions of Bayesian uncertainty quantification using the formal generalized likelihood function are non-Gaussian. In addition, predictive performance of formal generalized likelihood function is superior to that of least squares regression and Bayesian methods with Gaussian likelihood function. The Bayesian uncertainty quantification is conducted using the differential evolution adaptive metropolis (DREAM(zs)) algorithm; as a Markov chain Monte Carlo (MCMC) method, it is a robust tool for quantifying uncertainty in groundwater reactive transport models. For the surface complexation model, the regression-based local sensitivity analysis and Morris- and DREAM(ZS)-based global sensitivity analysis yield almost identical ranking of parameter importance. The uncertainty analysis may help select appropriate likelihood functions, improve model calibration, and reduce predictive uncertainty in other groundwater reactive transport and environmental modeling.

  5. Improving Global Gross Primary Productivity Estimates by Computing Optimum Light Use Efficiencies Using Flux Tower Data

    NASA Astrophysics Data System (ADS)

    Madani, Nima; Kimball, John S.; Running, Steven W.

    2017-11-01

    In the light use efficiency (LUE) approach of estimating the gross primary productivity (GPP), plant productivity is linearly related to absorbed photosynthetically active radiation assuming that plants absorb and convert solar energy into biomass within a maximum LUE (LUEmax) rate, which is assumed to vary conservatively within a given biome type. However, it has been shown that photosynthetic efficiency can vary within biomes. In this study, we used 149 global CO2 flux towers to derive the optimum LUE (LUEopt) under prevailing climate conditions for each tower location, stratified according to model training and test sites. Unlike LUEmax, LUEopt varies according to heterogeneous landscape characteristics and species traits. The LUEopt data showed large spatial variability within and between biome types, so that a simple biome classification explained only 29% of LUEopt variability over 95 global tower training sites. The use of explanatory variables in a mixed effect regression model explained 62.2% of the spatial variability in tower LUEopt data. The resulting regression model was used for global extrapolation of the LUEopt data and GPP estimation. The GPP estimated using the new LUEopt map showed significant improvement relative to global tower data, including a 15% R2 increase and 34% root-mean-square error reduction relative to baseline GPP calculations derived from biome-specific LUEmax constants. The new global LUEopt map is expected to improve the performance of LUE-based GPP algorithms for better assessment and monitoring of global terrestrial productivity and carbon dynamics.

  6. A note on variance estimation in random effects meta-regression.

    PubMed

    Sidik, Kurex; Jonkman, Jeffrey N

    2005-01-01

    For random effects meta-regression inference, variance estimation for the parameter estimates is discussed. Because estimated weights are used for meta-regression analysis in practice, the assumed or estimated covariance matrix used in meta-regression is not strictly correct, due to possible errors in estimating the weights. Therefore, this note investigates the use of a robust variance estimation approach for obtaining variances of the parameter estimates in random effects meta-regression inference. This method treats the assumed covariance matrix of the effect measure variables as a working covariance matrix. Using an example of meta-analysis data from clinical trials of a vaccine, the robust variance estimation approach is illustrated in comparison with two other methods of variance estimation. A simulation study is presented, comparing the three methods of variance estimation in terms of bias and coverage probability. We find that, despite the seeming suitability of the robust estimator for random effects meta-regression, the improved variance estimator of Knapp and Hartung (2003) yields the best performance among the three estimators, and thus may provide the best protection against errors in the estimated weights.

  7. A comparison of Cox and logistic regression for use in genome-wide association studies of cohort and case-cohort design.

    PubMed

    Staley, James R; Jones, Edmund; Kaptoge, Stephen; Butterworth, Adam S; Sweeting, Michael J; Wood, Angela M; Howson, Joanna M M

    2017-06-01

    Logistic regression is often used instead of Cox regression to analyse genome-wide association studies (GWAS) of single-nucleotide polymorphisms (SNPs) and disease outcomes with cohort and case-cohort designs, as it is less computationally expensive. Although Cox and logistic regression models have been compared previously in cohort studies, this work does not completely cover the GWAS setting nor extend to the case-cohort study design. Here, we evaluated Cox and logistic regression applied to cohort and case-cohort genetic association studies using simulated data and genetic data from the EPIC-CVD study. In the cohort setting, there was a modest improvement in power to detect SNP-disease associations using Cox regression compared with logistic regression, which increased as the disease incidence increased. In contrast, logistic regression had more power than (Prentice weighted) Cox regression in the case-cohort setting. Logistic regression yielded inflated effect estimates (assuming the hazard ratio is the underlying measure of association) for both study designs, especially for SNPs with greater effect on disease. Given logistic regression is substantially more computationally efficient than Cox regression in both settings, we propose a two-step approach to GWAS in cohort and case-cohort studies. First to analyse all SNPs with logistic regression to identify associated variants below a pre-defined P-value threshold, and second to fit Cox regression (appropriately weighted in case-cohort studies) to those identified SNPs to ensure accurate estimation of association with disease.

  8. Empirical Likelihood in Nonignorable Covariate-Missing Data Problems.

    PubMed

    Xie, Yanmei; Zhang, Biao

    2017-04-20

    Missing covariate data occurs often in regression analysis, which frequently arises in the health and social sciences as well as in survey sampling. We study methods for the analysis of a nonignorable covariate-missing data problem in an assumed conditional mean function when some covariates are completely observed but other covariates are missing for some subjects. We adopt the semiparametric perspective of Bartlett et al. (Improving upon the efficiency of complete case analysis when covariates are MNAR. Biostatistics 2014;15:719-30) on regression analyses with nonignorable missing covariates, in which they have introduced the use of two working models, the working probability model of missingness and the working conditional score model. In this paper, we study an empirical likelihood approach to nonignorable covariate-missing data problems with the objective of effectively utilizing the two working models in the analysis of covariate-missing data. We propose a unified approach to constructing a system of unbiased estimating equations, where there are more equations than unknown parameters of interest. One useful feature of these unbiased estimating equations is that they naturally incorporate the incomplete data into the data analysis, making it possible to seek efficient estimation of the parameter of interest even when the working regression function is not specified to be the optimal regression function. We apply the general methodology of empirical likelihood to optimally combine these unbiased estimating equations. We propose three maximum empirical likelihood estimators of the underlying regression parameters and compare their efficiencies with other existing competitors. We present a simulation study to compare the finite-sample performance of various methods with respect to bias, efficiency, and robustness to model misspecification. The proposed empirical likelihood method is also illustrated by an analysis of a data set from the US National Health and Nutrition Examination Survey (NHANES).

  9. Computing group cardinality constraint solutions for logistic regression problems.

    PubMed

    Zhang, Yong; Kwon, Dongjin; Pohl, Kilian M

    2017-01-01

    We derive an algorithm to directly solve logistic regression based on cardinality constraint, group sparsity and use it to classify intra-subject MRI sequences (e.g. cine MRIs) of healthy from diseased subjects. Group cardinality constraint models are often applied to medical images in order to avoid overfitting of the classifier to the training data. Solutions within these models are generally determined by relaxing the cardinality constraint to a weighted feature selection scheme. However, these solutions relate to the original sparse problem only under specific assumptions, which generally do not hold for medical image applications. In addition, inferring clinical meaning from features weighted by a classifier is an ongoing topic of discussion. Avoiding weighing features, we propose to directly solve the group cardinality constraint logistic regression problem by generalizing the Penalty Decomposition method. To do so, we assume that an intra-subject series of images represents repeated samples of the same disease patterns. We model this assumption by combining series of measurements created by a feature across time into a single group. Our algorithm then derives a solution within that model by decoupling the minimization of the logistic regression function from enforcing the group sparsity constraint. The minimum to the smooth and convex logistic regression problem is determined via gradient descent while we derive a closed form solution for finding a sparse approximation of that minimum. We apply our method to cine MRI of 38 healthy controls and 44 adult patients that received reconstructive surgery of Tetralogy of Fallot (TOF) during infancy. Our method correctly identifies regions impacted by TOF and generally obtains statistically significant higher classification accuracy than alternative solutions to this model, i.e., ones relaxing group cardinality constraints. Copyright © 2016 Elsevier B.V. All rights reserved.

  10. ASR4. Anelastic Strain Recovery Analysis Code

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Warpinski, N.R.

    ASR4 is a nonlinear least-squares regression of Anelastic Strain Recovery (ASR) data for the purpose of determining in situ stress orientations and magnitudes. ASR4 fits the viscoelastic model of Warpinski and Teufel to measure ASR data, calculates the stress orientations directly, and stress magnitudes if sufficient input data are available. The code also calculates the stress orientation using strain-rosette equations, and it calculates stress magnitudes using Blanton`s approach, assuming sufficient input data are available.

  11. Anelastic Strain Recovery Analysis Code

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    ASR4 is a nonlinear least-squares regression of Anelastic Strain Recovery (ASR) data for the purpose of determining in situ stress orientations and magnitudes. ASR4 fits the viscoelastic model of Warpinski and Teufel to measure ASR data, calculates the stress orientations directly, and stress magnitudes if sufficient input data are available. The code also calculates the stress orientation using strain-rosette equations, and it calculates stress magnitudes using Blanton''s approach, assuming sufficient input data are available.

  12. Logistic Regression with Multiple Random Effects: A Simulation Study of Estimation Methods and Statistical Packages.

    PubMed

    Kim, Yoonsang; Choi, Young-Ku; Emery, Sherry

    2013-08-01

    Several statistical packages are capable of estimating generalized linear mixed models and these packages provide one or more of three estimation methods: penalized quasi-likelihood, Laplace, and Gauss-Hermite. Many studies have investigated these methods' performance for the mixed-effects logistic regression model. However, the authors focused on models with one or two random effects and assumed a simple covariance structure between them, which may not be realistic. When there are multiple correlated random effects in a model, the computation becomes intensive, and often an algorithm fails to converge. Moreover, in our analysis of smoking status and exposure to anti-tobacco advertisements, we have observed that when a model included multiple random effects, parameter estimates varied considerably from one statistical package to another even when using the same estimation method. This article presents a comprehensive review of the advantages and disadvantages of each estimation method. In addition, we compare the performances of the three methods across statistical packages via simulation, which involves two- and three-level logistic regression models with at least three correlated random effects. We apply our findings to a real dataset. Our results suggest that two packages-SAS GLIMMIX Laplace and SuperMix Gaussian quadrature-perform well in terms of accuracy, precision, convergence rates, and computing speed. We also discuss the strengths and weaknesses of the two packages in regard to sample sizes.

  13. Logistic Regression with Multiple Random Effects: A Simulation Study of Estimation Methods and Statistical Packages

    PubMed Central

    Kim, Yoonsang; Emery, Sherry

    2013-01-01

    Several statistical packages are capable of estimating generalized linear mixed models and these packages provide one or more of three estimation methods: penalized quasi-likelihood, Laplace, and Gauss-Hermite. Many studies have investigated these methods’ performance for the mixed-effects logistic regression model. However, the authors focused on models with one or two random effects and assumed a simple covariance structure between them, which may not be realistic. When there are multiple correlated random effects in a model, the computation becomes intensive, and often an algorithm fails to converge. Moreover, in our analysis of smoking status and exposure to anti-tobacco advertisements, we have observed that when a model included multiple random effects, parameter estimates varied considerably from one statistical package to another even when using the same estimation method. This article presents a comprehensive review of the advantages and disadvantages of each estimation method. In addition, we compare the performances of the three methods across statistical packages via simulation, which involves two- and three-level logistic regression models with at least three correlated random effects. We apply our findings to a real dataset. Our results suggest that two packages—SAS GLIMMIX Laplace and SuperMix Gaussian quadrature—perform well in terms of accuracy, precision, convergence rates, and computing speed. We also discuss the strengths and weaknesses of the two packages in regard to sample sizes. PMID:24288415

  14. MIXREG: a computer program for mixed-effects regression analysis with autocorrelated errors.

    PubMed

    Hedeker, D; Gibbons, R D

    1996-05-01

    MIXREG is a program that provides estimates for a mixed-effects regression model (MRM) for normally-distributed response data including autocorrelated errors. This model can be used for analysis of unbalanced longitudinal data, where individuals may be measured at a different number of timepoints, or even at different timepoints. Autocorrelated errors of a general form or following an AR(1), MA(1), or ARMA(1,1) form are allowable. This model can also be used for analysis of clustered data, where the mixed-effects model assumes data within clusters are dependent. The degree of dependency is estimated jointly with estimates of the usual model parameters, thus adjusting for clustering. MIXREG uses maximum marginal likelihood estimation, utilizing both the EM algorithm and a Fisher-scoring solution. For the scoring solution, the covariance matrix of the random effects is expressed in its Gaussian decomposition, and the diagonal matrix reparameterized using the exponential transformation. Estimation of the individual random effects is accomplished using an empirical Bayes approach. Examples illustrating usage and features of MIXREG are provided.

  15. Atmospheric mold spore counts in relation to meteorological parameters

    NASA Astrophysics Data System (ADS)

    Katial, R. K.; Zhang, Yiming; Jones, Richard H.; Dyer, Philip D.

    Fungal spore counts of Cladosporium, Alternaria, and Epicoccum were studied during 8 years in Denver, Colorado. Fungal spore counts were obtained daily during the pollinating season by a Rotorod sampler. Weather data were obtained from the National Climatic Data Center. Daily averages of temperature, relative humidity, daily precipitation, barometric pressure, and wind speed were studied. A time series analysis was performed on the data to mathematically model the spore counts in relation to weather parameters. Using SAS PROC ARIMA software, a regression analysis was performed, regressing the spore counts on the weather variables assuming an autoregressive moving average (ARMA) error structure. Cladosporium was found to be positively correlated (P<0.02) with average daily temperature, relative humidity, and negatively correlated with precipitation. Alternaria and Epicoccum did not show increased predictability with weather variables. A mathematical model was derived for Cladosporium spore counts using the annual seasonal cycle and significant weather variables. The model for Alternaria and Epicoccum incorporated the annual seasonal cycle. Fungal spore counts can be modeled by time series analysis and related to meteorological parameters controlling for seasonallity; this modeling can provide estimates of exposure to fungal aeroallergens.

  16. Monthly streamflow forecasting based on hidden Markov model and Gaussian Mixture Regression

    NASA Astrophysics Data System (ADS)

    Liu, Yongqi; Ye, Lei; Qin, Hui; Hong, Xiaofeng; Ye, Jiajun; Yin, Xingli

    2018-06-01

    Reliable streamflow forecasts can be highly valuable for water resources planning and management. In this study, we combined a hidden Markov model (HMM) and Gaussian Mixture Regression (GMR) for probabilistic monthly streamflow forecasting. The HMM is initialized using a kernelized K-medoids clustering method, and the Baum-Welch algorithm is then executed to learn the model parameters. GMR derives a conditional probability distribution for the predictand given covariate information, including the antecedent flow at a local station and two surrounding stations. The performance of HMM-GMR was verified based on the mean square error and continuous ranked probability score skill scores. The reliability of the forecasts was assessed by examining the uniformity of the probability integral transform values. The results show that HMM-GMR obtained reasonably high skill scores and the uncertainty spread was appropriate. Different HMM states were assumed to be different climate conditions, which would lead to different types of observed values. We demonstrated that the HMM-GMR approach can handle multimodal and heteroscedastic data.

  17. Generalized Scalar-on-Image Regression Models via Total Variation.

    PubMed

    Wang, Xiao; Zhu, Hongtu

    2017-01-01

    The use of imaging markers to predict clinical outcomes can have a great impact in public health. The aim of this paper is to develop a class of generalized scalar-on-image regression models via total variation (GSIRM-TV), in the sense of generalized linear models, for scalar response and imaging predictor with the presence of scalar covariates. A key novelty of GSIRM-TV is that it is assumed that the slope function (or image) of GSIRM-TV belongs to the space of bounded total variation in order to explicitly account for the piecewise smooth nature of most imaging data. We develop an efficient penalized total variation optimization to estimate the unknown slope function and other parameters. We also establish nonasymptotic error bounds on the excess risk. These bounds are explicitly specified in terms of sample size, image size, and image smoothness. Our simulations demonstrate a superior performance of GSIRM-TV against many existing approaches. We apply GSIRM-TV to the analysis of hippocampus data obtained from the Alzheimers Disease Neuroimaging Initiative (ADNI) dataset.

  18. Big Data Toolsets to Pharmacometrics: Application of Machine Learning for Time‐to‐Event Analysis

    PubMed Central

    Gong, Xiajing; Hu, Meng

    2018-01-01

    Abstract Additional value can be potentially created by applying big data tools to address pharmacometric problems. The performances of machine learning (ML) methods and the Cox regression model were evaluated based on simulated time‐to‐event data synthesized under various preset scenarios, i.e., with linear vs. nonlinear and dependent vs. independent predictors in the proportional hazard function, or with high‐dimensional data featured by a large number of predictor variables. Our results showed that ML‐based methods outperformed the Cox model in prediction performance as assessed by concordance index and in identifying the preset influential variables for high‐dimensional data. The prediction performances of ML‐based methods are also less sensitive to data size and censoring rates than the Cox regression model. In conclusion, ML‐based methods provide a powerful tool for time‐to‐event analysis, with a built‐in capacity for high‐dimensional data and better performance when the predictor variables assume nonlinear relationships in the hazard function. PMID:29536640

  19. The PIT-trap-A "model-free" bootstrap procedure for inference about regression models with discrete, multivariate responses.

    PubMed

    Warton, David I; Thibaut, Loïc; Wang, Yi Alice

    2017-01-01

    Bootstrap methods are widely used in statistics, and bootstrapping of residuals can be especially useful in the regression context. However, difficulties are encountered extending residual resampling to regression settings where residuals are not identically distributed (thus not amenable to bootstrapping)-common examples including logistic or Poisson regression and generalizations to handle clustered or multivariate data, such as generalised estimating equations. We propose a bootstrap method based on probability integral transform (PIT-) residuals, which we call the PIT-trap, which assumes data come from some marginal distribution F of known parametric form. This method can be understood as a type of "model-free bootstrap", adapted to the problem of discrete and highly multivariate data. PIT-residuals have the key property that they are (asymptotically) pivotal. The PIT-trap thus inherits the key property, not afforded by any other residual resampling approach, that the marginal distribution of data can be preserved under PIT-trapping. This in turn enables the derivation of some standard bootstrap properties, including second-order correctness of pivotal PIT-trap test statistics. In multivariate data, bootstrapping rows of PIT-residuals affords the property that it preserves correlation in data without the need for it to be modelled, a key point of difference as compared to a parametric bootstrap. The proposed method is illustrated on an example involving multivariate abundance data in ecology, and demonstrated via simulation to have improved properties as compared to competing resampling methods.

  20. The PIT-trap—A “model-free” bootstrap procedure for inference about regression models with discrete, multivariate responses

    PubMed Central

    Thibaut, Loïc; Wang, Yi Alice

    2017-01-01

    Bootstrap methods are widely used in statistics, and bootstrapping of residuals can be especially useful in the regression context. However, difficulties are encountered extending residual resampling to regression settings where residuals are not identically distributed (thus not amenable to bootstrapping)—common examples including logistic or Poisson regression and generalizations to handle clustered or multivariate data, such as generalised estimating equations. We propose a bootstrap method based on probability integral transform (PIT-) residuals, which we call the PIT-trap, which assumes data come from some marginal distribution F of known parametric form. This method can be understood as a type of “model-free bootstrap”, adapted to the problem of discrete and highly multivariate data. PIT-residuals have the key property that they are (asymptotically) pivotal. The PIT-trap thus inherits the key property, not afforded by any other residual resampling approach, that the marginal distribution of data can be preserved under PIT-trapping. This in turn enables the derivation of some standard bootstrap properties, including second-order correctness of pivotal PIT-trap test statistics. In multivariate data, bootstrapping rows of PIT-residuals affords the property that it preserves correlation in data without the need for it to be modelled, a key point of difference as compared to a parametric bootstrap. The proposed method is illustrated on an example involving multivariate abundance data in ecology, and demonstrated via simulation to have improved properties as compared to competing resampling methods. PMID:28738071

  1. Association of Brain-Derived Neurotrophic Factor and Vitamin D with Depression and Obesity: A Population-Based Study.

    PubMed

    Goltz, Annemarie; Janowitz, Deborah; Hannemann, Anke; Nauck, Matthias; Hoffmann, Johanna; Seyfart, Tom; Völzke, Henry; Terock, Jan; Grabe, Hans Jörgen

    2018-06-19

    Depression and obesity are widespread and closely linked. Brain-derived neurotrophic factor (BDNF) and vitamin D are both assumed to be associated with depression and obesity. Little is known about the interplay between vitamin D and BDNF. We explored the putative associations and interactions between serum BDNF and vitamin D levels with depressive symptoms and abdominal obesity in a large population-based cohort. Data were obtained from the population-based Study of Health in Pomerania (SHIP)-Trend (n = 3,926). The associations of serum BDNF and vitamin D levels with depressive symptoms (measured using the Patient Health Questionnaire) were assessed with binary and multinomial logistic regression models. The associations of serum BDNF and vitamin D levels with obesity (measured by the waist-to-hip ratio [WHR]) were assessed with binary logistic and linear regression models with restricted cubic splines. Logistic regression models revealed inverse associations of vitamin D with depression (OR = 0.966; 95% CI 0.951-0.981) and obesity (OR = 0.976; 95% CI 0.967-0.985). No linear association of serum BDNF with depression or obesity was found. However, linear regression models revealed a U-shaped association of BDNF with WHR (p < 0.001). Vitamin D was inversely associated with depression and obesity. BDNF was associated with abdominal obesity, but not with depression. At the population level, our results support the relevant roles of vitamin D and BDNF in mental and physical health-related outcomes. © 2018 S. Karger AG, Basel.

  2. Modelling long-term fire occurrence factors in Spain by accounting for local variations with geographically weighted regression

    NASA Astrophysics Data System (ADS)

    Martínez-Fernández, J.; Chuvieco, E.; Koutsias, N.

    2013-02-01

    Humans are responsible for most forest fires in Europe, but anthropogenic factors behind these events are still poorly understood. We tried to identify the driving factors of human-caused fire occurrence in Spain by applying two different statistical approaches. Firstly, assuming stationary processes for the whole country, we created models based on multiple linear regression and binary logistic regression to find factors associated with fire density and fire presence, respectively. Secondly, we used geographically weighted regression (GWR) to better understand and explore the local and regional variations of those factors behind human-caused fire occurrence. The number of human-caused fires occurring within a 25-yr period (1983-2007) was computed for each of the 7638 Spanish mainland municipalities, creating a binary variable (fire/no fire) to develop logistic models, and a continuous variable (fire density) to build standard linear regression models. A total of 383 657 fires were registered in the study dataset. The binary logistic model, which estimates the probability of having/not having a fire, successfully classified 76.4% of the total observations, while the ordinary least squares (OLS) regression model explained 53% of the variation of the fire density patterns (adjusted R2 = 0.53). Both approaches confirmed, in addition to forest and climatic variables, the importance of variables related with agrarian activities, land abandonment, rural population exodus and developmental processes as underlying factors of fire occurrence. For the GWR approach, the explanatory power of the GW linear model for fire density using an adaptive bandwidth increased from 53% to 67%, while for the GW logistic model the correctly classified observations improved only slightly, from 76.4% to 78.4%, but significantly according to the corrected Akaike Information Criterion (AICc), from 3451.19 to 3321.19. The results from GWR indicated a significant spatial variation in the local parameter estimates for all the variables and an important reduction of the autocorrelation in the residuals of the GW linear model. Despite the fitting improvement of local models, GW regression, more than an alternative to "global" or traditional regression modelling, seems to be a valuable complement to explore the non-stationary relationships between the response variable and the explanatory variables. The synergy of global and local modelling provides insights into fire management and policy and helps further our understanding of the fire problem over large areas while at the same time recognizing its local character.

  3. Using Structured Additive Regression Models to Estimate Risk Factors of Malaria: Analysis of 2010 Malawi Malaria Indicator Survey Data

    PubMed Central

    Chirombo, James; Lowe, Rachel; Kazembe, Lawrence

    2014-01-01

    Background After years of implementing Roll Back Malaria (RBM) interventions, the changing landscape of malaria in terms of risk factors and spatial pattern has not been fully investigated. This paper uses the 2010 malaria indicator survey data to investigate if known malaria risk factors remain relevant after many years of interventions. Methods We adopted a structured additive logistic regression model that allowed for spatial correlation, to more realistically estimate malaria risk factors. Our model included child and household level covariates, as well as climatic and environmental factors. Continuous variables were modelled by assuming second order random walk priors, while spatial correlation was specified as a Markov random field prior, with fixed effects assigned diffuse priors. Inference was fully Bayesian resulting in an under five malaria risk map for Malawi. Results Malaria risk increased with increasing age of the child. With respect to socio-economic factors, the greater the household wealth, the lower the malaria prevalence. A general decline in malaria risk was observed as altitude increased. Minimum temperatures and average total rainfall in the three months preceding the survey did not show a strong association with disease risk. Conclusions The structured additive regression model offered a flexible extension to standard regression models by enabling simultaneous modelling of possible nonlinear effects of continuous covariates, spatial correlation and heterogeneity, while estimating usual fixed effects of categorical and continuous observed variables. Our results confirmed that malaria epidemiology is a complex interaction of biotic and abiotic factors, both at the individual, household and community level and that risk factors are still relevant many years after extensive implementation of RBM activities. PMID:24991915

  4. The Local Food Environment and Fruit and Vegetable Intake: A Geographically Weighted Regression Approach in the ORiEL Study.

    PubMed

    Clary, Christelle; Lewis, Daniel J; Flint, Ellen; Smith, Neil R; Kestens, Yan; Cummins, Steven

    2016-12-01

    Studies that explore associations between the local food environment and diet routinely use global regression models, which assume that relationships are invariant across space, yet such stationarity assumptions have been little tested. We used global and geographically weighted regression models to explore associations between the residential food environment and fruit and vegetable intake. Analyses were performed in 4 boroughs of London, United Kingdom, using data collected between April 2012 and July 2012 from 969 adults in the Olympic Regeneration in East London Study. Exposures were assessed both as absolute densities of healthy and unhealthy outlets, taken separately, and as a relative measure (proportion of total outlets classified as healthy). Overall, local models performed better than global models (lower Akaike information criterion). Locally estimated coefficients varied across space, regardless of the type of exposure measure, although changes of sign were observed only when absolute measures were used. Despite findings from global models showing significant associations between the relative measure and fruit and vegetable intake (β = 0.022; P < 0.01) only, geographically weighted regression models using absolute measures outperformed models using relative measures. This study suggests that greater attention should be given to nonstationary relationships between the food environment and diet. It further challenges the idea that a single measure of exposure, whether relative or absolute, can reflect the many ways the food environment may shape health behaviors. © The Author 2016. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  5. Using structured additive regression models to estimate risk factors of malaria: analysis of 2010 Malawi malaria indicator survey data.

    PubMed

    Chirombo, James; Lowe, Rachel; Kazembe, Lawrence

    2014-01-01

    After years of implementing Roll Back Malaria (RBM) interventions, the changing landscape of malaria in terms of risk factors and spatial pattern has not been fully investigated. This paper uses the 2010 malaria indicator survey data to investigate if known malaria risk factors remain relevant after many years of interventions. We adopted a structured additive logistic regression model that allowed for spatial correlation, to more realistically estimate malaria risk factors. Our model included child and household level covariates, as well as climatic and environmental factors. Continuous variables were modelled by assuming second order random walk priors, while spatial correlation was specified as a Markov random field prior, with fixed effects assigned diffuse priors. Inference was fully Bayesian resulting in an under five malaria risk map for Malawi. Malaria risk increased with increasing age of the child. With respect to socio-economic factors, the greater the household wealth, the lower the malaria prevalence. A general decline in malaria risk was observed as altitude increased. Minimum temperatures and average total rainfall in the three months preceding the survey did not show a strong association with disease risk. The structured additive regression model offered a flexible extension to standard regression models by enabling simultaneous modelling of possible nonlinear effects of continuous covariates, spatial correlation and heterogeneity, while estimating usual fixed effects of categorical and continuous observed variables. Our results confirmed that malaria epidemiology is a complex interaction of biotic and abiotic factors, both at the individual, household and community level and that risk factors are still relevant many years after extensive implementation of RBM activities.

  6. Trend Estimation and Regression Analysis in Climatological Time Series: An Application of Structural Time Series Models and the Kalman Filter.

    NASA Astrophysics Data System (ADS)

    Visser, H.; Molenaar, J.

    1995-05-01

    The detection of trends in climatological data has become central to the discussion on climate change due to the enhanced greenhouse effect. To prove detection, a method is needed (i) to make inferences on significant rises or declines in trends, (ii) to take into account natural variability in climate series, and (iii) to compare output from GCMs with the trends in observed climate data. To meet these requirements, flexible mathematical tools are needed. A structural time series model is proposed with which a stochastic trend, a deterministic trend, and regression coefficients can be estimated simultaneously. The stochastic trend component is described using the class of ARIMA models. The regression component is assumed to be linear. However, the regression coefficients corresponding with the explanatory variables may be time dependent to validate this assumption. The mathematical technique used to estimate this trend-regression model is the Kaiman filter. The main features of the filter are discussed.Examples of trend estimation are given using annual mean temperatures at a single station in the Netherlands (1706-1990) and annual mean temperatures at Northern Hemisphere land stations (1851-1990). The inclusion of explanatory variables is shown by regressing the latter temperature series on four variables: Southern Oscillation index (SOI), volcanic dust index (VDI), sunspot numbers (SSN), and a simulated temperature signal, induced by increasing greenhouse gases (GHG). In all analyses, the influence of SSN on global temperatures is found to be negligible. The correlations between temperatures and SOI and VDI appear to be negative. For SOI, this correlation is significant, but for VDI it is not, probably because of a lack of volcanic eruptions during the sample period. The relation between temperatures and GHG is positive, which is in agreement with the hypothesis of a warming climate because of increasing levels of greenhouse gases. The prediction performance of the model is rather poor, and possible explanations are discussed.

  7. Upscaling NZ-DNDC using a regression based meta-model to estimate direct N2O emissions from New Zealand grazed pastures.

    PubMed

    Giltrap, Donna L; Ausseil, Anne-Gaëlle E

    2016-01-01

    The availability of detailed input data frequently limits the application of process-based models at large scale. In this study, we produced simplified meta-models of the simulated nitrous oxide (N2O) emission factors (EF) using NZ-DNDC. Monte Carlo simulations were performed and the results investigated using multiple regression analysis to produce simplified meta-models of EF. These meta-models were then used to estimate direct N2O emissions from grazed pastures in New Zealand. New Zealand EF maps were generated using the meta-models with data from national scale soil maps. Direct emissions of N2O from grazed pasture were calculated by multiplying the EF map with a nitrogen (N) input map. Three meta-models were considered. Model 1 included only the soil organic carbon in the top 30cm (SOC30), Model 2 also included a clay content factor, and Model 3 added the interaction between SOC30 and clay. The median annual national direct N2O emissions from grazed pastures estimated using each model (assuming model errors were purely random) were: 9.6GgN (Model 1), 13.6GgN (Model 2), and 11.9GgN (Model 3). These values corresponded to an average EF of 0.53%, 0.75% and 0.63% respectively, while the corresponding average EF using New Zealand national inventory values was 0.67%. If the model error can be assumed to be independent for each pixel then the 95% confidence interval for the N2O emissions was of the order of ±0.4-0.7%, which is much lower than existing methods. However, spatial correlations in the model errors could invalidate this assumption. Under the extreme assumption that the model error for each pixel was identical the 95% confidence interval was approximately ±100-200%. Therefore further work is needed to assess the degree of spatial correlation in the model errors. Copyright © 2015 Elsevier B.V. All rights reserved.

  8. Inferring epidemiological parameters from phylogenies using regression-ABC: A comparative study

    PubMed Central

    Gascuel, Olivier

    2017-01-01

    Inferring epidemiological parameters such as the R0 from time-scaled phylogenies is a timely challenge. Most current approaches rely on likelihood functions, which raise specific issues that range from computing these functions to finding their maxima numerically. Here, we present a new regression-based Approximate Bayesian Computation (ABC) approach, which we base on a large variety of summary statistics intended to capture the information contained in the phylogeny and its corresponding lineage-through-time plot. The regression step involves the Least Absolute Shrinkage and Selection Operator (LASSO) method, which is a robust machine learning technique. It allows us to readily deal with the large number of summary statistics, while avoiding resorting to Markov Chain Monte Carlo (MCMC) techniques. To compare our approach to existing ones, we simulated target trees under a variety of epidemiological models and settings, and inferred parameters of interest using the same priors. We found that, for large phylogenies, the accuracy of our regression-ABC is comparable to that of likelihood-based approaches involving birth-death processes implemented in BEAST2. Our approach even outperformed these when inferring the host population size with a Susceptible-Infected-Removed epidemiological model. It also clearly outperformed a recent kernel-ABC approach when assuming a Susceptible-Infected epidemiological model with two host types. Lastly, by re-analyzing data from the early stages of the recent Ebola epidemic in Sierra Leone, we showed that regression-ABC provides more realistic estimates for the duration parameters (latency and infectiousness) than the likelihood-based method. Overall, ABC based on a large variety of summary statistics and a regression method able to perform variable selection and avoid overfitting is a promising approach to analyze large phylogenies. PMID:28263987

  9. A Bayesian goodness of fit test and semiparametric generalization of logistic regression with measurement data.

    PubMed

    Schörgendorfer, Angela; Branscum, Adam J; Hanson, Timothy E

    2013-06-01

    Logistic regression is a popular tool for risk analysis in medical and population health science. With continuous response data, it is common to create a dichotomous outcome for logistic regression analysis by specifying a threshold for positivity. Fitting a linear regression to the nondichotomized response variable assuming a logistic sampling model for the data has been empirically shown to yield more efficient estimates of odds ratios than ordinary logistic regression of the dichotomized endpoint. We illustrate that risk inference is not robust to departures from the parametric logistic distribution. Moreover, the model assumption of proportional odds is generally not satisfied when the condition of a logistic distribution for the data is violated, leading to biased inference from a parametric logistic analysis. We develop novel Bayesian semiparametric methodology for testing goodness of fit of parametric logistic regression with continuous measurement data. The testing procedures hold for any cutoff threshold and our approach simultaneously provides the ability to perform semiparametric risk estimation. Bayes factors are calculated using the Savage-Dickey ratio for testing the null hypothesis of logistic regression versus a semiparametric generalization. We propose a fully Bayesian and a computationally efficient empirical Bayesian approach to testing, and we present methods for semiparametric estimation of risks, relative risks, and odds ratios when parametric logistic regression fails. Theoretical results establish the consistency of the empirical Bayes test. Results from simulated data show that the proposed approach provides accurate inference irrespective of whether parametric assumptions hold or not. Evaluation of risk factors for obesity shows that different inferences are derived from an analysis of a real data set when deviations from a logistic distribution are permissible in a flexible semiparametric framework. © 2013, The International Biometric Society.

  10. The use of auxiliary variables in capture-recapture and removal experiments

    USGS Publications Warehouse

    Pollock, K.H.; Hines, J.E.; Nichols, J.D.

    1984-01-01

    The dependence of animal capture probabilities on auxiliary variables is an important practical problem which has not been considered in the development of estimation procedures for capture-recapture and removal experiments. In this paper the linear logistic binary regression model is used to relate the probability of capture to continuous auxiliary variables. The auxiliary variables could be environmental quantities such as air or water temperature, or characteristics of individual animals, such as body length or weight. Maximum likelihood estimators of the population parameters are considered for a variety of models which all assume a closed population. Testing between models is also considered. The models can also be used when one auxiliary variable is a measure of the effort expended in obtaining the sample.

  11. A Tutorial on Multilevel Survival Analysis: Methods, Models and Applications

    PubMed Central

    Austin, Peter C.

    2017-01-01

    Summary Data that have a multilevel structure occur frequently across a range of disciplines, including epidemiology, health services research, public health, education and sociology. We describe three families of regression models for the analysis of multilevel survival data. First, Cox proportional hazards models with mixed effects incorporate cluster-specific random effects that modify the baseline hazard function. Second, piecewise exponential survival models partition the duration of follow-up into mutually exclusive intervals and fit a model that assumes that the hazard function is constant within each interval. This is equivalent to a Poisson regression model that incorporates the duration of exposure within each interval. By incorporating cluster-specific random effects, generalised linear mixed models can be used to analyse these data. Third, after partitioning the duration of follow-up into mutually exclusive intervals, one can use discrete time survival models that use a complementary log–log generalised linear model to model the occurrence of the outcome of interest within each interval. Random effects can be incorporated to account for within-cluster homogeneity in outcomes. We illustrate the application of these methods using data consisting of patients hospitalised with a heart attack. We illustrate the application of these methods using three statistical programming languages (R, SAS and Stata). PMID:29307954

  12. A Tutorial on Multilevel Survival Analysis: Methods, Models and Applications.

    PubMed

    Austin, Peter C

    2017-08-01

    Data that have a multilevel structure occur frequently across a range of disciplines, including epidemiology, health services research, public health, education and sociology. We describe three families of regression models for the analysis of multilevel survival data. First, Cox proportional hazards models with mixed effects incorporate cluster-specific random effects that modify the baseline hazard function. Second, piecewise exponential survival models partition the duration of follow-up into mutually exclusive intervals and fit a model that assumes that the hazard function is constant within each interval. This is equivalent to a Poisson regression model that incorporates the duration of exposure within each interval. By incorporating cluster-specific random effects, generalised linear mixed models can be used to analyse these data. Third, after partitioning the duration of follow-up into mutually exclusive intervals, one can use discrete time survival models that use a complementary log-log generalised linear model to model the occurrence of the outcome of interest within each interval. Random effects can be incorporated to account for within-cluster homogeneity in outcomes. We illustrate the application of these methods using data consisting of patients hospitalised with a heart attack. We illustrate the application of these methods using three statistical programming languages (R, SAS and Stata).

  13. High dimensional linear regression models under long memory dependence and measurement error

    NASA Astrophysics Data System (ADS)

    Kaul, Abhishek

    This dissertation consists of three chapters. The first chapter introduces the models under consideration and motivates problems of interest. A brief literature review is also provided in this chapter. The second chapter investigates the properties of Lasso under long range dependent model errors. Lasso is a computationally efficient approach to model selection and estimation, and its properties are well studied when the regression errors are independent and identically distributed. We study the case, where the regression errors form a long memory moving average process. We establish a finite sample oracle inequality for the Lasso solution. We then show the asymptotic sign consistency in this setup. These results are established in the high dimensional setup (p> n) where p can be increasing exponentially with n. Finally, we show the consistency, n½ --d-consistency of Lasso, along with the oracle property of adaptive Lasso, in the case where p is fixed. Here d is the memory parameter of the stationary error sequence. The performance of Lasso is also analysed in the present setup with a simulation study. The third chapter proposes and investigates the properties of a penalized quantile based estimator for measurement error models. Standard formulations of prediction problems in high dimension regression models assume the availability of fully observed covariates and sub-Gaussian and homogeneous model errors. This makes these methods inapplicable to measurement errors models where covariates are unobservable and observations are possibly non sub-Gaussian and heterogeneous. We propose weighted penalized corrected quantile estimators for the regression parameter vector in linear regression models with additive measurement errors, where unobservable covariates are nonrandom. The proposed estimators forgo the need for the above mentioned model assumptions. We study these estimators in both the fixed dimension and high dimensional sparse setups, in the latter setup, the dimensionality can grow exponentially with the sample size. In the fixed dimensional setting we provide the oracle properties associated with the proposed estimators. In the high dimensional setting, we provide bounds for the statistical error associated with the estimation, that hold with asymptotic probability 1, thereby providing the ℓ1-consistency of the proposed estimator. We also establish the model selection consistency in terms of the correctly estimated zero components of the parameter vector. A simulation study that investigates the finite sample accuracy of the proposed estimator is also included in this chapter.

  14. An open-access CMIP5 pattern library for temperature and precipitation: Description and methodology

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lynch, Cary D.; Hartin, Corinne A.; Bond-Lamberty, Benjamin

    Pattern scaling is used to efficiently emulate general circulation models and explore uncertainty in climate projections under multiple forcing scenarios. Pattern scaling methods assume that local climate changes scale with a global mean temperature increase, allowing for spatial patterns to be generated for multiple models for any future emission scenario. For uncertainty quantification and probabilistic statistical analysis, a library of patterns with descriptive statistics for each file would be beneficial, but such a library does not presently exist. Of the possible techniques used to generate patterns, the two most prominent are the delta and least squared regression methods. We exploremore » the differences and statistical significance between patterns generated by each method and assess performance of the generated patterns across methods and scenarios. Differences in patterns across seasons between methods and epochs were largest in high latitudes (60-90°N/S). Bias and mean errors between modeled and pattern predicted output from the linear regression method were smaller than patterns generated by the delta method. Across scenarios, differences in the linear regression method patterns were more statistically significant, especially at high latitudes. We found that pattern generation methodologies were able to approximate the forced signal of change to within ≤ 0.5°C, but choice of pattern generation methodology for pattern scaling purposes should be informed by user goals and criteria. As a result, this paper describes our library of least squared regression patterns from all CMIP5 models for temperature and precipitation on an annual and sub-annual basis, along with the code used to generate these patterns.« less

  15. An open-access CMIP5 pattern library for temperature and precipitation: Description and methodology

    DOE PAGES

    Lynch, Cary D.; Hartin, Corinne A.; Bond-Lamberty, Benjamin; ...

    2017-05-15

    Pattern scaling is used to efficiently emulate general circulation models and explore uncertainty in climate projections under multiple forcing scenarios. Pattern scaling methods assume that local climate changes scale with a global mean temperature increase, allowing for spatial patterns to be generated for multiple models for any future emission scenario. For uncertainty quantification and probabilistic statistical analysis, a library of patterns with descriptive statistics for each file would be beneficial, but such a library does not presently exist. Of the possible techniques used to generate patterns, the two most prominent are the delta and least squared regression methods. We exploremore » the differences and statistical significance between patterns generated by each method and assess performance of the generated patterns across methods and scenarios. Differences in patterns across seasons between methods and epochs were largest in high latitudes (60-90°N/S). Bias and mean errors between modeled and pattern predicted output from the linear regression method were smaller than patterns generated by the delta method. Across scenarios, differences in the linear regression method patterns were more statistically significant, especially at high latitudes. We found that pattern generation methodologies were able to approximate the forced signal of change to within ≤ 0.5°C, but choice of pattern generation methodology for pattern scaling purposes should be informed by user goals and criteria. As a result, this paper describes our library of least squared regression patterns from all CMIP5 models for temperature and precipitation on an annual and sub-annual basis, along with the code used to generate these patterns.« less

  16. Meta-regression analysis of the effect of trans fatty acids on low-density lipoprotein cholesterol.

    PubMed

    Allen, Bruce C; Vincent, Melissa J; Liska, DeAnn; Haber, Lynne T

    2016-12-01

    We conducted a meta-regression of controlled clinical trial data to investigate quantitatively the relationship between dietary intake of industrial trans fatty acids (iTFA) and increased low-density lipoprotein cholesterol (LDL-C). Previous regression analyses included insufficient data to determine the nature of the dose response in the low-dose region and have nonetheless assumed a linear relationship between iTFA intake and LDL-C levels. This work contributes to the previous work by 1) including additional studies examining low-dose intake (identified using an evidence mapping procedure); 2) investigating a range of curve shapes, including both linear and nonlinear models; and 3) using Bayesian meta-regression to combine results across trials. We found that, contrary to previous assumptions, the linear model does not acceptably fit the data, while the nonlinear, S-shaped Hill model fits the data well. Based on a conservative estimate of the degree of intra-individual variability in LDL-C (0.1 mmoL/L), as an estimate of a change in LDL-C that is not adverse, a change in iTFA intake of 2.2% of energy intake (%en) (corresponding to a total iTFA intake of 2.2-2.9%en) does not cause adverse effects on LDL-C. The iTFA intake associated with this change in LDL-C is substantially higher than the average iTFA intake (0.5%en). Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.

  17. Statistical methods for efficient design of community surveys of response to noise: Random coefficients regression models

    NASA Technical Reports Server (NTRS)

    Tomberlin, T. J.

    1985-01-01

    Research studies of residents' responses to noise consist of interviews with samples of individuals who are drawn from a number of different compact study areas. The statistical techniques developed provide a basis for those sample design decisions. These techniques are suitable for a wide range of sample survey applications. A sample may consist of a random sample of residents selected from a sample of compact study areas, or in a more complex design, of a sample of residents selected from a sample of larger areas (e.g., cities). The techniques may be applied to estimates of the effects on annoyance of noise level, numbers of noise events, the time-of-day of the events, ambient noise levels, or other factors. Methods are provided for determining, in advance, how accurately these effects can be estimated for different sample sizes and study designs. Using a simple cost function, they also provide for optimum allocation of the sample across the stages of the design for estimating these effects. These techniques are developed via a regression model in which the regression coefficients are assumed to be random, with components of variance associated with the various stages of a multi-stage sample design.

  18. Logistic regression of family data from retrospective study designs.

    PubMed

    Whittemore, Alice S; Halpern, Jerry

    2003-11-01

    We wish to study the effects of genetic and environmental factors on disease risk, using data from families ascertained because they contain multiple cases of the disease. To do so, we must account for the way participants were ascertained, and for within-family correlations in both disease occurrences and covariates. We model the joint probability distribution of the covariates of ascertained family members, given family disease occurrence and pedigree structure. We describe two such covariate models: the random effects model and the marginal model. Both models assume a logistic form for the distribution of one person's covariates that involves a vector beta of regression parameters. The components of beta in the two models have different interpretations, and they differ in magnitude when the covariates are correlated within families. We describe ascertainment assumptions needed to estimate consistently the parameters beta(RE) in the random effects model and the parameters beta(M) in the marginal model. Under the ascertainment assumptions for the random effects model, we show that conditional logistic regression (CLR) of matched family data gives a consistent estimate beta(RE) for beta(RE) and a consistent estimate for the covariance matrix of beta(RE). Under the ascertainment assumptions for the marginal model, we show that unconditional logistic regression (ULR) gives a consistent estimate for beta(M), and we give a consistent estimator for its covariance matrix. The random effects/CLR approach is simple to use and to interpret, but it can use data only from families containing both affected and unaffected members. The marginal/ULR approach uses data from all individuals, but its variance estimates require special computations. A C program to compute these variance estimates is available at http://www.stanford.edu/dept/HRP/epidemiology. We illustrate these pros and cons by application to data on the effects of parity on ovarian cancer risk in mother/daughter pairs, and use simulations to study the performance of the estimates. Copyright 2003 Wiley-Liss, Inc.

  19. Gene set analysis using variance component tests.

    PubMed

    Huang, Yen-Tsung; Lin, Xihong

    2013-06-28

    Gene set analyses have become increasingly important in genomic research, as many complex diseases are contributed jointly by alterations of numerous genes. Genes often coordinate together as a functional repertoire, e.g., a biological pathway/network and are highly correlated. However, most of the existing gene set analysis methods do not fully account for the correlation among the genes. Here we propose to tackle this important feature of a gene set to improve statistical power in gene set analyses. We propose to model the effects of an independent variable, e.g., exposure/biological status (yes/no), on multiple gene expression values in a gene set using a multivariate linear regression model, where the correlation among the genes is explicitly modeled using a working covariance matrix. We develop TEGS (Test for the Effect of a Gene Set), a variance component test for the gene set effects by assuming a common distribution for regression coefficients in multivariate linear regression models, and calculate the p-values using permutation and a scaled chi-square approximation. We show using simulations that type I error is protected under different choices of working covariance matrices and power is improved as the working covariance approaches the true covariance. The global test is a special case of TEGS when correlation among genes in a gene set is ignored. Using both simulation data and a published diabetes dataset, we show that our test outperforms the commonly used approaches, the global test and gene set enrichment analysis (GSEA). We develop a gene set analyses method (TEGS) under the multivariate regression framework, which directly models the interdependence of the expression values in a gene set using a working covariance. TEGS outperforms two widely used methods, GSEA and global test in both simulation and a diabetes microarray data.

  20. Multitask TSK fuzzy system modeling by mining intertask common hidden structure.

    PubMed

    Jiang, Yizhang; Chung, Fu-Lai; Ishibuchi, Hisao; Deng, Zhaohong; Wang, Shitong

    2015-03-01

    The classical fuzzy system modeling methods implicitly assume data generated from a single task, which is essentially not in accordance with many practical scenarios where data can be acquired from the perspective of multiple tasks. Although one can build an individual fuzzy system model for each task, the result indeed tells us that the individual modeling approach will get poor generalization ability due to ignoring the intertask hidden correlation. In order to circumvent this shortcoming, we consider a general framework for preserving the independent information among different tasks and mining hidden correlation information among all tasks in multitask fuzzy modeling. In this framework, a low-dimensional subspace (structure) is assumed to be shared among all tasks and hence be the hidden correlation information among all tasks. Under this framework, a multitask Takagi-Sugeno-Kang (TSK) fuzzy system model called MTCS-TSK-FS (TSK-FS for multiple tasks with common hidden structure), based on the classical L2-norm TSK fuzzy system, is proposed in this paper. The proposed model can not only take advantage of independent sample information from the original space for each task, but also effectively use the intertask common hidden structure among multiple tasks to enhance the generalization performance of the built fuzzy systems. Experiments on synthetic and real-world datasets demonstrate the applicability and distinctive performance of the proposed multitask fuzzy system model in multitask regression learning scenarios.

  1. Hazard Regression Models of Early Mortality in Trauma Centers

    PubMed Central

    Clark, David E; Qian, Jing; Winchell, Robert J; Betensky, Rebecca A

    2013-01-01

    Background Factors affecting early hospital deaths after trauma may be different from factors affecting later hospital deaths, and the distribution of short and long prehospital times may vary among hospitals. Hazard regression (HR) models may therefore be more useful than logistic regression (LR) models for analysis of trauma mortality, especially when treatment effects at different time points are of interest. Study Design We obtained data for trauma center patients from the 2008–9 National Trauma Data Bank (NTDB). Cases were included if they had complete data for prehospital times, hospital times, survival outcome, age, vital signs, and severity scores. Cases were excluded if pulseless on admission, transferred in or out, or ISS<9. Using covariates proposed for the Trauma Quality Improvement Program and an indicator for each hospital, we compared LR models predicting survival at 8 hours after injury to HR models with survival censored at 8 hours. HR models were then modified to allow time-varying hospital effects. Results 85,327 patients in 161 hospitals met inclusion criteria. Crude hazards peaked initially, then steadily declined. When hazard ratios were assumed constant in HR models, they were similar to odds ratios in LR models associating increased mortality with increased age, firearm mechanism, increased severity, more deranged physiology, and estimated hospital-specific effects. However, when hospital effects were allowed to vary by time, HR models demonstrated that hospital outliers were not the same at different times after injury. Conclusions HR models with time-varying hazard ratios reveal inconsistencies in treatment effects, data quality, and/or timing of early death among trauma centers. HR models are generally more flexible than LR models, can be adapted for censored data, and potentially offer a better tool for analysis of factors affecting early death after injury. PMID:23036828

  2. Big Data Toolsets to Pharmacometrics: Application of Machine Learning for Time-to-Event Analysis.

    PubMed

    Gong, Xiajing; Hu, Meng; Zhao, Liang

    2018-05-01

    Additional value can be potentially created by applying big data tools to address pharmacometric problems. The performances of machine learning (ML) methods and the Cox regression model were evaluated based on simulated time-to-event data synthesized under various preset scenarios, i.e., with linear vs. nonlinear and dependent vs. independent predictors in the proportional hazard function, or with high-dimensional data featured by a large number of predictor variables. Our results showed that ML-based methods outperformed the Cox model in prediction performance as assessed by concordance index and in identifying the preset influential variables for high-dimensional data. The prediction performances of ML-based methods are also less sensitive to data size and censoring rates than the Cox regression model. In conclusion, ML-based methods provide a powerful tool for time-to-event analysis, with a built-in capacity for high-dimensional data and better performance when the predictor variables assume nonlinear relationships in the hazard function. © 2018 The Authors. Clinical and Translational Science published by Wiley Periodicals, Inc. on behalf of American Society for Clinical Pharmacology and Therapeutics.

  3. Revisiting crash spatial heterogeneity: A Bayesian spatially varying coefficients approach.

    PubMed

    Xu, Pengpeng; Huang, Helai; Dong, Ni; Wong, S C

    2017-01-01

    This study was performed to investigate the spatially varying relationships between crash frequency and related risk factors. A Bayesian spatially varying coefficients model was elaborately introduced as a methodological alternative to simultaneously account for the unstructured and spatially structured heterogeneity of the regression coefficients in predicting crash frequencies. The proposed method was appealing in that the parameters were modeled via a conditional autoregressive prior distribution, which involved a single set of random effects and a spatial correlation parameter with extreme values corresponding to pure unstructured or pure spatially correlated random effects. A case study using a three-year crash dataset from the Hillsborough County, Florida, was conducted to illustrate the proposed model. Empirical analysis confirmed the presence of both unstructured and spatially correlated variations in the effects of contributory factors on severe crash occurrences. The findings also suggested that ignoring spatially structured heterogeneity may result in biased parameter estimates and incorrect inferences, while assuming the regression coefficients to be spatially clustered only is probably subject to the issue of over-smoothness. Copyright © 2016 Elsevier Ltd. All rights reserved.

  4. Demand analysis of flood insurance by using logistic regression model and genetic algorithm

    NASA Astrophysics Data System (ADS)

    Sidi, P.; Mamat, M. B.; Sukono; Supian, S.; Putra, A. S.

    2018-03-01

    Citarum River floods in the area of South Bandung Indonesia, often resulting damage to some buildings belonging to the people living in the vicinity. One effort to alleviate the risk of building damage is to have flood insurance. The main obstacle is not all people in the Citarum basin decide to buy flood insurance. In this paper, we intend to analyse the decision to buy flood insurance. It is assumed that there are eight variables that influence the decision of purchasing flood assurance, include: income level, education level, house distance with river, building election with road, flood frequency experience, flood prediction, perception on insurance company, and perception towards government effort in handling flood. The analysis was done by using logistic regression model, and to estimate model parameters, it is done with genetic algorithm. The results of the analysis shows that eight variables analysed significantly influence the demand of flood insurance. These results are expected to be considered for insurance companies, to influence the decision of the community to be willing to buy flood insurance.

  5. Robust Methods for Moderation Analysis with a Two-Level Regression Model.

    PubMed

    Yang, Miao; Yuan, Ke-Hai

    2016-01-01

    Moderation analysis has many applications in social sciences. Most widely used estimation methods for moderation analysis assume that errors are normally distributed and homoscedastic. When these assumptions are not met, the results from a classical moderation analysis can be misleading. For more reliable moderation analysis, this article proposes two robust methods with a two-level regression model when the predictors do not contain measurement error. One method is based on maximum likelihood with Student's t distribution and the other is based on M-estimators with Huber-type weights. An algorithm for obtaining the robust estimators is developed. Consistent estimates of standard errors of the robust estimators are provided. The robust approaches are compared against normal-distribution-based maximum likelihood (NML) with respect to power and accuracy of parameter estimates through a simulation study. Results show that the robust approaches outperform NML under various distributional conditions. Application of the robust methods is illustrated through a real data example. An R program is developed and documented to facilitate the application of the robust methods.

  6. Hierarchical Bayesian modelling of mobility metrics for hazard model input calibration

    NASA Astrophysics Data System (ADS)

    Calder, Eliza; Ogburn, Sarah; Spiller, Elaine; Rutarindwa, Regis; Berger, Jim

    2015-04-01

    In this work we present a method to constrain flow mobility input parameters for pyroclastic flow models using hierarchical Bayes modeling of standard mobility metrics such as H/L and flow volume etc. The advantage of hierarchical modeling is that it can leverage the information in global dataset for a particular mobility metric in order to reduce the uncertainty in modeling of an individual volcano, especially important where individual volcanoes have only sparse datasets. We use compiled pyroclastic flow runout data from Colima, Merapi, Soufriere Hills, Unzen and Semeru volcanoes, presented in an open-source database FlowDat (https://vhub.org/groups/massflowdatabase). While the exact relationship between flow volume and friction varies somewhat between volcanoes, dome collapse flows originating from the same volcano exhibit similar mobility relationships. Instead of fitting separate regression models for each volcano dataset, we use a variation of the hierarchical linear model (Kass and Steffey, 1989). The model presents a hierarchical structure with two levels; all dome collapse flows and dome collapse flows at specific volcanoes. The hierarchical model allows us to assume that the flows at specific volcanoes share a common distribution of regression slopes, then solves for that distribution. We present comparisons of the 95% confidence intervals on the individual regression lines for the data set from each volcano as well as those obtained from the hierarchical model. The results clearly demonstrate the advantage of considering global datasets using this technique. The technique developed is demonstrated here for mobility metrics, but can be applied to many other global datasets of volcanic parameters. In particular, such methods can provide a means to better contain parameters for volcanoes for which we only have sparse data, a ubiquitous problem in volcanology.

  7. Exploring precipitation pattern scaling methodologies and robustness among CMIP5 models

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kravitz, Ben; Lynch, Cary; Hartin, Corinne

    Pattern scaling is a well-established method for approximating modeled spatial distributions of changes in temperature by assuming a time-invariant pattern that scales with changes in global mean temperature. We compare two methods of pattern scaling for annual mean precipitation (regression and epoch difference) and evaluate which method is better in particular circumstances by quantifying their robustness to interpolation/extrapolation in time, inter-model variations, and inter-scenario variations. Both the regression and epoch-difference methods (the two most commonly used methods of pattern scaling) have good absolute performance in reconstructing the climate model output, measured as an area-weighted root mean square error. We decomposemore » the precipitation response in the RCP8.5 scenario into a CO 2 portion and a non-CO 2 portion. Extrapolating RCP8.5 patterns to reconstruct precipitation change in the RCP2.6 scenario results in large errors due to violations of pattern scaling assumptions when this CO 2-/non-CO 2-forcing decomposition is applied. As a result, the methodologies discussed in this paper can help provide precipitation fields to be utilized in other models (including integrated assessment models or impacts assessment models) for a wide variety of scenarios of future climate change.« less

  8. Use of Cox's Cure Model to Establish Clinical Determinants of Long-Term Disease-Free Survival in Neoadjuvant-Chemotherapy-Treated Breast Cancer Patients without Pathologic Complete Response.

    PubMed

    Asano, Junichi; Hirakawa, Akihiro; Hamada, Chikuma; Yonemori, Kan; Hirata, Taizo; Shimizu, Chikako; Tamura, Kenji; Fujiwara, Yasuhiro

    2013-01-01

    In prognostic studies for breast cancer patients treated with neoadjuvant chemotherapy (NAC), the ordinary Cox proportional-hazards (PH) model has been often used to identify prognostic factors for disease-free survival (DFS). This model assumes that all patients eventually experience relapse or death. However, a subset of NAC-treated breast cancer patients never experience these events during long-term follow-up (>10 years) and may be considered clinically "cured." Clinical factors associated with cure have not been studied adequately. Because the ordinary Cox PH model cannot be used to identify such clinical factors, we used the Cox PH cure model, a recently developed statistical method. This model includes both a logistic regression component for the cure rate and a Cox regression component for the hazard for uncured patients. The purpose of this study was to identify the clinical factors associated with cure and the variables associated with the time to recurrence or death in NAC-treated breast cancer patients without a pathologic complete response, by using the Cox PH cure model. We found that hormone receptor status, clinical response, human epidermal growth factor receptor 2 status, histological grade, and the number of lymph node metastases were associated with cure.

  9. Exploring precipitation pattern scaling methodologies and robustness among CMIP5 models

    DOE PAGES

    Kravitz, Ben; Lynch, Cary; Hartin, Corinne; ...

    2017-05-12

    Pattern scaling is a well-established method for approximating modeled spatial distributions of changes in temperature by assuming a time-invariant pattern that scales with changes in global mean temperature. We compare two methods of pattern scaling for annual mean precipitation (regression and epoch difference) and evaluate which method is better in particular circumstances by quantifying their robustness to interpolation/extrapolation in time, inter-model variations, and inter-scenario variations. Both the regression and epoch-difference methods (the two most commonly used methods of pattern scaling) have good absolute performance in reconstructing the climate model output, measured as an area-weighted root mean square error. We decomposemore » the precipitation response in the RCP8.5 scenario into a CO 2 portion and a non-CO 2 portion. Extrapolating RCP8.5 patterns to reconstruct precipitation change in the RCP2.6 scenario results in large errors due to violations of pattern scaling assumptions when this CO 2-/non-CO 2-forcing decomposition is applied. As a result, the methodologies discussed in this paper can help provide precipitation fields to be utilized in other models (including integrated assessment models or impacts assessment models) for a wide variety of scenarios of future climate change.« less

  10. Robust Bayesian linear regression with application to an analysis of the CODATA values for the Planck constant

    NASA Astrophysics Data System (ADS)

    Wübbeler, Gerd; Bodnar, Olha; Elster, Clemens

    2018-02-01

    Weighted least-squares estimation is commonly applied in metrology to fit models to measurements that are accompanied with quoted uncertainties. The weights are chosen in dependence on the quoted uncertainties. However, when data and model are inconsistent in view of the quoted uncertainties, this procedure does not yield adequate results. When it can be assumed that all uncertainties ought to be rescaled by a common factor, weighted least-squares estimation may still be used, provided that a simple correction of the uncertainty obtained for the estimated model is applied. We show that these uncertainties and credible intervals are robust, as they do not rely on the assumption of a Gaussian distribution of the data. Hence, common software for weighted least-squares estimation may still safely be employed in such a case, followed by a simple modification of the uncertainties obtained by that software. We also provide means of checking the assumptions of such an approach. The Bayesian regression procedure is applied to analyze the CODATA values for the Planck constant published over the past decades in terms of three different models: a constant model, a straight line model and a spline model. Our results indicate that the CODATA values may not have yet stabilized.

  11. Genetic parameters and expected responses to selection for components of feed efficiency in a Duroc pig line.

    PubMed

    Sánchez, Juan P; Ragab, Mohamed; Quintanilla, Raquel; Rothschild, Max F; Piles, Miriam

    2017-12-01

    Improving feed efficiency ([Formula: see text]) is a key factor for any pig breeding company. Although this can be achieved by selection on an index of multi-trait best linear unbiased prediction of breeding values with optimal economic weights, considering deviations of feed intake from actual needs ([Formula: see text]) should be of value for further research on biological aspects of [Formula: see text]. Here, we present a random regression model that extends the classical definition of [Formula: see text] by including animal-specific needs in the model. Using this model, we explore the genetic determinism of several [Formula: see text] components: use of feed for growth ([Formula: see text]), use of feed for backfat deposition ([Formula: see text]), use of feed for maintenance ([Formula: see text]), and unspecific efficiency in the use of feed ([Formula: see text]). Expected response to alternative selection indexes involving different components is also studied. Based on goodness-of-fit to the available feed intake ([Formula: see text]) data, the model that assumes individual (genetic and permanent) variation in the use of feed for maintenance, [Formula: see text] and [Formula: see text] showed the best performance. Joint individual variation in feed allocation to maintenance, growth and backfat deposition comprised 37% of the individual variation of [Formula: see text]. The estimated heritabilities of [Formula: see text] using the model that accounts for animal-specific needs and the traditional [Formula: see text] model were 0.12 and 0.18, respectively. The estimated heritabilities for the regression coefficients were 0.44, 0.39 and 0.55 for [Formula: see text], [Formula: see text] and [Formula: see text], respectively. Estimates of genetic correlations of [Formula: see text] were positive with amount of feed used for [Formula: see text] and [Formula: see text] but negative for [Formula: see text]. Expected response in overall efficiency, reducing [Formula: see text] without altering performance, was 2.5% higher when the model assumed animal-specific needs than when the traditional definition of [Formula: see text] was considered. Expected response in overall efficiency, by reducing [Formula: see text] without altering performance, is slightly better with a model that assumes animal-specific needs instead of batch-specific needs to correct [Formula: see text]. The relatively small difference between the traditional [Formula: see text] model and our model is due to random intercepts (unspecific use of feed) accounting for the majority of variability in [Formula: see text]. Overall, a model that accounts for animal-specific needs for [Formula: see text], [Formula: see text] and [Formula: see text] is statistically superior and allows for the possibility to act differentially on [Formula: see text] components.

  12. Multiple-trait structured antedependence model to study the relationship between litter size and birth weight in pigs and rabbits.

    PubMed

    David, Ingrid; Garreau, Hervé; Balmisse, Elodie; Billon, Yvon; Canario, Laurianne

    2017-01-20

    Some genetic studies need to take into account correlations between traits that are repeatedly measured over time. Multiple-trait random regression models are commonly used to analyze repeated traits but suffer from several major drawbacks. In the present study, we developed a multiple-trait extension of the structured antedependence model (SAD) to overcome this issue and validated its usefulness by modeling the association between litter size (LS) and average birth weight (ABW) over parities in pigs and rabbits. The single-trait SAD model assumes that a random effect at time [Formula: see text] can be explained by the previous values of the random effect (i.e. at previous times). The proposed multiple-trait extension of the SAD model consists in adding a cross-antedependence parameter to the single-trait SAD model. This model can be easily fitted using ASReml and the OWN Fortran program that we have developed. In comparison with the random regression model, we used our multiple-trait SAD model to analyze the LS and ABW of 4345 litters from 1817 Large White sows and 8706 litters from 2286 L-1777 does over a maximum of five successive parities. For both species, the multiple-trait SAD fitted the data better than the random regression model. The difference between AIC of the two models (AIC_random regression-AIC_SAD) were equal to 7 and 227 for pigs and rabbits, respectively. A similar pattern of heritability and correlation estimates was obtained for both species. Heritabilities were lower for LS (ranging from 0.09 to 0.29) than for ABW (ranging from 0.23 to 0.39). The general trend was a decrease of the genetic correlation for a given trait between more distant parities. Estimates of genetic correlations between LS and ABW were negative and ranged from -0.03 to -0.52 across parities. No correlation was observed between the permanent environmental effects, except between the permanent environmental effects of LS and ABW of the same parity, for which the estimate of the correlation was strongly negative (ranging from -0.57 to -0.67). We demonstrated that application of our multiple-trait SAD model is feasible for studying several traits with repeated measurements and showed that it provided a better fit to the data than the random regression model.

  13. Sufficient Dimension Reduction for Longitudinally Measured Predictors

    PubMed Central

    Pfeiffer, Ruth M.; Forzani, Liliana; Bura, Efstathia

    2013-01-01

    We propose a method to combine several predictors (markers) that are measured repeatedly over time into a composite marker score without assuming a model and only requiring a mild condition on the predictor distribution. Assuming that the first and second moments of the predictors can be decomposed into a time and a marker component via a Kronecker product structure, that accommodates the longitudinal nature of the predictors, we develop first moment sufficient dimension reduction techniques to replace the original markers with linear transformations that contain sufficient information for the regression of the predictors on the outcome. These linear combinations can then be combined into a score that has better predictive performance than the score built under a general model that ignores the longitudinal structure of the data. Our methods can be applied to either continuous or categorical outcome measures. In simulations we focus on binary outcomes and show that our method outperforms existing alternatives using the AUC, the area under the receiver-operator characteristics (ROC) curve, as a summary measure of the discriminatory ability of a single continuous diagnostic marker for binary disease outcomes. PMID:22161635

  14. Estimation of Recurrence of Colorectal Adenomas with Dependent Censoring Using Weighted Logistic Regression

    PubMed Central

    Hsu, Chiu-Hsieh; Li, Yisheng; Long, Qi; Zhao, Qiuhong; Lance, Peter

    2011-01-01

    In colorectal polyp prevention trials, estimation of the rate of recurrence of adenomas at the end of the trial may be complicated by dependent censoring, that is, time to follow-up colonoscopy and dropout may be dependent on time to recurrence. Assuming that the auxiliary variables capture the dependence between recurrence and censoring times, we propose to fit two working models with the auxiliary variables as covariates to define risk groups and then extend an existing weighted logistic regression method for independent censoring to each risk group to accommodate potential dependent censoring. In a simulation study, we show that the proposed method results in both a gain in efficiency and reduction in bias for estimating the recurrence rate. We illustrate the methodology by analyzing a recurrent adenoma dataset from a colorectal polyp prevention trial. PMID:22065985

  15. A Bayesian methodological framework for accommodating interannual variability of nutrient loading with the SPARROW model

    NASA Astrophysics Data System (ADS)

    Wellen, Christopher; Arhonditsis, George B.; Labencki, Tanya; Boyd, Duncan

    2012-10-01

    Regression-type, hybrid empirical/process-based models (e.g., SPARROW, PolFlow) have assumed a prominent role in efforts to estimate the sources and transport of nutrient pollution at river basin scales. However, almost no attempts have been made to explicitly accommodate interannual nutrient loading variability in their structure, despite empirical and theoretical evidence indicating that the associated source/sink processes are quite variable at annual timescales. In this study, we present two methodological approaches to accommodate interannual variability with the Spatially Referenced Regressions on Watershed attributes (SPARROW) nonlinear regression model. The first strategy uses the SPARROW model to estimate a static baseline load and climatic variables (e.g., precipitation) to drive the interannual variability. The second approach allows the source/sink processes within the SPARROW model to vary at annual timescales using dynamic parameter estimation techniques akin to those used in dynamic linear models. Model parameterization is founded upon Bayesian inference techniques that explicitly consider calibration data and model uncertainty. Our case study is the Hamilton Harbor watershed, a mixed agricultural and urban residential area located at the western end of Lake Ontario, Canada. Our analysis suggests that dynamic parameter estimation is the more parsimonious of the two strategies tested and can offer insights into the temporal structural changes associated with watershed functioning. Consistent with empirical and theoretical work, model estimated annual in-stream attenuation rates varied inversely with annual discharge. Estimated phosphorus source areas were concentrated near the receiving water body during years of high in-stream attenuation and dispersed along the main stems of the streams during years of low attenuation, suggesting that nutrient source areas are subject to interannual variability.

  16. Estimating riparian understory vegetation cover with beta regression and copula models

    USGS Publications Warehouse

    Eskelson, Bianca N.I.; Madsen, Lisa; Hagar, Joan C.; Temesgen, Hailemariam

    2011-01-01

    Understory vegetation communities are critical components of forest ecosystems. As a result, the importance of modeling understory vegetation characteristics in forested landscapes has become more apparent. Abundance measures such as shrub cover are bounded between 0 and 1, exhibit heteroscedastic error variance, and are often subject to spatial dependence. These distributional features tend to be ignored when shrub cover data are analyzed. The beta distribution has been used successfully to describe the frequency distribution of vegetation cover. Beta regression models ignoring spatial dependence (BR) and accounting for spatial dependence (BRdep) were used to estimate percent shrub cover as a function of topographic conditions and overstory vegetation structure in riparian zones in western Oregon. The BR models showed poor explanatory power (pseudo-R2 ≤ 0.34) but outperformed ordinary least-squares (OLS) and generalized least-squares (GLS) regression models with logit-transformed response in terms of mean square prediction error and absolute bias. We introduce a copula (COP) model that is based on the beta distribution and accounts for spatial dependence. A simulation study was designed to illustrate the effects of incorrectly assuming normality, equal variance, and spatial independence. It showed that BR, BRdep, and COP models provide unbiased parameter estimates, whereas OLS and GLS models result in slightly biased estimates for two of the three parameters. On the basis of the simulation study, 93–97% of the GLS, BRdep, and COP confidence intervals covered the true parameters, whereas OLS and BR only resulted in 84–88% coverage, which demonstrated the superiority of GLS, BRdep, and COP over OLS and BR models in providing standard errors for the parameter estimates in the presence of spatial dependence.

  17. Oxidative desulfurization: kinetic modelling.

    PubMed

    Dhir, S; Uppaluri, R; Purkait, M K

    2009-01-30

    Increasing environmental legislations coupled with enhanced production of petroleum products demand, the deployment of novel technologies to remove organic sulfur efficiently. This work represents the kinetic modeling of ODS using H(2)O(2) over tungsten-containing layered double hydroxide (LDH) using the experimental data provided by Hulea et al. [V. Hulea, A.L. Maciuca, F. Fajula, E. Dumitriu, Catalytic oxidation of thiophenes and thioethers with hydrogen peroxide in the presence of W-containing layered double hydroxides, Appl. Catal. A: Gen. 313 (2) (2006) 200-207]. The kinetic modeling approach in this work initially targets the scope of the generation of a superstructure of micro-kinetic reaction schemes and models assuming Langmuir-Hinshelwood (LH) and Eley-Rideal (ER) mechanisms. Subsequently, the screening and selection of above models is initially based on profile-based elimination of incompetent schemes followed by non-linear regression search performed using the Levenberg-Marquardt algorithm (LMA) for the chosen models. The above analysis inferred that Eley-Rideal mechanism describes the kinetic behavior of ODS process using tungsten-containing LDH, with adsorption of reactant and intermediate product only taking place on the catalyst surface. Finally, an economic index is presented that scopes the economic aspects of the novel catalytic technology with the parameters obtained during regression analysis to conclude that the cost factor for the catalyst is 0.0062-0.04759 US $ per barrel.

  18. Decoding and modelling of time series count data using Poisson hidden Markov model and Markov ordinal logistic regression models.

    PubMed

    Sebastian, Tunny; Jeyaseelan, Visalakshi; Jeyaseelan, Lakshmanan; Anandan, Shalini; George, Sebastian; Bangdiwala, Shrikant I

    2018-01-01

    Hidden Markov models are stochastic models in which the observations are assumed to follow a mixture distribution, but the parameters of the components are governed by a Markov chain which is unobservable. The issues related to the estimation of Poisson-hidden Markov models in which the observations are coming from mixture of Poisson distributions and the parameters of the component Poisson distributions are governed by an m-state Markov chain with an unknown transition probability matrix are explained here. These methods were applied to the data on Vibrio cholerae counts reported every month for 11-year span at Christian Medical College, Vellore, India. Using Viterbi algorithm, the best estimate of the state sequence was obtained and hence the transition probability matrix. The mean passage time between the states were estimated. The 95% confidence interval for the mean passage time was estimated via Monte Carlo simulation. The three hidden states of the estimated Markov chain are labelled as 'Low', 'Moderate' and 'High' with the mean counts of 1.4, 6.6 and 20.2 and the estimated average duration of stay of 3, 3 and 4 months, respectively. Environmental risk factors were studied using Markov ordinal logistic regression analysis. No significant association was found between disease severity levels and climate components.

  19. Bayesian Travel Time Inversion adopting Gaussian Process Regression

    NASA Astrophysics Data System (ADS)

    Mauerberger, S.; Holschneider, M.

    2017-12-01

    A major application in seismology is the determination of seismic velocity models. Travel time measurements are putting an integral constraint on the velocity between source and receiver. We provide insight into travel time inversion from a correlation-based Bayesian point of view. Therefore, the concept of Gaussian process regression is adopted to estimate a velocity model. The non-linear travel time integral is approximated by a 1st order Taylor expansion. A heuristic covariance describes correlations amongst observations and a priori model. That approach enables us to assess a proxy of the Bayesian posterior distribution at ordinary computational costs. No multi dimensional numeric integration nor excessive sampling is necessary. Instead of stacking the data, we suggest to progressively build the posterior distribution. Incorporating only a single evidence at a time accounts for the deficit of linearization. As a result, the most probable model is given by the posterior mean whereas uncertainties are described by the posterior covariance.As a proof of concept, a synthetic purely 1d model is addressed. Therefore a single source accompanied by multiple receivers is considered on top of a model comprising a discontinuity. We consider travel times of both phases - direct and reflected wave - corrupted by noise. Left and right of the interface are assumed independent where the squared exponential kernel serves as covariance.

  20. Robust inference in the negative binomial regression model with an application to falls data.

    PubMed

    Aeberhard, William H; Cantoni, Eva; Heritier, Stephane

    2014-12-01

    A popular way to model overdispersed count data, such as the number of falls reported during intervention studies, is by means of the negative binomial (NB) distribution. Classical estimating methods are well-known to be sensitive to model misspecifications, taking the form of patients falling much more than expected in such intervention studies where the NB regression model is used. We extend in this article two approaches for building robust M-estimators of the regression parameters in the class of generalized linear models to the NB distribution. The first approach achieves robustness in the response by applying a bounded function on the Pearson residuals arising in the maximum likelihood estimating equations, while the second approach achieves robustness by bounding the unscaled deviance components. For both approaches, we explore different choices for the bounding functions. Through a unified notation, we show how close these approaches may actually be as long as the bounding functions are chosen and tuned appropriately, and provide the asymptotic distributions of the resulting estimators. Moreover, we introduce a robust weighted maximum likelihood estimator for the overdispersion parameter, specific to the NB distribution. Simulations under various settings show that redescending bounding functions yield estimates with smaller biases under contamination while keeping high efficiency at the assumed model, and this for both approaches. We present an application to a recent randomized controlled trial measuring the effectiveness of an exercise program at reducing the number of falls among people suffering from Parkinsons disease to illustrate the diagnostic use of such robust procedures and their need for reliable inference. © 2014, The International Biometric Society.

  1. A Novel Approach to Implement Takagi-Sugeno Fuzzy Models.

    PubMed

    Chang, Chia-Wen; Tao, Chin-Wang

    2017-09-01

    This paper proposes new algorithms based on the fuzzy c-regressing model algorithm for Takagi-Sugeno (T-S) fuzzy modeling of the complex nonlinear systems. A fuzzy c-regression state model (FCRSM) algorithm is a T-S fuzzy model in which the functional antecedent and the state-space-model-type consequent are considered with the available input-output data. The antecedent and consequent forms of the proposed FCRSM consists mainly of two advantages: one is that the FCRSM has low computation load due to only one input variable is considered in the antecedent part; another is that the unknown system can be modeled to not only the polynomial form but also the state-space form. Moreover, the FCRSM can be extended to FCRSM-ND and FCRSM-Free algorithms. An algorithm FCRSM-ND is presented to find the T-S fuzzy state-space model of the nonlinear system when the input-output data cannot be precollected and an assumed effective controller is available. In the practical applications, the mathematical model of controller may be hard to be obtained. In this case, an online tuning algorithm, FCRSM-FREE, is designed such that the parameters of a T-S fuzzy controller and the T-S fuzzy state model of an unknown system can be online tuned simultaneously. Four numerical simulations are given to demonstrate the effectiveness of the proposed approach.

  2. An open-access CMIP5 pattern library for temperature and precipitation: description and methodology

    NASA Astrophysics Data System (ADS)

    Lynch, Cary; Hartin, Corinne; Bond-Lamberty, Ben; Kravitz, Ben

    2017-05-01

    Pattern scaling is used to efficiently emulate general circulation models and explore uncertainty in climate projections under multiple forcing scenarios. Pattern scaling methods assume that local climate changes scale with a global mean temperature increase, allowing for spatial patterns to be generated for multiple models for any future emission scenario. For uncertainty quantification and probabilistic statistical analysis, a library of patterns with descriptive statistics for each file would be beneficial, but such a library does not presently exist. Of the possible techniques used to generate patterns, the two most prominent are the delta and least squares regression methods. We explore the differences and statistical significance between patterns generated by each method and assess performance of the generated patterns across methods and scenarios. Differences in patterns across seasons between methods and epochs were largest in high latitudes (60-90° N/S). Bias and mean errors between modeled and pattern-predicted output from the linear regression method were smaller than patterns generated by the delta method. Across scenarios, differences in the linear regression method patterns were more statistically significant, especially at high latitudes. We found that pattern generation methodologies were able to approximate the forced signal of change to within ≤ 0.5 °C, but the choice of pattern generation methodology for pattern scaling purposes should be informed by user goals and criteria. This paper describes our library of least squares regression patterns from all CMIP5 models for temperature and precipitation on an annual and sub-annual basis, along with the code used to generate these patterns. The dataset and netCDF data generation code are available at doi:10.5281/zenodo.495632.

  3. Boosting multi-state models.

    PubMed

    Reulen, Holger; Kneib, Thomas

    2016-04-01

    One important goal in multi-state modelling is to explore information about conditional transition-type-specific hazard rate functions by estimating influencing effects of explanatory variables. This may be performed using single transition-type-specific models if these covariate effects are assumed to be different across transition-types. To investigate whether this assumption holds or whether one of the effects is equal across several transition-types (cross-transition-type effect), a combined model has to be applied, for instance with the use of a stratified partial likelihood formulation. Here, prior knowledge about the underlying covariate effect mechanisms is often sparse, especially about ineffectivenesses of transition-type-specific or cross-transition-type effects. As a consequence, data-driven variable selection is an important task: a large number of estimable effects has to be taken into account if joint modelling of all transition-types is performed. A related but subsequent task is model choice: is an effect satisfactory estimated assuming linearity, or is the true underlying nature strongly deviating from linearity? This article introduces component-wise Functional Gradient Descent Boosting (short boosting) for multi-state models, an approach performing unsupervised variable selection and model choice simultaneously within a single estimation run. We demonstrate that features and advantages in the application of boosting introduced and illustrated in classical regression scenarios remain present in the transfer to multi-state models. As a consequence, boosting provides an effective means to answer questions about ineffectiveness and non-linearity of single transition-type-specific or cross-transition-type effects.

  4. Exploring unobserved heterogeneity in bicyclists' red-light running behaviors at different crossing facilities.

    PubMed

    Guo, Yanyong; Li, Zhibin; Wu, Yao; Xu, Chengcheng

    2018-06-01

    Bicyclists running the red light at crossing facilities increase the potential of colliding with motor vehicles. Exploring the contributing factors could improve the prediction of running red-light probability and develop countermeasures to reduce such behaviors. However, individuals could have unobserved heterogeneities in running a red light, which make the accurate prediction more challenging. Traditional models assume that factor parameters are fixed and cannot capture the varying impacts on red-light running behaviors. In this study, we employed the full Bayesian random parameters logistic regression approach to account for the unobserved heterogeneous effects. Two types of crossing facilities were considered which were the signalized intersection crosswalks and the road segment crosswalks. Electric and conventional bikes were distinguished in the modeling. Data were collected from 16 crosswalks in urban area of Nanjing, China. Factors such as individual characteristics, road geometric design, environmental features, and traffic variables were examined. Model comparison indicates that the full Bayesian random parameters logistic regression approach is statistically superior to the standard logistic regression model. More red-light runners are predicted at signalized intersection crosswalks than at road segment crosswalks. Factors affecting red-light running behaviors are gender, age, bike type, road width, presence of raised median, separation width, signal type, green ratio, bike and vehicle volume, and average vehicle speed. Factors associated with the unobserved heterogeneity are gender, bike type, signal type, separation width, and bike volume. Copyright © 2018 Elsevier Ltd. All rights reserved.

  5. CO2 flux determination by closed-chamber methods can be seriously biased by inappropriate application of linear regression

    NASA Astrophysics Data System (ADS)

    Kutzbach, L.; Schneider, J.; Sachs, T.; Giebels, M.; Nykänen, H.; Shurpali, N. J.; Martikainen, P. J.; Alm, J.; Wilmking, M.

    2007-11-01

    Closed (non-steady state) chambers are widely used for quantifying carbon dioxide (CO2) fluxes between soils or low-stature canopies and the atmosphere. It is well recognised that covering a soil or vegetation by a closed chamber inherently disturbs the natural CO2 fluxes by altering the concentration gradients between the soil, the vegetation and the overlying air. Thus, the driving factors of CO2 fluxes are not constant during the closed chamber experiment, and no linear increase or decrease of CO2 concentration over time within the chamber headspace can be expected. Nevertheless, linear regression has been applied for calculating CO2 fluxes in many recent, partly influential, studies. This approach has been justified by keeping the closure time short and assuming the concentration change over time to be in the linear range. Here, we test if the application of linear regression is really appropriate for estimating CO2 fluxes using closed chambers over short closure times and if the application of nonlinear regression is necessary. We developed a nonlinear exponential regression model from diffusion and photosynthesis theory. This exponential model was tested with four different datasets of CO2 flux measurements (total number: 1764) conducted at three peatlands sites in Finland and a tundra site in Siberia. Thorough analyses of residuals demonstrated that linear regression was frequently not appropriate for the determination of CO2 fluxes by closed-chamber methods, even if closure times were kept short. The developed exponential model was well suited for nonlinear regression of the concentration over time c(t) evolution in the chamber headspace and estimation of the initial CO2 fluxes at closure time for the majority of experiments. However, a rather large percentage of the exponential regression functions showed curvatures not consistent with the theoretical model which is considered to be caused by violations of the underlying model assumptions. Especially the effects of turbulence and pressure disturbances by the chamber deployment are suspected to have caused unexplainable curvatures. CO2 flux estimates by linear regression can be as low as 40% of the flux estimates of exponential regression for closure times of only two minutes. The degree of underestimation increased with increasing CO2 flux strength and was dependent on soil and vegetation conditions which can disturb not only the quantitative but also the qualitative evaluation of CO2 flux dynamics. The underestimation effect by linear regression was observed to be different for CO2 uptake and release situations which can lead to stronger bias in the daily, seasonal and annual CO2 balances than in the individual fluxes. To avoid serious bias of CO2 flux estimates based on closed chamber experiments, we suggest further tests using published datasets and recommend the use of nonlinear regression models for future closed chamber studies.

  6. Comparing Linear Discriminant Function with Logistic Regression for the Two-Group Classification Problem.

    ERIC Educational Resources Information Center

    Fan, Xitao; Wang, Lin

    The Monte Carlo study compared the performance of predictive discriminant analysis (PDA) and that of logistic regression (LR) for the two-group classification problem. Prior probabilities were used for classification, but the cost of misclassification was assumed to be equal. The study used a fully crossed three-factor experimental design (with…

  7. Breaking the solid ground of common sense: undoing "structure" with Michael Balint.

    PubMed

    Bonomi, Carlo

    2003-09-01

    Balint's great merit was to question what, in the classical perspective, was assumed as a prerequisite for analysis and thus located beyond analysis: the maturity of the ego. A fundamental premise of his work was Ferenczi's distrust for the structural model, which praised the maturity of the ego and its verbal, social, and adaptive abilities. Ferenczi's view of ego maturation as a trauma derivative was strikingly different from the theories of all other psychoanalytic schools and seems to be responsible for Balint's understanding of regression as a sort of inverted process that enables the undoing of the sheltering structures of the mature mind. Balint's understanding of the relation between mature ego and regression diverged not only from the ego psychologists, who emphasized the idea of therapeutic alliance, but also from most of the authors who embraced the object-relational view, like Klein (who considered regression a manifestation of the patient's craving for oral gratification), Fairbairn (who gave up the notion of regression), and Guntrip (who viewed regression as a schizoid phenomenon related to the ego weakness). According to Balint, the clinical appearance of a regression would "depend also on the way the regression is recognized, is accepted, and is responded to by the analyst." In this respect, his position was close to Winnicott's reformulation of the therapeutic action. Yet, the work of Balint reflects the persuasion that the progressive fluidification of the solid structure could be enabled only by the analyst's capacity for becoming himself or herself [unsolid].

  8. Estimated Perennial Streams of Idaho and Related Geospatial Datasets

    USGS Publications Warehouse

    Rea, Alan; Skinner, Kenneth D.

    2009-01-01

    The perennial or intermittent status of a stream has bearing on many regulatory requirements. Because of changing technologies over time, cartographic representation of perennial/intermittent status of streams on U.S. Geological Survey (USGS) topographic maps is not always accurate and (or) consistent from one map sheet to another. Idaho Administrative Code defines an intermittent stream as one having a 7-day, 2-year low flow (7Q2) less than 0.1 cubic feet per second. To establish consistency with the Idaho Administrative Code, the USGS developed regional regression equations for Idaho streams for several low-flow statistics, including 7Q2. Using these regression equations, the 7Q2 streamflow may be estimated for naturally flowing streams anywhere in Idaho to help determine perennial/intermittent status of streams. Using these equations in conjunction with a Geographic Information System (GIS) technique known as weighted flow accumulation allows for an automated and continuous estimation of 7Q2 streamflow at all points along a stream, which in turn can be used to determine if a stream is intermittent or perennial according to the Idaho Administrative Code operational definition. The selected regression equations were applied to create continuous grids of 7Q2 estimates for the eight low-flow regression regions of Idaho. By applying the 0.1 ft3/s criterion, the perennial streams have been estimated in each low-flow region. Uncertainty in the estimates is shown by identifying a 'transitional' zone, corresponding to flow estimates of 0.1 ft3/s plus and minus one standard error. Considerable additional uncertainty exists in the model of perennial streams presented in this report. The regression models provide overall estimates based on general trends within each regression region. These models do not include local factors such as a large spring or a losing reach that may greatly affect flows at any given point. Site-specific flow data, assuming a sufficient period of record, generally would be considered to represent flow conditions better at a given site than flow estimates based on regionalized regression models. The geospatial datasets of modeled perennial streams are considered a first-cut estimate, and should not be construed to override site-specific flow data.

  9. A mixed-effects regression model for longitudinal multivariate ordinal data.

    PubMed

    Liu, Li C; Hedeker, Donald

    2006-03-01

    A mixed-effects item response theory model that allows for three-level multivariate ordinal outcomes and accommodates multiple random subject effects is proposed for analysis of multivariate ordinal outcomes in longitudinal studies. This model allows for the estimation of different item factor loadings (item discrimination parameters) for the multiple outcomes. The covariates in the model do not have to follow the proportional odds assumption and can be at any level. Assuming either a probit or logistic response function, maximum marginal likelihood estimation is proposed utilizing multidimensional Gauss-Hermite quadrature for integration of the random effects. An iterative Fisher scoring solution, which provides standard errors for all model parameters, is used. An analysis of a longitudinal substance use data set, where four items of substance use behavior (cigarette use, alcohol use, marijuana use, and getting drunk or high) are repeatedly measured over time, is used to illustrate application of the proposed model.

  10. The analytical representation of viscoelastic material properties using optimization techniques

    NASA Technical Reports Server (NTRS)

    Hill, S. A.

    1993-01-01

    This report presents a technique to model viscoelastic material properties with a function of the form of the Prony series. Generally, the method employed to determine the function constants requires assuming values for the exponential constants of the function and then resolving the remaining constants through linear least-squares techniques. The technique presented here allows all the constants to be analytically determined through optimization techniques. This technique is employed in a computer program named PRONY and makes use of commercially available optimization tool developed by VMA Engineering, Inc. The PRONY program was utilized to compare the technique against previously determined models for solid rocket motor TP-H1148 propellant and V747-75 Viton fluoroelastomer. In both cases, the optimization technique generated functions that modeled the test data with at least an order of magnitude better correlation. This technique has demonstrated the capability to use small or large data sets and to use data sets that have uniformly or nonuniformly spaced data pairs. The reduction of experimental data to accurate mathematical models is a vital part of most scientific and engineering research. This technique of regression through optimization can be applied to other mathematical models that are difficult to fit to experimental data through traditional regression techniques.

  11. Parametrically Guided Generalized Additive Models with Application to Mergers and Acquisitions Data

    PubMed Central

    Fan, Jianqing; Maity, Arnab; Wang, Yihui; Wu, Yichao

    2012-01-01

    Generalized nonparametric additive models present a flexible way to evaluate the effects of several covariates on a general outcome of interest via a link function. In this modeling framework, one assumes that the effect of each of the covariates is nonparametric and additive. However, in practice, often there is prior information available about the shape of the regression functions, possibly from pilot studies or exploratory analysis. In this paper, we consider such situations and propose an estimation procedure where the prior information is used as a parametric guide to fit the additive model. Specifically, we first posit a parametric family for each of the regression functions using the prior information (parametric guides). After removing these parametric trends, we then estimate the remainder of the nonparametric functions using a nonparametric generalized additive model, and form the final estimates by adding back the parametric trend. We investigate the asymptotic properties of the estimates and show that when a good guide is chosen, the asymptotic variance of the estimates can be reduced significantly while keeping the asymptotic variance same as the unguided estimator. We observe the performance of our method via a simulation study and demonstrate our method by applying to a real data set on mergers and acquisitions. PMID:23645976

  12. Parametrically Guided Generalized Additive Models with Application to Mergers and Acquisitions Data.

    PubMed

    Fan, Jianqing; Maity, Arnab; Wang, Yihui; Wu, Yichao

    2013-01-01

    Generalized nonparametric additive models present a flexible way to evaluate the effects of several covariates on a general outcome of interest via a link function. In this modeling framework, one assumes that the effect of each of the covariates is nonparametric and additive. However, in practice, often there is prior information available about the shape of the regression functions, possibly from pilot studies or exploratory analysis. In this paper, we consider such situations and propose an estimation procedure where the prior information is used as a parametric guide to fit the additive model. Specifically, we first posit a parametric family for each of the regression functions using the prior information (parametric guides). After removing these parametric trends, we then estimate the remainder of the nonparametric functions using a nonparametric generalized additive model, and form the final estimates by adding back the parametric trend. We investigate the asymptotic properties of the estimates and show that when a good guide is chosen, the asymptotic variance of the estimates can be reduced significantly while keeping the asymptotic variance same as the unguided estimator. We observe the performance of our method via a simulation study and demonstrate our method by applying to a real data set on mergers and acquisitions.

  13. A combined M5P tree and hazard-based duration model for predicting urban freeway traffic accident durations.

    PubMed

    Lin, Lei; Wang, Qian; Sadek, Adel W

    2016-06-01

    The duration of freeway traffic accidents duration is an important factor, which affects traffic congestion, environmental pollution, and secondary accidents. Among previous studies, the M5P algorithm has been shown to be an effective tool for predicting incident duration. M5P builds a tree-based model, like the traditional classification and regression tree (CART) method, but with multiple linear regression models as its leaves. The problem with M5P for accident duration prediction, however, is that whereas linear regression assumes that the conditional distribution of accident durations is normally distributed, the distribution for a "time-to-an-event" is almost certainly nonsymmetrical. A hazard-based duration model (HBDM) is a better choice for this kind of a "time-to-event" modeling scenario, and given this, HBDMs have been previously applied to analyze and predict traffic accidents duration. Previous research, however, has not yet applied HBDMs for accident duration prediction, in association with clustering or classification of the dataset to minimize data heterogeneity. The current paper proposes a novel approach for accident duration prediction, which improves on the original M5P tree algorithm through the construction of a M5P-HBDM model, in which the leaves of the M5P tree model are HBDMs instead of linear regression models. Such a model offers the advantage of minimizing data heterogeneity through dataset classification, and avoids the need for the incorrect assumption of normality for traffic accident durations. The proposed model was then tested on two freeway accident datasets. For each dataset, the first 500 records were used to train the following three models: (1) an M5P tree; (2) a HBDM; and (3) the proposed M5P-HBDM, and the remainder of data were used for testing. The results show that the proposed M5P-HBDM managed to identify more significant and meaningful variables than either M5P or HBDMs. Moreover, the M5P-HBDM had the lowest overall mean absolute percentage error (MAPE). Copyright © 2016 Elsevier Ltd. All rights reserved.

  14. The importance of regional models in assessing canine cancer incidences in Switzerland

    PubMed Central

    Leyk, Stefan; Brunsdon, Christopher; Graf, Ramona; Pospischil, Andreas; Fabrikant, Sara Irina

    2018-01-01

    Fitting canine cancer incidences through a conventional regression model assumes constant statistical relationships across the study area in estimating the model coefficients. However, it is often more realistic to consider that these relationships may vary over space. Such a condition, known as spatial non-stationarity, implies that the model coefficients need to be estimated locally. In these kinds of local models, the geographic scale, or spatial extent, employed for coefficient estimation may also have a pervasive influence. This is because important variations in the local model coefficients across geographic scales may impact the understanding of local relationships. In this study, we fitted canine cancer incidences across Swiss municipal units through multiple regional models. We computed diagnostic summaries across the different regional models, and contrasted them with the diagnostics of the conventional regression model, using value-by-alpha maps and scalograms. The results of this comparative assessment enabled us to identify variations in the goodness-of-fit and coefficient estimates. We detected spatially non-stationary relationships, in particular, for the variables related to biological risk factors. These variations in the model coefficients were more important at small geographic scales, making a case for the need to model canine cancer incidences locally in contrast to more conventional global approaches. However, we contend that prior to undertaking local modeling efforts, a deeper understanding of the effects of geographic scale is needed to better characterize and identify local model relationships. PMID:29652921

  15. The importance of regional models in assessing canine cancer incidences in Switzerland.

    PubMed

    Boo, Gianluca; Leyk, Stefan; Brunsdon, Christopher; Graf, Ramona; Pospischil, Andreas; Fabrikant, Sara Irina

    2018-01-01

    Fitting canine cancer incidences through a conventional regression model assumes constant statistical relationships across the study area in estimating the model coefficients. However, it is often more realistic to consider that these relationships may vary over space. Such a condition, known as spatial non-stationarity, implies that the model coefficients need to be estimated locally. In these kinds of local models, the geographic scale, or spatial extent, employed for coefficient estimation may also have a pervasive influence. This is because important variations in the local model coefficients across geographic scales may impact the understanding of local relationships. In this study, we fitted canine cancer incidences across Swiss municipal units through multiple regional models. We computed diagnostic summaries across the different regional models, and contrasted them with the diagnostics of the conventional regression model, using value-by-alpha maps and scalograms. The results of this comparative assessment enabled us to identify variations in the goodness-of-fit and coefficient estimates. We detected spatially non-stationary relationships, in particular, for the variables related to biological risk factors. These variations in the model coefficients were more important at small geographic scales, making a case for the need to model canine cancer incidences locally in contrast to more conventional global approaches. However, we contend that prior to undertaking local modeling efforts, a deeper understanding of the effects of geographic scale is needed to better characterize and identify local model relationships.

  16. Regression analysis of case K interval-censored failure time data in the presence of informative censoring.

    PubMed

    Wang, Peijie; Zhao, Hui; Sun, Jianguo

    2016-12-01

    Interval-censored failure time data occur in many fields such as demography, economics, medical research, and reliability and many inference procedures on them have been developed (Sun, 2006; Chen, Sun, and Peace, 2012). However, most of the existing approaches assume that the mechanism that yields interval censoring is independent of the failure time of interest and it is clear that this may not be true in practice (Zhang et al., 2007; Ma, Hu, and Sun, 2015). In this article, we consider regression analysis of case K interval-censored failure time data when the censoring mechanism may be related to the failure time of interest. For the problem, an estimated sieve maximum-likelihood approach is proposed for the data arising from the proportional hazards frailty model and for estimation, a two-step procedure is presented. In the addition, the asymptotic properties of the proposed estimators of regression parameters are established and an extensive simulation study suggests that the method works well. Finally, we apply the method to a set of real interval-censored data that motivated this study. © 2016, The International Biometric Society.

  17. The effect of using genealogy-based haplotypes for genomic prediction

    PubMed Central

    2013-01-01

    Background Genomic prediction uses two sources of information: linkage disequilibrium between markers and quantitative trait loci, and additive genetic relationships between individuals. One way to increase the accuracy of genomic prediction is to capture more linkage disequilibrium by regression on haplotypes instead of regression on individual markers. The aim of this study was to investigate the accuracy of genomic prediction using haplotypes based on local genealogy information. Methods A total of 4429 Danish Holstein bulls were genotyped with the 50K SNP chip. Haplotypes were constructed using local genealogical trees. Effects of haplotype covariates were estimated with two types of prediction models: (1) assuming that effects had the same distribution for all haplotype covariates, i.e. the GBLUP method and (2) assuming that a large proportion (π) of the haplotype covariates had zero effect, i.e. a Bayesian mixture method. Results About 7.5 times more covariate effects were estimated when fitting haplotypes based on local genealogical trees compared to fitting individuals markers. Genealogy-based haplotype clustering slightly increased the accuracy of genomic prediction and, in some cases, decreased the bias of prediction. With the Bayesian method, accuracy of prediction was less sensitive to parameter π when fitting haplotypes compared to fitting markers. Conclusions Use of haplotypes based on genealogy can slightly increase the accuracy of genomic prediction. Improved methods to cluster the haplotypes constructed from local genealogy could lead to additional gains in accuracy. PMID:23496971

  18. The effect of using genealogy-based haplotypes for genomic prediction.

    PubMed

    Edriss, Vahid; Fernando, Rohan L; Su, Guosheng; Lund, Mogens S; Guldbrandtsen, Bernt

    2013-03-06

    Genomic prediction uses two sources of information: linkage disequilibrium between markers and quantitative trait loci, and additive genetic relationships between individuals. One way to increase the accuracy of genomic prediction is to capture more linkage disequilibrium by regression on haplotypes instead of regression on individual markers. The aim of this study was to investigate the accuracy of genomic prediction using haplotypes based on local genealogy information. A total of 4429 Danish Holstein bulls were genotyped with the 50K SNP chip. Haplotypes were constructed using local genealogical trees. Effects of haplotype covariates were estimated with two types of prediction models: (1) assuming that effects had the same distribution for all haplotype covariates, i.e. the GBLUP method and (2) assuming that a large proportion (π) of the haplotype covariates had zero effect, i.e. a Bayesian mixture method. About 7.5 times more covariate effects were estimated when fitting haplotypes based on local genealogical trees compared to fitting individuals markers. Genealogy-based haplotype clustering slightly increased the accuracy of genomic prediction and, in some cases, decreased the bias of prediction. With the Bayesian method, accuracy of prediction was less sensitive to parameter π when fitting haplotypes compared to fitting markers. Use of haplotypes based on genealogy can slightly increase the accuracy of genomic prediction. Improved methods to cluster the haplotypes constructed from local genealogy could lead to additional gains in accuracy.

  19. Using nonlinear quantile regression to estimate the self-thinning boundary curve

    Treesearch

    Quang V. Cao; Thomas J. Dean

    2015-01-01

    The relationship between tree size (quadratic mean diameter) and tree density (number of trees per unit area) has been a topic of research and discussion for many decades. Starting with Reineke in 1933, the maximum size-density relationship, on a log-log scale, has been assumed to be linear. Several techniques, including linear quantile regression, have been employed...

  20. Association between arsenic exposure from a coal-burning power plant and urinary arsenic concentrations in Prievidza District, Slovakia.

    PubMed

    Ranft, Ulrich; Miskovic, Peter; Pesch, Beate; Jakubis, Pavel; Fabianova, Elenora; Keegan, Tom; Hergemöller, Andre; Jakubis, Marian; Nieuwenhuijsen, Mark J

    2003-06-01

    To assess the arsenic exposure of a population living in the vicinity of a coal-burning power plant with high arsenic emission in the Prievidza District, Slovakia, 548 spot urine samples were speciated for inorganic As (Asinorg), monomethylarsonic acid (MMA), dimethylarsinic acid (DMA), and their sum (Assum). The urine samples were collected from the population of a case-control study on nonmelanoma skin cancer (NMSC). A total of 411 samples with complete As speciations and sufficient urine quality and without fish consumption were used for statistical analysis. Although current environmental As exposure and urinary As concentrations were low (median As in soil within 5 km distance to the power plant, 41 micro g/g; median urinary Assum, 5.8 microg/L), there was a significant but weak association between As in soil and urinary Assum(r = 0.21, p < 0.01). We performed a multivariate regression analysis to calculate adjusted regression coefficients for environmental As exposure and other determinants of urinary As. Persons living in the vicinity of the plant had 27% higher Assum values (p < 0.01), based on elevated concentrations of the methylated species. A 32% increase of MMA occurred among subjects who consumed homegrown food (p < 0.001). NMSC cases had significantly higher levels of Assum, DMA, and Asinorg. The methylation index Asinorg/(MMA + DMA) was about 20% lower among cases (p < 0.05) and in men (p < 0.05) compared with controls and females, respectively.

  1. Association between arsenic exposure from a coal-burning power plant and urinary arsenic concentrations in Prievidza District, Slovakia.

    PubMed Central

    Ranft, Ulrich; Miskovic, Peter; Pesch, Beate; Jakubis, Pavel; Fabianova, Elenora; Keegan, Tom; Hergemöller, Andre; Jakubis, Marian; Nieuwenhuijsen, Mark J

    2003-01-01

    To assess the arsenic exposure of a population living in the vicinity of a coal-burning power plant with high arsenic emission in the Prievidza District, Slovakia, 548 spot urine samples were speciated for inorganic As (Asinorg), monomethylarsonic acid (MMA), dimethylarsinic acid (DMA), and their sum (Assum). The urine samples were collected from the population of a case-control study on nonmelanoma skin cancer (NMSC). A total of 411 samples with complete As speciations and sufficient urine quality and without fish consumption were used for statistical analysis. Although current environmental As exposure and urinary As concentrations were low (median As in soil within 5 km distance to the power plant, 41 micro g/g; median urinary Assum, 5.8 microg/L), there was a significant but weak association between As in soil and urinary Assum(r = 0.21, p < 0.01). We performed a multivariate regression analysis to calculate adjusted regression coefficients for environmental As exposure and other determinants of urinary As. Persons living in the vicinity of the plant had 27% higher Assum values (p < 0.01), based on elevated concentrations of the methylated species. A 32% increase of MMA occurred among subjects who consumed homegrown food (p < 0.001). NMSC cases had significantly higher levels of Assum, DMA, and Asinorg. The methylation index Asinorg/(MMA + DMA) was about 20% lower among cases (p < 0.05) and in men (p < 0.05) compared with controls and females, respectively. PMID:12782488

  2. Bias and uncertainty in regression-calibrated models of groundwater flow in heterogeneous media

    USGS Publications Warehouse

    Cooley, R.L.; Christensen, S.

    2006-01-01

    Groundwater models need to account for detailed but generally unknown spatial variability (heterogeneity) of the hydrogeologic model inputs. To address this problem we replace the large, m-dimensional stochastic vector ?? that reflects both small and large scales of heterogeneity in the inputs by a lumped or smoothed m-dimensional approximation ????*, where ?? is an interpolation matrix and ??* is a stochastic vector of parameters. Vector ??* has small enough dimension to allow its estimation with the available data. The consequence of the replacement is that model function f(????*) written in terms of the approximate inputs is in error with respect to the same model function written in terms of ??, ??,f(??), which is assumed to be nearly exact. The difference f(??) - f(????*), termed model error, is spatially correlated, generates prediction biases, and causes standard confidence and prediction intervals to be too small. Model error is accounted for in the weighted nonlinear regression methodology developed to estimate ??* and assess model uncertainties by incorporating the second-moment matrix of the model errors into the weight matrix. Techniques developed by statisticians to analyze classical nonlinear regression methods are extended to analyze the revised method. The analysis develops analytical expressions for bias terms reflecting the interaction of model nonlinearity and model error, for correction factors needed to adjust the sizes of confidence and prediction intervals for this interaction, and for correction factors needed to adjust the sizes of confidence and prediction intervals for possible use of a diagonal weight matrix in place of the correct one. If terms expressing the degree of intrinsic nonlinearity for f(??) and f(????*) are small, then most of the biases are small and the correction factors are reduced in magnitude. Biases, correction factors, and confidence and prediction intervals were obtained for a test problem for which model error is large to test robustness of the methodology. Numerical results conform with the theoretical analysis. ?? 2005 Elsevier Ltd. All rights reserved.

  3. Controlled pattern imputation for sensitivity analysis of longitudinal binary and ordinal outcomes with nonignorable dropout.

    PubMed

    Tang, Yongqiang

    2018-04-30

    The controlled imputation method refers to a class of pattern mixture models that have been commonly used as sensitivity analyses of longitudinal clinical trials with nonignorable dropout in recent years. These pattern mixture models assume that participants in the experimental arm after dropout have similar response profiles to the control participants or have worse outcomes than otherwise similar participants who remain on the experimental treatment. In spite of its popularity, the controlled imputation has not been formally developed for longitudinal binary and ordinal outcomes partially due to the lack of a natural multivariate distribution for such endpoints. In this paper, we propose 2 approaches for implementing the controlled imputation for binary and ordinal data based respectively on the sequential logistic regression and the multivariate probit model. Efficient Markov chain Monte Carlo algorithms are developed for missing data imputation by using the monotone data augmentation technique for the sequential logistic regression and a parameter-expanded monotone data augmentation scheme for the multivariate probit model. We assess the performance of the proposed procedures by simulation and the analysis of a schizophrenia clinical trial and compare them with the fully conditional specification, last observation carried forward, and baseline observation carried forward imputation methods. Copyright © 2018 John Wiley & Sons, Ltd.

  4. Gene-Based Association Analysis for Censored Traits Via Fixed Effect Functional Regressions.

    PubMed

    Fan, Ruzong; Wang, Yifan; Yan, Qi; Ding, Ying; Weeks, Daniel E; Lu, Zhaohui; Ren, Haobo; Cook, Richard J; Xiong, Momiao; Swaroop, Anand; Chew, Emily Y; Chen, Wei

    2016-02-01

    Genetic studies of survival outcomes have been proposed and conducted recently, but statistical methods for identifying genetic variants that affect disease progression are rarely developed. Motivated by our ongoing real studies, here we develop Cox proportional hazard models using functional regression (FR) to perform gene-based association analysis of survival traits while adjusting for covariates. The proposed Cox models are fixed effect models where the genetic effects of multiple genetic variants are assumed to be fixed. We introduce likelihood ratio test (LRT) statistics to test for associations between the survival traits and multiple genetic variants in a genetic region. Extensive simulation studies demonstrate that the proposed Cox RF LRT statistics have well-controlled type I error rates. To evaluate power, we compare the Cox FR LRT with the previously developed burden test (BT) in a Cox model and sequence kernel association test (SKAT), which is based on mixed effect Cox models. The Cox FR LRT statistics have higher power than or similar power as Cox SKAT LRT except when 50%/50% causal variants had negative/positive effects and all causal variants are rare. In addition, the Cox FR LRT statistics have higher power than Cox BT LRT. The models and related test statistics can be useful in the whole genome and whole exome association studies. An age-related macular degeneration dataset was analyzed as an example. © 2016 WILEY PERIODICALS, INC.

  5. Gene-based Association Analysis for Censored Traits Via Fixed Effect Functional Regressions

    PubMed Central

    Fan, Ruzong; Wang, Yifan; Yan, Qi; Ding, Ying; Weeks, Daniel E.; Lu, Zhaohui; Ren, Haobo; Cook, Richard J; Xiong, Momiao; Swaroop, Anand; Chew, Emily Y.; Chen, Wei

    2015-01-01

    Summary Genetic studies of survival outcomes have been proposed and conducted recently, but statistical methods for identifying genetic variants that affect disease progression are rarely developed. Motivated by our ongoing real studies, we develop here Cox proportional hazard models using functional regression (FR) to perform gene-based association analysis of survival traits while adjusting for covariates. The proposed Cox models are fixed effect models where the genetic effects of multiple genetic variants are assumed to be fixed. We introduce likelihood ratio test (LRT) statistics to test for associations between the survival traits and multiple genetic variants in a genetic region. Extensive simulation studies demonstrate that the proposed Cox RF LRT statistics have well-controlled type I error rates. To evaluate power, we compare the Cox FR LRT with the previously developed burden test (BT) in a Cox model and sequence kernel association test (SKAT) which is based on mixed effect Cox models. The Cox FR LRT statistics have higher power than or similar power as Cox SKAT LRT except when 50%/50% causal variants had negative/positive effects and all causal variants are rare. In addition, the Cox FR LRT statistics have higher power than Cox BT LRT. The models and related test statistics can be useful in the whole genome and whole exome association studies. An age-related macular degeneration dataset was analyzed as an example. PMID:26782979

  6. CO2 flux determination by closed-chamber methods can be seriously biased by inappropriate application of linear regression

    NASA Astrophysics Data System (ADS)

    Kutzbach, L.; Schneider, J.; Sachs, T.; Giebels, M.; Nykänen, H.; Shurpali, N. J.; Martikainen, P. J.; Alm, J.; Wilmking, M.

    2007-07-01

    Closed (non-steady state) chambers are widely used for quantifying carbon dioxide (CO2) fluxes between soils or low-stature canopies and the atmosphere. It is well recognised that covering a soil or vegetation by a closed chamber inherently disturbs the natural CO2 fluxes by altering the concentration gradients between the soil, the vegetation and the overlying air. Thus, the driving factors of CO2 fluxes are not constant during the closed chamber experiment, and no linear increase or decrease of CO2 concentration over time within the chamber headspace can be expected. Nevertheless, linear regression has been applied for calculating CO2 fluxes in many recent, partly influential, studies. This approach was justified by keeping the closure time short and assuming the concentration change over time to be in the linear range. Here, we test if the application of linear regression is really appropriate for estimating CO2 fluxes using closed chambers over short closure times and if the application of nonlinear regression is necessary. We developed a nonlinear exponential regression model from diffusion and photosynthesis theory. This exponential model was tested with four different datasets of CO2 flux measurements (total number: 1764) conducted at three peatland sites in Finland and a tundra site in Siberia. The flux measurements were performed using transparent chambers on vegetated surfaces and opaque chambers on bare peat surfaces. Thorough analyses of residuals demonstrated that linear regression was frequently not appropriate for the determination of CO2 fluxes by closed-chamber methods, even if closure times were kept short. The developed exponential model was well suited for nonlinear regression of the concentration over time c(t) evolution in the chamber headspace and estimation of the initial CO2 fluxes at closure time for the majority of experiments. CO2 flux estimates by linear regression can be as low as 40% of the flux estimates of exponential regression for closure times of only two minutes and even lower for longer closure times. The degree of underestimation increased with increasing CO2 flux strength and is dependent on soil and vegetation conditions which can disturb not only the quantitative but also the qualitative evaluation of CO2 flux dynamics. The underestimation effect by linear regression was observed to be different for CO2 uptake and release situations which can lead to stronger bias in the daily, seasonal and annual CO2 balances than in the individual fluxes. To avoid serious bias of CO2 flux estimates based on closed chamber experiments, we suggest further tests using published datasets and recommend the use of nonlinear regression models for future closed chamber studies.

  7. Sparse multivariate factor analysis regression models and its applications to integrative genomics analysis.

    PubMed

    Zhou, Yan; Wang, Pei; Wang, Xianlong; Zhu, Ji; Song, Peter X-K

    2017-01-01

    The multivariate regression model is a useful tool to explore complex associations between two kinds of molecular markers, which enables the understanding of the biological pathways underlying disease etiology. For a set of correlated response variables, accounting for such dependency can increase statistical power. Motivated by integrative genomic data analyses, we propose a new methodology-sparse multivariate factor analysis regression model (smFARM), in which correlations of response variables are assumed to follow a factor analysis model with latent factors. This proposed method not only allows us to address the challenge that the number of association parameters is larger than the sample size, but also to adjust for unobserved genetic and/or nongenetic factors that potentially conceal the underlying response-predictor associations. The proposed smFARM is implemented by the EM algorithm and the blockwise coordinate descent algorithm. The proposed methodology is evaluated and compared to the existing methods through extensive simulation studies. Our results show that accounting for latent factors through the proposed smFARM can improve sensitivity of signal detection and accuracy of sparse association map estimation. We illustrate smFARM by two integrative genomics analysis examples, a breast cancer dataset, and an ovarian cancer dataset, to assess the relationship between DNA copy numbers and gene expression arrays to understand genetic regulatory patterns relevant to the disease. We identify two trans-hub regions: one in cytoband 17q12 whose amplification influences the RNA expression levels of important breast cancer genes, and the other in cytoband 9q21.32-33, which is associated with chemoresistance in ovarian cancer. © 2016 WILEY PERIODICALS, INC.

  8. Uncertainty analysis of an inflow forecasting model: extension of the UNEEC machine learning-based method

    NASA Astrophysics Data System (ADS)

    Pianosi, Francesca; Lal Shrestha, Durga; Solomatine, Dimitri

    2010-05-01

    This research presents an extension of UNEEC (Uncertainty Estimation based on Local Errors and Clustering, Shrestha and Solomatine, 2006, 2008 & Solomatine and Shrestha, 2009) method in the direction of explicit inclusion of parameter uncertainty. UNEEC method assumes that there is an optimal model and the residuals of the model can be used to assess the uncertainty of the model prediction. It is assumed that all sources of uncertainty including input, parameter and model structure uncertainty are explicitly manifested in the model residuals. In this research, theses assumptions are relaxed, and the UNEEC method is extended to consider parameter uncertainty as well (abbreviated as UNEEC-P). In UNEEC-P, first we use Monte Carlo (MC) sampling in parameter space to generate N model realizations (each of which is a time series), estimate the prediction quantiles based on the empirical distribution functions of the model residuals considering all the residual realizations, and only then apply the standard UNEEC method that encapsulates the uncertainty of a hydrologic model (expressed by quantiles of the error distribution) in a machine learning model (e.g., ANN). UNEEC-P is applied first to a linear regression model of synthetic data, and then to a real case study of forecasting inflow to lake Lugano in northern Italy. The inflow forecasting model is a stochastic heteroscedastic model (Pianosi and Soncini-Sessa, 2009). The preliminary results show that the UNEEC-P method produces wider uncertainty bounds, which is consistent with the fact that the method considers also parameter uncertainty of the optimal model. In the future UNEEC method will be further extended to consider input and structure uncertainty which will provide more realistic estimation of model predictions.

  9. Is Private Production of Public Services Cheaper Than Public Production? A Meta-Regression Analysis of Solid Waste and Water Services

    ERIC Educational Resources Information Center

    Bel, Germa; Fageda, Xavier; Warner, Mildred E.

    2010-01-01

    Privatization of local government services is assumed to deliver cost savings, but empirical evidence for this from around the world is mixed. We conduct a meta-regression analysis of all econometric studies examining privatization of water distribution and solid waste collection services and find no systematic support for lower costs with private…

  10. Poisson, Poisson-gamma and zero-inflated regression models of motor vehicle crashes: balancing statistical fit and theory.

    PubMed

    Lord, Dominique; Washington, Simon P; Ivan, John N

    2005-01-01

    There has been considerable research conducted over the last 20 years focused on predicting motor vehicle crashes on transportation facilities. The range of statistical models commonly applied includes binomial, Poisson, Poisson-gamma (or negative binomial), zero-inflated Poisson and negative binomial models (ZIP and ZINB), and multinomial probability models. Given the range of possible modeling approaches and the host of assumptions with each modeling approach, making an intelligent choice for modeling motor vehicle crash data is difficult. There is little discussion in the literature comparing different statistical modeling approaches, identifying which statistical models are most appropriate for modeling crash data, and providing a strong justification from basic crash principles. In the recent literature, it has been suggested that the motor vehicle crash process can successfully be modeled by assuming a dual-state data-generating process, which implies that entities (e.g., intersections, road segments, pedestrian crossings, etc.) exist in one of two states-perfectly safe and unsafe. As a result, the ZIP and ZINB are two models that have been applied to account for the preponderance of "excess" zeros frequently observed in crash count data. The objective of this study is to provide defensible guidance on how to appropriate model crash data. We first examine the motor vehicle crash process using theoretical principles and a basic understanding of the crash process. It is shown that the fundamental crash process follows a Bernoulli trial with unequal probability of independent events, also known as Poisson trials. We examine the evolution of statistical models as they apply to the motor vehicle crash process, and indicate how well they statistically approximate the crash process. We also present the theory behind dual-state process count models, and note why they have become popular for modeling crash data. A simulation experiment is then conducted to demonstrate how crash data give rise to "excess" zeros frequently observed in crash data. It is shown that the Poisson and other mixed probabilistic structures are approximations assumed for modeling the motor vehicle crash process. Furthermore, it is demonstrated that under certain (fairly common) circumstances excess zeros are observed-and that these circumstances arise from low exposure and/or inappropriate selection of time/space scales and not an underlying dual state process. In conclusion, carefully selecting the time/space scales for analysis, including an improved set of explanatory variables and/or unobserved heterogeneity effects in count regression models, or applying small-area statistical methods (observations with low exposure) represent the most defensible modeling approaches for datasets with a preponderance of zeros.

  11. A Two-Step Approach for Analysis of Nonignorable Missing Outcomes in Longitudinal Regression: an Application to Upstate KIDS Study.

    PubMed

    Liu, Danping; Yeung, Edwina H; McLain, Alexander C; Xie, Yunlong; Buck Louis, Germaine M; Sundaram, Rajeshwari

    2017-09-01

    Imperfect follow-up in longitudinal studies commonly leads to missing outcome data that can potentially bias the inference when the missingness is nonignorable; that is, the propensity of missingness depends on missing values in the data. In the Upstate KIDS Study, we seek to determine if the missingness of child development outcomes is nonignorable, and how a simple model assuming ignorable missingness would compare with more complicated models for a nonignorable mechanism. To correct for nonignorable missingness, the shared random effects model (SREM) jointly models the outcome and the missing mechanism. However, the computational complexity and lack of software packages has limited its practical applications. This paper proposes a novel two-step approach to handle nonignorable missing outcomes in generalized linear mixed models. We first analyse the missing mechanism with a generalized linear mixed model and predict values of the random effects; then, the outcome model is fitted adjusting for the predicted random effects to account for heterogeneity in the missingness propensity. Extensive simulation studies suggest that the proposed method is a reliable approximation to SREM, with a much faster computation. The nonignorability of missing data in the Upstate KIDS Study is estimated to be mild to moderate, and the analyses using the two-step approach or SREM are similar to the model assuming ignorable missingness. The two-step approach is a computationally straightforward method that can be conducted as sensitivity analyses in longitudinal studies to examine violations to the ignorable missingness assumption and the implications relative to health outcomes. © 2017 John Wiley & Sons Ltd.

  12. Characterizing the performance of the Conway-Maxwell Poisson generalized linear model.

    PubMed

    Francis, Royce A; Geedipally, Srinivas Reddy; Guikema, Seth D; Dhavala, Soma Sekhar; Lord, Dominique; LaRocca, Sarah

    2012-01-01

    Count data are pervasive in many areas of risk analysis; deaths, adverse health outcomes, infrastructure system failures, and traffic accidents are all recorded as count events, for example. Risk analysts often wish to estimate the probability distribution for the number of discrete events as part of doing a risk assessment. Traditional count data regression models of the type often used in risk assessment for this problem suffer from limitations due to the assumed variance structure. A more flexible model based on the Conway-Maxwell Poisson (COM-Poisson) distribution was recently proposed, a model that has the potential to overcome the limitations of the traditional model. However, the statistical performance of this new model has not yet been fully characterized. This article assesses the performance of a maximum likelihood estimation method for fitting the COM-Poisson generalized linear model (GLM). The objectives of this article are to (1) characterize the parameter estimation accuracy of the MLE implementation of the COM-Poisson GLM, and (2) estimate the prediction accuracy of the COM-Poisson GLM using simulated data sets. The results of the study indicate that the COM-Poisson GLM is flexible enough to model under-, equi-, and overdispersed data sets with different sample mean values. The results also show that the COM-Poisson GLM yields accurate parameter estimates. The COM-Poisson GLM provides a promising and flexible approach for performing count data regression. © 2011 Society for Risk Analysis.

  13. A robust real-time surface reconstruction method on point clouds captured from a 3D surface photogrammetry system

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Liu, Wenyang; Cheung, Yam; Sawant, Amit

    2016-05-15

    Purpose: To develop a robust and real-time surface reconstruction method on point clouds captured from a 3D surface photogrammetry system. Methods: The authors have developed a robust and fast surface reconstruction method on point clouds acquired by the photogrammetry system, without explicitly solving the partial differential equation required by a typical variational approach. Taking advantage of the overcomplete nature of the acquired point clouds, their method solves and propagates a sparse linear relationship from the point cloud manifold to the surface manifold, assuming both manifolds share similar local geometry. With relatively consistent point cloud acquisitions, the authors propose a sparsemore » regression (SR) model to directly approximate the target point cloud as a sparse linear combination from the training set, assuming that the point correspondences built by the iterative closest point (ICP) is reasonably accurate and have residual errors following a Gaussian distribution. To accommodate changing noise levels and/or presence of inconsistent occlusions during the acquisition, the authors further propose a modified sparse regression (MSR) model to model the potentially large and sparse error built by ICP with a Laplacian prior. The authors evaluated the proposed method on both clinical point clouds acquired under consistent acquisition conditions and on point clouds with inconsistent occlusions. The authors quantitatively evaluated the reconstruction performance with respect to root-mean-squared-error, by comparing its reconstruction results against that from the variational method. Results: On clinical point clouds, both the SR and MSR models have achieved sub-millimeter reconstruction accuracy and reduced the reconstruction time by two orders of magnitude to a subsecond reconstruction time. On point clouds with inconsistent occlusions, the MSR model has demonstrated its advantage in achieving consistent and robust performance despite the introduced occlusions. Conclusions: The authors have developed a fast and robust surface reconstruction method on point clouds captured from a 3D surface photogrammetry system, with demonstrated sub-millimeter reconstruction accuracy and subsecond reconstruction time. It is suitable for real-time motion tracking in radiotherapy, with clear surface structures for better quantifications.« less

  14. A robust real-time surface reconstruction method on point clouds captured from a 3D surface photogrammetry system.

    PubMed

    Liu, Wenyang; Cheung, Yam; Sawant, Amit; Ruan, Dan

    2016-05-01

    To develop a robust and real-time surface reconstruction method on point clouds captured from a 3D surface photogrammetry system. The authors have developed a robust and fast surface reconstruction method on point clouds acquired by the photogrammetry system, without explicitly solving the partial differential equation required by a typical variational approach. Taking advantage of the overcomplete nature of the acquired point clouds, their method solves and propagates a sparse linear relationship from the point cloud manifold to the surface manifold, assuming both manifolds share similar local geometry. With relatively consistent point cloud acquisitions, the authors propose a sparse regression (SR) model to directly approximate the target point cloud as a sparse linear combination from the training set, assuming that the point correspondences built by the iterative closest point (ICP) is reasonably accurate and have residual errors following a Gaussian distribution. To accommodate changing noise levels and/or presence of inconsistent occlusions during the acquisition, the authors further propose a modified sparse regression (MSR) model to model the potentially large and sparse error built by ICP with a Laplacian prior. The authors evaluated the proposed method on both clinical point clouds acquired under consistent acquisition conditions and on point clouds with inconsistent occlusions. The authors quantitatively evaluated the reconstruction performance with respect to root-mean-squared-error, by comparing its reconstruction results against that from the variational method. On clinical point clouds, both the SR and MSR models have achieved sub-millimeter reconstruction accuracy and reduced the reconstruction time by two orders of magnitude to a subsecond reconstruction time. On point clouds with inconsistent occlusions, the MSR model has demonstrated its advantage in achieving consistent and robust performance despite the introduced occlusions. The authors have developed a fast and robust surface reconstruction method on point clouds captured from a 3D surface photogrammetry system, with demonstrated sub-millimeter reconstruction accuracy and subsecond reconstruction time. It is suitable for real-time motion tracking in radiotherapy, with clear surface structures for better quantifications.

  15. A robust real-time surface reconstruction method on point clouds captured from a 3D surface photogrammetry system

    PubMed Central

    Liu, Wenyang; Cheung, Yam; Sawant, Amit; Ruan, Dan

    2016-01-01

    Purpose: To develop a robust and real-time surface reconstruction method on point clouds captured from a 3D surface photogrammetry system. Methods: The authors have developed a robust and fast surface reconstruction method on point clouds acquired by the photogrammetry system, without explicitly solving the partial differential equation required by a typical variational approach. Taking advantage of the overcomplete nature of the acquired point clouds, their method solves and propagates a sparse linear relationship from the point cloud manifold to the surface manifold, assuming both manifolds share similar local geometry. With relatively consistent point cloud acquisitions, the authors propose a sparse regression (SR) model to directly approximate the target point cloud as a sparse linear combination from the training set, assuming that the point correspondences built by the iterative closest point (ICP) is reasonably accurate and have residual errors following a Gaussian distribution. To accommodate changing noise levels and/or presence of inconsistent occlusions during the acquisition, the authors further propose a modified sparse regression (MSR) model to model the potentially large and sparse error built by ICP with a Laplacian prior. The authors evaluated the proposed method on both clinical point clouds acquired under consistent acquisition conditions and on point clouds with inconsistent occlusions. The authors quantitatively evaluated the reconstruction performance with respect to root-mean-squared-error, by comparing its reconstruction results against that from the variational method. Results: On clinical point clouds, both the SR and MSR models have achieved sub-millimeter reconstruction accuracy and reduced the reconstruction time by two orders of magnitude to a subsecond reconstruction time. On point clouds with inconsistent occlusions, the MSR model has demonstrated its advantage in achieving consistent and robust performance despite the introduced occlusions. Conclusions: The authors have developed a fast and robust surface reconstruction method on point clouds captured from a 3D surface photogrammetry system, with demonstrated sub-millimeter reconstruction accuracy and subsecond reconstruction time. It is suitable for real-time motion tracking in radiotherapy, with clear surface structures for better quantifications. PMID:27147347

  16. Intra-individual gait patterns across different time-scales as revealed by means of a supervised learning model using kernel-based discriminant regression.

    PubMed

    Horst, Fabian; Eekhoff, Alexander; Newell, Karl M; Schöllhorn, Wolfgang I

    2017-01-01

    Traditionally, gait analysis has been centered on the idea of average behavior and normality. On one hand, clinical diagnoses and therapeutic interventions typically assume that average gait patterns remain constant over time. On the other hand, it is well known that all our movements are accompanied by a certain amount of variability, which does not allow us to make two identical steps. The purpose of this study was to examine changes in the intra-individual gait patterns across different time-scales (i.e., tens-of-mins, tens-of-hours). Nine healthy subjects performed 15 gait trials at a self-selected speed on 6 sessions within one day (duration between two subsequent sessions from 10 to 90 mins). For each trial, time-continuous ground reaction forces and lower body joint angles were measured. A supervised learning model using a kernel-based discriminant regression was applied for classifying sessions within individual gait patterns. Discernable characteristics of intra-individual gait patterns could be distinguished between repeated sessions by classification rates of 67.8 ± 8.8% and 86.3 ± 7.9% for the six-session-classification of ground reaction forces and lower body joint angles, respectively. Furthermore, the one-on-one-classification showed that increasing classification rates go along with increasing time durations between two sessions and indicate that changes of gait patterns appear at different time-scales. Discernable characteristics between repeated sessions indicate continuous intrinsic changes in intra-individual gait patterns and suggest a predominant role of deterministic processes in human motor control and learning. Natural changes of gait patterns without any externally induced injury or intervention may reflect continuous adaptations of the motor system over several time-scales. Accordingly, the modelling of walking by means of average gait patterns that are assumed to be near constant over time needs to be reconsidered in the context of these findings, especially towards more individualized and situational diagnoses and therapy.

  17. An example of complex modelling in dentistry using Markov chain Monte Carlo (MCMC) simulation.

    PubMed

    Helfenstein, Ulrich; Menghini, Giorgio; Steiner, Marcel; Murati, Francesca

    2002-09-01

    In the usual regression setting one regression line is computed for a whole data set. In a more complex situation, each person may be observed for example at several points in time and thus a regression line might be calculated for each person. Additional complexities, such as various forms of errors in covariables may make a straightforward statistical evaluation difficult or even impossible. During recent years methods have been developed allowing convenient analysis of problems where the data and the corresponding models show these and many other forms of complexity. The methodology makes use of a Bayesian approach and Markov chain Monte Carlo (MCMC) simulations. The methods allow the construction of increasingly elaborate models by building them up from local sub-models. The essential structure of the models can be represented visually by directed acyclic graphs (DAG). This attractive property allows communication and discussion of the essential structure and the substantial meaning of a complex model without needing algebra. After presentation of the statistical methods an example from dentistry is presented in order to demonstrate their application and use. The dataset of the example had a complex structure; each of a set of children was followed up over several years. The number of new fillings in permanent teeth had been recorded at several ages. The dependent variables were markedly different from the normal distribution and could not be transformed to normality. In addition, explanatory variables were assumed to be measured with different forms of error. Illustration of how the corresponding models can be estimated conveniently via MCMC simulation, in particular, 'Gibbs sampling', using the freely available software BUGS is presented. In addition, how the measurement error may influence the estimates of the corresponding coefficients is explored. It is demonstrated that the effect of the independent variable on the dependent variable may be markedly underestimated if the measurement error is not taken into account ('regression dilution bias'). Markov chain Monte Carlo methods may be of great value to dentists in allowing analysis of data sets which exhibit a wide range of different forms of complexity.

  18. Isolating the cow-specific part of residual energy intake in lactating dairy cows using random regressions.

    PubMed

    Fischer, A; Friggens, N C; Berry, D P; Faverdin, P

    2018-07-01

    The ability to properly assess and accurately phenotype true differences in feed efficiency among dairy cows is key to the development of breeding programs for improving feed efficiency. The variability among individuals in feed efficiency is commonly characterised by the residual intake approach. Residual feed intake is represented by the residuals of a linear regression of intake on the corresponding quantities of the biological functions that consume (or release) energy. However, the residuals include both, model fitting and measurement errors as well as any variability in cow efficiency. The objective of this study was to isolate the individual animal variability in feed efficiency from the residual component. Two separate models were fitted, in one the standard residual energy intake (REI) was calculated as the residual of a multiple linear regression of lactation average net energy intake (NEI) on lactation average milk energy output, average metabolic BW, as well as lactation loss and gain of body condition score. In the other, a linear mixed model was used to simultaneously fit fixed linear regressions and random cow levels on the biological traits and intercept using fortnight repeated measures for the variables. This method split the predicted NEI in two parts: one quantifying the population mean intercept and coefficients, and one quantifying cow-specific deviations in the intercept and coefficients. The cow-specific part of predicted NEI was assumed to isolate true differences in feed efficiency among cows. NEI and associated energy expenditure phenotypes were available for the first 17 fortnights of lactation from 119 Holstein cows; all fed a constant energy-rich diet. Mixed models fitting cow-specific intercept and coefficients to different combinations of the aforementioned energy expenditure traits, calculated on a fortnightly basis, were compared. The variance of REI estimated with the lactation average model represented only 8% of the variance of measured NEI. Among all compared mixed models, the variance of the cow-specific part of predicted NEI represented between 53% and 59% of the variance of REI estimated from the lactation average model or between 4% and 5% of the variance of measured NEI. The remaining 41% to 47% of the variance of REI estimated with the lactation average model may therefore reflect model fitting errors or measurement errors. In conclusion, the use of a mixed model framework with cow-specific random regressions seems to be a promising method to isolate the cow-specific component of REI in dairy cows.

  19. Comparison of anchor-based and distributional approaches in estimating important difference in common cold.

    PubMed

    Barrett, Bruce; Brown, Roger; Mundt, Marlon

    2008-02-01

    Evaluative health-related quality-of-life instruments used in clinical trials should be able to detect small but important changes in health status. Several approaches to minimal important difference (MID) and responsiveness have been developed. To compare anchor-based and distributional approaches to important difference and responsiveness for the Wisconsin Upper Respiratory Symptom Survey (WURSS), an illness-specific quality of life outcomes instrument. Participants with community-acquired colds self-reported daily using the WURSS-44. Distribution-based methods calculated standardized effect size (ES) and standard error of measurement (SEM). Anchor-based methods compared daily interval changes to global ratings of change, using: (1) standard MID methods based on correspondence to ratings of "a little better" or "somewhat better," and (2) two-level multivariate regression models. About 150 adults were monitored throughout their colds (1,681 sick days.): 88% were white, 69% were women, and 50% had completed college. The mean age was 35.5 years (SD = 14.7). WURSS scores increased 2.2 points from the first to second day, and then dropped by an average of 8.2 points per day from days 2 to 7. The SEM averaged 9.1 during these 7 days. Standard methods yielded a between day MID of 22 points. Regression models of MID projected 11.3-point daily changes. Dividing these estimates of small-but-important-difference by pooled SDs yielded coefficients of .425 for standard MID, .218 for regression model, .177 for SEM, and .157 for ES. These imply per-group sample sizes of 870 using ES, 616 for SEM, 302 for regression model, and 89 for standard MID, assuming alpha = .05, beta = .20 (80% power), and two-tailed testing. Distribution and anchor-based approaches provide somewhat different estimates of small but important difference, which in turn can have substantial impact on trial design.

  20. Why is it so difficult to determine the yield of indoor cannabis plantations? A case study from the Netherlands.

    PubMed

    Vanhove, Wouter; Maalsté, Nicole; Van Damme, Patrick

    2017-07-01

    Together, the Netherlands and Belgium are the largest indoor cannabis producing countries in Europe. In both countries, legal prosecution procedure of convicted illicit cannabis growers usually includes recovery of the profits gained. However, it is not easy to make a reliable estimation of the latter profits, due to the wide range of factors that determine indoor cannabis yields and eventual selling prices. In the Netherlands, since 2005, a reference model is used that assumes a constant yield (g) per plant for a given indoor cannabis plant density. Later, in 2011, a new model was developed in Belgium for yield estimation of Belgian indoor cannabis plantations that assumes a constant yield per m 2 of growth surface, provided that a number of growth conditions are met. Indoor cannabis plantations in the Netherlands and Belgium share similar technical characteristics. As a result, for indoor cannabis plantations in both countries, both aforementioned yield estimation models should yield similar yield estimations. By means of a real-case study from the Netherlands, we show that the reliability of both models is hampered by a number of flaws and unmet preconditions. The Dutch model is based on a regression equation that makes use of ill-defined plant development stages, assumes a linear plant growth, does not discriminate between different plantation size categories and does not include other important yield determining factors (such as fertilization). The Belgian model addresses some of the latter shortcomings, but its applicability is constrained by a number of pre-conditions including plantation size between 50 and 1000 plants; cultivation in individual pots with peat soil; 600W (electrical power) assimilation lamps; constant temperature between 20°C and 30°C; adequate fertilizer application and plants unaffected by pests and diseases. Judiciary in both the Netherlands and Belgium require robust indoor cannabis yield models for adequate legal prosecution of illicit indoor cannabis growth operations. To that aim, the current models should be optimized whereas the validity of their application should be examined case by case. Copyright © 2017 Elsevier B.V. All rights reserved.

  1. Comparison of constitutive flow resistance equations based on the Manning and Chezy equations applied to natural rivers

    USGS Publications Warehouse

    Bjerklie, David M.; Dingman, S. Lawrence; Bolster, Carl H.

    2005-01-01

    A set of conceptually derived in‐bank river discharge–estimating equations (models), based on the Manning and Chezy equations, are calibrated and validated using a database of 1037 discharge measurements in 103 rivers in the United States and New Zealand. The models are compared to a multiple regression model derived from the same data. The comparison demonstrates that in natural rivers, using an exponent on the slope variable of 0.33 rather than the traditional value of 0.5 reduces the variance associated with estimating flow resistance. Mean model uncertainty, assuming a constant value for the conductance coefficient, is less than 5% for a large number of estimates, and 67% of the estimates would be accurate within 50%. The models have potential application where site‐specific flow resistance information is not available and can be the basis for (1) a general approach to estimating discharge from remotely sensed hydraulic data, (2) comparison to slope‐area discharge estimates, and (3) large‐scale river modeling.

  2. Regression Techniques for Determining the Effective Impervious Area in Southern California Watersheds

    NASA Astrophysics Data System (ADS)

    Sultana, R.; Mroczek, M.; Dallman, S.; Sengupta, A.; Stein, E. D.

    2016-12-01

    The portion of the Total Impervious Area (TIA) that is hydraulically connected to the storm drainage network is called the Effective Impervious Area (EIA). The remaining fraction of impervious area, called the non-effective impervious area, drains onto pervious surfaces which do not contribute to runoff for smaller events. Using the TIA instead of EIA in models and calculations can lead to overestimates of runoff volumes peak discharges and oversizing of drainage system since it is assumed all impervious areas produce urban runoff that is directly connected to storm drains. This makes EIA a better predictor of actual runoff from urban catchments for hydraulic design of storm drain systems and modeling non-point source pollution. Compared to TIA, determining the EIA is considerably more difficult to calculate since it cannot be found by using remote sensing techniques, readily available EIA datasets, or aerial imagery interpretation alone. For this study, EIA percentages were calculated by two successive regression methods for five watersheds (with areas of 8.38 - 158mi2) located in Southern California using rainfall-runoff event data for the years 2004 - 2007. Runoff generated from the smaller storm events are considered to be emanating only from the effective impervious areas. Therefore, larger events that were considered to have runoff from both impervious and pervious surfaces were successively removed in the regression methods using a criterion of (1) 1mm and (2) a max (2 , 1mm) above the regression line. MSE is calculated from actual runoff and runoff predicted by the regression. Analysis of standard deviations showed that criterion of max (2 , 1mm) better fit the regression line and is the preferred method in predicting the EIA percentage. The estimated EIAs have shown to be approximately 78% to 43% of the TIA which shows use of EIA instead of TIA can have significant impact on the cost building urban hydraulic systems and stormwater capture devices.

  3. On the relation between the peak frequency and the corresponding rise time of solar microwave impulsive bursts and the height dependence of magnetic fields

    NASA Astrophysics Data System (ADS)

    Zhao, Ren-Yang; Magun, Andreas; Schanda, Erwin

    1990-12-01

    Results are reported from a correlation analysis for 57 microwave impulsive bursts observed at six frequencies. A regression line between the peak frequency and the corresponding rise time of microwave impulsive bursts is obtained, with a correlation coefficient of -0.43. This can be explained in the frame of a thermal model. The magnetic field decrease with height has to be much slower than in a dipole field in order to explain the weak dependence of f(p) on t(r). This decrease of magnetic field with height in burst sources is based on the relationship between f(p) and t(r) found by assuming a thermal flare model with a collisionless conduction front.

  4. Generalized additive regression models of discharge and mean velocity associated with direct-runoff conditions in Texas: Utility of the U.S. Geological Survey discharge measurement database

    USGS Publications Warehouse

    Asquith, William H.; Herrmann, George R.; Cleveland, Theodore G.

    2013-01-01

    A database containing more than 17,700 discharge values and ancillary hydraulic properties was assembled from summaries of discharge measurement records for 424 U.S. Geological Survey streamflow-gauging stations (stream gauges) in Texas. Each discharge exceeds the 90th-percentile daily mean streamflow as determined by period-of-record, stream-gauge-specific, flow-duration curves. Each discharge therefore is assumed to represent discharge measurement made during direct-runoff conditions. The hydraulic properties of each discharge measurement included concomitant cross-sectional flow area, water-surface top width, and reported mean velocity. Systematic and statewide investigation of these data in pursuit of regional models for the estimation of discharge and mean velocity has not been previously attempted. Generalized additive regression modeling is used to develop readily implemented procedures by end-users for estimation of discharge and mean velocity from select predictor variables at ungauged stream locations. The discharge model uses predictor variables of cross-sectional flow area, top width, stream location, mean annual precipitation, and a generalized terrain and climate index (OmegaEM) derived for a previous flood-frequency regionalization study. The mean velocity model uses predictor variables of discharge, top width, stream location, mean annual precipitation, and OmegaEM. The discharge model has an adjusted R-squared value of about 0.95 and a residual standard error (RSE) of about 0.22 base-10 logarithm (cubic meters per second); the mean velocity model has an adjusted R-squared value of about 0.67 and an RSE of about 0.063 fifth root (meters per second). Example applications and computations using both regression models are provided. - See more at: http://ascelibrary.org/doi/abs/10.1061/%28ASCE%29HE.1943-5584.0000635#sthash.jhGyPxgZ.dpuf

  5. Development of European NO2 Land Use Regression Model for present and future exposure assessment: Implications for policy analysis.

    PubMed

    Vizcaino, Pilar; Lavalle, Carlo

    2018-05-04

    A new Land Use Regression model was built to develop pan-European 100 m resolution maps of NO 2 concentrations. The model was built using NO 2 concentrations from routine monitoring stations available in the Airbase database as dependent variable. Predictor variables included land use, road traffic proxies, population density, climatic and topographical variables, and distance to sea. In order to capture international and inter regional disparities not accounted for with the mentioned predictor variables, additional proxies of NO 2 concentrations, like levels of activity intensity and NO x emissions for specific sectors, were also included. The model was built using Random Forest techniques. Model performance was relatively good given the EU-wide scale (R 2  = 0.53). Output predictions of annual average concentrations of NO 2 were in line with other existing models in terms of spatial distribution and values of concentration. The model was validated for year 2015, comparing model predictions derived from updated values of independent variables, with concentrations in monitoring stations for that year. The algorithm was then used to model future concentrations up to the year 2030, considering different emission scenarios as well as changes in land use, population distribution and economic factors assuming the most likely socio-economic trends. Levels of exposure were derived from maps of concentration. The model proved to be a useful tool for the ex-ante evaluation of specific air pollution mitigation measures, and more broadly, for impact assessment of EU policies on territorial development. Copyright © 2018 The Authors. Published by Elsevier Ltd.. All rights reserved.

  6. Revealing transient strain in geodetic data with Gaussian process regression

    NASA Astrophysics Data System (ADS)

    Hines, T. T.; Hetland, E. A.

    2018-03-01

    Transient strain derived from global navigation satellite system (GNSS) data can be used to detect and understand geophysical processes such as slow slip events and post-seismic deformation. Here we propose using Gaussian process regression (GPR) as a tool for estimating transient strain from GNSS data. GPR is a non-parametric, Bayesian method for interpolating scattered data. In our approach, we assume a stochastic prior model for transient displacements. The prior describes how much we expect transient displacements to covary spatially and temporally. A posterior estimate of transient strain is obtained by differentiating the posterior transient displacements, which are formed by conditioning the prior with the GNSS data. As a demonstration, we use GPR to detect transient strain resulting from slow slip events in the Pacific Northwest. Maximum likelihood methods are used to constrain a prior model for transient displacements in this region. The temporal covariance of our prior model is described by a compact Wendland covariance function, which significantly reduces the computational burden that can be associated with GPR. Our results reveal the spatial and temporal evolution of strain from slow slip events. We verify that the transient strain estimated with GPR is in fact geophysical signal by comparing it to the seismic tremor that is associated with Pacific Northwest slow slip events.

  7. A closer look at cross-validation for assessing the accuracy of gene regulatory networks and models.

    PubMed

    Tabe-Bordbar, Shayan; Emad, Amin; Zhao, Sihai Dave; Sinha, Saurabh

    2018-04-26

    Cross-validation (CV) is a technique to assess the generalizability of a model to unseen data. This technique relies on assumptions that may not be satisfied when studying genomics datasets. For example, random CV (RCV) assumes that a randomly selected set of samples, the test set, well represents unseen data. This assumption doesn't hold true where samples are obtained from different experimental conditions, and the goal is to learn regulatory relationships among the genes that generalize beyond the observed conditions. In this study, we investigated how the CV procedure affects the assessment of supervised learning methods used to learn gene regulatory networks (or in other applications). We compared the performance of a regression-based method for gene expression prediction estimated using RCV with that estimated using a clustering-based CV (CCV) procedure. Our analysis illustrates that RCV can produce over-optimistic estimates of the model's generalizability compared to CCV. Next, we defined the 'distinctness' of test set from training set and showed that this measure is predictive of performance of the regression method. Finally, we introduced a simulated annealing method to construct partitions with gradually increasing distinctness and showed that performance of different gene expression prediction methods can be better evaluated using this method.

  8. Estimating mono- and bi-phasic regression parameters using a mixture piecewise linear Bayesian hierarchical model

    PubMed Central

    Zhao, Rui; Catalano, Paul; DeGruttola, Victor G.; Michor, Franziska

    2017-01-01

    The dynamics of tumor burden, secreted proteins or other biomarkers over time, is often used to evaluate the effectiveness of therapy and to predict outcomes for patients. Many methods have been proposed to investigate longitudinal trends to better characterize patients and to understand disease progression. However, most approaches assume a homogeneous patient population and a uniform response trajectory over time and across patients. Here, we present a mixture piecewise linear Bayesian hierarchical model, which takes into account both population heterogeneity and nonlinear relationships between biomarkers and time. Simulation results show that our method was able to classify subjects according to their patterns of treatment response with greater than 80% accuracy in the three scenarios tested. We then applied our model to a large randomized controlled phase III clinical trial of multiple myeloma patients. Analysis results suggest that the longitudinal tumor burden trajectories in multiple myeloma patients are heterogeneous and nonlinear, even among patients assigned to the same treatment cohort. In addition, between cohorts, there are distinct differences in terms of the regression parameters and the distributions among categories in the mixture. Those results imply that longitudinal data from clinical trials may harbor unobserved subgroups and nonlinear relationships; accounting for both may be important for analyzing longitudinal data. PMID:28723910

  9. Regression to the Mean Mimicking Changes in Sexual Arousal to Child Stimuli in Pedophiles.

    PubMed

    Mokros, Andreas; Habermeyer, Elmar

    2016-10-01

    The sexual preference for prepubertal children (pedophilia) is generally assumed to be a lifelong condition. Müller et al. (2014) challenged the notion that pedophilia was stable. Using data from phallometric testing, they found that almost half of 40 adult pedophilic men did not show a corresponding arousal pattern at retest. Critics pointed out that regression to the mean and measurement error might account for these results. Müller et al. contested these explanations. The present study shows that regression to the mean in combination with low reliability does indeed provide an exhaustive explanation for the results. Using a statistical model and an estimate of the retest correlation derived from the data, the relative frequency of cases with an allegedly non-pedophilic arousal pattern was shown to be consistent with chance expectation. A bootstrap simulation showed that this outcome was to be expected under a wide range of retest correlations. A re-analysis of the original data from the study by Müller et al. corroborated the assumption of considerable measurement error. Therefore, the original data do not challenge the view that pedophilic sexual preference is stable.

  10. Predicting the long tail of book sales: Unearthing the power-law exponent

    NASA Astrophysics Data System (ADS)

    Fenner, Trevor; Levene, Mark; Loizou, George

    2010-06-01

    The concept of the long tail has recently been used to explain the phenomenon in e-commerce where the total volume of sales of the items in the tail is comparable to that of the most popular items. In the case of online book sales, the proportion of tail sales has been estimated using regression techniques on the assumption that the data obeys a power-law distribution. Here we propose a different technique for estimation based on a generative model of book sales that results in an asymptotic power-law distribution of sales, but which does not suffer from the problems related to power-law regression techniques. We show that the proportion of tail sales predicted is very sensitive to the estimated power-law exponent. In particular, if we assume that the power-law exponent of the cumulative distribution is closer to 1.1 rather than to 1.2 (estimates published in 2003, calculated using regression by two groups of researchers), then our computations suggest that the tail sales of Amazon.com, rather than being 40% as estimated by Brynjolfsson, Hu and Smith in 2003, are actually closer to 20%, the proportion estimated by its CEO.

  11. Numerical simulation of ground-water flow through glacial deposits and crystalline bedrock in the Mirror Lake area, Grafton County, New Hampshire

    USGS Publications Warehouse

    Tiedeman, Claire; Goode, Daniel J.; Hsieh, Paul A.

    1997-01-01

    This report documents the development of a computer model to simulate steady-state (long-term average) flow of ground water in the vicinity of Mirror Lake, which lies at the eastern end of the Hubbard Brook valley in central New Hampshire. The 10-km2 study area includes Mirror Lake, the three streams that flow into Mirror Lake, Leeman's Brook, Paradise Brook, and parts of Hubbard Brook and the Pemigewasset River. The topography of the area is characterized by steep hillsides and relatively flat valleys. Major hydrogeologic units include glacial deposits, composed of till containing pockets of sand and gravel, and fractured crystalline bedrock, composed of schist intruded by granite, pegmatite, and lamprophyre. Ground water occurs in both the glacial deposits and bedrock. Precipitation and snowmelt infiltrate to the water table on the hillsides, flow downslope through the saturated glacial deposits and fractured bedrock, and discharge to streams and to Mirror Lake. The model domain includes the glacial deposits, the uppermost 150m of bedrock, Mirror Lake, the layer of organic sediments on the lake bottom, and streams and rivers within the study area. A streamflow routing package was included in the model to simulate baseflow in streams and interaction between streams and ground water. Recharge from precipitation is assumed to be areally uniform, and riparian evapotranspiration along stream banks is assumed negligible. The spatial distribution of hydraulic conductivity is represented by dividing the model domain into several zones, each having uniform hydraulic properties. Local variations in recharge and hydraulic conductivities are ignored; therefore, the simulation results characterize the general ground-water system, not local details of ground-water movement. The model was calibrated using a nonlinear regression method to match hydraulic heads measured in piezometers and wells, and baseflow in three inlet streams to Mirror Lake. Model calibration indicates that recharge from precipitation to the water table is 26 to 28 cm/year. Hydraulic conductivities are 1.7 x 10-6 to 2.7 x 10-6 m/s for glacial deposits, about 3 x 10-7 m/s for bedrock beneath lower hillsides and valleys, and about 6x10-8 m/s for bedrock beneath upper hillsides and hilltops. Analysis of parameter uncertainty indicates that the above values are well constrained, at least within the context of regression analysis. In the regression, several attributes of the ground-water flow model are assumed perfectly known. The hydraulic conductivity for bedrock beneath upper hillsides and hilltops was determined from few data, and additional data are needed to further confirm this result. Model fit was not improved by introducing a 10-to-1 ration of horizontal-to-vertical anisotropy in the hydraulic conductivity of the glacial deposits, or by varying hydraulic conductivity with depth in the modeled part (uppermost 150m) of the bedrock. The calibrated model was used to delineate the Mirror Lake ground-water basin, defined as the volumes of subsurface through which ground water flows from the water table to Mirror Lake or its inlet streams. Results indicate that Mirror Lake and its inlet streams drain an area of ground-water recharge that is about 1.5 times the area of the surface-water basin. The ground-water basin extends far up the hillside on the northwestern part of the study area. Ground water from this area flows at depth under Norris Brook to discharge into Mirror Lake or its inlet streams. As a result, the Mirror Lake ground-water basin extends beneath the adjacent ground-water basin that drains into Norris Brook. Model simulation indicates that approximately 300,000 m3/year of precipitation recharges the Mirror Lake ground-water basin. About half the recharge enters the basin in areas where the simulated water table lies in glacial deposits; the other half enters the basin in areas where the simulated water table lies in be

  12. A Development of Nonstationary Regional Frequency Analysis Model with Large-scale Climate Information: Its Application to Korean Watershed

    NASA Astrophysics Data System (ADS)

    Kim, Jin-Young; Kwon, Hyun-Han; Kim, Hung-Soo

    2015-04-01

    The existing regional frequency analysis has disadvantages in that it is difficult to consider geographical characteristics in estimating areal rainfall. In this regard, this study aims to develop a hierarchical Bayesian model based nonstationary regional frequency analysis in that spatial patterns of the design rainfall with geographical information (e.g. latitude, longitude and altitude) are explicitly incorporated. This study assumes that the parameters of Gumbel (or GEV distribution) are a function of geographical characteristics within a general linear regression framework. Posterior distribution of the regression parameters are estimated by Bayesian Markov Chain Monte Carlo (MCMC) method, and the identified functional relationship is used to spatially interpolate the parameters of the distributions by using digital elevation models (DEM) as inputs. The proposed model is applied to derive design rainfalls over the entire Han-river watershed. It was found that the proposed Bayesian regional frequency analysis model showed similar results compared to L-moment based regional frequency analysis. In addition, the model showed an advantage in terms of quantifying uncertainty of the design rainfall and estimating the area rainfall considering geographical information. Finally, comprehensive discussion on design rainfall in the context of nonstationary will be presented. KEYWORDS: Regional frequency analysis, Nonstationary, Spatial information, Bayesian Acknowledgement This research was supported by a grant (14AWMP-B082564-01) from Advanced Water Management Research Program funded by Ministry of Land, Infrastructure and Transport of Korean government.

  13. The As-Cu-Ni System: A Chemical Thermodynamic Model for Ancient Recycling

    NASA Astrophysics Data System (ADS)

    Sabatini, Benjamin J.

    2015-12-01

    This article is the first thermodynamically reasoned ancient metal system assessment intended for use by archaeologists and archaeometallurgists to aid in the interpretation of remelted/recycled copper alloys composed of arsenic and copper, and arsenic, copper, and nickel. These models are meant to fulfill two main purposes: first, to be applied toward the identification of progressive and regressive temporal changes in artifact chemistry that would have occurred due to recycling, and second, to provide thermodynamic insight into why such metal combinations existed in antiquity. Built on well-established thermodynamics, these models were created using a combination of custom-written software and published binary thermodynamic systems data adjusted to within the boundary conditions of 1200°C and 1 atm. Using these parameters, the behavior of each element and their likelihood of loss in the binaries As-Cu, As-Ni, Cu-Ni, and ternary As-Cu-Ni, systems, under assumed ancient furnace conditions, was determined.

  14. Periodic limb movements of sleep: empirical and theoretical evidence supporting objective at-home monitoring

    PubMed Central

    Moro, Marilyn; Goparaju, Balaji; Castillo, Jelina; Alameddine, Yvonne; Bianchi, Matt T

    2016-01-01

    Introduction Periodic limb movements of sleep (PLMS) may increase cardiovascular and cerebrovascular morbidity. However, most people with PLMS are either asymptomatic or have nonspecific symptoms. Therefore, predicting elevated PLMS in the absence of restless legs syndrome remains an important clinical challenge. Methods We undertook a retrospective analysis of demographic data, subjective symptoms, and objective polysomnography (PSG) findings in a clinical cohort with or without obstructive sleep apnea (OSA) from our laboratory (n=443 with OSA, n=209 without OSA). Correlation analysis and regression modeling were performed to determine predictors of periodic limb movement index (PLMI). Markov decision analysis with TreeAge software compared strategies to detect PLMS: in-laboratory PSG, at-home testing, and a clinical prediction tool based on the regression analysis. Results Elevated PLMI values (>15 per hour) were observed in >25% of patients. PLMI values in No-OSA patients correlated with age, sex, self-reported nocturnal leg jerks, restless legs syndrome symptoms, and hypertension. In OSA patients, PLMI correlated only with age and self-reported psychiatric medications. Regression models indicated only a modest predictive value of demographics, symptoms, and clinical history. Decision modeling suggests that at-home testing is favored as the pretest probability of PLMS increases, given plausible assumptions regarding PLMS morbidity, costs, and assumed benefits of pharmacological therapy. Conclusion Although elevated PLMI values were commonly observed, routinely acquired clinical information had only weak predictive utility. As the clinical importance of elevated PLMI continues to evolve, it is likely that objective measures such as PSG or at-home PLMS monitors will prove increasingly important for clinical and research endeavors. PMID:27540316

  15. A weighted least squares estimation of the polynomial regression model on paddy production in the area of Kedah and Perlis

    NASA Astrophysics Data System (ADS)

    Musa, Rosliza; Ali, Zalila; Baharum, Adam; Nor, Norlida Mohd

    2017-08-01

    The linear regression model assumes that all random error components are identically and independently distributed with constant variance. Hence, each data point provides equally precise information about the deterministic part of the total variation. In other words, the standard deviations of the error terms are constant over all values of the predictor variables. When the assumption of constant variance is violated, the ordinary least squares estimator of regression coefficient lost its property of minimum variance in the class of linear and unbiased estimators. Weighted least squares estimation are often used to maximize the efficiency of parameter estimation. A procedure that treats all of the data equally would give less precisely measured points more influence than they should have and would give highly precise points too little influence. Optimizing the weighted fitting criterion to find the parameter estimates allows the weights to determine the contribution of each observation to the final parameter estimates. This study used polynomial model with weighted least squares estimation to investigate paddy production of different paddy lots based on paddy cultivation characteristics and environmental characteristics in the area of Kedah and Perlis. The results indicated that factors affecting paddy production are mixture fertilizer application cycle, average temperature, the squared effect of average rainfall, the squared effect of pest and disease, the interaction between acreage with amount of mixture fertilizer, the interaction between paddy variety and NPK fertilizer application cycle and the interaction between pest and disease and NPK fertilizer application cycle.

  16. An Intelligent Decision System for Intraoperative Somatosensory Evoked Potential Monitoring.

    PubMed

    Fan, Bi; Li, Han-Xiong; Hu, Yong

    2016-02-01

    Somatosensory evoked potential (SEP) is a useful, noninvasive technique widely used for spinal cord monitoring during surgery. One of the main indicators of a spinal cord injury is the drop in amplitude of the SEP signal in comparison to the nominal baseline that is assumed to be constant during the surgery. However, in practice, the real-time baseline is not constant and may vary during the operation due to nonsurgical factors, such as blood pressure, anaesthesia, etc. Thus, a false warning is often generated if the nominal baseline is used for SEP monitoring. In current practice, human experts must be used to prevent this false warning. However, these well-trained human experts are expensive and may not be reliable and consistent due to various reasons like fatigue and emotion. In this paper, an intelligent decision system is proposed to improve SEP monitoring. First, the least squares support vector regression and multi-support vector regression models are trained to construct the dynamic baseline from historical data. Then a control chart is applied to detect abnormalities during surgery. The effectiveness of the intelligent decision system is evaluated by comparing its performance against the nominal baseline model by using the real experimental datasets derived from clinical conditions.

  17. Covariate Measurement Error Correction Methods in Mediation Analysis with Failure Time Data

    PubMed Central

    Zhao, Shanshan

    2014-01-01

    Summary Mediation analysis is important for understanding the mechanisms whereby one variable causes changes in another. Measurement error could obscure the ability of the potential mediator to explain such changes. This paper focuses on developing correction methods for measurement error in the mediator with failure time outcomes. We consider a broad definition of measurement error, including technical error and error associated with temporal variation. The underlying model with the ‘true’ mediator is assumed to be of the Cox proportional hazards model form. The induced hazard ratio for the observed mediator no longer has a simple form independent of the baseline hazard function, due to the conditioning event. We propose a mean-variance regression calibration approach and a follow-up time regression calibration approach, to approximate the partial likelihood for the induced hazard function. Both methods demonstrate value in assessing mediation effects in simulation studies. These methods are generalized to multiple biomarkers and to both case-cohort and nested case-control sampling design. We apply these correction methods to the Women's Health Initiative hormone therapy trials to understand the mediation effect of several serum sex hormone measures on the relationship between postmenopausal hormone therapy and breast cancer risk. PMID:25139469

  18. Covariate measurement error correction methods in mediation analysis with failure time data.

    PubMed

    Zhao, Shanshan; Prentice, Ross L

    2014-12-01

    Mediation analysis is important for understanding the mechanisms whereby one variable causes changes in another. Measurement error could obscure the ability of the potential mediator to explain such changes. This article focuses on developing correction methods for measurement error in the mediator with failure time outcomes. We consider a broad definition of measurement error, including technical error, and error associated with temporal variation. The underlying model with the "true" mediator is assumed to be of the Cox proportional hazards model form. The induced hazard ratio for the observed mediator no longer has a simple form independent of the baseline hazard function, due to the conditioning event. We propose a mean-variance regression calibration approach and a follow-up time regression calibration approach, to approximate the partial likelihood for the induced hazard function. Both methods demonstrate value in assessing mediation effects in simulation studies. These methods are generalized to multiple biomarkers and to both case-cohort and nested case-control sampling designs. We apply these correction methods to the Women's Health Initiative hormone therapy trials to understand the mediation effect of several serum sex hormone measures on the relationship between postmenopausal hormone therapy and breast cancer risk. © 2014, The International Biometric Society.

  19. Adaptation can explain evidence for encoding of probabilistic information in macaque inferior temporal cortex.

    PubMed

    Vinken, Kasper; Vogels, Rufin

    2017-11-20

    In predictive coding theory, the brain is conceptualized as a prediction machine that constantly constructs and updates expectations of the sensory environment [1]. In the context of this theory, Bell et al.[2] recently studied the effect of the probability of task-relevant stimuli on the activity of macaque inferior temporal (IT) neurons and observed a reduced population response to expected faces in face-selective neurons. They concluded that "IT neurons encode long-term, latent probabilistic information about stimulus occurrence", supporting predictive coding. They manipulated expectation by the frequency of face versus fruit stimuli in blocks of trials. With such a design, stimulus repetition is confounded with expectation. As previous studies showed that IT neurons decrease their response with repetition [3], such adaptation (or repetition suppression), instead of expectation suppression as assumed by the authors, could explain their effects. The authors attempted to control for this alternative interpretation with a multiple regression approach. Here we show by using simulation that adaptation can still masquerade as expectation effects reported in [2]. Further, the results from the regression model used for most analyses cannot be trusted, because the model is not uniquely defined. Copyright © 2017 Elsevier Ltd. All rights reserved.

  20. Optimization of Fish Protection System to Increase Technosphere Safety

    NASA Astrophysics Data System (ADS)

    Khetsuriani, E. D.; Fesenko, L. N.; Larin, D. S.

    2017-11-01

    The article is concerned with field study data. Drawing upon prior information and considering structural features of fish protection devices, we decided to conduct experimental research while changing three parameters: process pressure PCT, stream velocity Vp and washer nozzle inclination angle αc. The variability intervals of examined factors are shown in the Table 1. The conicity angle was assumed as a constant one. The box design B3 was chosen as a baseline being close to D-optimal designs in its statistical characteristics. The number of device rotations and its fish fry protection efficiency were accepted as the output functions of optimization. The numerical values of regression coefficients of quadratic equations describing the behavior of optimization functions Y1 and Y2 and their formulaic errors were calculated upon the test results in accordance with the planning matrix. The adequacy or inadequacy of the obtained quadratic regression model is judged via checking the condition whether Fexp ≤ Ftheor.

  1. The problem of natural funnel asymmetries: a simulation analysis of meta-analysis in macroeconomics.

    PubMed

    Callot, Laurent; Paldam, Martin

    2011-06-01

    Effect sizes in macroeconomic are estimated by regressions on data published by statistical agencies. Funnel plots are a representation of the distribution of the resulting regression coefficients. They are normally much wider than predicted by the t-ratio of the coefficients and often asymmetric. The standard method of meta-analysts in economics assumes that the asymmetries are because of publication bias causing censoring and adjusts the average accordingly. The paper shows that some funnel asymmetries may be 'natural' so that they occur without censoring. We investigate such asymmetries by simulating funnels by pairs of data generating processes (DGPs) and estimating models (EMs), in which the EM has the problem that it disregards a property of the DGP. The problems are data dependency, structural breaks, non-normal residuals, non-linearity, and omitted variables. We show that some of these problems generate funnel asymmetries. When they do, the standard method often fails. Copyright © 2011 John Wiley & Sons, Ltd. Copyright © 2011 John Wiley & Sons, Ltd.

  2. Change of plant phenophases explained by survival modeling

    NASA Astrophysics Data System (ADS)

    Templ, Barbara; Fleck, Stefan; Templ, Matthias

    2017-05-01

    It is known from many studies that plant species show a delay in the timing of flowering events with an increase in latitude and altitude, and an advance with an increase in temperature. Furthermore, in many locations and for many species, flowering dates have advanced over the long-term. New insights using survival modeling are given based on data collected (1970-2010) along a 3000-km long transect from northern to eastern central Europe. We could clearly observe that in the case of dandelion ( Taraxacum officinale) the risk of flowering time, in other words the probability that flowering occurs, is higher for an earlier day of year in later decades. Our approach assume that temperature has greater influence than precipitation on the timing of flowering. Evaluation of the predictive power of tested models suggests that Cox models may be used in plant phenological research. The applied Cox model provides improved predictions of flowering dates compared to traditional regression methods and gives further insights into drivers of phenological events.

  3. Bayesian Analysis of Non-Gaussian Long-Range Dependent Processes

    NASA Astrophysics Data System (ADS)

    Graves, T.; Franzke, C.; Gramacy, R. B.; Watkins, N. W.

    2012-12-01

    Recent studies have strongly suggested that surface temperatures exhibit long-range dependence (LRD). The presence of LRD would hamper the identification of deterministic trends and the quantification of their significance. It is well established that LRD processes exhibit stochastic trends over rather long periods of time. Thus, accurate methods for discriminating between physical processes that possess long memory and those that do not are an important adjunct to climate modeling. We have used Markov Chain Monte Carlo algorithms to perform a Bayesian analysis of Auto-Regressive Fractionally-Integrated Moving-Average (ARFIMA) processes, which are capable of modeling LRD. Our principal aim is to obtain inference about the long memory parameter, d,with secondary interest in the scale and location parameters. We have developed a reversible-jump method enabling us to integrate over different model forms for the short memory component. We initially assume Gaussianity, and have tested the method on both synthetic and physical time series such as the Central England Temperature. Many physical processes, for example the Faraday time series from Antarctica, are highly non-Gaussian. We have therefore extended this work by weakening the Gaussianity assumption. Specifically, we assume a symmetric α -stable distribution for the innovations. Such processes provide good, flexible, initial models for non-Gaussian processes with long memory. We will present a study of the dependence of the posterior variance σ d of the memory parameter d on the length of the time series considered. This will be compared with equivalent error diagnostics for other measures of d.

  4. Woody plants and the prediction of climate-change impacts on bird diversity.

    PubMed

    Kissling, W D; Field, R; Korntheuer, H; Heyder, U; Böhning-Gaese, K

    2010-07-12

    Current methods of assessing climate-induced shifts of species distributions rarely account for species interactions and usually ignore potential differences in response times of interacting taxa to climate change. Here, we used species-richness data from 1005 breeding bird and 1417 woody plant species in Kenya and employed model-averaged coefficients from regression models and median climatic forecasts assembled across 15 climate-change scenarios to predict bird species richness under climate change. Forecasts assuming an instantaneous response of woody plants and birds to climate change suggested increases in future bird species richness across most of Kenya whereas forecasts assuming strongly lagged woody plant responses to climate change indicated a reversed trend, i.e. reduced bird species richness. Uncertainties in predictions of future bird species richness were geographically structured, mainly owing to uncertainties in projected precipitation changes. We conclude that assessments of future species responses to climate change are very sensitive to current uncertainties in regional climate-change projections, and to the inclusion or not of time-lagged interacting taxa. We expect even stronger effects for more specialized plant-animal associations. Given the slow response time of woody plant distributions to climate change, current estimates of future biodiversity of many animal taxa may be both biased and too optimistic.

  5. On the potential of models for location and scale for genome-wide DNA methylation data

    PubMed Central

    2014-01-01

    Background With the help of epigenome-wide association studies (EWAS), increasing knowledge on the role of epigenetic mechanisms such as DNA methylation in disease processes is obtained. In addition, EWAS aid the understanding of behavioral and environmental effects on DNA methylation. In terms of statistical analysis, specific challenges arise from the characteristics of methylation data. First, methylation β-values represent proportions with skewed and heteroscedastic distributions. Thus, traditional modeling strategies assuming a normally distributed response might not be appropriate. Second, recent evidence suggests that not only mean differences but also variability in site-specific DNA methylation associates with diseases, including cancer. The purpose of this study was to compare different modeling strategies for methylation data in terms of model performance and performance of downstream hypothesis tests. Specifically, we used the generalized additive models for location, scale and shape (GAMLSS) framework to compare beta regression with Gaussian regression on raw, binary logit and arcsine square root transformed methylation data, with and without modeling a covariate effect on the scale parameter. Results Using simulated and real data from a large population-based study and an independent sample of cancer patients and healthy controls, we show that beta regression does not outperform competing strategies in terms of model performance. In addition, Gaussian models for location and scale showed an improved performance as compared to models for location only. The best performance was observed for the Gaussian model on binary logit transformed β-values, referred to as M-values. Our results further suggest that models for location and scale are specifically sensitive towards violations of the distribution assumption and towards outliers in the methylation data. Therefore, a resampling procedure is proposed as a mode of inference and shown to diminish type I error rate in practically relevant settings. We apply the proposed method in an EWAS of BMI and age and reveal strong associations of age with methylation variability that are validated in an independent sample. Conclusions Models for location and scale are promising tools for EWAS that may help to understand the influence of environmental factors and disease-related phenotypes on methylation variability and its role during disease development. PMID:24994026

  6. The role of Atlantic Multi-decadal Oscillation in the global mean temperature variability

    DOE PAGES

    Chylek, Petr; Klett, James D.; Dubey, Manvendra K.; ...

    2016-11-01

    We simulated the global mean 1900–2015 warming by 42 Coupled Models Inter-comparison Project, phase 5 (CMIP5) climate models varies between 0.58 and 1.70 °C. The observed warming according to the NASA GISS temperature analysis is 0.95 °C with a 1200 km smoothing radius, or 0.86 °C with a 250 km smoothing radius. The projection of the future 2015–2100 global warming under a moderate increase of anthropogenic radiative forcing (RCP4.5 scenario) by individual models is between 0.7 and 2.3 °C. The CMIP5 climate models agree that the future climate will be warmer; however, there is little consensus as to how largemore » the warming will be (reflected by an uncertainty of over a factor of three). Moreover, a parsimonious statistical regression model with just three explanatory variables [anthropogenic radiative forcing due to greenhouse gases and aerosols (GHGA), solar variability, and the Atlantic Multi-decadal Oscillation (AMO) index] accounts for over 95 % of the observed 1900–2015 temperature variance. This statistical regression model reproduces very accurately the past warming (0.96 °C compared to the observed 0.95 °C) and projects the future 2015–2100 warming to be around 0.95 °C (with the IPCC 2013 suggested RCP4.5 radiative forcing and an assumed cyclic AMO behavior). The AMO contribution to the 1970–2005 warming was between 0.13 and 0.20 °C (depending on which AMO index is used) compared to the GHGA contribution of 0.49–0.58 °C. During the twenty-first century AMO cycle the AMO contribution is projected to remain the same (0.13–0.20 °C), while the GHGA contribution is expected to decrease to 0.21–0.25 °C due to the levelling off of the GHGA radiative forcing that is assumed according to the RCP4.5 scenario. Therefore, the anthropogenic contribution and natural variability are expected to contribute about equally to the anticipated global warming during the second half of the twenty-first century for the RCP4.5 trajectory.« less

  7. The role of Atlantic Multi-decadal Oscillation in the global mean temperature variability

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chylek, Petr; Klett, James D.; Dubey, Manvendra K.

    We simulated the global mean 1900–2015 warming by 42 Coupled Models Inter-comparison Project, phase 5 (CMIP5) climate models varies between 0.58 and 1.70 °C. The observed warming according to the NASA GISS temperature analysis is 0.95 °C with a 1200 km smoothing radius, or 0.86 °C with a 250 km smoothing radius. The projection of the future 2015–2100 global warming under a moderate increase of anthropogenic radiative forcing (RCP4.5 scenario) by individual models is between 0.7 and 2.3 °C. The CMIP5 climate models agree that the future climate will be warmer; however, there is little consensus as to how largemore » the warming will be (reflected by an uncertainty of over a factor of three). Moreover, a parsimonious statistical regression model with just three explanatory variables [anthropogenic radiative forcing due to greenhouse gases and aerosols (GHGA), solar variability, and the Atlantic Multi-decadal Oscillation (AMO) index] accounts for over 95 % of the observed 1900–2015 temperature variance. This statistical regression model reproduces very accurately the past warming (0.96 °C compared to the observed 0.95 °C) and projects the future 2015–2100 warming to be around 0.95 °C (with the IPCC 2013 suggested RCP4.5 radiative forcing and an assumed cyclic AMO behavior). The AMO contribution to the 1970–2005 warming was between 0.13 and 0.20 °C (depending on which AMO index is used) compared to the GHGA contribution of 0.49–0.58 °C. During the twenty-first century AMO cycle the AMO contribution is projected to remain the same (0.13–0.20 °C), while the GHGA contribution is expected to decrease to 0.21–0.25 °C due to the levelling off of the GHGA radiative forcing that is assumed according to the RCP4.5 scenario. Therefore, the anthropogenic contribution and natural variability are expected to contribute about equally to the anticipated global warming during the second half of the twenty-first century for the RCP4.5 trajectory.« less

  8. A biological quality index for volcanic Andisols and Aridisols (Canary Islands, Spain): variations related to the ecosystem degradation.

    PubMed

    Armas, Cecilia María; Santana, Bayanor; Mora, Juan Luis; Notario, Jesús Santiago; Arbelo, Carmen Dolores; Rodríguez-Rodríguez, Antonio

    2007-05-25

    The aim of this work is to identify indicators of biological activity in soils from the Canary Islands, by studying the variation of selected biological parameters related to the processes of deforestation and accelerated soil degradation affecting the Canarian natural ecosystems. Ten plots with different degrees of maturity/degradation have been selected in three typical habitats in the Canary Islands: laurel forest, pine forest and xerophytic scrub with Andisols and Aridisols as the most common soils. The studied characteristics in each case include total organic carbon, field soil respiration, mineralized carbon after laboratory incubation, microbial biomass carbon, hot water-extractable carbon and carboxymethylcellulase, beta-d-glucosidase and dehydrogenase activities. A Biological Quality Index (BQI) has been designed on the basis of a regression model using these variables, assuming that the total soil organic carbon content is quite stable in nearly mature ecosystems. Total carbon in mature ecosystems has been related to significant biological variables (hot water-extractable carbon, soil respiration and carboxymethylcellulase, beta-d-glucosidase and dehydrogenase activities), accounting for nearly 100% of the total variance by a multiple regression analysis. The index has been calculated as the ratio of the value calculated from the regression model and the actual measured value. The obtained results show that soils in nearly mature ecosystems have BQI values close to unit, whereas those in degraded ecosystems range between 0.24 and 0.97, depending on the degradation degree.

  9. On the relation between the peak frequency and the corresponding rise time of solar microwave impulsive bursts and the height dependence of magnetic fields

    NASA Astrophysics Data System (ADS)

    Ren-Yang, Zhao; Magun, Andreas; Schanda, Erwin

    1990-12-01

    In the present paper we report the results of a correlation analysis for 57 microwave impulsive bursts observed at six frequencies in which we have obtained a regression line between the peak frequency and the corresponding rise time of microwave impulsive bursts: {ie361-01} (with a correlation coefficient of - 0.43). This can be explained in the frame of a thermal model. The magnetic field decrease with height has to be much slower than in a dipole field in order to explain the weak dependence of f p on t r . This decrease of magnetic field with height in burst sources is based on the relationship between f p and t r found by assuming a thermal flare model with a collisionless conduction front.

  10. Dielectric Cytometry with Three-Dimensional Cellular Modeling

    PubMed Central

    Katsumoto, Yoichi; Hayashi, Yoshihito; Oshige, Ikuya; Omori, Shinji; Kishii, Noriyuki; Yasuda, Akio; Asami, Koji

    2008-01-01

    We have developed what we believe is an efficient method to determine the electric parameters (the specific membrane capacitance Cm and the cytoplasm conductivity κi) of cells from their dielectric dispersion. First, a limited number of dispersion curves are numerically calculated for a three-dimensional cell model by changing Cm and κi, and their amplitudes Δɛ and relaxation times τ are determined by assuming a Cole-Cole function. Second, regression formulas are obtained from the values of Δɛ and τ and then used for the determination of Cm and κi from the experimental Δɛ and τ. This method was applied to the dielectric dispersion measured for rabbit erythrocytes (discocytes and echinocytes) and human erythrocytes (normocytes), and provided reasonable Cm and κi of the erythrocytes and excellent agreement between the theoretical and experimental dispersion curves. PMID:18567636

  11. Dielectric cytometry with three-dimensional cellular modeling.

    PubMed

    Katsumoto, Yoichi; Hayashi, Yoshihito; Oshige, Ikuya; Omori, Shinji; Kishii, Noriyuki; Yasuda, Akio; Asami, Koji

    2008-09-15

    We have developed what we believe is an efficient method to determine the electric parameters (the specific membrane capacitance C(m) and the cytoplasm conductivity kappa(i)) of cells from their dielectric dispersion. First, a limited number of dispersion curves are numerically calculated for a three-dimensional cell model by changing C(m) and kappa(i), and their amplitudes Deltaepsilon and relaxation times tau are determined by assuming a Cole-Cole function. Second, regression formulas are obtained from the values of Deltaepsilon and tau and then used for the determination of C(m) and kappa(i) from the experimental Deltaepsilon and tau. This method was applied to the dielectric dispersion measured for rabbit erythrocytes (discocytes and echinocytes) and human erythrocytes (normocytes), and provided reasonable C(m) and kappa(i) of the erythrocytes and excellent agreement between the theoretical and experimental dispersion curves.

  12. Sensitivity to gaze-contingent contrast increments in naturalistic movies: An exploratory report and model comparison

    PubMed Central

    Wallis, Thomas S. A.; Dorr, Michael; Bex, Peter J.

    2015-01-01

    Sensitivity to luminance contrast is a prerequisite for all but the simplest visual systems. To examine contrast increment detection performance in a way that approximates the natural environmental input of the human visual system, we presented contrast increments gaze-contingently within naturalistic video freely viewed by observers. A band-limited contrast increment was applied to a local region of the video relative to the observer's current gaze point, and the observer made a forced-choice response to the location of the target (≈25,000 trials across five observers). We present exploratory analyses showing that performance improved as a function of the magnitude of the increment and depended on the direction of eye movements relative to the target location, the timing of eye movements relative to target presentation, and the spatiotemporal image structure at the target location. Contrast discrimination performance can be modeled by assuming that the underlying contrast response is an accelerating nonlinearity (arising from a nonlinear transducer or gain control). We implemented one such model and examined the posterior over model parameters, estimated using Markov-chain Monte Carlo methods. The parameters were poorly constrained by our data; parameters constrained using strong priors taken from previous research showed poor cross-validated prediction performance. Atheoretical logistic regression models were better constrained and provided similar prediction performance to the nonlinear transducer model. Finally, we explored the properties of an extended logistic regression that incorporates both eye movement and image content features. Models of contrast transduction may be better constrained by incorporating data from both artificial and natural contrast perception settings. PMID:26057546

  13. Estimating disease prevalence from two-phase surveys with non-response at the second phase

    PubMed Central

    Gao, Sujuan; Hui, Siu L.; Hall, Kathleen S.; Hendrie, Hugh C.

    2010-01-01

    SUMMARY In this paper we compare several methods for estimating population disease prevalence from data collected by two-phase sampling when there is non-response at the second phase. The traditional weighting type estimator requires the missing completely at random assumption and may yield biased estimates if the assumption does not hold. We review two approaches and propose one new approach to adjust for non-response assuming that the non-response depends on a set of covariates collected at the first phase: an adjusted weighting type estimator using estimated response probability from a response model; a modelling type estimator using predicted disease probability from a disease model; and a regression type estimator combining the adjusted weighting type estimator and the modelling type estimator. These estimators are illustrated using data from an Alzheimer’s disease study in two populations. Simulation results are presented to investigate the performances of the proposed estimators under various situations. PMID:10931514

  14. Systematic review of statistical approaches to quantify, or correct for, measurement error in a continuous exposure in nutritional epidemiology.

    PubMed

    Bennett, Derrick A; Landry, Denise; Little, Julian; Minelli, Cosetta

    2017-09-19

    Several statistical approaches have been proposed to assess and correct for exposure measurement error. We aimed to provide a critical overview of the most common approaches used in nutritional epidemiology. MEDLINE, EMBASE, BIOSIS and CINAHL were searched for reports published in English up to May 2016 in order to ascertain studies that described methods aimed to quantify and/or correct for measurement error for a continuous exposure in nutritional epidemiology using a calibration study. We identified 126 studies, 43 of which described statistical methods and 83 that applied any of these methods to a real dataset. The statistical approaches in the eligible studies were grouped into: a) approaches to quantify the relationship between different dietary assessment instruments and "true intake", which were mostly based on correlation analysis and the method of triads; b) approaches to adjust point and interval estimates of diet-disease associations for measurement error, mostly based on regression calibration analysis and its extensions. Two approaches (multiple imputation and moment reconstruction) were identified that can deal with differential measurement error. For regression calibration, the most common approach to correct for measurement error used in nutritional epidemiology, it is crucial to ensure that its assumptions and requirements are fully met. Analyses that investigate the impact of departures from the classical measurement error model on regression calibration estimates can be helpful to researchers in interpreting their findings. With regard to the possible use of alternative methods when regression calibration is not appropriate, the choice of method should depend on the measurement error model assumed, the availability of suitable calibration study data and the potential for bias due to violation of the classical measurement error model assumptions. On the basis of this review, we provide some practical advice for the use of methods to assess and adjust for measurement error in nutritional epidemiology.

  15. The Development of a Stochastic Model of the Atmosphere Between 30 and 90 Km to Be Used in Determining the Effect of Atmospheric Variability on Space Shuttle Entry Parameters. Ph.D. Thesis - Virginia Polytechnic Inst. and State Univ.

    NASA Technical Reports Server (NTRS)

    Campbell, J. W.

    1973-01-01

    A stochasitc model of the atmosphere between 30 and 90 km was developed for use in Monte Carlo space shuttle entry studies. The model is actually a family of models, one for each latitude-season category as defined in the 1966 U.S. Standard Atmosphere Supplements. Each latitude-season model generates a pseudo-random temperature profile whose mean is the appropriate temperature profile from the Standard Atmosphere Supplements. The standard deviation of temperature at each altitude for a given latitude-season model was estimated from sounding-rocket data. Departures from the mean temperature at each altitude were produced by assuming a linear regression of temperature on the solar heating rate of ozone. A profile of random ozone concentrations was first generated using an auxiliary stochastic ozone model, also developed as part of this study, and then solar heating rates were computed for the random ozone concentrations.

  16. Impact of increasing heat waves on U.S. ozone episodes in the 2050s: Results from a multimodel analysis using extreme value theory

    NASA Astrophysics Data System (ADS)

    Shen, L.; Mickley, L. J.; Gilleland, E.

    2016-04-01

    We develop a statistical model using extreme value theory to estimate the 2000-2050 changes in ozone episodes across the United States. We model the relationships between daily maximum temperature (Tmax) and maximum daily 8 h average (MDA8) ozone in May-September over 2003-2012 using a Point Process (PP) model. At ~20% of the sites, a marked decrease in the ozone-temperature slope occurs at high temperatures, defined as ozone suppression. The PP model sometimes fails to capture ozone-Tmax relationships, so we refit the ozone-Tmax slope using logistic regression and a generalized Pareto distribution model. We then apply the resulting hybrid-extreme value theory model to projections of Tmax from an ensemble of downscaled climate models. Assuming constant anthropogenic emissions at the present level, we find an average increase of 2.3 d a-1 in ozone episodes (>75 ppbv) across the United States by the 2050s, with a change of +3-9 d a-1 at many sites.

  17. Validation of alternative indicators of social support in perinatal outcomes research using quality of the partner relationship.

    PubMed

    Kruse, Julie A; Low, Lisa Kane; Seng, Julia S

    2013-07-01

    To test alternatives to the current research and clinical practice of assuming that married or partnered status is a proxy for positive social support. Having a partner is assumed to relate to better health status via the intermediary process of social support. However, women's health research indicates that having a partner is not always associated with positive social support. An exploratory post hoc analysis focused on posttraumatic stress and childbearing was conducted using a large perinatal database from 2005-2009. To operationalize partner relationship, four variables were analysed: partner ('yes' or 'no'), intimate partner violence ('yes' or 'no'), the combination of those two factors, and the woman's appraisal of the quality of her partner relationship via a single item. Construct validity of these four alternative variables was assessed in relation to appraisal of the partner's social support in labour and the postpartum using linear regression standardized betas and adjusted R-squares. Predictive validity was assessed using unadjusted and adjusted linear regression modelling. Four groups were compared. Married, abused women differed most from married, not abused women in relation to the social support, and depression outcomes used for validity checks. The variable representing the women's appraisals of their partner relationships accounts for the most variance in predicting depression scores. Our results support the validity of operationalizing the impact of the partner relationship on outcomes using a combination of partnered status and abuse status or using a subjective rating of quality of the partner relationship. © 2012 Blackwell Publishing Ltd.

  18. Analyzing ROC curves using the effective set-size model

    NASA Astrophysics Data System (ADS)

    Samuelson, Frank W.; Abbey, Craig K.; He, Xin

    2018-03-01

    The Effective Set-Size model has been used to describe uncertainty in various signal detection experiments. The model regards images as if they were an effective number (M*) of searchable locations, where the observer treats each location as a location-known-exactly detection task with signals having average detectability d'. The model assumes a rational observer behaves as if he searches an effective number of independent locations and follows signal detection theory at each location. Thus the location-known-exactly detectability (d') and the effective number of independent locations M* fully characterize search performance. In this model the image rating in a single-response task is assumed to be the maximum response that the observer would assign to these many locations. The model has been used by a number of other researchers, and is well corroborated. We examine this model as a way of differentiating imaging tasks that radiologists perform. Tasks involving more searching or location uncertainty may have higher estimated M* values. In this work we applied the Effective Set-Size model to a number of medical imaging data sets. The data sets include radiologists reading screening and diagnostic mammography with and without computer-aided diagnosis (CAD), and breast tomosynthesis. We developed an algorithm to fit the model parameters using two-sample maximum-likelihood ordinal regression, similar to the classic bi-normal model. The resulting model ROC curves are rational and fit the observed data well. We find that the distributions of M* and d' differ significantly among these data sets, and differ between pairs of imaging systems within studies. For example, on average tomosynthesis increased readers' d' values, while CAD reduced the M* parameters. We demonstrate that the model parameters M* and d' are correlated. We conclude that the Effective Set-Size model may be a useful way of differentiating location uncertainty from the diagnostic uncertainty in medical imaging tasks.

  19. Geographic dimensions of heat-related mortality in seven U.S. cities.

    PubMed

    Hondula, David M; Davis, Robert E; Saha, Michael V; Wegner, Carleigh R; Veazey, Lindsay M

    2015-04-01

    Spatially targeted interventions may help protect the public when extreme heat occurs. Health outcome data are increasingly being used to map intra-urban variability in heat-health risks, but there has been little effort to compare patterns and risk factors between cities. We sought to identify places within large metropolitan areas where the mortality rate is highest on hot summer days and determine if characteristics of high-risk areas are consistent from one city to another. A Poisson regression model was adapted to quantify temperature-mortality relationships at the postal code scale based on 2.1 million records of daily all-cause mortality counts from seven U.S. cities. Multivariate spatial regression models were then used to determine the demographic and environmental variables most closely associated with intra-city variability in risk. Significant mortality increases on extreme heat days were confined to 12-44% of postal codes comprising each city. Places with greater risk had more developed land, young, elderly, and minority residents, and lower income and educational attainment, but the key explanatory variables varied from one city to another. Regression models accounted for 14-34% of the spatial variability in heat-related mortality. The results emphasize the need for public health plans for heat to be locally tailored and not assume that pre-identified vulnerability indicators are universally applicable. As known risk factors accounted for no more than one third of the spatial variability in heat-health outcomes, consideration of health outcome data is important in efforts to identify and protect residents of the places where the heat-related health risks are the highest. Copyright © 2015 Elsevier Inc. All rights reserved.

  20. Multivariate Regression Analysis of Winter Ozone Events in the Uinta Basin of Eastern Utah, USA

    NASA Astrophysics Data System (ADS)

    Mansfield, M. L.

    2012-12-01

    I report on a regression analysis of a number of variables that are involved in the formation of winter ozone in the Uinta Basin of Eastern Utah. One goal of the analysis is to develop a mathematical model capable of predicting the daily maximum ozone concentration from values of a number of independent variables. The dependent variable is the daily maximum ozone concentration at a particular site in the basin. Independent variables are (1) daily lapse rate, (2) daily "basin temperature" (defined below), (3) snow cover, (4) midday solar zenith angle, (5) monthly oil production, (6) monthly gas production, and (7) the number of days since the beginning of a multi-day inversion event. Daily maximum temperature and daily snow cover data are available at ten or fifteen different sites throughout the basin. The daily lapse rate is defined operationally as the slope of the linear least-squares fit to the temperature-altitude plot, and the "basin temperature" is defined as the value assumed by the same least-squares line at an altitude of 1400 m. A multi-day inversion event is defined as a set of consecutive days for which the lapse rate remains positive. The standard deviation in the accuracy of the model is about 10 ppb. The model has been combined with historical climate and oil & gas production data to estimate historical ozone levels.

  1. Short-term forecasts gain in accuracy. [Regression technique using ''Box-Jenkins'' analysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Not Available

    Box-Jenkins time-series models offer accuracy for short-term forecasts that compare with large-scale macroeconomic forecasts. Utilities need to be able to forecast peak demand in order to plan their generating, transmitting, and distribution systems. This new method differs from conventional models by not assuming specific data patterns, but by fitting available data into a tentative pattern on the basis of auto-correlations. Three types of models (autoregressive, moving average, or mixed autoregressive/moving average) can be used according to which provides the most appropriate combination of autocorrelations and related derivatives. Major steps in choosing a model are identifying potential models, estimating the parametersmore » of the problem, and running a diagnostic check to see if the model fits the parameters. The Box-Jenkins technique is well suited for seasonal patterns, which makes it possible to have as short as hourly forecasts of load demand. With accuracy up to two years, the method will allow electricity price-elasticity forecasting that can be applied to facility planning and rate design. (DCK)« less

  2. Educational attainment among long-term survivors of cancer in childhood and adolescence: a Norwegian population-based cohort study.

    PubMed

    Ghaderi, Sara; Engeland, Anders; Gunnes, Maria Winther; Moster, Dag; Ruud, Ellen; Syse, Astri; Wesenberg, Finn; Bjørge, Tone

    2016-02-01

    The number of young cancer survivors has increased over the past few decades due to improvement in treatment regimens, and understanding of long-term effects among the survivors has become even more important. Educational achievements and choice of educational fields were explored here. Five-year cancer survivors born in Norway during 1965-1985 (diagnosed <19 years) were included in our analysis by linking Norwegian population-based registries. Cox regression was applied to study the educational attainment among survivors of central nervous system (CNS) tumours, those assumed to have received CNS-directed therapy, and other cancer survivors relative to the cancer-free population. Logistic regression was used to compare the choice of educational fields between the cancer survivors at undergraduate and graduate level and the cancer-free population. Overall, a lower proportion of the cancer survivors completed intermediate (67 vs. 70 %), undergraduate (31 vs. 35 %) and graduate education (7 vs. 9 %) compared with the cancer-free population. Deficits in completion of an educational level were mainly observed among survivors of CNS-tumours and those assumed to have received CNS-directed therapy. Choices of educational fields among cancer survivors were in general similar with the cancer-free population at both undergraduate and graduate levels. Survivors of CNS-tumours and those assumed to have received CNS-directed therapy were at increased risk for educational impairments compared with the cancer-free population. Choices of educational fields were in general similar. Careful follow-up of the survivors of CNS-tumours and those assumed to have received CNS-directed therapy is important at each level of education.

  3. Employer-sponsored insurance, health care cost growth, and the economic performance of U.S. Industries.

    PubMed

    Sood, Neeraj; Ghosh, Arkadipta; Escarce, José J

    2009-10-01

    To estimate the effect of growth in health care costs that outpaces gross domestic product (GDP) growth ("excess" growth in health care costs) on employment, gross output, and value added to GDP of U.S. industries. We analyzed data from 38 U.S. industries for the period 1987-2005. All data are publicly available from various government agencies. We estimated bivariate and multivariate regressions. To develop the regression models, we assumed that rapid growth in health care costs has a larger effect on economic performance for industries where large percentages of workers receive employer-sponsored health insurance (ESI). We used the estimated regression coefficients to simulate economic outcomes under alternative scenarios of health care cost inflation. Faster growth in health care costs had greater adverse effects on economic outcomes for industries with larger percentages of workers who had ESI. We found that a 10 percent increase in excess growth in health care costs would have resulted in 120,803 fewer jobs, US$28,022 million in lost gross output, and US$14,082 million in lost value added in 2005. These declines represent 0.17 to 0.18 percent of employment, gross output, and value added in 2005. Excess growth in health care costs is adversely affecting the economic performance of U.S. industries.

  4. Improved Regression Analysis of Temperature-Dependent Strain-Gage Balance Calibration Data

    NASA Technical Reports Server (NTRS)

    Ulbrich, N.

    2015-01-01

    An improved approach is discussed that may be used to directly include first and second order temperature effects in the load prediction algorithm of a wind tunnel strain-gage balance. The improved approach was designed for the Iterative Method that fits strain-gage outputs as a function of calibration loads and uses a load iteration scheme during the wind tunnel test to predict loads from measured gage outputs. The improved approach assumes that the strain-gage balance is at a constant uniform temperature when it is calibrated and used. First, the method introduces a new independent variable for the regression analysis of the balance calibration data. The new variable is designed as the difference between the uniform temperature of the balance and a global reference temperature. This reference temperature should be the primary calibration temperature of the balance so that, if needed, a tare load iteration can be performed. Then, two temperature{dependent terms are included in the regression models of the gage outputs. They are the temperature difference itself and the square of the temperature difference. Simulated temperature{dependent data obtained from Triumph Aerospace's 2013 calibration of NASA's ARC-30K five component semi{span balance is used to illustrate the application of the improved approach.

  5. Cross-Sectional HIV Incidence Surveillance: A Benchmarking of Approaches for Estimating the 'Mean Duration of Recent Infection'.

    PubMed

    Kassanjee, Reshma; De Angelis, Daniela; Farah, Marian; Hanson, Debra; Labuschagne, Jan Phillipus Lourens; Laeyendecker, Oliver; Le Vu, Stéphane; Tom, Brian; Wang, Rui; Welte, Alex

    2017-03-01

    The application of biomarkers for 'recent' infection in cross-sectional HIV incidence surveillance requires the estimation of critical biomarker characteristics. Various approaches have been employed for using longitudinal data to estimate the Mean Duration of Recent Infection (MDRI) - the average time in the 'recent' state. In this systematic benchmarking of MDRI estimation approaches, a simulation platform was used to measure accuracy and precision of over twenty approaches, in thirty scenarios capturing various study designs, subject behaviors and test dynamics that may be encountered in practice. Results highlight that assuming a single continuous sojourn in the 'recent' state can produce substantial bias. Simple interpolation provides useful MDRI estimates provided subjects are tested at regular intervals. Regression performs the best - while 'random effects' describe the subject-clustering in the data, regression models without random effects proved easy to implement, stable, and of similar accuracy in scenarios considered; robustness to parametric assumptions was improved by regressing 'recent'/'non-recent' classifications rather than continuous biomarker readings. All approaches were vulnerable to incorrect assumptions about subjects' (unobserved) infection times. Results provided show the relationships between MDRI estimation performance and the number of subjects, inter-visit intervals, missed visits, loss to follow-up, and aspects of biomarker signal and noise.

  6. Sieve analysis using the number of infecting pathogens.

    PubMed

    Follmann, Dean; Huang, Chiung-Yu

    2017-12-14

    Assessment of vaccine efficacy as a function of the similarity of the infecting pathogen to the vaccine is an important scientific goal. Characterization of pathogen strains for which vaccine efficacy is low can increase understanding of the vaccine's mechanism of action and offer targets for vaccine improvement. Traditional sieve analysis estimates differential vaccine efficacy using a single identifiable pathogen for each subject. The similarity between this single entity and the vaccine immunogen is quantified, for example, by exact match or number of mismatched amino acids. With new technology, we can now obtain the actual count of genetically distinct pathogens that infect an individual. Let F be the number of distinct features of a species of pathogen. We assume a log-linear model for the expected number of infecting pathogens with feature "f," f=1,…,F. The model can be used directly in studies with passive surveillance of infections where the count of each type of pathogen is recorded at the end of some interval, or active surveillance where the time of infection is known. For active surveillance, we additionally assume that a proportional intensity model applies to the time of potentially infectious exposures and derive product and weighted estimating equation (WEE) estimators for the regression parameters in the log-linear model. The WEE estimator explicitly allows for waning vaccine efficacy and time-varying distributions of pathogens. We give conditions where sieve parameters have a per-exposure interpretation under passive surveillance. We evaluate the methods by simulation and analyze a phase III trial of a malaria vaccine. © 2017, The International Biometric Society.

  7. Sufficient Forecasting Using Factor Models

    PubMed Central

    Fan, Jianqing; Xue, Lingzhou; Yao, Jiawei

    2017-01-01

    We consider forecasting a single time series when there is a large number of predictors and a possible nonlinear effect. The dimensionality was first reduced via a high-dimensional (approximate) factor model implemented by the principal component analysis. Using the extracted factors, we develop a novel forecasting method called the sufficient forecasting, which provides a set of sufficient predictive indices, inferred from high-dimensional predictors, to deliver additional predictive power. The projected principal component analysis will be employed to enhance the accuracy of inferred factors when a semi-parametric (approximate) factor model is assumed. Our method is also applicable to cross-sectional sufficient regression using extracted factors. The connection between the sufficient forecasting and the deep learning architecture is explicitly stated. The sufficient forecasting correctly estimates projection indices of the underlying factors even in the presence of a nonparametric forecasting function. The proposed method extends the sufficient dimension reduction to high-dimensional regimes by condensing the cross-sectional information through factor models. We derive asymptotic properties for the estimate of the central subspace spanned by these projection directions as well as the estimates of the sufficient predictive indices. We further show that the natural method of running multiple regression of target on estimated factors yields a linear estimate that actually falls into this central subspace. Our method and theory allow the number of predictors to be larger than the number of observations. We finally demonstrate that the sufficient forecasting improves upon the linear forecasting in both simulation studies and an empirical study of forecasting macroeconomic variables. PMID:29731537

  8. Modeling Drought Impact Occurrence Based on Climatological Drought Indices for Europe

    NASA Astrophysics Data System (ADS)

    Stagge, J. H.; Kohn, I.; Tallaksen, L. M.; Stahl, K.

    2014-12-01

    Meteorological drought indices are often assumed to accurately characterize the severity of a drought event; however, these indices do not necessarily reflect the likelihood or severity of a particular type of drought impact experienced on the ground. In previous research, this link between index and impact was often estimated based on thresholds found by experience, measured using composite indices with assumed weighting schemes, or defined based on very narrow impact measures, using either a narrow spatial extent or very specific impacts. This study expands on earlier work by demonstrating the feasibility of relating user-provided impact reports to the climatological drought indices SPI and SPEI by logistic regression. The user-provided drought impact reports are based on the European Drought Impact Inventory (EDII, www.geo.uio.no/edc/droughtdb/), a newly developed online database that allows both public report submission and querying the more than 4,000 reported impacts spanning 33 European countries. This new tool is used to quantify the link between meteorological drought indices and impacts focusing on four primary impact types, spanning agriculture, energy and industry, public water supply, and freshwater ecosystem across five European countries. Statistically significant climate indices are retained as predictors using step-wise regression and used to compare the most relevant drought indices and accumulation periods for different impact types and regions. Agricultural impacts are explained best by 2-12 month anomalies, with 2-3 month anomalies found in predominantly rain-fed agricultural regions, and anomalies greater than 3 months related to agricultural management practices. Energy and industry impacts, related to hydropower and energy cooling water in these countries, respond to longer accumulated precipitation anomalies (6-12 months). Public water supply and freshwater ecosystem impacts are explained by a more complex combination of short (1-3 month) and seasonal (6-12 month) anomalies. A mean of 47.0% (22.4-71.6%) impact deviance is explained by the resulting models, highlighting the feasibility of using such statistical techniques and drought impact databases to model drought impact likelihood based on relatively easily calculated meteorological drought indices.

  9. The Role of Auxiliary Variables in Deterministic and Deterministic-Stochastic Spatial Models of Air Temperature in Poland

    NASA Astrophysics Data System (ADS)

    Szymanowski, Mariusz; Kryza, Maciej

    2017-02-01

    Our study examines the role of auxiliary variables in the process of spatial modelling and mapping of climatological elements, with air temperature in Poland used as an example. The multivariable algorithms are the most frequently applied for spatialization of air temperature, and their results in many studies are proved to be better in comparison to those obtained by various one-dimensional techniques. In most of the previous studies, two main strategies were used to perform multidimensional spatial interpolation of air temperature. First, it was accepted that all variables significantly correlated with air temperature should be incorporated into the model. Second, it was assumed that the more spatial variation of air temperature was deterministically explained, the better was the quality of spatial interpolation. The main goal of the paper was to examine both above-mentioned assumptions. The analysis was performed using data from 250 meteorological stations and for 69 air temperature cases aggregated on different levels: from daily means to 10-year annual mean. Two cases were considered for detailed analysis. The set of potential auxiliary variables covered 11 environmental predictors of air temperature. Another purpose of the study was to compare the results of interpolation given by various multivariable methods using the same set of explanatory variables. Two regression models: multiple linear (MLR) and geographically weighted (GWR) method, as well as their extensions to the regression-kriging form, MLRK and GWRK, respectively, were examined. Stepwise regression was used to select variables for the individual models and the cross-validation method was used to validate the results with a special attention paid to statistically significant improvement of the model using the mean absolute error (MAE) criterion. The main results of this study led to rejection of both assumptions considered. Usually, including more than two or three of the most significantly correlated auxiliary variables does not improve the quality of the spatial model. The effects of introduction of certain variables into the model were not climatologically justified and were seen on maps as unexpected and undesired artefacts. The results confirm, in accordance with previous studies, that in the case of air temperature distribution, the spatial process is non-stationary; thus, the local GWR model performs better than the global MLR if they are specified using the same set of auxiliary variables. If only GWR residuals are autocorrelated, the geographically weighted regression-kriging (GWRK) model seems to be optimal for air temperature spatial interpolation.

  10. Exploring the use of random regression models with legendre polynomials to analyze measures of volume of ejaculate in Holstein bulls.

    PubMed

    Carabaño, M J; Díaz, C; Ugarte, C; Serrano, M

    2007-02-01

    Artificial insemination centers routinely collect records of quantity and quality of semen of bulls throughout the animals' productive period. The goal of this paper was to explore the use of random regression models with orthogonal polynomials to analyze repeated measures of semen production of Spanish Holstein bulls. A total of 8,773 records of volume of first ejaculate (VFE) collected between 12 and 30 mo of age from 213 Spanish Holstein bulls was analyzed under alternative random regression models. Legendre polynomial functions of increasing order (0 to 6) were fitted to the average trajectory, additive genetic and permanent environmental effects. Age at collection and days in production were used as time variables. Heterogeneous and homogeneous residual variances were alternatively assumed. Analyses were carried out within a Bayesian framework. The logarithm of the marginal density and the cross-validation predictive ability of the data were used as model comparison criteria. Based on both criteria, age at collection as a time variable and heterogeneous residuals models are recommended to analyze changes of VFE over time. Both criteria indicated that fitting random curves for genetic and permanent environmental components as well as for the average trajector improved the quality of models. Furthermore, models with a higher order polynomial for the permanent environmental (5 to 6) than for the genetic components (4 to 5) and the average trajectory (2 to 3) tended to perform best. High-order polynomials were needed to accommodate the highly oscillating nature of the phenotypic values. Heritability and repeatability estimates, disregarding the extremes of the studied period, ranged from 0.15 to 0.35 and from 0.20 to 0.50, respectively, indicating that selection for VFE may be effective at any stage. Small differences among models were observed. Apart from the extremes, estimated correlations between ages decreased steadily from 0.9 and 0.4 for measures 1 mo apart to 0.4 and 0.2 for most distant measures for additive genetic and phenotypic components, respectively. Further investigation to account for environmental factors that may be responsible for the oscillating observations of VFE is needed.

  11. Trend analysis of the long-term Swiss ozone measurements

    NASA Technical Reports Server (NTRS)

    Staehelin, Johannes; Bader, Juerg; Gelpke, Verena

    1994-01-01

    Trend analyses, assuming a linear trend which started at 1970, were performed from total ozone measurements from Arosa (Switzerland, 1926-1991). Decreases in monthly mean values were statistically significant for October through April showing decreases of about 2.0-4 percent per decade. For the period 1947-91, total ozone trends were further investigated using a multiple regression model. Temperature of a mountain peak in Switzerland (Mt. Santis), the F10.7 solar flux series, the QBO series (quasi biennial oscillation), and the southern oscillation index (SOI) were included as explanatory variables. Trends in the monthly mean values were statistically significant for December through April. The same multiple regression model was applied to investigate the ozone trends at various altitudes using the ozone balloon soundings from Payerne (1967-1989) and the Umkehr measurements from Arosa (1947-1989). The results show four different vertical trend regimes: On a relative scale changes were largest in the troposphere (increase of about 10 percent per decade). On an absolute scale the largest trends were obtained in the lower stratosphere (decrease of approximately 6 per decade at an altitude of about 18 to 22 km). No significant trends were observed at approximately 30 km, whereas stratospheric ozone decreased in the upper stratosphere.

  12. A determinant-based criterion for working correlation structure selection in generalized estimating equations.

    PubMed

    Jaman, Ajmery; Latif, Mahbub A H M; Bari, Wasimul; Wahed, Abdus S

    2016-05-20

    In generalized estimating equations (GEE), the correlation between the repeated observations on a subject is specified with a working correlation matrix. Correct specification of the working correlation structure ensures efficient estimators of the regression coefficients. Among the criteria used, in practice, for selecting working correlation structure, Rotnitzky-Jewell, Quasi Information Criterion (QIC) and Correlation Information Criterion (CIC) are based on the fact that if the assumed working correlation structure is correct then the model-based (naive) and the sandwich (robust) covariance estimators of the regression coefficient estimators should be close to each other. The sandwich covariance estimator, used in defining the Rotnitzky-Jewell, QIC and CIC criteria, is biased downward and has a larger variability than the corresponding model-based covariance estimator. Motivated by this fact, a new criterion is proposed in this paper based on the bias-corrected sandwich covariance estimator for selecting an appropriate working correlation structure in GEE. A comparison of the proposed and the competing criteria is shown using simulation studies with correlated binary responses. The results revealed that the proposed criterion generally performs better than the competing criteria. An example of selecting the appropriate working correlation structure has also been shown using the data from Madras Schizophrenia Study. Copyright © 2015 John Wiley & Sons, Ltd.

  13. Nonlinear isochrones in murine left ventricular pressure-volume loops: how well does the time-varying elastance concept hold?

    PubMed

    Claessens, T E; Georgakopoulos, D; Afanasyeva, M; Vermeersch, S J; Millar, H D; Stergiopulos, N; Westerhof, N; Verdonck, P R; Segers, P

    2006-04-01

    The linear time-varying elastance theory is frequently used to describe the change in ventricular stiffness during the cardiac cycle. The concept assumes that all isochrones (i.e., curves that connect pressure-volume data occurring at the same time) are linear and have a common volume intercept. Of specific interest is the steepest isochrone, the end-systolic pressure-volume relationship (ESPVR), of which the slope serves as an index for cardiac contractile function. Pressure-volume measurements, achieved with a combined pressure-conductance catheter in the left ventricle of 13 open-chest anesthetized mice, showed a marked curvilinearity of the isochrones. We therefore analyzed the shape of the isochrones by using six regression algorithms (two linear, two quadratic, and two logarithmic, each with a fixed or time-varying intercept) and discussed the consequences for the elastance concept. Our main observations were 1) the volume intercept varies considerably with time; 2) isochrones are equally well described by using quadratic or logarithmic regression; 3) linear regression with a fixed intercept shows poor correlation (R(2) < 0.75) during isovolumic relaxation and early filling; and 4) logarithmic regression is superior in estimating the fixed volume intercept of the ESPVR. In conclusion, the linear time-varying elastance fails to provide a sufficiently robust model to account for changes in pressure and volume during the cardiac cycle in the mouse ventricle. A new framework accounting for the nonlinear shape of the isochrones needs to be developed.

  14. Self-consistent asset pricing models

    NASA Astrophysics Data System (ADS)

    Malevergne, Y.; Sornette, D.

    2007-08-01

    We discuss the foundations of factor or regression models in the light of the self-consistency condition that the market portfolio (and more generally the risk factors) is (are) constituted of the assets whose returns it is (they are) supposed to explain. As already reported in several articles, self-consistency implies correlations between the return disturbances. As a consequence, the alphas and betas of the factor model are unobservable. Self-consistency leads to renormalized betas with zero effective alphas, which are observable with standard OLS regressions. When the conditions derived from internal consistency are not met, the model is necessarily incomplete, which means that some sources of risk cannot be replicated (or hedged) by a portfolio of stocks traded on the market, even for infinite economies. Analytical derivations and numerical simulations show that, for arbitrary choices of the proxy which are different from the true market portfolio, a modified linear regression holds with a non-zero value αi at the origin between an asset i's return and the proxy's return. Self-consistency also introduces “orthogonality” and “normality” conditions linking the betas, alphas (as well as the residuals) and the weights of the proxy portfolio. Two diagnostics based on these orthogonality and normality conditions are implemented on a basket of 323 assets which have been components of the S&P500 in the period from January 1990 to February 2005. These two diagnostics show interesting departures from dynamical self-consistency starting about 2 years before the end of the Internet bubble. Assuming that the CAPM holds with the self-consistency condition, the OLS method automatically obeys the resulting orthogonality and normality conditions and therefore provides a simple way to self-consistently assess the parameters of the model by using proxy portfolios made only of the assets which are used in the CAPM regressions. Finally, the factor decomposition with the self-consistency condition derives a risk-factor decomposition in the multi-factor case which is identical to the principal component analysis (PCA), thus providing a direct link between model-driven and data-driven constructions of risk factors. This correspondence shows that PCA will therefore suffer from the same limitations as the CAPM and its multi-factor generalization, namely lack of out-of-sample explanatory power and predictability. In the multi-period context, the self-consistency conditions force the betas to be time-dependent with specific constraints.

  15. Drainage investment and wetland loss: an analysis of the national resources inventory data

    USGS Publications Warehouse

    Douglas, Aaron J.; Johnson, Richard L.

    1994-01-01

    The United States Soil Conservation Service (SCS) conducts a survey for the purpose of establishing an agricultural land use database. This survey is called the National Resources Inventory (NRI) database. The complex NRI land classification system, in conjunction with the quantitative information gathered by the survey, has numerous applications. The current paper uses the wetland area data gathered by the NRI in 1982 and 1987 to examine empirically the factors that generate wetland loss in the United States. The cross-section regression models listed here use the quantity of wetlands, the stock of drainage capital, the realty value of farmland and drainage costs to explain most of the cross-state variation in wetland loss rates. Wetlands preservation efforts by federal agencies assume that pecuniary economic factors play a decisive role in wetland drainage. The empirical models tested in the present paper validate this assumption.

  16. Two-factor theory – at the intersection of health care management and patient satisfaction

    PubMed Central

    Bohm, Josef

    2012-01-01

    Using data obtained from the 2004 Joint Canadian/United States Survey of Health, an analytic model using principles derived from Herzberg’s motivational hygiene theory was developed for evaluating patient satisfaction with health care. The analysis sought to determine whether survey variables associated with consumer satisfaction act as Hertzberg factors and contribute to survey participants’ self-reported levels of health care satisfaction. To validate the technique, data from the survey were analyzed using logistic regression methods and then compared with results obtained from the two-factor model. The findings indicate a high degree of correlation between the two methods. The two-factor analytical methodology offers advantages due to its ability to identify whether a factor assumes a motivational or hygienic role and assesses the influence of a factor within select populations. Its ease of use makes this methodology well suited for assessment of multidimensional variables. PMID:23055755

  17. Two-factor theory - at the intersection of health care management and patient satisfaction.

    PubMed

    Bohm, Josef

    2012-01-01

    Using data obtained from the 2004 Joint Canadian/United States Survey of Health, an analytic model using principles derived from Herzberg's motivational hygiene theory was developed for evaluating patient satisfaction with health care. The analysis sought to determine whether survey variables associated with consumer satisfaction act as Hertzberg factors and contribute to survey participants' self-reported levels of health care satisfaction. To validate the technique, data from the survey were analyzed using logistic regression methods and then compared with results obtained from the two-factor model. The findings indicate a high degree of correlation between the two methods. The two-factor analytical methodology offers advantages due to its ability to identify whether a factor assumes a motivational or hygienic role and assesses the influence of a factor within select populations. Its ease of use makes this methodology well suited for assessment of multidimensional variables.

  18. A historical analysis of natural gas demand

    NASA Astrophysics Data System (ADS)

    Dalbec, Nathan Richard

    This thesis analyzes demand in the US energy market for natural gas, oil, and coal over the period of 1918-2013 and examines their price relationship over the period of 2007-2013. Diagnostic tests for time series were used; Augmented Dickey-Fuller, Kwiatkowski-Phillips-Schmidt-Shin, Johansen cointegration, Granger Causality and weak exogeneity tests. Directed acyclic graphs were used as a complimentary test for endogeneity. Due to the varied results in determining endogeneity, a seemingly unrelated regression model was used which assumes all right hand side variables in the three demand equations were exogenous. A number of factors were significant in determining demand for natural gas including its own price, lagged demand, a number of structural break dummies, and trend, while oil indicate some substitutability with natural gas. An error correction model was used to examine the price relationships. Natural gas price was found not to have a significant cointegrating vector.

  19. Isostatic compensation of equatorial highlands on Venus

    NASA Technical Reports Server (NTRS)

    Kucinskas, Algis B.; Turcotte, Donald L.

    1994-01-01

    Spherical harmonic models for Venus' global topography and gravity incorporating Magellan data are used to test isostatic compensation models in five 30 deg x 30 deg regions representative of the main classes of equatorial highlands. The power spectral density for the harmonic models obeys a power-law scaling with spectral slope Beta approximately 2 (Brown noise) for the topography and Beta approximately 3 (Kaula's law) for the geoid, similar to what is observed for Earth. The Venus topography spectrum has lower amplitudes than Earth's which reflects the dominant lowland topography on Venus. Observed degree geoid to topography ratios (GTRs) on Venus are significantly smaller than degree GTRs for uncompensated topography, indicative of substantial compensation. Assuming a global Airy compensation, most of the topography is compensated at depths greater than 100 km, suggesting a thick lithosphere on Venus. For each region considered we obtain a regional degree of compensation C from a linear regression of Bouguer anomaly versus Bouguer gravity data. Geoid anomaly (N) versus topography variation (h) data for each sample were compared, in the least-squares sense, to theoretical correlations for Pratt, Airy, and thermal thinning isostasy models yielding regional GTR, zero-elevation crustal thickness (H), and zero elevation thermal lithosphere thickness (y(sub L(sub 0)), respectively. We find the regional compensation to be substantial (C approximately 52-80%), and the h, N data correlations in the chosen areas can be explained by isostasy models applicable on the Earth and involving variations in crustal thickness (Airy) and/or lithospheric (thermal thinning) thickness. However, a thick crust and lithosphere (y(sub L(sub 0)) approximately 300 km) must be assumed for Venus.

  20. Modeling the effects of AADT on predicting multiple-vehicle crashes at urban and suburban signalized intersections.

    PubMed

    Chen, Chen; Xie, Yuanchang

    2016-06-01

    Annual Average Daily Traffic (AADT) is often considered as a main covariate for predicting crash frequencies at urban and suburban intersections. A linear functional form is typically assumed for the Safety Performance Function (SPF) to describe the relationship between the natural logarithm of expected crash frequency and covariates derived from AADTs. Such a linearity assumption has been questioned by many researchers. This study applies Generalized Additive Models (GAMs) and Piecewise Linear Negative Binomial (PLNB) regression models to fit intersection crash data. Various covariates derived from minor-and major-approach AADTs are considered. Three different dependent variables are modeled, which are total multiple-vehicle crashes, rear-end crashes, and angle crashes. The modeling results suggest that a nonlinear functional form may be more appropriate. Also, the results show that it is important to take into consideration the joint safety effects of multiple covariates. Additionally, it is found that the ratio of minor to major-approach AADT has a varying impact on intersection safety and deserves further investigations. Copyright © 2016 Elsevier Ltd. All rights reserved.

  1. Improving RNA nearest neighbor parameters for helices by going beyond the two-state model.

    PubMed

    Spasic, Aleksandar; Berger, Kyle D; Chen, Jonathan L; Seetin, Matthew G; Turner, Douglas H; Mathews, David H

    2018-06-01

    RNA folding free energy change nearest neighbor parameters are widely used to predict folding stabilities of secondary structures. They were determined by linear regression to datasets of optical melting experiments on small model systems. Traditionally, the optical melting experiments are analyzed assuming a two-state model, i.e. a structure is either complete or denatured. Experimental evidence, however, shows that structures exist in an ensemble of conformations. Partition functions calculated with existing nearest neighbor parameters predict that secondary structures can be partially denatured, which also directly conflicts with the two-state model. Here, a new approach for determining RNA nearest neighbor parameters is presented. Available optical melting data for 34 Watson-Crick helices were fit directly to a partition function model that allows an ensemble of conformations. Fitting parameters were the enthalpy and entropy changes for helix initiation, terminal AU pairs, stacks of Watson-Crick pairs and disordered internal loops. The resulting set of nearest neighbor parameters shows a 38.5% improvement in the sum of residuals in fitting the experimental melting curves compared to the current literature set.

  2. Geographically weighted regression and geostatistical techniques to construct the geogenic radon potential map of the Lazio region: A methodological proposal for the European Atlas of Natural Radiation.

    PubMed

    Ciotoli, G; Voltaggio, M; Tuccimei, P; Soligo, M; Pasculli, A; Beaubien, S E; Bigi, S

    2017-01-01

    In many countries, assessment programmes are carried out to identify areas where people may be exposed to high radon levels. These programmes often involve detailed mapping, followed by spatial interpolation and extrapolation of the results based on the correlation of indoor radon values with other parameters (e.g., lithology, permeability and airborne total gamma radiation) to optimise the radon hazard maps at the municipal and/or regional scale. In the present work, Geographical Weighted Regression and geostatistics are used to estimate the Geogenic Radon Potential (GRP) of the Lazio Region, assuming that the radon risk only depends on the geological and environmental characteristics of the study area. A wide geodatabase has been organised including about 8000 samples of soil-gas radon, as well as other proxy variables, such as radium and uranium content of homogeneous geological units, rock permeability, and faults and topography often associated with radon production/migration in the shallow environment. All these data have been processed in a Geographic Information System (GIS) using geospatial analysis and geostatistics to produce base thematic maps in a 1000 m × 1000 m grid format. Global Ordinary Least Squared (OLS) regression and local Geographical Weighted Regression (GWR) have been applied and compared assuming that the relationships between radon activities and the environmental variables are not spatially stationary, but vary locally according to the GRP. The spatial regression model has been elaborated considering soil-gas radon concentrations as the response variable and developing proxy variables as predictors through the use of a training dataset. Then a validation procedure was used to predict soil-gas radon values using a test dataset. Finally, the predicted values were interpolated using the kriging algorithm to obtain the GRP map of the Lazio region. The map shows some high GRP areas corresponding to the volcanic terrains (central-northern sector of Lazio region) and to faulted and fractured carbonate rocks (central-southern and eastern sectors of the Lazio region). This typical local variability of autocorrelated phenomena can only be taken into account by using local methods for spatial data analysis. The constructed GRP map can be a useful tool to implement radon policies at both the national and local levels, providing critical data for land use and planning purposes. Copyright © 2016 Elsevier Ltd. All rights reserved.

  3. Measuring demand for flat water recreation using a two-stage/disequilibrium travel cost model with adjustment for overdispersion and self-selection

    NASA Astrophysics Data System (ADS)

    McKean, John R.; Johnson, Donn; Taylor, R. Garth

    2003-04-01

    An alternate travel cost model is applied to an on-site sample to estimate the value of flat water recreation on the impounded lower Snake River. Four contiguous reservoirs would be eliminated if the dams are breached to protect endangered Pacific salmon and steelhead trout. The empirical method applies truncated negative binomial regression with adjustment for endogenous stratification. The two-stage decision model assumes that recreationists allocate their time among work and leisure prior to deciding among consumer goods. The allocation of time and money among goods in the second stage is conditional on the predetermined work time and income. The second stage is a disequilibrium labor market which also applies if employers set work hours or if recreationists are not in the labor force. When work time is either predetermined, fixed by contract, or nonexistent, recreationists must consider separate prices and budgets for time and money.

  4. Discrete Emotion Effects on Lexical Decision Response Times

    PubMed Central

    Briesemeister, Benny B.; Kuchinke, Lars; Jacobs, Arthur M.

    2011-01-01

    Our knowledge about affective processes, especially concerning effects on cognitive demands like word processing, is increasing steadily. Several studies consistently document valence and arousal effects, and although there is some debate on possible interactions and different notions of valence, broad agreement on a two dimensional model of affective space has been achieved. Alternative models like the discrete emotion theory have received little interest in word recognition research so far. Using backward elimination and multiple regression analyses, we show that five discrete emotions (i.e., happiness, disgust, fear, anger and sadness) explain as much variance as two published dimensional models assuming continuous or categorical valence, with the variables happiness, disgust and fear significantly contributing to this account. Moreover, these effects even persist in an experiment with discrete emotion conditions when the stimuli are controlled for emotional valence and arousal levels. We interpret this result as evidence for discrete emotion effects in visual word recognition that cannot be explained by the two dimensional affective space account. PMID:21887307

  5. Measurement System Characterization in the Presence of Measurement Errors

    NASA Technical Reports Server (NTRS)

    Commo, Sean A.

    2012-01-01

    In the calibration of a measurement system, data are collected in order to estimate a mathematical model between one or more factors of interest and a response. Ordinary least squares is a method employed to estimate the regression coefficients in the model. The method assumes that the factors are known without error; yet, it is implicitly known that the factors contain some uncertainty. In the literature, this uncertainty is known as measurement error. The measurement error affects both the estimates of the model coefficients and the prediction, or residual, errors. There are some methods, such as orthogonal least squares, that are employed in situations where measurement errors exist, but these methods do not directly incorporate the magnitude of the measurement errors. This research proposes a new method, known as modified least squares, that combines the principles of least squares with knowledge about the measurement errors. This knowledge is expressed in terms of the variance ratio - the ratio of response error variance to measurement error variance.

  6. Discrete emotion effects on lexical decision response times.

    PubMed

    Briesemeister, Benny B; Kuchinke, Lars; Jacobs, Arthur M

    2011-01-01

    Our knowledge about affective processes, especially concerning effects on cognitive demands like word processing, is increasing steadily. Several studies consistently document valence and arousal effects, and although there is some debate on possible interactions and different notions of valence, broad agreement on a two dimensional model of affective space has been achieved. Alternative models like the discrete emotion theory have received little interest in word recognition research so far. Using backward elimination and multiple regression analyses, we show that five discrete emotions (i.e., happiness, disgust, fear, anger and sadness) explain as much variance as two published dimensional models assuming continuous or categorical valence, with the variables happiness, disgust and fear significantly contributing to this account. Moreover, these effects even persist in an experiment with discrete emotion conditions when the stimuli are controlled for emotional valence and arousal levels. We interpret this result as evidence for discrete emotion effects in visual word recognition that cannot be explained by the two dimensional affective space account.

  7. Deployment and Alcohol Use in a Military Cohort: Use of Combined Methods to Account for Exposure-Related Covariates and Heterogeneous Response to Exposure.

    PubMed

    Fink, David S; Keyes, Katherine M; Calabrese, Joseph R; Liberzon, Israel; Tamburrino, Marijo B; Cohen, Gregory H; Sampson, Laura; Galea, Sandro

    2017-08-15

    Studies have shown that combat-area deployment is associated with increases in alcohol use; however, studying the influence of deployment on alcohol use faces 2 complications. First, the military considers a confluence of factors before determining whether to deploy a service member, creating a nonignorable exposure and unbalanced comparison groups that inevitably complicate inference about the role of deployment itself. Second, regression analysis assumes that a single effect estimate can approximate the population's change in postdeployment alcohol use, which ignores previous studies that have documented that respondents tend to exhibit heterogeneous postdeployment drinking behaviors. Therefore, we used propensity score matching to balance baseline covariates for the 2 comparison groups (deployed and nondeployed), followed by a variable-oriented difference-in-differences approach to account for the confounding and a person-oriented approach using a latent growth mixture model to account for the heterogeneous response to deployment in this prospective cohort study of the US Army National Guard (2009-2014). We observed a nonsignificant increase in estimated monthly drinks in the first year after deployment that regressed to predeployment drinking levels 2 years after deployment. We found a 4-class model that fit these data best, suggesting that common regression analyses likely conceal substantial interindividual heterogeneity in postdeployment alcohol-use behaviors. © The Author(s) 2017. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  8. Finding vulnerable subpopulations in the Seychelles Child Development Study: effect modification with latent groups.

    PubMed

    Love, Tanzy Mt; Thurston, Sally W; Davidson, Philip W

    2017-04-01

    The Seychelles Child Development Study is a research project with the objective of examining associations between prenatal exposure to low doses of methylmercury from maternal fish consumption and children's developmental outcomes. Whether methylmercury has neurotoxic effects at low doses remains unclear and recommendations for pregnant women and children to reduce fish intake may prevent a substantial number of people from receiving sufficient nutrients that are abundant in fish. The primary findings of the Seychelles Child Development Study are inconsistent with adverse associations between methylmercury from fish consumption and neurodevelopmental outcomes. However, whether there are subpopulations of children who are particularly sensitive to this diet is an open question. Secondary analysis from this study found significant interactions between prenatal methylmercury levels and both caregiver IQ and income on 19-month IQ. These results are sensitive to the categories chosen for these covariates and are difficult to interpret collectively. In this paper, we estimate effect modification of the association between prenatal methylmercury exposure and 19-month IQ using a general formulation of mixture regression. Our mixture regression model creates a latent categorical group membership variable which interacts with methylmercury in predicting the outcome. We also fit the same outcome model when in addition the latent variable is assumed to be a parametric function of three distinct socioeconomic measures. Bayesian methods allow group membership and the regression coefficients to be estimated simultaneously and our approach yields a principled choice of the number of distinct subpopulations. The results show three groups with different response patterns between prenatal methylmercury exposure and 19-month IQ in this population.

  9. Model-checking techniques based on cumulative residuals.

    PubMed

    Lin, D Y; Wei, L J; Ying, Z

    2002-03-01

    Residuals have long been used for graphical and numerical examinations of the adequacy of regression models. Conventional residual analysis based on the plots of raw residuals or their smoothed curves is highly subjective, whereas most numerical goodness-of-fit tests provide little information about the nature of model misspecification. In this paper, we develop objective and informative model-checking techniques by taking the cumulative sums of residuals over certain coordinates (e.g., covariates or fitted values) or by considering some related aggregates of residuals, such as moving sums and moving averages. For a variety of statistical models and data structures, including generalized linear models with independent or dependent observations, the distributions of these stochastic processes tinder the assumed model can be approximated by the distributions of certain zero-mean Gaussian processes whose realizations can be easily generated by computer simulation. Each observed process can then be compared, both graphically and numerically, with a number of realizations from the Gaussian process. Such comparisons enable one to assess objectively whether a trend seen in a residual plot reflects model misspecification or natural variation. The proposed techniques are particularly useful in checking the functional form of a covariate and the link function. Illustrations with several medical studies are provided.

  10. Development and validation of a shared decision-making instrument for health-related quality of life one year after total hip replacement based on quality registries data.

    PubMed

    Nemes, Szilard; Rolfson, Ola; Garellick, Göran

    2018-02-01

    Clinicians considering improvements in health-related quality of life (HRQoL) after total hip replacement (THR) must account for multiple pieces of information. Evidence-based decisions are important to best assess the effect of THR on HRQoL. This work aims at constructing a shared decision-making tool that helps clinicians assessing the future benefits of THR by offering predictions of 1-year postoperative HRQoL of THR patients. We used data from the Swedish Hip Arthroplasty Register. Data from 2008 were used as training set and data from 2009 to 2012 as validation set. We adopted two approaches. First, we assumed a continuous distribution for the EQ-5D index and modelled the postoperative EQ-5D index with regression models. Second, we modelled the five dimensions of the EQ-5D and weighted together the predictions using the UK Time Trade-Off value set. As predictors, we used preoperative EQ-5D dimensions and the EQ-5D index, EQ visual analogue scale, visual analogue scale pain, Charnley classification, age, gender, body mass index, American Society of Anesthesiologists, surgical approach and prosthesis type. Additionally, the tested algorithms were combined in a single predictive tool by stacking. Best predictive power was obtained by the multivariate adaptive regression splines (R 2  = 0.158). However, this was not significantly better than the predictive power of linear regressions (R 2  = 0.157). The stacked model had a predictive power of 17%. Successful implementation of a shared decision-making tool that can aid clinicians and patients in understanding expected improvement in HRQoL following THR would require higher predictive power than we achieved. For a shared decision-making tool to succeed, further variables, such as socioeconomics, need to be considered. © 2016 John Wiley & Sons, Ltd.

  11. Identifying indicators of illegal behaviour: carnivore killing in human-managed landscapes.

    PubMed

    St John, Freya A V; Keane, Aidan M; Edwards-Jones, Gareth; Jones, Lauren; Yarnell, Richard W; Jones, Julia P G

    2012-02-22

    Managing natural resources often depends on influencing people's behaviour, however effectively targeting interventions to discourage environmentally harmful behaviours is challenging because those involved may be unwilling to identify themselves. Non-sensitive indicators of sensitive behaviours are therefore needed. Previous studies have investigated people's attitudes, assuming attitudes reflect behaviour. There has also been interest in using people's estimates of the proportion of their peers involved in sensitive behaviours to identify those involved, since people tend to assume that others behave like themselves. However, there has been little attempt to test the potential of such indicators. We use the randomized response technique (RRT), designed for investigating sensitive behaviours, to estimate the proportion of farmers in north-eastern South Africa killing carnivores, and use a modified logistic regression model to explore relationships between our best estimates of true behaviour (from RRT) and our proposed non-sensitive indicators (including farmers' attitudes, and estimates of peer-behaviour). Farmers' attitudes towards carnivores, question sensitivity and estimates of peers' behaviour, predict the likelihood of farmers killing carnivores. Attitude and estimates of peer-behaviour are useful indicators of involvement in illicit behaviours and may be used to identify groups of people to engage in interventions aimed at changing behaviour.

  12. Marginalized zero-altered models for longitudinal count data.

    PubMed

    Tabb, Loni Philip; Tchetgen, Eric J Tchetgen; Wellenius, Greg A; Coull, Brent A

    2016-10-01

    Count data often exhibit more zeros than predicted by common count distributions like the Poisson or negative binomial. In recent years, there has been considerable interest in methods for analyzing zero-inflated count data in longitudinal or other correlated data settings. A common approach has been to extend zero-inflated Poisson models to include random effects that account for correlation among observations. However, these models have been shown to have a few drawbacks, including interpretability of regression coefficients and numerical instability of fitting algorithms even when the data arise from the assumed model. To address these issues, we propose a model that parameterizes the marginal associations between the count outcome and the covariates as easily interpretable log relative rates, while including random effects to account for correlation among observations. One of the main advantages of this marginal model is that it allows a basis upon which we can directly compare the performance of standard methods that ignore zero inflation with that of a method that explicitly takes zero inflation into account. We present simulations of these various model formulations in terms of bias and variance estimation. Finally, we apply the proposed approach to analyze toxicological data of the effect of emissions on cardiac arrhythmias.

  13. Marginalized zero-altered models for longitudinal count data

    PubMed Central

    Tabb, Loni Philip; Tchetgen, Eric J. Tchetgen; Wellenius, Greg A.; Coull, Brent A.

    2015-01-01

    Count data often exhibit more zeros than predicted by common count distributions like the Poisson or negative binomial. In recent years, there has been considerable interest in methods for analyzing zero-inflated count data in longitudinal or other correlated data settings. A common approach has been to extend zero-inflated Poisson models to include random effects that account for correlation among observations. However, these models have been shown to have a few drawbacks, including interpretability of regression coefficients and numerical instability of fitting algorithms even when the data arise from the assumed model. To address these issues, we propose a model that parameterizes the marginal associations between the count outcome and the covariates as easily interpretable log relative rates, while including random effects to account for correlation among observations. One of the main advantages of this marginal model is that it allows a basis upon which we can directly compare the performance of standard methods that ignore zero inflation with that of a method that explicitly takes zero inflation into account. We present simulations of these various model formulations in terms of bias and variance estimation. Finally, we apply the proposed approach to analyze toxicological data of the effect of emissions on cardiac arrhythmias. PMID:27867423

  14. Comparative study of outcome measures and analysis methods for traumatic brain injury trials.

    PubMed

    Alali, Aziz S; Vavrek, Darcy; Barber, Jason; Dikmen, Sureyya; Nathens, Avery B; Temkin, Nancy R

    2015-04-15

    Batteries of functional and cognitive measures have been proposed as alternatives to the Extended Glasgow Outcome Scale (GOSE) as the primary outcome for traumatic brain injury (TBI) trials. We evaluated several approaches to analyzing GOSE and a battery of four functional and cognitive measures. Using data from a randomized trial, we created a "super" dataset of 16,550 subjects from patients with complete data (n=331) and then simulated multiple treatment effects across multiple outcome measures. Patients were sampled with replacement (bootstrapping) to generate 10,000 samples for each treatment effect (n=400 patients/group). The percentage of samples where the null hypothesis was rejected estimates the power. All analytic techniques had appropriate rates of type I error (≤5%). Accounting for baseline prognosis either by using sliding dichotomy for GOSE or using regression-based methods substantially increased the power over the corresponding analysis without accounting for prognosis. Analyzing GOSE using multivariate proportional odds regression or analyzing the four-outcome battery with regression-based adjustments had the highest power, assuming equal treatment effect across all components. Analyzing GOSE using a fixed dichotomy provided the lowest power for both unadjusted and regression-adjusted analyses. We assumed an equal treatment effect for all measures. This may not be true in an actual clinical trial. Accounting for baseline prognosis is critical to attaining high power in Phase III TBI trials. The choice of primary outcome for future trials should be guided by power, the domain of brain function that an intervention is likely to impact, and the feasibility of collecting outcome data.

  15. Comparative Study of Outcome Measures and Analysis Methods for Traumatic Brain Injury Trials

    PubMed Central

    Alali, Aziz S.; Vavrek, Darcy; Barber, Jason; Dikmen, Sureyya; Nathens, Avery B.

    2015-01-01

    Abstract Batteries of functional and cognitive measures have been proposed as alternatives to the Extended Glasgow Outcome Scale (GOSE) as the primary outcome for traumatic brain injury (TBI) trials. We evaluated several approaches to analyzing GOSE and a battery of four functional and cognitive measures. Using data from a randomized trial, we created a “super” dataset of 16,550 subjects from patients with complete data (n=331) and then simulated multiple treatment effects across multiple outcome measures. Patients were sampled with replacement (bootstrapping) to generate 10,000 samples for each treatment effect (n=400 patients/group). The percentage of samples where the null hypothesis was rejected estimates the power. All analytic techniques had appropriate rates of type I error (≤5%). Accounting for baseline prognosis either by using sliding dichotomy for GOSE or using regression-based methods substantially increased the power over the corresponding analysis without accounting for prognosis. Analyzing GOSE using multivariate proportional odds regression or analyzing the four-outcome battery with regression-based adjustments had the highest power, assuming equal treatment effect across all components. Analyzing GOSE using a fixed dichotomy provided the lowest power for both unadjusted and regression-adjusted analyses. We assumed an equal treatment effect for all measures. This may not be true in an actual clinical trial. Accounting for baseline prognosis is critical to attaining high power in Phase III TBI trials. The choice of primary outcome for future trials should be guided by power, the domain of brain function that an intervention is likely to impact, and the feasibility of collecting outcome data. PMID:25317951

  16. Indirect Validation of Probe Speed Data on Arterial Corridors

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Eshragh, Sepideh; Young, Stanley E.; Sharifi, Elham

    This study aimed to estimate the accuracy of probe speed data on arterial corridors on the basis of roadway geometric attributes and functional classification. It was assumed that functional class (medium and low) along with other road characteristics (such as weighted average of the annual average daily traffic, average signal density, average access point density, and average speed) were available as correlation factors to estimate the accuracy of probe traffic data. This study tested these factors as predictors of the fidelity of probe traffic data by using the results of an extensive validation exercise. This study showed strong correlations betweenmore » these geometric attributes and the accuracy of probe data when they were assessed by using average absolute speed error. Linear models were regressed to existing data to estimate appropriate models for medium- and low-type arterial corridors. The proposed models for medium- and low-type arterials were validated further on the basis of the results of a slowdown analysis. These models can be used to predict the accuracy of probe data indirectly in medium and low types of arterial corridors.« less

  17. Global distribution of urban parameters derived from high-resolution global datasets for weather modelling

    NASA Astrophysics Data System (ADS)

    Kawano, N.; Varquez, A. C. G.; Dong, Y.; Kanda, M.

    2016-12-01

    Numerical model such as Weather Research and Forecasting model coupled with single-layer Urban Canopy Model (WRF-UCM) is one of the powerful tools to investigate urban heat island. Urban parameters such as average building height (Have), plain area index (λp) and frontal area index (λf), are necessary inputs for the model. In general, these parameters are uniformly assumed in WRF-UCM but this leads to unrealistic urban representation. Distributed urban parameters can also be incorporated into WRF-UCM to consider a detail urban effect. The problem is that distributed building information is not readily available for most megacities especially in developing countries. Furthermore, acquiring real building parameters often require huge amount of time and money. In this study, we investigated the potential of using globally available satellite-captured datasets for the estimation of the parameters, Have, λp, and λf. Global datasets comprised of high spatial resolution population dataset (LandScan by Oak Ridge National Laboratory), nighttime lights (NOAA), and vegetation fraction (NASA). True samples of Have, λp, and λf were acquired from actual building footprints from satellite images and 3D building database of Tokyo, New York, Paris, Melbourne, Istanbul, Jakarta and so on. Regression equations were then derived from the block-averaging of spatial pairs of real parameters and global datasets. Results show that two regression curves to estimate Have and λf from the combination of population and nightlight are necessary depending on the city's level of development. An index which can be used to decide which equation to use for a city is the Gross Domestic Product (GDP). On the other hand, λphas less dependence on GDP but indicated a negative relationship to vegetation fraction. Finally, a simplified but precise approximation of urban parameters through readily-available, high-resolution global datasets and our derived regressions can be utilized to estimate a global distribution of urban parameters for later incorporation into a weather model, thus allowing us to acquire a global understanding of urban climate (Global Urban Climatology). Acknowledgment: This research was supported by the Environment Research and Technology Development Fund (S-14) of the Ministry of the Environment, Japan.

  18. Obesity and severe obesity forecasts through 2030.

    PubMed

    Finkelstein, Eric A; Khavjou, Olga A; Thompson, Hope; Trogdon, Justin G; Pan, Liping; Sherry, Bettylou; Dietz, William

    2012-06-01

    Previous efforts to forecast future trends in obesity applied linear forecasts assuming that the rise in obesity would continue unabated. However, evidence suggests that obesity prevalence may be leveling off. This study presents estimates of adult obesity and severe obesity prevalence through 2030 based on nonlinear regression models. The forecasted results are then used to simulate the savings that could be achieved through modestly successful obesity prevention efforts. The study was conducted in 2009-2010 and used data from the 1990 through 2008 Behavioral Risk Factor Surveillance System (BRFSS). The analysis sample included nonpregnant adults aged ≥ 18 years. The individual-level BRFSS variables were supplemented with state-level variables from the U.S. Bureau of Labor Statistics, the American Chamber of Commerce Research Association, and the Census of Retail Trade. Future obesity and severe obesity prevalence were estimated through regression modeling by projecting trends in explanatory variables expected to influence obesity prevalence. Linear time trend forecasts suggest that by 2030, 51% of the population will be obese. The model estimates a much lower obesity prevalence of 42% and severe obesity prevalence of 11%. If obesity were to remain at 2010 levels, the combined savings in medical expenditures over the next 2 decades would be $549.5 billion. The study estimates a 33% increase in obesity prevalence and a 130% increase in severe obesity prevalence over the next 2 decades. If these forecasts prove accurate, this will further hinder efforts for healthcare cost containment. Copyright © 2012 Elsevier Inc. All rights reserved.

  19. Robust stochastic resonance: Signal detection and adaptation in impulsive noise

    NASA Astrophysics Data System (ADS)

    Kosko, Bart; Mitaim, Sanya

    2001-11-01

    Stochastic resonance (SR) occurs when noise improves a system performance measure such as a spectral signal-to-noise ratio or a cross-correlation measure. All SR studies have assumed that the forcing noise has finite variance. Most have further assumed that the noise is Gaussian. We show that SR still occurs for the more general case of impulsive or infinite-variance noise. The SR effect fades as the noise grows more impulsive. We study this fading effect on the family of symmetric α-stable bell curves that includes the Gaussian bell curve as a special case. These bell curves have thicker tails as the parameter α falls from 2 (the Gaussian case) to 1 (the Cauchy case) to even lower values. Thicker tails create more frequent and more violent noise impulses. The main feedback and feedforward models in the SR literature show this fading SR effect for periodic forcing signals when we plot either the signal-to-noise ratio or a signal correlation measure against the dispersion of the α-stable noise. Linear regression shows that an exponential law γopt(α)=cAα describes this relation between the impulsive index α and the SR-optimal noise dispersion γopt. The results show that SR is robust against noise ``outliers.'' So SR may be more widespread in nature than previously believed. Such robustness also favors the use of SR in engineering systems. We further show that an adaptive system can learn the optimal noise dispersion for two standard SR models (the quartic bistable model and the FitzHugh-Nagumo neuron model) for the signal-to-noise ratio performance measure. This also favors practical applications of SR and suggests that evolution may have tuned the noise-sensitive parameters of biological systems.

  20. Modelling of the acid-base properties of natural and synthetic adsorbent materials used for heavy metal removal from aqueous solutions.

    PubMed

    Pagnanelli, Francesca; Vegliò, Francesco; Toro, Luigi

    2004-02-01

    In this paper a comparison about kinetic behaviour, acid-base properties and copper removal capacities was carried out between two different adsorbent materials used for heavy metal removal from aqueous solutions: an aminodiacetic chelating resin as commercial product (Lewatit TP207) and a lyophilised bacterial biomass of Sphaerotilus natans. The acid-base properties of a S. natans cell suspension were well described by simplified mechanistic models without electrostatic corrections considering two kinds of weakly acidic active sites. In particular the introduction of two-peak distribution function for the proton affinity constants allows a better representation of the experimental data reproducing the site heterogeneity. A priori knowledge about resin functional groups (aminodiacetic groups) is the base for preliminary simulations of titration curve assuming a Donnan gel structure for the resin phase considered as a concentrated aqueous solution of aminodiacetic acid (ADA). Departures from experimental and simulated data can be interpreted by considering the heterogeneity of the functional groups and the effect of ionic concentration in the resin phase. Two-site continuous model describes adequately the experimental data. Moreover the values of apparent protonation constants (as adjustable parameters found by non-linear regression) are very near to the apparent constants evaluated by a Donnan model assuming the intrinsic constants in resin phase equal to the equilibrium constants in aqueous solution of ADA and considering the amphoteric nature of active sites for the evaluation of counter-ion concentration in the resin phase. Copper removal outlined the strong affinity of the active groups of the resin for this ion in solution compared to the S. natans biomass according to the complexation constants between aminodiacetic and mono-carboxylic groups and copper ions.

  1. Analysis of Point Based Image Registration Errors With Applications in Single Molecule Microscopy

    PubMed Central

    Cohen, E. A. K.; Ober, R. J.

    2014-01-01

    We present an asymptotic treatment of errors involved in point-based image registration where control point (CP) localization is subject to heteroscedastic noise; a suitable model for image registration in fluorescence microscopy. Assuming an affine transform, CPs are used to solve a multivariate regression problem. With measurement errors existing for both sets of CPs this is an errors-in-variable problem and linear least squares is inappropriate; the correct method being generalized least squares. To allow for point dependent errors the equivalence of a generalized maximum likelihood and heteroscedastic generalized least squares model is achieved allowing previously published asymptotic results to be extended to image registration. For a particularly useful model of heteroscedastic noise where covariance matrices are scalar multiples of a known matrix (including the case where covariance matrices are multiples of the identity) we provide closed form solutions to estimators and derive their distribution. We consider the target registration error (TRE) and define a new measure called the localization registration error (LRE) believed to be useful, especially in microscopy registration experiments. Assuming Gaussianity of the CP localization errors, it is shown that the asymptotic distribution for the TRE and LRE are themselves Gaussian and the parameterized distributions are derived. Results are successfully applied to registration in single molecule microscopy to derive the key dependence of the TRE and LRE variance on the number of CPs and their associated photon counts. Simulations show asymptotic results are robust for low CP numbers and non-Gaussianity. The method presented here is shown to outperform GLS on real imaging data. PMID:24634573

  2. Sources of Sodium in the Lunar Exosphere: Modeling Using Ground-Based Observations of Sodium Emission and Spacecraft Data of the Plasma

    NASA Technical Reports Server (NTRS)

    Sarantos, Menelaos; Killen, Rosemary M.; Sharma, A. Surjalal; Slavin, James A.

    2009-01-01

    Observations of the equatorial lunar sodium emission are examined to quantify the effect of precipitating ions on source rates for the Moon's exospheric volatile species. Using a model of exospheric sodium transport under lunar gravity forces, the measured emission intensity is normalized to a constant lunar phase angle to minimize the effect of different viewing geometries. Daily averages of the solar Lyman alpha flux and ion flux are used as the input variables for photon-stimulated desorption (PSD) and ion sputtering, respectively, while impact vaporization due to the micrometeoritic influx is assumed constant. Additionally, a proxy term proportional to both the Lyman alpha and to the ion flux is introduced to assess the importance of ion-enhanced diffusion and/or chemical sputtering. The combination of particle transport and constrained regression models demonstrates that, assuming sputtering yields that are typical of protons incident on lunar soils, the primary effect of ion impact on the surface of the Moon is not direct sputtering but rather an enhancement of the PSD efficiency. It is inferred that the ion-induced effects must double the PSD efficiency for flux typical of the solar wind at 1 AU. The enhancement in relative efficiency of PSD due to the bombardment of the lunar surface by the plasma sheet ions during passages through the Earth's magnetotail is shown to be approximately two times higher than when it is due to solar wind ions. This leads to the conclusion that the priming of the surface is more efficiently carried out by the energetic plasma sheet ions.

  3. Analysis of interacting entropy-corrected holographic and new agegraphic dark energies

    NASA Astrophysics Data System (ADS)

    Ranjit, Chayan; Debnath, Ujjal

    In the present work, we assume the flat FRW model of the universe is filled with dark matter and dark energy where they are interacting. For dark energy model, we consider the entropy-corrected HDE (ECHDE) model and the entropy-corrected NADE (ECNADE). For entropy-corrected models, we assume logarithmic correction and power law correction. For ECHDE model, length scale L is assumed to be Hubble horizon and future event horizon. The ωde-ωde‧ analysis for our different horizons are discussed.

  4. Gross regional domestic product estimation: Application of two-way unbalanced panel data models to economic growth in East Nusa Tenggara province

    NASA Astrophysics Data System (ADS)

    Wibowo, Wahyu; Sinu, Elisabeth B.; Setiawan

    2017-03-01

    The condition of East Nusa Tenggara Province which recently developed new districts can affect the number of information or data collected become unbalanced. One of the consequences of ignoring the data incompleteness is the estimator become not valid. Therefore, the analysis of unbalanced panel data is very crucial.The aim of this paper is to find the estimation of Gross Regional Domestic Product in East Nusa Tenggara Province using unbalanced panel data regression model for two-way error component which assume random effect model (REM). In this research, we employ Feasible Generalized Least Squares (FGLS) as regression coefficients estimation method. Since variance of the model is unknown, ANOVA method is considered to obtain the variance components in order to construct the variance-covariance matrix. The data used in this research is secondary data taken from Central Bureau of Statistics of East Nusa Tenggara Province in 21 districts period 2004-2013. The predictors are the number of labor over 15 years old (X1), electrification ratios (X2), and local revenues (X3) while Gross Regional Domestic Product based on constant price 2000 is the response (Y). The FGLS estimation result shows that the value of R2 is 80,539% and all the predictors chosen are significantly affect (α = 5%) the Gross Regional Domestic Product in all district of East Nusa Tenggara Province. Those variables are the number of labor over 15 years old (X1), electrification ratios (X2), and local revenues (X3) with 0,22986, 0,090476, and 0,14749 of elasticities, respectively.

  5. Semiparametric temporal process regression of survival-out-of-hospital.

    PubMed

    Zhan, Tianyu; Schaubel, Douglas E

    2018-05-23

    The recurrent/terminal event data structure has undergone considerable methodological development in the last 10-15 years. An example of the data structure that has arisen with increasing frequency involves the recurrent event being hospitalization and the terminal event being death. We consider the response Survival-Out-of-Hospital, defined as a temporal process (indicator function) taking the value 1 when the subject is currently alive and not hospitalized, and 0 otherwise. Survival-Out-of-Hospital is a useful alternative strategy for the analysis of hospitalization/survival in the chronic disease setting, with the response variate representing a refinement to survival time through the incorporation of an objective quality-of-life component. The semiparametric model we consider assumes multiplicative covariate effects and leaves unspecified the baseline probability of being alive-and-out-of-hospital. Using zero-mean estimating equations, the proposed regression parameter estimator can be computed without estimating the unspecified baseline probability process, although baseline probabilities can subsequently be estimated for any time point within the support of the censoring distribution. We demonstrate that the regression parameter estimator is asymptotically normal, and that the baseline probability function estimator converges to a Gaussian process. Simulation studies are performed to show that our estimating procedures have satisfactory finite sample performances. The proposed methods are applied to the Dialysis Outcomes and Practice Patterns Study (DOPPS), an international end-stage renal disease study.

  6. Employer-Sponsored Insurance, Health Care Cost Growth, and the Economic Performance of U.S. Industries

    PubMed Central

    Sood, Neeraj; Ghosh, Arkadipta; Escarce, José J

    2009-01-01

    Objective To estimate the effect of growth in health care costs that outpaces gross domestic product (GDP) growth (“excess” growth in health care costs) on employment, gross output, and value added to GDP of U.S. industries. Study Setting We analyzed data from 38 U.S. industries for the period 1987–2005. All data are publicly available from various government agencies. Study Design We estimated bivariate and multivariate regressions. To develop the regression models, we assumed that rapid growth in health care costs has a larger effect on economic performance for industries where large percentages of workers receive employer-sponsored health insurance (ESI). We used the estimated regression coefficients to simulate economic outcomes under alternative scenarios of health care cost inflation. Results Faster growth in health care costs had greater adverse effects on economic outcomes for industries with larger percentages of workers who had ESI. We found that a 10 percent increase in excess growth in health care costs would have resulted in 120,803 fewer jobs, US$28,022 million in lost gross output, and US$14,082 million in lost value added in 2005. These declines represent 0.17 to 0.18 percent of employment, gross output, and value added in 2005. Conclusion Excess growth in health care costs is adversely affecting the economic performance of U.S. industries. PMID:19500165

  7. Global Food Demand Scenarios for the 21st Century

    PubMed Central

    Biewald, Anne; Weindl, Isabelle; Popp, Alexander; Lotze-Campen, Hermann

    2015-01-01

    Long-term food demand scenarios are an important tool for studying global food security and for analysing the environmental impacts of agriculture. We provide a simple and transparent method to create scenarios for future plant-based and animal-based calorie demand, using time-dependent regression models between calorie demand and income. The scenarios can be customized to a specific storyline by using different input data for gross domestic product (GDP) and population projections and by assuming different functional forms of the regressions. Our results confirm that total calorie demand increases with income, but we also found a non-income related positive time-trend. The share of animal-based calories is estimated to rise strongly with income for low-income groups. For high income groups, two ambiguous relations between income and the share of animal-based products are consistent with historical data: First, a positive relation with a strong negative time-trend and second a negative relation with a slight negative time-trend. The fits of our regressions are highly significant and our results compare well to other food demand estimates. The method is exemplarily used to construct four food demand scenarios until the year 2100 based on the storylines of the IPCC Special Report on Emissions Scenarios (SRES). We find in all scenarios a strong increase of global food demand until 2050 with an increasing share of animal-based products, especially in developing countries. PMID:26536124

  8. Global Food Demand Scenarios for the 21st Century.

    PubMed

    Bodirsky, Benjamin Leon; Rolinski, Susanne; Biewald, Anne; Weindl, Isabelle; Popp, Alexander; Lotze-Campen, Hermann

    2015-01-01

    Long-term food demand scenarios are an important tool for studying global food security and for analysing the environmental impacts of agriculture. We provide a simple and transparent method to create scenarios for future plant-based and animal-based calorie demand, using time-dependent regression models between calorie demand and income. The scenarios can be customized to a specific storyline by using different input data for gross domestic product (GDP) and population projections and by assuming different functional forms of the regressions. Our results confirm that total calorie demand increases with income, but we also found a non-income related positive time-trend. The share of animal-based calories is estimated to rise strongly with income for low-income groups. For high income groups, two ambiguous relations between income and the share of animal-based products are consistent with historical data: First, a positive relation with a strong negative time-trend and second a negative relation with a slight negative time-trend. The fits of our regressions are highly significant and our results compare well to other food demand estimates. The method is exemplarily used to construct four food demand scenarios until the year 2100 based on the storylines of the IPCC Special Report on Emissions Scenarios (SRES). We find in all scenarios a strong increase of global food demand until 2050 with an increasing share of animal-based products, especially in developing countries.

  9. Outliers: A Potential Data Problem.

    ERIC Educational Resources Information Center

    Douzenis, Cordelia; Rakow, Ernest A.

    Outliers, extreme data values relative to others in a sample, may distort statistics that assume internal levels of measurement and normal distribution. The outlier may be a valid value or an error. Several procedures are available for identifying outliers, and each may be applied to errors of prediction from the regression lines for utility in a…

  10. Probabilistic Low-Rank Multitask Learning.

    PubMed

    Kong, Yu; Shao, Ming; Li, Kang; Fu, Yun

    2018-03-01

    In this paper, we consider the problem of learning multiple related tasks simultaneously with the goal of improving the generalization performance of individual tasks. The key challenge is to effectively exploit the shared information across multiple tasks as well as preserve the discriminative information for each individual task. To address this, we propose a novel probabilistic model for multitask learning (MTL) that can automatically balance between low-rank and sparsity constraints. The former assumes a low-rank structure of the underlying predictive hypothesis space to explicitly capture the relationship of different tasks and the latter learns the incoherent sparse patterns private to each task. We derive and perform inference via variational Bayesian methods. Experimental results on both regression and classification tasks on real-world applications demonstrate the effectiveness of the proposed method in dealing with the MTL problems.

  11. Modeling Array Stations in SIG-VISA

    NASA Astrophysics Data System (ADS)

    Ding, N.; Moore, D.; Russell, S.

    2013-12-01

    We add support for array stations to SIG-VISA, a system for nuclear monitoring using probabilistic inference on seismic signals. Array stations comprise a large portion of the IMS network; they can provide increased sensitivity and more accurate directional information compared to single-component stations. Our existing model assumed that signals were independent at each station, which is false when lots of stations are close together, as in an array. The new model removes that assumption by jointly modeling signals across array elements. This is done by extending our existing Gaussian process (GP) regression models, also known as kriging, from a 3-dimensional single-component space of events to a 6-dimensional space of station-event pairs. For each array and each event attribute (including coda decay, coda height, amplitude transfer and travel time), we model the joint distribution across array elements using a Gaussian process that learns the correlation lengthscale across the array, thereby incorporating information of array stations into the probabilistic inference framework. To evaluate the effectiveness of our model, we perform ';probabilistic beamforming' on new events using our GP model, i.e., we compute the event azimuth having highest posterior probability under the model, conditioned on the signals at array elements. We compare the results from our probabilistic inference model to the beamforming currently performed by IMS station processing.

  12. Comparison of two landslide susceptibility assessments in the Champagne-Ardenne region (France)

    NASA Astrophysics Data System (ADS)

    Den Eeckhaut, M. Van; Marre, A.; Poesen, J.

    2010-02-01

    The vineyards of the Montagne de Reims are mostly planted on steep south-oriented cuesta fronts receiving a maximum of sun radiation. Due to the location of the vineyards on steep hillslopes, the viticultural activity is threatened by slope failures. This study attempts to better understand the spatial patterns of landslide susceptibility in the Champagne-Ardenne region by comparing a heuristic (qualitative) and a statistical (quantitative) model in a 1120 km² study area. The heuristic landslide susceptibility model was adopted from the Bureau de Recherches Géologiques et Minières, the GEGEAA - Reims University and the Comité Interprofessionnel du Vin de Champagne. In this model, expert knowledge of the region was used to assign weights to all slope classes and lithologies present in the area, but the final susceptibility map was never evaluated with the location of mapped landslides. For the statistical landslide susceptibility assessment, logistic regression was applied to a dataset of 291 'old' (Holocene) landslides. The robustness of the logistic regression model was evaluated and ROC curves were used for model calibration and validation. With regard to the variables assumed to be important environmental factors controlling landslides, the two models are in agreement. They both indicate that present and future landslides are mainly controlled by slope gradient and lithology. However, the comparison of the two landslide susceptibility maps through (1) an evaluation with the location of mapped 'old' landslides and through (2) a temporal validation with spatial data of 'recent' (1960-1999; n = 48) and 'very recent' (2000-2008; n = 46) landslides showed a better prediction capacity for the statistical model produced in this study compared to the heuristic model. In total, the statistically-derived landslide susceptibility map succeeded in correctly classifying 81.0% of the 'old' and 91.6% of the 'recent' and 'very recent' landslides. On the susceptibility map derived from the heuristic model, on the other hand, only 54.6% of the 'old' and 64.0% of the 'recent' and 'very recent' landslides were correctly classified as unstable. Hence, the landslide susceptibility map obtained from logistic regression is a better tool for regional landslide susceptibility analysis in the study area of the Montagne de Reims. The accurate classification of zones with very high and high susceptibility allows delineating zones where viticulturists should be informed and where implementation of precaution measures is needed to secure slope stability.

  13. A computational model for biosonar echoes from foliage

    PubMed Central

    Gupta, Anupam Kumar; Lu, Ruijin; Zhu, Hongxiao

    2017-01-01

    Since many bat species thrive in densely vegetated habitats, echoes from foliage are likely to be of prime importance to the animals’ sensory ecology, be it as clutter that masks prey echoes or as sources of information about the environment. To better understand the characteristics of foliage echoes, a new model for the process that generates these signals has been developed. This model takes leaf size and orientation into account by representing the leaves as circular disks of varying diameter. The two added leaf parameters are of potential importance to the sensory ecology of bats, e.g., with respect to landmark recognition and flight guidance along vegetation contours. The full model is specified by a total of three parameters: leaf density, average leaf size, and average leaf orientation. It assumes that all leaf parameters are independently and identically distributed. Leaf positions were drawn from a uniform probability density function, sizes and orientations each from a Gaussian probability function. The model was found to reproduce the first-order amplitude statistics of measured example echoes and showed time-variant echo properties that depended on foliage parameters. Parameter estimation experiments using lasso regression have demonstrated that a single foliage parameter can be estimated with high accuracy if the other two parameters are known a priori. If only one parameter is known a priori, the other two can still be estimated, but with a reduced accuracy. Lasso regression did not support simultaneous estimation of all three parameters. Nevertheless, these results demonstrate that foliage echoes contain accessible information on foliage type and orientation that could play a role in supporting sensory tasks such as landmark identification and contour following in echolocating bats. PMID:28817631

  14. A computational model for biosonar echoes from foliage.

    PubMed

    Ming, Chen; Gupta, Anupam Kumar; Lu, Ruijin; Zhu, Hongxiao; Müller, Rolf

    2017-01-01

    Since many bat species thrive in densely vegetated habitats, echoes from foliage are likely to be of prime importance to the animals' sensory ecology, be it as clutter that masks prey echoes or as sources of information about the environment. To better understand the characteristics of foliage echoes, a new model for the process that generates these signals has been developed. This model takes leaf size and orientation into account by representing the leaves as circular disks of varying diameter. The two added leaf parameters are of potential importance to the sensory ecology of bats, e.g., with respect to landmark recognition and flight guidance along vegetation contours. The full model is specified by a total of three parameters: leaf density, average leaf size, and average leaf orientation. It assumes that all leaf parameters are independently and identically distributed. Leaf positions were drawn from a uniform probability density function, sizes and orientations each from a Gaussian probability function. The model was found to reproduce the first-order amplitude statistics of measured example echoes and showed time-variant echo properties that depended on foliage parameters. Parameter estimation experiments using lasso regression have demonstrated that a single foliage parameter can be estimated with high accuracy if the other two parameters are known a priori. If only one parameter is known a priori, the other two can still be estimated, but with a reduced accuracy. Lasso regression did not support simultaneous estimation of all three parameters. Nevertheless, these results demonstrate that foliage echoes contain accessible information on foliage type and orientation that could play a role in supporting sensory tasks such as landmark identification and contour following in echolocating bats.

  15. Paleoclimatic significance of δD and δ13C values in pinon pine needles from packrat middens spanning the last 40,000 years

    USGS Publications Warehouse

    Pendall, Elise; Betancourt, Julio L.; Leavitt, Steven W.

    1999-01-01

    We compared two approaches to interpreting δD of cellulose nitrate in piñon pine needles (Pinus edulis) preserved in packrat middens from central New Mexico, USA. One approach was based on linear regression between modern δD values and climate parameters, and the other on a deterministic isotope model, modified from Craig and Gordon's terminal lake evaporation model that assumes steady-state conditions and constant isotope effects. One such effect, the net biochemical fractionation factor, was determined for a new species, piñon pine. Regressions showed that δD values in cellulose nitrate from annual cohorts of needles (1989–1996) were strongly correlated with growing season (May–August) precipitation amount, and δ13C values in the same samples were correlated with June relative humidity. The deterministic model reconstructed δD values of meteoric water used by plants after constraining relative humidity effects with δ13C values; growing season temperatures were estimated via modern correlations with δD values of meteoric water. Variations of this modeling approach have been applied to tree-ring cellulose before, but not to macrofossil cellulose, and comparisons to empirical relationships have not been provided. Results from fossil piñon needles spanning the last ∼40,000 years showed no significant trend in δD values of cellulose nitrate, suggesting either no change in the amount of summer precipitation (based on the transfer function) or δD values of meteoric water or temperature (based on the deterministic model). However, there were significant differences in δ13C values, and therefore relative humidity, between Pleistocene and Holocene.

  16. Smooth individual level covariates adjustment in disease mapping.

    PubMed

    Huque, Md Hamidul; Anderson, Craig; Walton, Richard; Woolford, Samuel; Ryan, Louise

    2018-05-01

    Spatial models for disease mapping should ideally account for covariates measured both at individual and area levels. The newly available "indiCAR" model fits the popular conditional autoregresssive (CAR) model by accommodating both individual and group level covariates while adjusting for spatial correlation in the disease rates. This algorithm has been shown to be effective but assumes log-linear associations between individual level covariates and outcome. In many studies, the relationship between individual level covariates and the outcome may be non-log-linear, and methods to track such nonlinearity between individual level covariate and outcome in spatial regression modeling are not well developed. In this paper, we propose a new algorithm, smooth-indiCAR, to fit an extension to the popular conditional autoregresssive model that can accommodate both linear and nonlinear individual level covariate effects while adjusting for group level covariates and spatial correlation in the disease rates. In this formulation, the effect of a continuous individual level covariate is accommodated via penalized splines. We describe a two-step estimation procedure to obtain reliable estimates of individual and group level covariate effects where both individual and group level covariate effects are estimated separately. This distributed computing framework enhances its application in the Big Data domain with a large number of individual/group level covariates. We evaluate the performance of smooth-indiCAR through simulation. Our results indicate that the smooth-indiCAR method provides reliable estimates of all regression and random effect parameters. We illustrate our proposed methodology with an analysis of data on neutropenia admissions in New South Wales (NSW), Australia. © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  17. Estimation of the Characterized Tsunami Source Model considering the Complicated Shape of Tsunami Source by Using the observed waveforms of GPS Buoys in the Nankai Trough

    NASA Astrophysics Data System (ADS)

    Seto, S.; Takahashi, T.

    2017-12-01

    In the 2011 Tohoku earthquake tsunami disaster, the delay of understanding damage situation increased the human damage. To solve this problem, it is important to search the severe damaged areas. The tsunami numerical modeling is useful to estimate damages and the accuracy of simulation depends on the tsunami source. Seto and Takahashi (2017) proposed a method to estimate the characterized tsunami source model by using the limited observed data of GPS buoys. The model consists of Large slip zone (LSZ), Super large slip zone (SLSZ) and background rupture zone (BZ) as the Cabinet Office, Government of Japan (below COGJ) reported after the Tohoku tsunami. At the beginning of this method, the rectangular fault model is assumed based on the seismic magnitude and hypocenter reported right after an earthquake. By using the fault model, tsunami propagation is simulated numerically, and the fault model is improved after comparing the computed data with the observed data repeatedly. In the comparison, correlation coefficient and regression coefficient are used as indexes. They are calculated with the observed and the computed tsunami wave profiles. This repetition is conducted to get the two coefficients close to 1.0, which makes the precise of the fault model higher. However, it was indicated as the improvement that the model did not examine a complicated shape of tsunami source. In this study, we proposed an improved model to examine the complicated shape. COGJ(2012) assumed that possible tsunami source region in the Nankai trough consisted of the several thousands small faults. And, we use these small faults to estimate the targeted tsunami source in this model. Therefore, we can estimate the complicated tsunami source by using these small faults. The estimation of BZ is carried out as a first step, and LSZ and SLSZ are estimated next as same as the previous model. The proposed model by using GPS buoy was applied for a tsunami scenario in the Nankai Trough. As a result, the final estimated location of LSZ and SLSZ in BZ are estimated well.

  18. Effect of misspecification of gene frequency on the two-point LOD score.

    PubMed

    Pal, D K; Durner, M; Greenberg, D A

    2001-11-01

    In this study, we used computer simulation of simple and complex models to ask: (1) What is the penalty in evidence for linkage when the assumed gene frequency is far from the true gene frequency? (2) If the assumed model for gene frequency and inheritance are misspecified in the analysis, can this lead to a higher maximum LOD score than that obtained under the true parameters? Linkage data simulated under simple dominant, recessive, dominant and recessive with reduced penetrance, and additive models, were analysed assuming a single locus with both the correct and incorrect dominance model and assuming a range of different gene frequencies. We found that misspecifying the analysis gene frequency led to little penalty in maximum LOD score in all models examined, especially if the assumed gene frequency was lower than the generating one. Analysing linkage data assuming a gene frequency of the order of 0.01 for a dominant gene, and 0.1 for a recessive gene, appears to be a reasonable tactic in the majority of realistic situations because underestimating the gene frequency, even when the true gene frequency is high, leads to little penalty in the LOD score.

  19. Children Exposed to Intimate Partner Violence: Identifying Differential Effects of Family Environment on Children’s Trauma and Psychopathology Symptoms through Regression Mixture Models

    PubMed Central

    McDonald, Shelby Elaine; Shin, Sunny; Corona, Rosalie; Maternick, Anna; Graham-Bermann, Sandra A.; Ascione, Frank R.; Williams, James Herbert

    2016-01-01

    The majority of analytic approaches aimed at understanding the influence of environmental context on children’s socioemotional adjustment assume comparable effects of contextual risk and protective factors for all children. Using self-reported data from 289 maternal caregiver-child dyads, we examined the degree to which there are differential effects of severity of intimate partner violence (IPV) exposure, yearly household income, and number of children in the family on posttraumatic stress symptoms (PTS) and psychopathology symptoms (i.e., internalizing and externalizing problems) among school-age children between the ages of 7 to 12 years. A regression mixture model identified three latent classes that were primarily distinguished by differential effects of IPV exposure severity on PTS and psychopathology symptoms: (1) asymptomatic with low sensitivity to environmental factors (66% of children), (2) maladjusted with moderate sensitivity (24%), and (3) highly maladjusted with high sensitivity (10%). Children with mothers who had higher levels of education were more likely to be in the maladjusted with moderate sensitivity group than the asymptomatic with low sensitivity group. Latino children were less likely to be in both maladjusted groups compared to the asymptomatic group. Overall, the findings suggest differential effects of family environmental factors on PTS and psychopathology symptoms among children exposed to IPV. Implications for research and practice are discussed. PMID:27337691

  20. Depth and Medium-Scale Spatial Processes Influence Fish Assemblage Structure of Unconsolidated Habitats in a Subtropical Marine Park

    PubMed Central

    Schultz, Arthur L.; Malcolm, Hamish A.; Bucher, Daniel J.; Linklater, Michelle; Smith, Stephen D. A.

    2014-01-01

    Where biological datasets are spatially limited, abiotic surrogates have been advocated to inform objective planning for Marine Protected Areas. However, this approach assumes close correlation between abiotic and biotic patterns. The Solitary Islands Marine Park, northern NSW, Australia, currently uses a habitat classification system (HCS) to assist with planning, but this is based only on data for reefs. We used Baited Remote Underwater Videos (BRUVs) to survey fish assemblages of unconsolidated substrata at different depths, distances from shore, and across an along-shore spatial scale of 10 s of km (2 transects) to examine how well the HCS works for this dominant habitat. We used multivariate regression modelling to examine the importance of these, and other environmental factors (backscatter intensity, fine-scale bathymetric variation and rugosity), in structuring fish assemblages. There were significant differences in fish assemblages across depths, distance from shore, and over the medium spatial scale of the study: together, these factors generated the optimum model in multivariate regression. However, marginal tests suggested that backscatter intensity, which itself is a surrogate for sediment type and hardness, might also influence fish assemblages and needs further investigation. Species richness was significantly different across all factors: however, total MaxN only differed significantly between locations. This study demonstrates that the pre-existing abiotic HCS only partially represents the range of fish assemblages of unconsolidated habitats in the region. PMID:24824998

  1. Assessing mediation using marginal structural models in the presence of confounding and moderation

    PubMed Central

    Coffman, Donna L.; Zhong, Wei

    2012-01-01

    This paper presents marginal structural models (MSMs) with inverse propensity weighting (IPW) for assessing mediation. Generally, individuals are not randomly assigned to levels of the mediator. Therefore, confounders of the mediator and outcome may exist that limit causal inferences, a goal of mediation analysis. Either regression adjustment or IPW can be used to take confounding into account, but IPW has several advantages. Regression adjustment of even one confounder of the mediator and outcome that has been influenced by treatment results in biased estimates of the direct effect (i.e., the effect of treatment on the outcome that does not go through the mediator). One advantage of IPW is that it can properly adjust for this type of confounding, assuming there are no unmeasured confounders. Further, we illustrate that IPW estimation provides unbiased estimates of all effects when there is a baseline moderator variable that interacts with the treatment, when there is a baseline moderator variable that interacts with the mediator, and when the treatment interacts with the mediator. IPW estimation also provides unbiased estimates of all effects in the presence of non-randomized treatments. In addition, for testing mediation we propose a test of the null hypothesis of no mediation. Finally, we illustrate this approach with an empirical data set in which the mediator is continuous, as is often the case in psychological research. PMID:22905648

  2. Assessing mediation using marginal structural models in the presence of confounding and moderation.

    PubMed

    Coffman, Donna L; Zhong, Wei

    2012-12-01

    This article presents marginal structural models with inverse propensity weighting (IPW) for assessing mediation. Generally, individuals are not randomly assigned to levels of the mediator. Therefore, confounders of the mediator and outcome may exist that limit causal inferences, a goal of mediation analysis. Either regression adjustment or IPW can be used to take confounding into account, but IPW has several advantages. Regression adjustment of even one confounder of the mediator and outcome that has been influenced by treatment results in biased estimates of the direct effect (i.e., the effect of treatment on the outcome that does not go through the mediator). One advantage of IPW is that it can properly adjust for this type of confounding, assuming there are no unmeasured confounders. Further, we illustrate that IPW estimation provides unbiased estimates of all effects when there is a baseline moderator variable that interacts with the treatment, when there is a baseline moderator variable that interacts with the mediator, and when the treatment interacts with the mediator. IPW estimation also provides unbiased estimates of all effects in the presence of nonrandomized treatments. In addition, for testing mediation we propose a test of the null hypothesis of no mediation. Finally, we illustrate this approach with an empirical data set in which the mediator is continuous, as is often the case in psychological research. PsycINFO Database Record (c) 2013 APA, all rights reserved.

  3. Children exposed to intimate partner violence: Identifying differential effects of family environment on children's trauma and psychopathology symptoms through regression mixture models.

    PubMed

    McDonald, Shelby Elaine; Shin, Sunny; Corona, Rosalie; Maternick, Anna; Graham-Bermann, Sandra A; Ascione, Frank R; Herbert Williams, James

    2016-08-01

    The majority of analytic approaches aimed at understanding the influence of environmental context on children's socioemotional adjustment assume comparable effects of contextual risk and protective factors for all children. Using self-reported data from 289 maternal caregiver-child dyads, we examined the degree to which there are differential effects of severity of intimate partner violence (IPV) exposure, yearly household income, and number of children in the family on posttraumatic stress symptoms (PTS) and psychopathology symptoms (i.e., internalizing and externalizing problems) among school-age children between the ages of 7-12 years. A regression mixture model identified three latent classes that were primarily distinguished by differential effects of IPV exposure severity on PTS and psychopathology symptoms: (1) asymptomatic with low sensitivity to environmental factors (66% of children), (2) maladjusted with moderate sensitivity (24%), and (3) highly maladjusted with high sensitivity (10%). Children with mothers who had higher levels of education were more likely to be in the maladjusted with moderate sensitivity group than the asymptomatic with low sensitivity group. Latino children were less likely to be in both maladjusted groups compared to the asymptomatic group. Overall, the findings suggest differential effects of family environmental factors on PTS and psychopathology symptoms among children exposed to IPV. Implications for research and practice are discussed. Copyright © 2016 Elsevier Ltd. All rights reserved.

  4. Iodine intake by adult residents of a farming area in Iwate Prefecture, Japan, and the accuracy of estimated iodine intake calculated using the Standard Tables of Food Composition in Japan.

    PubMed

    Nakatsuka, Haruo; Chiba, Keiko; Watanabe, Takao; Sawatari, Hideyuki; Seki, Takako

    2016-11-01

    Iodine intake by adults in farming districts in Northeastern Japan was evaluated by two methods: (1) government-approved food composition tables based calculation and (2) instrumental measurement. The correlation between these two values and a regression model for the calibration of calculated values was presented. Iodine intake was calculated, using the values in the Japan Standard Tables of Food Composition (FCT), through the analysis of duplicate samples of complete 24-h food consumption for 90 adult subjects. In cases where the value for iodine content was not available in the FCT, it was assumed to be zero for that food item (calculated values). Iodine content was also measured by ICP-MS (measured values). Calculated and measured values rendered geometric means (GM) of 336 and 279 μg/day, respectively. There was no statistically significant (p > 0.05) difference between calculated and measured values. The correlation coefficient was 0.646 (p < 0.05). With this high correlation coefficient, a simple regression line can be applied to estimate measured value from calculated value. A survey of the literature suggests that the values in this study were similar to values that have been reported to date for Japan, and higher than those for other countries in Asia. Iodine intake of Japanese adults was 336 μg/day (GM, calculated) and 279 μg/day (GM, measured). Both values correlated so well, with a correlation coefficient of 0.646, that a regression model (Y = 130.8 + 1.9479X, where X and Y are measured and calculated values, respectively) could be used to calibrate calculated values.

  5. Comparison of statistical tests for association between rare variants and binary traits.

    PubMed

    Bacanu, Silviu-Alin; Nelson, Matthew R; Whittaker, John C

    2012-01-01

    Genome-wide association studies have found thousands of common genetic variants associated with a wide variety of diseases and other complex traits. However, a large portion of the predicted genetic contribution to many traits remains unknown. One plausible explanation is that some of the missing variation is due to the effects of rare variants. Nonetheless, the statistical analysis of rare variants is challenging. A commonly used method is to contrast, within the same region (gene), the frequency of minor alleles at rare variants between cases and controls. However, this strategy is most useful under the assumption that the tested variants have similar effects. We previously proposed a method that can accommodate heterogeneous effects in the analysis of quantitative traits. Here we extend this method to include binary traits that can accommodate covariates. We use simulations for a variety of causal and covariate impact scenarios to compare the performance of the proposed method to standard logistic regression, C-alpha, SKAT, and EREC. We found that i) logistic regression methods perform well when the heterogeneity of the effects is not extreme and ii) SKAT and EREC have good performance under all tested scenarios but they can be computationally intensive. Consequently, it would be more computationally desirable to use a two-step strategy by (i) selecting promising genes by faster methods and ii) analyzing selected genes using SKAT/EREC. To select promising genes one can use (1) regression methods when effect heterogeneity is assumed to be low and the covariates explain a non-negligible part of trait variability, (2) C-alpha when heterogeneity is assumed to be large and covariates explain a small fraction of trait's variability and (3) the proposed trend and heterogeneity test when the heterogeneity is assumed to be non-trivial and the covariates explain a large fraction of trait variability.

  6. Evaluation of regression and neural network models for solar forecasting over different short-term horizons

    DOE PAGES

    Inanlouganji, Alireza; Reddy, T. Agami; Katipamula, Srinivas

    2018-04-13

    Forecasting solar irradiation has acquired immense importance in view of the exponential increase in the number of solar photovoltaic (PV) system installations. In this article, analyses results involving statistical and machine-learning techniques to predict solar irradiation for different forecasting horizons are reported. Yearlong typical meteorological year 3 (TMY3) datasets from three cities in the United States with different climatic conditions have been used in this analysis. A simple forecast approach that assumes consecutive days to be identical serves as a baseline model to compare forecasting alternatives. To account for seasonal variability and to capture short-term fluctuations, different variants of themore » lagged moving average (LMX) model with cloud cover as the input variable are evaluated. Finally, the proposed LMX model is evaluated against an artificial neural network (ANN) model. How the one-hour and 24-hour models can be used in conjunction to predict different short-term rolling horizons is discussed, and this joint application is illustrated for a four-hour rolling horizon forecast scheme. Lastly, the effect of using predicted cloud cover values, instead of measured ones, on the accuracy of the models is assessed. Results show that LMX models do not degrade in forecast accuracy if models are trained with the forecast cloud cover data.« less

  7. Evaluation of regression and neural network models for solar forecasting over different short-term horizons

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Inanlouganji, Alireza; Reddy, T. Agami; Katipamula, Srinivas

    Forecasting solar irradiation has acquired immense importance in view of the exponential increase in the number of solar photovoltaic (PV) system installations. In this article, analyses results involving statistical and machine-learning techniques to predict solar irradiation for different forecasting horizons are reported. Yearlong typical meteorological year 3 (TMY3) datasets from three cities in the United States with different climatic conditions have been used in this analysis. A simple forecast approach that assumes consecutive days to be identical serves as a baseline model to compare forecasting alternatives. To account for seasonal variability and to capture short-term fluctuations, different variants of themore » lagged moving average (LMX) model with cloud cover as the input variable are evaluated. Finally, the proposed LMX model is evaluated against an artificial neural network (ANN) model. How the one-hour and 24-hour models can be used in conjunction to predict different short-term rolling horizons is discussed, and this joint application is illustrated for a four-hour rolling horizon forecast scheme. Lastly, the effect of using predicted cloud cover values, instead of measured ones, on the accuracy of the models is assessed. Results show that LMX models do not degrade in forecast accuracy if models are trained with the forecast cloud cover data.« less

  8. The Trend Odds Model for Ordinal Data‡

    PubMed Central

    Capuano, Ana W.; Dawson, Jeffrey D.

    2013-01-01

    Ordinal data appear in a wide variety of scientific fields. These data are often analyzed using ordinal logistic regression models that assume proportional odds. When this assumption is not met, it may be possible to capture the lack of proportionality using a constrained structural relationship between the odds and the cut-points of the ordinal values (Peterson and Harrell, 1990). We consider a trend odds version of this constrained model, where the odds parameter increases or decreases in a monotonic manner across the cut-points. We demonstrate algebraically and graphically how this model is related to latent logistic, normal, and exponential distributions. In particular, we find that scale changes in these potential latent distributions are consistent with the trend odds assumption, with the logistic and exponential distributions having odds that increase in a linear or nearly linear fashion. We show how to fit this model using SAS Proc Nlmixed, and perform simulations under proportional odds and trend odds processes. We find that the added complexity of the trend odds model gives improved power over the proportional odds model when there are moderate to severe departures from proportionality. A hypothetical dataset is used to illustrate the interpretation of the trend odds model, and we apply this model to a Swine Influenza example where the proportional odds assumption appears to be violated. PMID:23225520

  9. The trend odds model for ordinal data.

    PubMed

    Capuano, Ana W; Dawson, Jeffrey D

    2013-06-15

    Ordinal data appear in a wide variety of scientific fields. These data are often analyzed using ordinal logistic regression models that assume proportional odds. When this assumption is not met, it may be possible to capture the lack of proportionality using a constrained structural relationship between the odds and the cut-points of the ordinal values. We consider a trend odds version of this constrained model, wherein the odds parameter increases or decreases in a monotonic manner across the cut-points. We demonstrate algebraically and graphically how this model is related to latent logistic, normal, and exponential distributions. In particular, we find that scale changes in these potential latent distributions are consistent with the trend odds assumption, with the logistic and exponential distributions having odds that increase in a linear or nearly linear fashion. We show how to fit this model using SAS Proc NLMIXED and perform simulations under proportional odds and trend odds processes. We find that the added complexity of the trend odds model gives improved power over the proportional odds model when there are moderate to severe departures from proportionality. A hypothetical data set is used to illustrate the interpretation of the trend odds model, and we apply this model to a swine influenza example wherein the proportional odds assumption appears to be violated. Copyright © 2012 John Wiley & Sons, Ltd.

  10. Analysis of regression methods for solar activity forecasting

    NASA Technical Reports Server (NTRS)

    Lundquist, C. A.; Vaughan, W. W.

    1979-01-01

    The paper deals with the potential use of the most recent solar data to project trends in the next few years. Assuming that a mode of solar influence on weather can be identified, advantageous use of that knowledge presumably depends on estimating future solar activity. A frequently used technique for solar cycle predictions is a linear regression procedure along the lines formulated by McNish and Lincoln (1949). The paper presents a sensitivity analysis of the behavior of such regression methods relative to the following aspects: cycle minimum, time into cycle, composition of historical data base, and unnormalized vs. normalized solar cycle data. Comparative solar cycle forecasts for several past cycles are presented as to these aspects of the input data. Implications for the current cycle, No. 21, are also given.

  11. Gender-Dependent Association of FTO Polymorphisms with Body Mass Index in Mexicans

    PubMed Central

    Saldaña-Alvarez, Yolanda; Salas-Martínez, María Guadalupe; García-Ortiz, Humberto; Luckie-Duque, Angélica; García-Cárdenas, Gustavo; Vicenteño-Ayala, Hermenegildo; Cordova, Emilio J.; Esparza-Aguilar, Marcelino; Contreras-Cubas, Cecilia; Carnevale, Alessandra; Chávez-Saldaña, Margarita; Orozco, Lorena

    2016-01-01

    To evaluate the associations between six single-nucleotide polymorphisms (SNPs) in intron 1 of FTO and body mass index (BMI), a case-control association study of 2314 unrelated Mexican-Mestizo adult subjects was performed. The association between each SNP and BMI was tested using logistic and linear regression adjusted for age, gender, and ancestry and assuming additive, recessive, and dominant effects of the minor allele. Association analysis after BMI stratification showed that all five FTO SNPs (rs1121980, rs17817449, rs3751812, rs9930506, and rs17817449), were significantly associated with obesity class II/III under an additive model (P<0.05). Interestingly, we also documented a genetic model-dependent influence of gender on the effect of FTO variants on increased BMI. Two SNPs were specifically associated in males under a dominant model, while the remainder were associated with females under additive and recessive models (P<0.05). The SNP rs9930506 showed the highest increased in obesity risk in females (odds ratio = 4.4). Linear regression using BMI as a continuous trait also revealed differential FTO SNP contributions. Homozygous individuals for the risk alleles of rs17817449, rs3751812, and rs9930506 were on average 2.18 kg/m2 heavier than homozygous for the wild-type alleles; rs1121980 and rs8044769 showed significant but less-strong effects on BMI (1.54 kg/m2 and 0.9 kg/m2, respectively). Remarkably, rs9930506 also exhibited positive interactions with age and BMI in a gender-dependent manner. Women carrying the minor allele of this variant have a significant increase in BMI by year (0.42 kg/m2, P = 1.17 x 10−10). Linear regression haplotype analysis under an additive model, confirmed that the TGTGC haplotype harboring all five minor alleles, increased the BMI of carriers by 2.36 kg/m2 (P = 1.15 x 10−5). Our data suggest that FTO SNPs make differential contributions to obesity risk and support the hypothesis that gender differences in the mechanisms involving these variants may contribute to disease development. PMID:26726774

  12. Levels of naturally occurring gamma radiation measured in British homes and their prediction in particular residences.

    PubMed

    Kendall, G M; Wakeford, R; Athanson, M; Vincent, T J; Carter, E J; McColl, N P; Little, M P

    2016-03-01

    Gamma radiation from natural sources (including directly ionising cosmic rays) is an important component of background radiation. In the present paper, indoor measurements of naturally occurring gamma rays that were undertaken as part of the UK Childhood Cancer Study are summarised, and it is shown that these are broadly compatible with an earlier UK National Survey. The distribution of indoor gamma-ray dose rates in Great Britain is approximately normal with mean 96 nGy/h and standard deviation 23 nGy/h. Directly ionising cosmic rays contribute about one-third of the total. The expanded dataset allows a more detailed description than previously of indoor gamma-ray exposures and in particular their geographical variation. Various strategies for predicting indoor natural background gamma-ray dose rates were explored. In the first of these, a geostatistical model was fitted, which assumes an underlying geologically determined spatial variation, superimposed on which is a Gaussian stochastic process with Matérn correlation structure that models the observed tendency of dose rates in neighbouring houses to correlate. In the second approach, a number of dose-rate interpolation measures were first derived, based on averages over geologically or administratively defined areas or using distance-weighted averages of measurements at nearest-neighbour points. Linear regression was then used to derive an optimal linear combination of these interpolation measures. The predictive performances of the two models were compared via cross-validation, using a randomly selected 70 % of the data to fit the models and the remaining 30 % to test them. The mean square error (MSE) of the linear-regression model was lower than that of the Gaussian-Matérn model (MSE 378 and 411, respectively). The predictive performance of the two candidate models was also evaluated via simulation; the OLS model performs significantly better than the Gaussian-Matérn model.

  13. Dynamics of stream water TOC concentrations in a boreal headwater catchment: Controlling factors and implications for climate scenarios

    NASA Astrophysics Data System (ADS)

    Köhler, S. J.; Buffam, I.; Seibert, J.; Bishop, K. H.; Laudon, H.

    2009-06-01

    SummaryTwo different but complementary modelling approaches for reproducing the observed dynamics of total organic carbon (TOC) in a boreal stream are presented. One is based on a regression analysis, while the other is based on riparian soil conditions using a convolution of flow and concentration. Both approaches are relatively simple to establish and help to identify gaps in the process understanding of the TOC transport from soils to catchments runoff. The largest part of the temporal variation of stream TOC concentrations (4-46 mg L -1) in a forested headwater stream in the boreal zone in northern Sweden may be described using a four-parameter regression equation that has runoff and transformed air temperature as sole input variables. Runoff is assumed to be a proxy for soil wetness conditions and changing flow pathways which in turn caused most of the stream TOC variation. Temperature explained a significant part of the observed inter-annual variability. Long-term riparian hydrochemistry in soil solutions within 4 m of the stream also captures a surprisingly large part of the observed variation of stream TOC and highlights the importance of riparian soils. The riparian zone was used to reproduce stream TOC with the help of a convolution model based on flow and average riparian chemistry as input variables. There is a significant effect of wetting of the riparian soil that translates into a memory effect for subsequent episodes and thus contributes to controlling stream TOC concentrations. Situations with high flow introduce a large amount of variability into stream water TOC that may be related to memory effects, rapid groundwater fluctuations and other processes not identified so far. Two different climate scenarios for the region based on the IPCC scenarios were applied to the regression equation to test what effect the expected increase in precipitation and temperature and resulting changes in runoff would have on stream TOC concentrations assuming that the soil conditions remain unchanged. Both scenarios resulted in a mean increase of stream TOC concentrations of between 1.5 and 2.5 mg L -1 during the snow free season, which amounts to approximately 15% more TOC export compared to present conditions. Wetter and warmer conditions in the late autumn led to a difference of monthly average TOC of up to 5 mg L -1, suggesting that stream TOC may be particularly susceptible to climate variability during this season.

  14. Variable selection for distribution-free models for longitudinal zero-inflated count responses.

    PubMed

    Chen, Tian; Wu, Pan; Tang, Wan; Zhang, Hui; Feng, Changyong; Kowalski, Jeanne; Tu, Xin M

    2016-07-20

    Zero-inflated count outcomes arise quite often in research and practice. Parametric models such as the zero-inflated Poisson and zero-inflated negative binomial are widely used to model such responses. Like most parametric models, they are quite sensitive to departures from assumed distributions. Recently, new approaches have been proposed to provide distribution-free, or semi-parametric, alternatives. These methods extend the generalized estimating equations to provide robust inference for population mixtures defined by zero-inflated count outcomes. In this paper, we propose methods to extend smoothly clipped absolute deviation (SCAD)-based variable selection methods to these new models. Variable selection has been gaining popularity in modern clinical research studies, as determining differential treatment effects of interventions for different subgroups has become the norm, rather the exception, in the era of patent-centered outcome research. Such moderation analysis in general creates many explanatory variables in regression analysis, and the advantages of SCAD-based methods over their traditional counterparts render them a great choice for addressing this important and timely issues in clinical research. We illustrate the proposed approach with both simulated and real study data. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  15. Error-in-variables models in calibration

    NASA Astrophysics Data System (ADS)

    Lira, I.; Grientschnig, D.

    2017-12-01

    In many calibration operations, the stimuli applied to the measuring system or instrument under test are derived from measurement standards whose values may be considered to be perfectly known. In that case, it is assumed that calibration uncertainty arises solely from inexact measurement of the responses, from imperfect control of the calibration process and from the possible inaccuracy of the calibration model. However, the premise that the stimuli are completely known is never strictly fulfilled and in some instances it may be grossly inadequate. Then, error-in-variables (EIV) regression models have to be employed. In metrology, these models have been approached mostly from the frequentist perspective. In contrast, not much guidance is available on their Bayesian analysis. In this paper, we first present a brief summary of the conventional statistical techniques that have been developed to deal with EIV models in calibration. We then proceed to discuss the alternative Bayesian framework under some simplifying assumptions. Through a detailed example about the calibration of an instrument for measuring flow rates, we provide advice on how the user of the calibration function should employ the latter framework for inferring the stimulus acting on the calibrated device when, in use, a certain response is measured.

  16. Identification of chilling and heat requirements of cherry trees--a statistical approach.

    PubMed

    Luedeling, Eike; Kunz, Achim; Blanke, Michael M

    2013-09-01

    Most trees from temperate climates require the accumulation of winter chill and subsequent heat during their dormant phase to resume growth and initiate flowering in the following spring. Global warming could reduce chill and hence hamper the cultivation of high-chill species such as cherries. Yet determining chilling and heat requirements requires large-scale controlled-forcing experiments, and estimates are thus often unavailable. Where long-term phenology datasets exist, partial least squares (PLS) regression can be used as an alternative, to determine climatic requirements statistically. Bloom dates of cherry cv. 'Schneiders späte Knorpelkirsche' trees in Klein-Altendorf, Germany, from 24 growing seasons were correlated with 11-day running means of daily mean temperature. Based on the output of the PLS regression, five candidate chilling periods ranging in length from 17 to 102 days, and one forcing phase of 66 days were delineated. Among three common chill models used to quantify chill, the Dynamic Model showed the lowest variation in chill, indicating that it may be more accurate than the Utah and Chilling Hours Models. Based on the longest candidate chilling phase with the earliest starting date, cv. 'Schneiders späte Knorpelkirsche' cherries at Bonn exhibited a chilling requirement of 68.6 ± 5.7 chill portions (or 1,375 ± 178 chilling hours or 1,410 ± 238 Utah chill units) and a heat requirement of 3,473 ± 1,236 growing degree hours. Closer investigation of the distinct chilling phases detected by PLS regression could contribute to our understanding of dormancy processes and thus help fruit and nut growers identify suitable tree cultivars for a future in which static climatic conditions can no longer be assumed. All procedures used in this study were bundled in an R package ('chillR') and are provided as Supplementary materials. The procedure was also applied to leaf emergence dates of walnut (cv. 'Payne') at Davis, California.

  17. Identification of chilling and heat requirements of cherry trees—a statistical approach

    NASA Astrophysics Data System (ADS)

    Luedeling, Eike; Kunz, Achim; Blanke, Michael M.

    2013-09-01

    Most trees from temperate climates require the accumulation of winter chill and subsequent heat during their dormant phase to resume growth and initiate flowering in the following spring. Global warming could reduce chill and hence hamper the cultivation of high-chill species such as cherries. Yet determining chilling and heat requirements requires large-scale controlled-forcing experiments, and estimates are thus often unavailable. Where long-term phenology datasets exist, partial least squares (PLS) regression can be used as an alternative, to determine climatic requirements statistically. Bloom dates of cherry cv. `Schneiders späte Knorpelkirsche' trees in Klein-Altendorf, Germany, from 24 growing seasons were correlated with 11-day running means of daily mean temperature. Based on the output of the PLS regression, five candidate chilling periods ranging in length from 17 to 102 days, and one forcing phase of 66 days were delineated. Among three common chill models used to quantify chill, the Dynamic Model showed the lowest variation in chill, indicating that it may be more accurate than the Utah and Chilling Hours Models. Based on the longest candidate chilling phase with the earliest starting date, cv. `Schneiders späte Knorpelkirsche' cherries at Bonn exhibited a chilling requirement of 68.6 ± 5.7 chill portions (or 1,375 ± 178 chilling hours or 1,410 ± 238 Utah chill units) and a heat requirement of 3,473 ± 1,236 growing degree hours. Closer investigation of the distinct chilling phases detected by PLS regression could contribute to our understanding of dormancy processes and thus help fruit and nut growers identify suitable tree cultivars for a future in which static climatic conditions can no longer be assumed. All procedures used in this study were bundled in an R package (`chillR') and are provided as Supplementary materials. The procedure was also applied to leaf emergence dates of walnut (cv. `Payne') at Davis, California.

  18. Joint genome-wide prediction in several populations accounting for randomness of genotypes: A hierarchical Bayes approach. II: Multivariate spike and slab priors for marker effects and derivation of approximate Bayes and fractional Bayes factors for the complete family of models.

    PubMed

    Martínez, Carlos Alberto; Khare, Kshitij; Banerjee, Arunava; Elzo, Mauricio A

    2017-03-21

    This study corresponds to the second part of a companion paper devoted to the development of Bayesian multiple regression models accounting for randomness of genotypes in across population genome-wide prediction. This family of models considers heterogeneous and correlated marker effects and allelic frequencies across populations, and has the ability of considering records from non-genotyped individuals and individuals with missing genotypes in any subset of loci without the need for previous imputation, taking into account uncertainty about imputed genotypes. This paper extends this family of models by considering multivariate spike and slab conditional priors for marker allele substitution effects and contains derivations of approximate Bayes factors and fractional Bayes factors to compare models from part I and those developed here with their null versions. These null versions correspond to simpler models ignoring heterogeneity of populations, but still accounting for randomness of genotypes. For each marker loci, the spike component of priors corresponded to point mass at 0 in R S , where S is the number of populations, and the slab component was a S-variate Gaussian distribution, independent conditional priors were assumed. For the Gaussian components, covariance matrices were assumed to be either the same for all markers or different for each marker. For null models, the priors were simply univariate versions of these finite mixture distributions. Approximate algebraic expressions for Bayes factors and fractional Bayes factors were found using the Laplace approximation. Using the simulated datasets described in part I, these models were implemented and compared with models derived in part I using measures of predictive performance based on squared Pearson correlations, Deviance Information Criterion, Bayes factors, and fractional Bayes factors. The extensions presented here enlarge our family of genome-wide prediction models making it more flexible in the sense that it now offers more modeling options. Copyright © 2017 Elsevier Ltd. All rights reserved.

  19. Estimating peer density effects on oral health for community-based older adults.

    PubMed

    Chakraborty, Bibhas; Widener, Michael J; Mirzaei Salehabadi, Sedigheh; Northridge, Mary E; Kum, Susan S; Jin, Zhu; Kunzel, Carol; Palmer, Harvey D; Metcalf, Sara S

    2017-12-29

    As part of a long-standing line of research regarding how peer density affects health, researchers have sought to understand the multifaceted ways that the density of contemporaries living and interacting in proximity to one another influence social networks and knowledge diffusion, and subsequently health and well-being. This study examined peer density effects on oral health for racial/ethnic minority older adults living in northern Manhattan and the Bronx, New York, NY. Peer age-group density was estimated by smoothing US Census data with 4 kernel bandwidths ranging from 0.25 to 1.50 mile. Logistic regression models were developed using these spatial measures and data from the ElderSmile oral and general health screening program that serves predominantly racial/ethnic minority older adults at community centers in northern Manhattan and the Bronx. The oral health outcomes modeled as dependent variables were ordinal dentition status and binary self-rated oral health. After construction of kernel density surfaces and multiple imputation of missing data, logistic regression analyses were performed to estimate the effects of peer density and other sociodemographic characteristics on the oral health outcomes of dentition status and self-rated oral health. Overall, higher peer density was associated with better oral health for older adults when estimated using smaller bandwidths (0.25 and 0.50 mile). That is, statistically significant relationships (p < 0.01) between peer density and improved dentition status were found when peer density was measured assuming a more local social network. As with dentition status, a positive significant association was found between peer density and fair or better self-rated oral health when peer density was measured assuming a more local social network. This study provides novel evidence that the oral health of community-based older adults is affected by peer density in an urban environment. To the extent that peer density signifies the potential for social interaction and support, the positive significant effects of peer density on improved oral health point to the importance of place in promoting social interaction as a component of healthy aging. Proximity to peers and their knowledge of local resources may facilitate utilization of community-based oral health care.

  20. A flexible cure rate model for spatially correlated survival data based on generalized extreme value distribution and Gaussian process priors.

    PubMed

    Li, Dan; Wang, Xia; Dey, Dipak K

    2016-09-01

    Our present work proposes a new survival model in a Bayesian context to analyze right-censored survival data for populations with a surviving fraction, assuming that the log failure time follows a generalized extreme value distribution. Many applications require a more flexible modeling of covariate information than a simple linear or parametric form for all covariate effects. It is also necessary to include the spatial variation in the model, since it is sometimes unexplained by the covariates considered in the analysis. Therefore, the nonlinear covariate effects and the spatial effects are incorporated into the systematic component of our model. Gaussian processes (GPs) provide a natural framework for modeling potentially nonlinear relationship and have recently become extremely powerful in nonlinear regression. Our proposed model adopts a semiparametric Bayesian approach by imposing a GP prior on the nonlinear structure of continuous covariate. With the consideration of data availability and computational complexity, the conditionally autoregressive distribution is placed on the region-specific frailties to handle spatial correlation. The flexibility and gains of our proposed model are illustrated through analyses of simulated data examples as well as a dataset involving a colon cancer clinical trial from the state of Iowa. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  1. Bayesian Hierarchical Random Intercept Model Based on Three Parameter Gamma Distribution

    NASA Astrophysics Data System (ADS)

    Wirawati, Ika; Iriawan, Nur; Irhamah

    2017-06-01

    Hierarchical data structures are common throughout many areas of research. Beforehand, the existence of this type of data was less noticed in the analysis. The appropriate statistical analysis to handle this type of data is the hierarchical linear model (HLM). This article will focus only on random intercept model (RIM), as a subclass of HLM. This model assumes that the intercept of models in the lowest level are varied among those models, and their slopes are fixed. The differences of intercepts were suspected affected by some variables in the upper level. These intercepts, therefore, are regressed against those upper level variables as predictors. The purpose of this paper would demonstrate a proven work of the proposed two level RIM of the modeling on per capita household expenditure in Maluku Utara, which has five characteristics in the first level and three characteristics of districts/cities in the second level. The per capita household expenditure data in the first level were captured by the three parameters Gamma distribution. The model, therefore, would be more complex due to interaction of many parameters for representing the hierarchical structure and distribution pattern of the data. To simplify the estimation processes of parameters, the computational Bayesian method couple with Markov Chain Monte Carlo (MCMC) algorithm and its Gibbs Sampling are employed.

  2. Exploring the spatially varying innovation capacity of the US counties in the framework of Griliches' knowledge production function: a mixed GWR approach

    NASA Astrophysics Data System (ADS)

    Kang, Dongwoo; Dall'erba, Sandy

    2016-04-01

    Griliches' knowledge production function has been increasingly adopted at the regional level where location-specific conditions drive the spatial differences in knowledge creation dynamics. However, the large majority of such studies rely on a traditional regression approach that assumes spatially homogenous marginal effects of knowledge input factors. This paper extends the authors' previous work (Kang and Dall'erba in Int Reg Sci Rev, 2015. doi: 10.1177/0160017615572888) to investigate the spatial heterogeneity in the marginal effects by using nonparametric local modeling approaches such as geographically weighted regression (GWR) and mixed GWR with two distinct samples of the US Metropolitan Statistical Area (MSA) and non-MSA counties. The results indicate a high degree of spatial heterogeneity in the marginal effects of the knowledge input variables, more specifically for the local and distant spillovers of private knowledge measured across MSA counties. On the other hand, local academic knowledge spillovers are found to display spatially homogenous elasticities in both MSA and non-MSA counties. Our results highlight the strengths and weaknesses of each county's innovation capacity and suggest policy implications for regional innovation strategies.

  3. Does context matter for the relationship between deprivation and all-cause mortality? The West vs. the rest of Scotland

    PubMed Central

    2011-01-01

    Background A growing body of research emphasizes the importance of contextual factors on health outcomes. Using postcode sector data for Scotland (UK), this study tests the hypothesis of spatial heterogeneity in the relationship between area-level deprivation and mortality to determine if contextual differences in the West vs. the rest of Scotland influence this relationship. Research into health inequalities frequently fails to recognise spatial heterogeneity in the deprivation-health relationship, assuming that global relationships apply uniformly across geographical areas. In this study, exploratory spatial data analysis methods are used to assess local patterns in deprivation and mortality. Spatial regression models are then implemented to examine the relationship between deprivation and mortality more formally. Results The initial exploratory spatial data analysis reveals concentrations of high standardized mortality ratios (SMR) and deprivation (hotspots) in the West of Scotland and concentrations of low values (coldspots) for both variables in the rest of the country. The main spatial regression result is that deprivation is the only variable that is highly significantly correlated with all-cause mortality in all models. However, in contrast to the expected spatial heterogeneity in the deprivation-mortality relationship, this relation does not vary between regions in any of the models. This result is robust to a number of specifications, including weighting for population size, controlling for spatial autocorrelation and heteroskedasticity, assuming a non-linear relationship between mortality and socio-economic deprivation, separating the dependent variable into male and female SMRs, and distinguishing between West, North and Southeast regions. The rejection of the hypothesis of spatial heterogeneity in the relationship between socio-economic deprivation and mortality complements prior research on the stability of the deprivation-mortality relationship over time. Conclusions The homogeneity we found in the deprivation-mortality relationship across the regions of Scotland and the absence of a contextualized effect of region highlights the importance of taking a broader strategic policy that can combat the toxic impacts of socio-economic deprivation on health. Focusing on a few specific places (e.g. 15% of the poorest areas) to concentrate resources might be a good start but the impact of socio-economic deprivation on mortality is not restricted to a few places. A comprehensive strategy that can be sustained over time might be needed to interrupt the linkages between poverty and mortality. PMID:21569408

  4. ANCOVA Versus CHANGE From Baseline in Nonrandomized Studies: The Difference.

    PubMed

    van Breukelen, Gerard J P

    2013-11-01

    The pretest-posttest control group design can be analyzed with the posttest as dependent variable and the pretest as covariate (ANCOVA) or with the difference between posttest and pretest as dependent variable (CHANGE). These 2 methods can give contradictory results if groups differ at pretest, a phenomenon that is known as Lord's paradox. Literature claims that ANCOVA is preferable if treatment assignment is based on randomization or on the pretest and questionable for preexisting groups. Some literature suggests that Lord's paradox has to do with measurement error in the pretest. This article shows two new things: First, the claims are confirmed by proving the mathematical equivalence of ANCOVA to a repeated measures model without group effect at pretest. Second, correction for measurement error in the pretest is shown to lead back to ANCOVA or to CHANGE, depending on the assumed absence or presence of a true group difference at pretest. These two new theoretical results are illustrated with multilevel (mixed) regression and structural equation modeling of data from two studies.

  5. Robust estimation for partially linear models with large-dimensional covariates

    PubMed Central

    Zhu, LiPing; Li, RunZe; Cui, HengJian

    2014-01-01

    We are concerned with robust estimation procedures to estimate the parameters in partially linear models with large-dimensional covariates. To enhance the interpretability, we suggest implementing a noncon-cave regularization method in the robust estimation procedure to select important covariates from the linear component. We establish the consistency for both the linear and the nonlinear components when the covariate dimension diverges at the rate of o(n), where n is the sample size. We show that the robust estimate of linear component performs asymptotically as well as its oracle counterpart which assumes the baseline function and the unimportant covariates were known a priori. With a consistent estimator of the linear component, we estimate the nonparametric component by a robust local linear regression. It is proved that the robust estimate of nonlinear component performs asymptotically as well as if the linear component were known in advance. Comprehensive simulation studies are carried out and an application is presented to examine the finite-sample performance of the proposed procedures. PMID:24955087

  6. Cosmogenic 26Al/10Be surface production ratio in Greenland

    NASA Astrophysics Data System (ADS)

    Corbett, Lee B.; Bierman, Paul R.; Rood, Dylan H.; Caffee, Marc W.; Lifton, Nathaniel A.; Woodruff, Thomas E.

    2017-02-01

    The assumed value for the cosmogenic 26Al/10Be surface production rate ratio in quartz is an important parameter for studies investigating the burial or subaerial erosion of long-lived surfaces and sediments. Recent models and data suggest that the production ratio is spatially variable and may be greater than originally thought. Here we present measured 26Al/10Be ratios for 24 continuously exposed bedrock and boulder surfaces spanning 61-77°N in Greenland. Empirical measurements, such as ours, include nuclides produced predominately by neutron-induced spallation with percent-level contributions by muon interactions. The slope of a York regression line fit to our data is 7.3 ± 0.3 (1σ), suggesting that the 26Al/10Be surface production ratio exceeds the commonly used value of 6.75, at least in the Arctic. A higher 26Al/10Be production ratio has implications for multinuclide cosmogenic isotope studies because it results in greater modeled burial durations and erosion rates.

  7. Robust estimation for partially linear models with large-dimensional covariates.

    PubMed

    Zhu, LiPing; Li, RunZe; Cui, HengJian

    2013-10-01

    We are concerned with robust estimation procedures to estimate the parameters in partially linear models with large-dimensional covariates. To enhance the interpretability, we suggest implementing a noncon-cave regularization method in the robust estimation procedure to select important covariates from the linear component. We establish the consistency for both the linear and the nonlinear components when the covariate dimension diverges at the rate of [Formula: see text], where n is the sample size. We show that the robust estimate of linear component performs asymptotically as well as its oracle counterpart which assumes the baseline function and the unimportant covariates were known a priori. With a consistent estimator of the linear component, we estimate the nonparametric component by a robust local linear regression. It is proved that the robust estimate of nonlinear component performs asymptotically as well as if the linear component were known in advance. Comprehensive simulation studies are carried out and an application is presented to examine the finite-sample performance of the proposed procedures.

  8. The impact of covariance misspecification in multivariate Gaussian mixtures on estimation and inference: an application to longitudinal modeling.

    PubMed

    Heggeseth, Brianna C; Jewell, Nicholas P

    2013-07-20

    Multivariate Gaussian mixtures are a class of models that provide a flexible parametric approach for the representation of heterogeneous multivariate outcomes. When the outcome is a vector of repeated measurements taken on the same subject, there is often inherent dependence between observations. However, a common covariance assumption is conditional independence-that is, given the mixture component label, the outcomes for subjects are independent. In this paper, we study, through asymptotic bias calculations and simulation, the impact of covariance misspecification in multivariate Gaussian mixtures. Although maximum likelihood estimators of regression and mixing probability parameters are not consistent under misspecification, they have little asymptotic bias when mixture components are well separated or if the assumed correlation is close to the truth even when the covariance is misspecified. We also present a robust standard error estimator and show that it outperforms conventional estimators in simulations and can indicate that the model is misspecified. Body mass index data from a national longitudinal study are used to demonstrate the effects of misspecification on potential inferences made in practice. Copyright © 2013 John Wiley & Sons, Ltd.

  9. Model-Free Feature Screening for Ultrahigh Dimensional Discriminant Analysis

    PubMed Central

    Cui, Hengjian; Li, Runze

    2014-01-01

    This work is concerned with marginal sure independence feature screening for ultra-high dimensional discriminant analysis. The response variable is categorical in discriminant analysis. This enables us to use conditional distribution function to construct a new index for feature screening. In this paper, we propose a marginal feature screening procedure based on empirical conditional distribution function. We establish the sure screening and ranking consistency properties for the proposed procedure without assuming any moment condition on the predictors. The proposed procedure enjoys several appealing merits. First, it is model-free in that its implementation does not require specification of a regression model. Second, it is robust to heavy-tailed distributions of predictors and the presence of potential outliers. Third, it allows the categorical response having a diverging number of classes in the order of O(nκ) with some κ ≥ 0. We assess the finite sample property of the proposed procedure by Monte Carlo simulation studies and numerical comparison. We further illustrate the proposed methodology by empirical analyses of two real-life data sets. PMID:26392643

  10. The Impact of Uncertain Physical Parameters on HVAC Demand Response

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sun, Yannan; Elizondo, Marcelo A.; Lu, Shuai

    HVAC units are currently one of the major resources providing demand response (DR) in residential buildings. Models of HVAC with DR function can improve understanding of its impact on power system operations and facilitate the deployment of DR technologies. This paper investigates the importance of various physical parameters and their distributions to the HVAC response to DR signals, which is a key step to the construction of HVAC models for a population of units with insufficient data. These parameters include the size of floors, insulation efficiency, the amount of solid mass in the house, and efficiency of the HVAC units.more » These parameters are usually assumed to follow Gaussian or Uniform distributions. We study the effect of uncertainty in the chosen parameter distributions on the aggregate HVAC response to DR signals, during transient phase and in steady state. We use a quasi-Monte Carlo sampling method with linear regression and Prony analysis to evaluate sensitivity of DR output to the uncertainty in the distribution parameters. The significance ranking on the uncertainty sources is given for future guidance in the modeling of HVAC demand response.« less

  11. Estimating surface pCO2 in the northern Gulf of Mexico: Which remote sensing model to use?

    NASA Astrophysics Data System (ADS)

    Chen, Shuangling; Hu, Chuanmin; Cai, Wei-Jun; Yang, Bo

    2017-12-01

    Various approaches and models have been proposed to remotely estimate surface pCO2 in the ocean, with variable performance as they were designed for different environments. Among these, a recently developed mechanistic semi-analytical approach (MeSAA) has shown its advantage for its explicit inclusion of physical and biological forcing in the model, yet its general applicability is unknown. Here, with extensive in situ measurements of surface pCO2, the MeSAA, originally developed for the summertime East China Sea, was tested in the northern Gulf of Mexico (GOM) where river plumes dominate water's biogeochemical properties during summer. Specifically, the MeSAA-predicted surface pCO2 was estimated by combining the dominating effects of thermodynamics, river-ocean mixing and biological activities on surface pCO2. Firstly, effects of thermodynamics and river-ocean mixing (pCO2@Hmixing) were estimated with a two-endmember mixing model, assuming conservative mixing. Secondly, pCO2 variations caused by biological activities (ΔpCO2@bio) was determined through an empirical relationship between sea surface temperature (SST)-normalized pCO2 and MODIS (Moderate Resolution Imaging Spectroradiometer) 8-day composite chlorophyll concentration (CHL). The MeSAA-modeled pCO2 (sum of pCO2@Hmixing and ΔpCO2@bio) was compared with the field-measured pCO2. The Root Mean Square Error (RMSE) was 22.94 μatm (5.91%), with coefficient of determination (R2) of 0.25, mean bias (MB) of - 0.23 μatm and mean ratio (MR) of 1.001, for pCO2 ranging between 316 and 452 μatm. To improve the model performance, a locally tuned MeSAA was developed through the use of a locally tuned ΔpCO2@bio term. A multi-variate empirical regression model was also developed using the same dataset. Both the locally tuned MeSAA and the regression models showed improved performance comparing to the original MeSAA, with R2 of 0.78 and 0.84, RMSE of 12.36 μatm (3.14%) and 10.66 μatm (2.68%), MB of 0.00 μatm and - 0.10 μatm, MR of 1.001 and 1.000, respectively. A sensitivity analysis was conducted to study the uncertainties in the predicted pCO2 as a result of the uncertainties in the input variables of each model. Although the MeSAA was more sensitive to variations in SST and CHL than in sea surface salinity (SSS), and the locally tuned MeSAA and the empirical regression models were more sensitive to changes in SST and SSS than in CHL, generally for these three models the bias induced by the uncertainties in the empirically derived parameters (river endmember total alkalinity (TA) and dissolved inorganic carbon (DIC), biological coefficient of the MeSAA and locally tuned MeSAA models) and environmental variables (SST, SSS, CHL) was within or close to the uncertainty of each model. While all these three models showed that surface pCO2 was positively correlated to SST, the MeSAA showed negative correlation between surface pCO2 and SSS and CHL but the locally tuned MeSAA and the empirical regression showed the opposite. These results suggest that the locally tuned MeSAA worked better in the river-dominated northern GOM than the original MeSAA, with slightly worse statistics but more meaningful physical and biogeochemical interpretations than the empirical regression model. Because data from abnormal upwelling were not used to train the models, they are not applicable for waters with strong upwelling, yet the empirical regression approach showed ability to be further tuned to adapt to such cases.

  12. The use of generalised additive models (GAM) in dentistry.

    PubMed

    Helfenstein, U; Steiner, M; Menghini, G

    1997-12-01

    Ordinary multiple regression and logistic multiple regression are widely applied statistical methods which allow a researcher to 'explain' or 'predict' a response variable from a set of explanatory variables or predictors. In these models it is usually assumed that quantitative predictors such as age enter linearly into the model. During recent years these methods have been further developed to allow more flexibility in the way explanatory variables 'act' on a response variable. The methods are called 'generalised additive models' (GAM). The rigid linear terms characterising the association between response and predictors are replaced in an optimal way by flexible curved functions of the predictors (the 'profiles'). Plotting the 'profiles' allows the researcher to visualise easily the shape by which predictors 'act' over the whole range of values. The method facilitates detection of particular shapes such as 'bumps', 'U-shapes', 'J-shapes, 'threshold values' etc. Information about the shape of the association is not revealed by traditional methods. The shapes of the profiles may be checked by performing a Monte Carlo simulation ('bootstrapping'). After the presentation of the GAM a relevant case study is presented in order to demonstrate application and use of the method. The dependence of caries in primary teeth on a set of explanatory variables is investigated. Since GAMs may not be easily accessible to dentists, this article presents them in an introductory condensed form. It was thought that a nonmathematical summary and a worked example might encourage readers to consider the methods described. GAMs may be of great value to dentists in allowing visualisation of the shape by which predictors 'act' and obtaining a better understanding of the complex relationships between predictors and response.

  13. Simulating the effects of climatic variation on stem carbon accumulation of a ponderosa pine stand: comparison with annual growth increment data.

    PubMed

    Hunt, E R; Martin, F C; Running, S W

    1991-01-01

    Simulation models of ecosystem processes may be necessary to separate the long-term effects of climate change on forest productivity from the effects of year-to-year variations in climate. The objective of this study was to compare simulated annual stem growth with measured annual stem growth from 1930 to 1982 for a uniform stand of ponderosa pine (Pinus ponderosa Dougl.) in Montana, USA. The model, FOREST-BGC, was used to simulate growth assuming leaf area index (LAI) was either constant or increasing. The measured stem annual growth increased exponentially over time; the differences between the simulated and measured stem carbon accumulations were not large. Growth trends were removed from both the measured and simulated annual increments of stem carbon to enhance the year-to-year variations in growth resulting from climate. The detrended increments from the increasing LAI simulation fit the detrended increments of the stand data over time with an R(2) of 0.47; the R(2) increased to 0.65 when the previous year's simulated detrended increment was included with the current year's simulated increment to account for autocorrelation. Stepwise multiple linear regression of the detrended increments of the stand data versus monthly meteorological variables had an R(2) of 0.37, and the R(2) increased to 0.47 when the previous year's meteorological data were included to account for autocorrelation. Thus, FOREST-BGC was more sensitive to the effects of year-to-year climate variation on annual stem growth than were multiple linear regression models.

  14. Functional linear models for zero-inflated count data with application to modeling hospitalizations in patients on dialysis.

    PubMed

    Sentürk, Damla; Dalrymple, Lorien S; Nguyen, Danh V

    2014-11-30

    We propose functional linear models for zero-inflated count data with a focus on the functional hurdle and functional zero-inflated Poisson (ZIP) models. Although the hurdle model assumes the counts come from a mixture of a degenerate distribution at zero and a zero-truncated Poisson distribution, the ZIP model considers a mixture of a degenerate distribution at zero and a standard Poisson distribution. We extend the generalized functional linear model framework with a functional predictor and multiple cross-sectional predictors to model counts generated by a mixture distribution. We propose an estimation procedure for functional hurdle and ZIP models, called penalized reconstruction, geared towards error-prone and sparsely observed longitudinal functional predictors. The approach relies on dimension reduction and pooling of information across subjects involving basis expansions and penalized maximum likelihood techniques. The developed functional hurdle model is applied to modeling hospitalizations within the first 2 years from initiation of dialysis, with a high percentage of zeros, in the Comprehensive Dialysis Study participants. Hospitalization counts are modeled as a function of sparse longitudinal measurements of serum albumin concentrations, patient demographics, and comorbidities. Simulation studies are used to study finite sample properties of the proposed method and include comparisons with an adaptation of standard principal components regression. Copyright © 2014 John Wiley & Sons, Ltd.

  15. The comparison of landslide ratio-based and general logistic regression landslide susceptibility models in the Chishan watershed after 2009 Typhoon Morakot

    NASA Astrophysics Data System (ADS)

    WU, Chunhung

    2015-04-01

    The research built the original logistic regression landslide susceptibility model (abbreviated as or-LRLSM) and landslide ratio-based ogistic regression landslide susceptibility model (abbreviated as lr-LRLSM), compared the performance and explained the error source of two models. The research assumes that the performance of the logistic regression model can be better if the distribution of landslide ratio and weighted value of each variable is similar. Landslide ratio is the ratio of landslide area to total area in the specific area and an useful index to evaluate the seriousness of landslide disaster in Taiwan. The research adopted the landside inventory induced by 2009 Typhoon Morakot in the Chishan watershed, which was the most serious disaster event in the last decade, in Taiwan. The research adopted the 20 m grid as the basic unit in building the LRLSM, and six variables, including elevation, slope, aspect, geological formation, accumulated rainfall, and bank erosion, were included in the two models. The six variables were divided as continuous variables, including elevation, slope, and accumulated rainfall, and categorical variables, including aspect, geological formation and bank erosion in building the or-LRLSM, while all variables, which were classified based on landslide ratio, were categorical variables in building the lr-LRLSM. Because the count of whole basic unit in the Chishan watershed was too much to calculate by using commercial software, the research took random sampling instead of the whole basic units. The research adopted equal proportions of landslide unit and not landslide unit in logistic regression analysis. The research took 10 times random sampling and selected the group with the best Cox & Snell R2 value and Nagelkerker R2 value as the database for the following analysis. Based on the best result from 10 random sampling groups, the or-LRLSM (lr-LRLSM) is significant at the 1% level with Cox & Snell R2 = 0.190 (0.196) and Nagelkerke R2 = 0.253 (0.260). The unit with the landslide susceptibility value > 0.5 (≦ 0.5) will be classified as a predicted landslide unit (not landslide unit). The AUC, i.e. the area under the relative operating characteristic curve, of or-LRLSM in the Chishan watershed is 0.72, while that of lr-LRLSM is 0.77. Furthermore, the average correct ratio of lr-LRLSM (73.3%) is better than that of or-LRLSM (68.3%). The research analyzed in detail the error sources from the two models. In continuous variables, using the landslide ratio-based classification in building the lr-LRLSM can let the distribution of weighted value more similar to distribution of landslide ratio in the range of continuous variable than that in building the or-LRLSM. In categorical variables, the meaning of using the landslide ratio-based classification in building the lr-LRLSM is to gather the parameters with approximate landslide ratio together. The mean correct ratio in continuous variables (categorical variables) by using the lr-LRLSM is better than that in or-LRLSM by 0.6 ~ 2.6% (1.7% ~ 6.0%). Building the landslide susceptibility model by using landslide ratio-based classification is practical and of better performance than that by using the original logistic regression.

  16. Bayesian Analysis of Non-Gaussian Long-Range Dependent Processes

    NASA Astrophysics Data System (ADS)

    Graves, Timothy; Watkins, Nicholas; Franzke, Christian; Gramacy, Robert

    2013-04-01

    Recent studies [e.g. the Antarctic study of Franzke, J. Climate, 2010] have strongly suggested that surface temperatures exhibit long-range dependence (LRD). The presence of LRD would hamper the identification of deterministic trends and the quantification of their significance. It is well established that LRD processes exhibit stochastic trends over rather long periods of time. Thus, accurate methods for discriminating between physical processes that possess long memory and those that do not are an important adjunct to climate modeling. As we briefly review, the LRD idea originated at the same time as H-selfsimilarity, so it is often not realised that a model does not have to be H-self similar to show LRD [e.g. Watkins, GRL Frontiers, 2013]. We have used Markov Chain Monte Carlo algorithms to perform a Bayesian analysis of Auto-Regressive Fractionally-Integrated Moving-Average ARFIMA(p,d,q) processes, which are capable of modeling LRD. Our principal aim is to obtain inference about the long memory parameter, d, with secondary interest in the scale and location parameters. We have developed a reversible-jump method enabling us to integrate over different model forms for the short memory component. We initially assume Gaussianity, and have tested the method on both synthetic and physical time series. Many physical processes, for example the Faraday Antarctic time series, are significantly non-Gaussian. We have therefore extended this work by weakening the Gaussianity assumption, assuming an alpha-stable distribution for the innovations, and performing joint inference on d and alpha. Such a modified FARIMA(p,d,q) process is a flexible, initial model for non-Gaussian processes with long memory. We will present a study of the dependence of the posterior variance of the memory parameter d on the length of the time series considered. This will be compared with equivalent error diagnostics for other measures of d.

  17. REGRES: A FORTRAN-77 program to calculate nonparametric and ``structural'' parametric solutions to bivariate regression equations

    NASA Astrophysics Data System (ADS)

    Rock, N. M. S.; Duffy, T. R.

    REGRES allows a range of regression equations to be calculated for paired sets of data values in which both variables are subject to error (i.e. neither is the "independent" variable). Nonparametric regressions, based on medians of all possible pairwise slopes and intercepts, are treated in detail. Estimated slopes and intercepts are output, along with confidence limits, Spearman and Kendall rank correlation coefficients. Outliers can be rejected with user-determined stringency. Parametric regressions can be calculated for any value of λ (the ratio of the variances of the random errors for y and x)—including: (1) major axis ( λ = 1); (2) reduced major axis ( λ = variance of y/variance of x); (3) Y on Xλ = infinity; or (4) X on Y ( λ = 0) solutions. Pearson linear correlation coefficients also are output. REGRES provides an alternative to conventional isochron assessment techniques where bivariate normal errors cannot be assumed, or weighting methods are inappropriate.

  18. A case study for the integration of predictive mineral potential maps

    NASA Astrophysics Data System (ADS)

    Lee, Saro; Oh, Hyun-Joo; Heo, Chul-Ho; Park, Inhye

    2014-09-01

    This study aims to elaborate on the mineral potential maps using various models and verify the accuracy for the epithermal gold (Au) — silver (Ag) deposits in a Geographic Information System (GIS) environment assuming that all deposits shared a common genesis. The maps of potential Au and Ag deposits were produced by geological data in Taebaeksan mineralized area, Korea. The methodological framework consists of three main steps: 1) identification of spatial relationships 2) quantification of such relationships and 3) combination of multiple quantified relationships. A spatial database containing 46 Au-Ag deposits was constructed using GIS. The spatial association between training deposits and 26 related factors were identified and quantified by probabilistic and statistical modelling. The mineral potential maps were generated by integrating all factors using the overlay method and recombined afterwards using the likelihood ratio model. They were verified by comparison with test mineral deposit locations. The verification revealed that the combined mineral potential map had the greatest accuracy (83.97%), whereas it was 72.24%, 65.85%, 72.23% and 71.02% for the likelihood ratio, weight of evidence, logistic regression and artificial neural network models, respectively. The mineral potential map can provide useful information for the mineral resource development.

  19. Multimodal Image Analysis in Alzheimer’s Disease via Statistical Modelling of Non-local Intensity Correlations

    NASA Astrophysics Data System (ADS)

    Lorenzi, Marco; Simpson, Ivor J.; Mendelson, Alex F.; Vos, Sjoerd B.; Cardoso, M. Jorge; Modat, Marc; Schott, Jonathan M.; Ourselin, Sebastien

    2016-04-01

    The joint analysis of brain atrophy measured with magnetic resonance imaging (MRI) and hypometabolism measured with positron emission tomography with fluorodeoxyglucose (FDG-PET) is of primary importance in developing models of pathological changes in Alzheimer’s disease (AD). Most of the current multimodal analyses in AD assume a local (spatially overlapping) relationship between MR and FDG-PET intensities. However, it is well known that atrophy and hypometabolism are prominent in different anatomical areas. The aim of this work is to describe the relationship between atrophy and hypometabolism by means of a data-driven statistical model of non-overlapping intensity correlations. For this purpose, FDG-PET and MRI signals are jointly analyzed through a computationally tractable formulation of partial least squares regression (PLSR). The PLSR model is estimated and validated on a large clinical cohort of 1049 individuals from the ADNI dataset. Results show that the proposed non-local analysis outperforms classical local approaches in terms of predictive accuracy while providing a plausible description of disease dynamics: early AD is characterised by non-overlapping temporal atrophy and temporo-parietal hypometabolism, while the later disease stages show overlapping brain atrophy and hypometabolism spread in temporal, parietal and cortical areas.

  20. Voxelwise multivariate analysis of multimodality magnetic resonance imaging

    PubMed Central

    Naylor, Melissa G.; Cardenas, Valerie A.; Tosun, Duygu; Schuff, Norbert; Weiner, Michael; Schwartzman, Armin

    2015-01-01

    Most brain magnetic resonance imaging (MRI) studies concentrate on a single MRI contrast or modality, frequently structural MRI. By performing an integrated analysis of several modalities, such as structural, perfusion-weighted, and diffusion-weighted MRI, new insights may be attained to better understand the underlying processes of brain diseases. We compare two voxelwise approaches: (1) fitting multiple univariate models, one for each outcome and then adjusting for multiple comparisons among the outcomes and (2) fitting a multivariate model. In both cases, adjustment for multiple comparisons is performed over all voxels jointly to account for the search over the brain. The multivariate model is able to account for the multiple comparisons over outcomes without assuming independence because the covariance structure between modalities is estimated. Simulations show that the multivariate approach is more powerful when the outcomes are correlated and, even when the outcomes are independent, the multivariate approach is just as powerful or more powerful when at least two outcomes are dependent on predictors in the model. However, multiple univariate regressions with Bonferroni correction remains a desirable alternative in some circumstances. To illustrate the power of each approach, we analyze a case control study of Alzheimer's disease, in which data from three MRI modalities are available. PMID:23408378

  1. Pre-operative prediction of surgical morbidity in children: comparison of five statistical models.

    PubMed

    Cooper, Jennifer N; Wei, Lai; Fernandez, Soledad A; Minneci, Peter C; Deans, Katherine J

    2015-02-01

    The accurate prediction of surgical risk is important to patients and physicians. Logistic regression (LR) models are typically used to estimate these risks. However, in the fields of data mining and machine-learning, many alternative classification and prediction algorithms have been developed. This study aimed to compare the performance of LR to several data mining algorithms for predicting 30-day surgical morbidity in children. We used the 2012 National Surgical Quality Improvement Program-Pediatric dataset to compare the performance of (1) a LR model that assumed linearity and additivity (simple LR model) (2) a LR model incorporating restricted cubic splines and interactions (flexible LR model) (3) a support vector machine, (4) a random forest and (5) boosted classification trees for predicting surgical morbidity. The ensemble-based methods showed significantly higher accuracy, sensitivity, specificity, PPV, and NPV than the simple LR model. However, none of the models performed better than the flexible LR model in terms of the aforementioned measures or in model calibration or discrimination. Support vector machines, random forests, and boosted classification trees do not show better performance than LR for predicting pediatric surgical morbidity. After further validation, the flexible LR model derived in this study could be used to assist with clinical decision-making based on patient-specific surgical risks. Copyright © 2014 Elsevier Ltd. All rights reserved.

  2. Abstraction and Assume-Guarantee Reasoning for Automated Software Verification

    NASA Technical Reports Server (NTRS)

    Chaki, S.; Clarke, E.; Giannakopoulou, D.; Pasareanu, C. S.

    2004-01-01

    Compositional verification and abstraction are the key techniques to address the state explosion problem associated with model checking of concurrent software. A promising compositional approach is to prove properties of a system by checking properties of its components in an assume-guarantee style. This article proposes a framework for performing abstraction and assume-guarantee reasoning of concurrent C code in an incremental and fully automated fashion. The framework uses predicate abstraction to extract and refine finite state models of software and it uses an automata learning algorithm to incrementally construct assumptions for the compositional verification of the abstract models. The framework can be instantiated with different assume-guarantee rules. We have implemented our approach in the COMFORT reasoning framework and we show how COMFORT out-performs several previous software model checking approaches when checking safety properties of non-trivial concurrent programs.

  3. The Solid Rocket Motor Slag Population: Results of a Radar-based Regressive Statistical Evaluation

    NASA Technical Reports Server (NTRS)

    Horstman, Matthew F.; Xu, Yu-Lin

    2008-01-01

    Solid rocket motor (SRM) slag has been identified as a significant source of man-made orbital debris. The propensity of SRMs to generate particles of 100 m and larger has caused concern regarding their contribution to the debris environment. Radar observation, rather than in-situ gathered evidence, is currently the only measurable source for the NASA/ODPO model of the on-orbit slag population. This simulated model includes the time evolution of the resultant orbital populations using a historical database of SRM launches, propellant masses, and estimated locations and times of tail-off. However, due to the small amount of observational evidence, there can be no direct comparison to check the validity of this model. Rather than using the assumed population developed from purely historical and physical assumptions, a regressional approach was used which utilized the populations observed by the Haystack radar from 1996 to present. The estimated trajectories from the historical model of slag sources, and the corresponding plausible detections by the Haystack radar, were identified. Comparisons with observational data from the ensuing years were made, and the SRM model was altered with respect to size and mass production of slag particles to reflect the historical data obtained. The result is a model SRM population that fits within the bounds of the observed environment.

  4. Developing and applying metamodels of high resolution ...

    EPA Pesticide Factsheets

    As defined by Wikipedia (https://en.wikipedia.org/wiki/Metamodeling), “(a) metamodel or surrogate model is a model of a model, and metamodeling is the process of generating such metamodels.” The goals of metamodeling include, but are not limited to (1) developing functional or statistical relationships between a model’s input and output variables for model analysis, interpretation, or information consumption by users’ clients; (2) quantifying a model’s sensitivity to alternative or uncertain forcing functions, initial conditions, or parameters; and (3) characterizing the model’s response or state space. Using five existing models developed by US Environmental Protection Agency, we generate a metamodeling database of the expected environmental and biological concentrations of 644 organic chemicals released into nine US rivers from wastewater treatment works (WTWs) assuming multiple loading rates and sizes of populations serviced. The chemicals of interest have log n-octanol/water partition coefficients ( ) ranging from 3 to 14, and the rivers of concern have mean annual discharges ranging from 1.09 to 3240 m3/s. Log linear regression models are derived to predict mean annual dissolved and total water concentrations and total sediment concentrations of chemicals of concern based on their , Henry’s Law Constant, and WTW loading rate and on the mean annual discharges of the receiving rivers. Metamodels are also derived to predict mean annual chemical

  5. A varying-coefficient method for analyzing longitudinal clinical trials data with nonignorable dropout

    PubMed Central

    Forster, Jeri E.; MaWhinney, Samantha; Ball, Erika L.; Fairclough, Diane

    2011-01-01

    Dropout is common in longitudinal clinical trials and when the probability of dropout depends on unobserved outcomes even after conditioning on available data, it is considered missing not at random and therefore nonignorable. To address this problem, mixture models can be used to account for the relationship between a longitudinal outcome and dropout. We propose a Natural Spline Varying-coefficient mixture model (NSV), which is a straightforward extension of the parametric Conditional Linear Model (CLM). We assume that the outcome follows a varying-coefficient model conditional on a continuous dropout distribution. Natural cubic B-splines are used to allow the regression coefficients to semiparametrically depend on dropout and inference is therefore more robust. Additionally, this method is computationally stable and relatively simple to implement. We conduct simulation studies to evaluate performance and compare methodologies in settings where the longitudinal trajectories are linear and dropout time is observed for all individuals. Performance is assessed under conditions where model assumptions are both met and violated. In addition, we compare the NSV to the CLM and a standard random-effects model using an HIV/AIDS clinical trial with probable nonignorable dropout. The simulation studies suggest that the NSV is an improvement over the CLM when dropout has a nonlinear dependence on the outcome. PMID:22101223

  6. Neural network models - a novel tool for predicting the efficacy of growth hormone (GH) therapy in children with short stature.

    PubMed

    Smyczynska, Joanna; Hilczer, Maciej; Smyczynska, Urszula; Stawerska, Renata; Tadeusiewicz, Ryszard; Lewinski, Andrzej

    2015-01-01

    The leading method for prediction of growth hormone (GH) therapy effectiveness are multiple linear regression (MLR) models. Best of our knowledge, we are the first to apply artificial neural networks (ANN) to solve this problem. For ANN there is no necessity to assume the functions linking independent and dependent variables. The aim of study is to compare ANN and MLR models of GH therapy effectiveness. Analysis comprised the data of 245 GH-deficient children (170 boys) treated with GH up to final height (FH). Independent variables included: patients' height, pre-treatment height velocity, chronological age, bone age, gender, pubertal status, parental heights, GH peak in 2 stimulation tests, IGF-I concentration. The output variable was FH. For testing dataset, MLR model predicted FH SDS with average error (RMSE) 0.64 SD, explaining 34.3% of its variability; ANN model derived on the same pre-processed data predicted FH SDS with RMSE 0.60 SD, explaining 42.0% of its variability; ANN model derived on raw data predicted FH with RMSE 3.9 cm (0.63 SD), explaining 78.7% of its variability. ANN seem to be valuable tool in prediction of GH treatment effectiveness, especially since they can be applied to raw clinical data.

  7. Modeling Error Distributions of Growth Curve Models through Bayesian Methods

    ERIC Educational Resources Information Center

    Zhang, Zhiyong

    2016-01-01

    Growth curve models are widely used in social and behavioral sciences. However, typical growth curve models often assume that the errors are normally distributed although non-normal data may be even more common than normal data. In order to avoid possible statistical inference problems in blindly assuming normality, a general Bayesian framework is…

  8. Mechanism-based model for tumor drug resistance.

    PubMed

    Kuczek, T; Chan, T C

    1992-01-01

    The development of tumor resistance to cytotoxic agents has important implications in the treatment of cancer. If supported by experimental data, mathematical models of resistance can provide useful information on the underlying mechanisms and aid in the design of therapeutic regimens. We report on the development of a model of tumor-growth kinetics based on the assumption that the rates of cell growth in a tumor are normally distributed. We further assumed that the growth rate of each cell is proportional to its rate of total pyrimidine synthesis (de novo plus salvage). Using an ovarian carcinoma cell line (2008) and resistant variants selected for chronic exposure to a pyrimidine antimetabolite, N-phosphonacetyl-L-aspartate (PALA), we derived a simple and specific analytical form describing the growth curves generated in 72 h growth assays. The model assumes that the rate of de novo pyrimidine synthesis, denoted alpha, is shifted down by an amount proportional to the log10 PALA concentration and that cells whose rate of pyrimidine synthesis falls below a critical level, denoted alpha 0, can no longer grow. This is described by the equation: Probability (growth) = probability (alpha 0 less than alpha-constant x log10 [PALA]). This model predicts that when growth curves are plotted on probit paper, they will produce straight lines. This prediction is in agreement with the data we obtained for the 2008 cells. Another prediction of this model is that the same probit plots for the resistant variants should shift to the right in a parallel fashion. Probit plots of the dose-response data obtained for each resistant 2008 line following chronic exposure to PALA again confirmed this prediction. Correlation of the rightward shift of dose responses to uridine transport (r = 0.99) also suggests that salvage metabolism plays a key role in tumor-cell resistance to PALA. Furthermore, the slope of the regression lines enables the detection of synergy such as that observed between dipyridamole and PALA. Although the rate-normal model was used to study the rate of salvage metabolism in PALA resistance in the present study, it may be widely applicable to modeling of other resistance mechanisms such as gene amplification of target enzymes.

  9. Thoughts on Reforming Professional Bureaucracies. Summary.

    ERIC Educational Resources Information Center

    Lynch, Patrick D.

    1978-01-01

    The humanization of bureaucracies is essential in modern society. Traditionally, the training of administrators has emphasized three models of organizations. The first is the productivity model, the second model assumes that fulfilling the needs of organizational members will increase client satisfaction, and the third assumes a stable environment…

  10. Poisson Mixture Regression Models for Heart Disease Prediction.

    PubMed

    Mufudza, Chipo; Erol, Hamza

    2016-01-01

    Early heart disease control can be achieved by high disease prediction and diagnosis efficiency. This paper focuses on the use of model based clustering techniques to predict and diagnose heart disease via Poisson mixture regression models. Analysis and application of Poisson mixture regression models is here addressed under two different classes: standard and concomitant variable mixture regression models. Results show that a two-component concomitant variable Poisson mixture regression model predicts heart disease better than both the standard Poisson mixture regression model and the ordinary general linear Poisson regression model due to its low Bayesian Information Criteria value. Furthermore, a Zero Inflated Poisson Mixture Regression model turned out to be the best model for heart prediction over all models as it both clusters individuals into high or low risk category and predicts rate to heart disease componentwise given clusters available. It is deduced that heart disease prediction can be effectively done by identifying the major risks componentwise using Poisson mixture regression model.

  11. Poisson Mixture Regression Models for Heart Disease Prediction

    PubMed Central

    Erol, Hamza

    2016-01-01

    Early heart disease control can be achieved by high disease prediction and diagnosis efficiency. This paper focuses on the use of model based clustering techniques to predict and diagnose heart disease via Poisson mixture regression models. Analysis and application of Poisson mixture regression models is here addressed under two different classes: standard and concomitant variable mixture regression models. Results show that a two-component concomitant variable Poisson mixture regression model predicts heart disease better than both the standard Poisson mixture regression model and the ordinary general linear Poisson regression model due to its low Bayesian Information Criteria value. Furthermore, a Zero Inflated Poisson Mixture Regression model turned out to be the best model for heart prediction over all models as it both clusters individuals into high or low risk category and predicts rate to heart disease componentwise given clusters available. It is deduced that heart disease prediction can be effectively done by identifying the major risks componentwise using Poisson mixture regression model. PMID:27999611

  12. Is complex allometry in field metabolic rates of mammals a statistical artifact?

    PubMed

    Packard, Gary C

    2017-01-01

    Recent reports indicate that field metabolic rates (FMRs) of mammals conform to a pattern of complex allometry in which the exponent in a simple, two-parameter power equation increases steadily as a dependent function of body mass. The reports were based, however, on indirect analyses performed on logarithmic transformations of the original data. I re-examined values for FMR and body mass for 114 species of mammal by the conventional approach to allometric analysis (to illustrate why the approach is unreliable) and by linear and nonlinear regression on untransformed variables (to illustrate the power and versatility of newer analytical methods). The best of the regression models fitted directly to untransformed observations is a three-parameter power equation with multiplicative, lognormal, heteroscedastic error and an allometric exponent of 0.82. The mean function is a good fit to data in graphical display. The significant intercept in the model may simply have gone undetected in prior analyses because conventional allometry assumes implicitly that the intercept is zero; or the intercept may be a spurious finding resulting from bias introduced by the haphazard sampling that underlies "exploratory" analyses like the one reported here. The aforementioned issues can be resolved only by gathering new data specifically intended to address the question of scaling of FMR with body mass in mammals. However, there is no support for the concept of complex allometry in the relationship between FMR and body size in mammals. Copyright © 2016 Elsevier Inc. All rights reserved.

  13. Valuing year-to-go hydrologic forecast improvements for a peaking hydropower system in the Sierra Nevada

    NASA Astrophysics Data System (ADS)

    Rheinheimer, David E.; Bales, Roger C.; Oroza, Carlos A.; Lund, Jay R.; Viers, Joshua H.

    2016-05-01

    We assessed the potential value of hydrologic forecasting improvements for a snow-dominated high-elevation hydropower system in the Sierra Nevada of California, using a hydropower optimization model. To mimic different forecasting skill levels for inflow time series, rest-of-year inflows from regression-based forecasts were blended in different proportions with representative inflows from a spatially distributed hydrologic model. The statistical approach mimics the simpler, historical forecasting approach that is still widely used. Revenue was calculated using historical electricity prices, with perfect price foresight assumed. With current infrastructure and operations, perfect hydrologic forecasts increased annual hydropower revenue by 0.14 to 1.6 million, with lower values in dry years and higher values in wet years, or about $0.8 million (1.2%) on average, representing overall willingness-to-pay for perfect information. A second sensitivity analysis found a wider range of annual revenue gain or loss using different skill levels in snow measurement in the regression-based forecast, mimicking expected declines in skill as the climate warms and historical snow measurements no longer represent current conditions. The value of perfect forecasts was insensitive to storage capacity for small and large reservoirs, relative to average inflow, and modestly sensitive to storage capacity with medium (current) reservoir storage. The value of forecasts was highly sensitive to powerhouse capacity, particularly for the range of capacities in the northern Sierra Nevada. The approach can be extended to multireservoir, multipurpose systems to help guide investments in forecasting.

  14. A fully traits-based approach to modeling global vegetation distribution.

    PubMed

    van Bodegom, Peter M; Douma, Jacob C; Verheijen, Lieneke M

    2014-09-23

    Dynamic Global Vegetation Models (DGVMs) are indispensable for our understanding of climate change impacts. The application of traits in DGVMs is increasingly refined. However, a comprehensive analysis of the direct impacts of trait variation on global vegetation distribution does not yet exist. Here, we present such analysis as proof of principle. We run regressions of trait observations for leaf mass per area, stem-specific density, and seed mass from a global database against multiple environmental drivers, making use of findings of global trait convergence. This analysis explained up to 52% of the global variation of traits. Global trait maps, generated by coupling the regression equations to gridded soil and climate maps, showed up to orders of magnitude variation in trait values. Subsequently, nine vegetation types were characterized by the trait combinations that they possess using Gaussian mixture density functions. The trait maps were input to these functions to determine global occurrence probabilities for each vegetation type. We prepared vegetation maps, assuming that the most probable (and thus, most suited) vegetation type at each location will be realized. This fully traits-based vegetation map predicted 42% of the observed vegetation distribution correctly. Our results indicate that a major proportion of the predictive ability of DGVMs with respect to vegetation distribution can be attained by three traits alone if traits like stem-specific density and seed mass are included. We envision that our traits-based approach, our observation-driven trait maps, and our vegetation maps may inspire a new generation of powerful traits-based DGVMs.

  15. A simple computational algorithm of model-based choice preference.

    PubMed

    Toyama, Asako; Katahira, Kentaro; Ohira, Hideki

    2017-08-01

    A broadly used computational framework posits that two learning systems operate in parallel during the learning of choice preferences-namely, the model-free and model-based reinforcement-learning systems. In this study, we examined another possibility, through which model-free learning is the basic system and model-based information is its modulator. Accordingly, we proposed several modified versions of a temporal-difference learning model to explain the choice-learning process. Using the two-stage decision task developed by Daw, Gershman, Seymour, Dayan, and Dolan (2011), we compared their original computational model, which assumes a parallel learning process, and our proposed models, which assume a sequential learning process. Choice data from 23 participants showed a better fit with the proposed models. More specifically, the proposed eligibility adjustment model, which assumes that the environmental model can weight the degree of the eligibility trace, can explain choices better under both model-free and model-based controls and has a simpler computational algorithm than the original model. In addition, the forgetting learning model and its variation, which assume changes in the values of unchosen actions, substantially improved the fits to the data. Overall, we show that a hybrid computational model best fits the data. The parameters used in this model succeed in capturing individual tendencies with respect to both model use in learning and exploration behavior. This computational model provides novel insights into learning with interacting model-free and model-based components.

  16. Wildfire Risk Mapping over the State of Mississippi: Land Surface Modeling Approach

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Cooke, William H.; Mostovoy, Georgy; Anantharaj, Valentine G

    2012-01-01

    Three fire risk indexes based on soil moisture estimates were applied to simulate wildfire probability over the southern part of Mississippi using the logistic regression approach. The fire indexes were retrieved from: (1) accumulated difference between daily precipitation and potential evapotranspiration (P-E); (2) top 10 cm soil moisture content simulated by the Mosaic land surface model; and (3) the Keetch-Byram drought index (KBDI). The P-E, KBDI, and soil moisture based indexes were estimated from gridded atmospheric and Mosaic-simulated soil moisture data available from the North American Land Data Assimilation System (NLDAS-2). Normalized deviations of these indexes from the 31-year meanmore » (1980-2010) were fitted into the logistic regression model describing probability of wildfires occurrence as a function of the fire index. It was assumed that such normalization provides more robust and adequate description of temporal dynamics of soil moisture anomalies than the original (not normalized) set of indexes. The logistic model parameters were evaluated for 0.25 x0.25 latitude/longitude cells and for probability representing at least one fire event occurred during 5 consecutive days. A 23-year (1986-2008) forest fires record was used. Two periods were selected and examined (January mid June and mid September December). The application of the logistic model provides an overall good agreement between empirical/observed and model-fitted fire probabilities over the study area during both seasons. The fire risk indexes based on the top 10 cm soil moisture and KBDI have the largest impact on the wildfire odds (increasing it by almost 2 times in response to each unit change of the corresponding fire risk index during January mid June period and by nearly 1.5 times during mid September-December) observed over 0.25 x0.25 cells located along the state of Mississippi Coast line. This result suggests a rather strong control of fire risk indexes on fire occurrence probability over this region.« less

  17. Optical scatterometry of quarter-micron patterns using neural regression

    NASA Astrophysics Data System (ADS)

    Bischoff, Joerg; Bauer, Joachim J.; Haak, Ulrich; Hutschenreuther, Lutz; Truckenbrodt, Horst

    1998-06-01

    With shrinking dimensions and increasing chip areas, a rapid and non-destructive full wafer characterization after every patterning cycle is an inevitable necessity. In former publications it was shown that Optical Scatterometry (OS) has the potential to push the attainable feature limits of optical techniques from 0.8 . . . 0.5 microns for imaging methods down to 0.1 micron and below. Thus the demands of future metrology can be met. Basically being a nonimaging method, OS combines light scatter (or diffraction) measurements with modern data analysis schemes to solve the inverse scatter issue. For very fine patterns with lambda-to-pitch ratios grater than one, the specular reflected light versus the incidence angle is recorded. Usually, the data analysis comprises two steps -- a training cycle connected the a rigorous forward modeling and the prediction itself. Until now, two data analysis schemes are usually applied -- the multivariate regression based Partial Least Squares method (PLS) and a look-up-table technique which is also referred to as Minimum Mean Square Error approach (MMSE). Both methods are afflicted with serious drawbacks. On the one hand, the prediction accuracy of multivariate regression schemes degrades with larger parameter ranges due to the linearization properties of the method. On the other hand, look-up-table methods are rather time consuming during prediction thus prolonging the processing time and reducing the throughput. An alternate method is an Artificial Neural Network (ANN) based regression which combines the advantages of multivariate regression and MMSE. Due to the versatility of a neural network, not only can its structure be adapted more properly to the scatter problem, but also the nonlinearity of the neuronal transfer functions mimic the nonlinear behavior of optical diffraction processes more adequately. In spite of these pleasant properties, the prediction speed of ANN regression is comparable with that of the PLS-method. In this paper, the viability and performance of ANN-regression will be demonstrated with the example of sub-quarter-micron resist metrology. To this end, 0.25 micrometer line/space patterns have been printed in positive photoresist by means of DUV projection lithography. In order to evaluate the total metrology chain from light scatter measurement through data analysis, a thorough modeling has been performed. Assuming a trapezoidal shape of the developed resist profile, a training data set was generated by means of the Rigorous Coupled Wave Approach (RCWA). After training the model, a second data set was computed and deteriorated by Gaussian noise to imitate real measuring conditions. Then, these data have been fed into the models established before resulting in a Standard Error of Prediction (SEP) which corresponds to the measuring accuracy. Even with putting only little effort in the design of a back-propagation network, the ANN is clearly superior to the PLS-method. Depending on whether a network with one or two hidden layers was used, accuracy gains between 2 and 5 can be achieved compared with PLS regression. Furthermore, the ANN is less noise sensitive, for there is only a doubling of the SEP at 5% noise for ANN whereas for PLS the accuracy degrades rapidly with increasing noise. The accuracy gain also depends on the light polarization and on the measured parameters. Finally, these results have been proven experimentally, where the OS-results are in good accordance with the profiles obtained from cross- sectioning micrographs.

  18. Association of Fast-Food and Full-Service Restaurant Densities With Mortality From Cardiovascular Disease and Stroke, and the Prevalence of Diabetes Mellitus.

    PubMed

    Mazidi, Mohsen; Speakman, John R

    2018-05-25

    We explored whether higher densities of fast-food restaurants (FFRs) and full-service restaurants are associated with mortality from cardiovascular disease (CVD) and stroke and the prevalence of type 2 diabetes mellitus (T2D) across the mainland United States. In this cross-sectional study county-level data for CVD and stroke mortality, and prevalence of T2D, were combined with per capita densities of FFRs and full-service restaurants and analyzed using regression. Mortality and diabetes mellitus prevalence were corrected for poverty, ethnicity, education, physical inactivity, and smoking. After adjustment, FFR density was positively associated with CVD (β=1.104, R 2 =2.3%), stroke (β=0.841, R 2 =1.4%), and T2D (β=0.578, R 2 =0.6%) and full-service restaurant density was positively associated with CVD mortality (β=0.19, R 2 =0.1%) and negatively related to T2D prevalence (β=-0.25, R 2 =0.3%). In a multiple regression analysis (FFRs and full-service restaurants together in same model), only the densities of FFRs were significant (and positive). If we assume these relationships are causal, an impact analysis suggested that opening 10 new FFRs in a county would lead to 1 extra death from CVD every 42 years and 1 extra death from stroke every 55 years. Repeated nationally across all counties, that would be an extra 748 CVD deaths and 567 stroke deaths (and 390 new cases of T2D) over the next 10 years. These results suggest that an increased density of FFRs is associated with increased risk of death from CVD and stroke and increased T2D prevalence, but the maximal impact (assuming the correlations reflect causality) of each individual FFR is small. URL: http://www.clinicaltrials.gov. Unique identifier: NCT03243253. © 2018 The Authors. Published on behalf of the American Heart Association, Inc., by Wiley.

  19. Parametric regression model for survival data: Weibull regression model as an example

    PubMed Central

    2016-01-01

    Weibull regression model is one of the most popular forms of parametric regression model that it provides estimate of baseline hazard function, as well as coefficients for covariates. Because of technical difficulties, Weibull regression model is seldom used in medical literature as compared to the semi-parametric proportional hazard model. To make clinical investigators familiar with Weibull regression model, this article introduces some basic knowledge on Weibull regression model and then illustrates how to fit the model with R software. The SurvRegCensCov package is useful in converting estimated coefficients to clinical relevant statistics such as hazard ratio (HR) and event time ratio (ETR). Model adequacy can be assessed by inspecting Kaplan-Meier curves stratified by categorical variable. The eha package provides an alternative method to model Weibull regression model. The check.dist() function helps to assess goodness-of-fit of the model. Variable selection is based on the importance of a covariate, which can be tested using anova() function. Alternatively, backward elimination starting from a full model is an efficient way for model development. Visualization of Weibull regression model after model development is interesting that it provides another way to report your findings. PMID:28149846

  20. Introduction to the use of regression models in epidemiology.

    PubMed

    Bender, Ralf

    2009-01-01

    Regression modeling is one of the most important statistical techniques used in analytical epidemiology. By means of regression models the effect of one or several explanatory variables (e.g., exposures, subject characteristics, risk factors) on a response variable such as mortality or cancer can be investigated. From multiple regression models, adjusted effect estimates can be obtained that take the effect of potential confounders into account. Regression methods can be applied in all epidemiologic study designs so that they represent a universal tool for data analysis in epidemiology. Different kinds of regression models have been developed in dependence on the measurement scale of the response variable and the study design. The most important methods are linear regression for continuous outcomes, logistic regression for binary outcomes, Cox regression for time-to-event data, and Poisson regression for frequencies and rates. This chapter provides a nontechnical introduction to these regression models with illustrating examples from cancer research.

  1. Improved statistical analysis of moclobemide dose effects on panic disorder treatment.

    PubMed

    Ross, Donald C; Klein, Donald F; Uhlenhuth, E H

    2010-04-01

    Clinical trials with several measurement occasions are frequently analyzed using only the last available observation as the dependent variable [last observation carried forward (LOCF)]. This ignores intermediate observations. We reanalyze, with complete data methods, a clinical trial previously reported using LOCF, comparing placebo and five dosage levels of moclobemide in the treatment of outpatients with panic disorder to illustrate the superiority of methods using repeated observations. We initially analyzed unprovoked and situational, major and minor attacks as the four dependent variables, by repeated measures maximum likelihood methods. The model included parameters for linear and curvilinear time trends and regression of measures during treatment on baseline measures. Significance tests using this method take into account the structure of the error covariance matrix. This makes the sphericity assumption irrelevant. Missingness is assumed to be unrelated to eventual outcome and the residuals are assumed to have a multivariate normal distribution. No differential treatment effects for limited attacks were found. Since similar results were obtained for both types of major attack, data for the two types of major attack were combined. Overall downward linear and negatively accelerated downward curvilinear time trends were found. There were highly significant treatment differences in the regression slopes of scores during treatment on baseline observations. For major attacks, all treatment groups improved over time. The flatter regression slopes, obtained with higher doses, indicated that higher doses result in uniformly lower attack rates regardless of initial severity. Lower doses do not lower the attack rate of severely ill patients to those achieved in the less severely ill. The clinical implication is that more severe patients require higher doses to attain best benefit. Further, the significance levels obtained by LOCF analyses were only in the 0.05-0.01 range, while significance levels of <0.00001 were obtained by these repeated measures analyses indicating increased power. The greater sensitivity to treatment effect of this complete data method is illustrated. To increase power, it is often recommended to increase sample size. However, this is often impractical since a major proportion of the cost per subject is due to the initial evaluation. Increasing the number of repeated observations increases power economically and also allows detailed longitudinal trajectory analyses.

  2. Pre-conceptual design of the Z-LLE accelerator.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Stygar, William A.

    We begin with a model of 20 LTD modules, connected in parallel. We assume each LTD module consists of 10 LTD cavities, connected in series. We assume each cavity includes 20 LTD bricks, in parallel. Each brick is assumed to have a 40-nF capacitance and a 160-nH inductance. We use for this calculation the RLC-circuit model of an LTD system that was developed by Mazarakis and colleagues.

  3. Freshwater DOM quantity and quality from a two-component model of UV absorbance

    USGS Publications Warehouse

    Carter, Heather T.; Tipping, Edward; Koprivnjak, Jean-Francois; Miller, Matthew P.; Cookson, Brenda; Hamilton-Taylor, John

    2012-01-01

    We present a model that considers UV-absorbing dissolved organic matter (DOM) to consist of two components (A and B), each with a distinct and constant spectrum. Component A absorbs UV light strongly, and is therefore presumed to possess aromatic chromophores and hydrophobic character, whereas B absorbs weakly and can be assumed hydrophilic. We parameterised the model with dissolved organic carbon concentrations [DOC] and corresponding UV spectra for c. 1700 filtered surface water samples from North America and the United Kingdom, by optimising extinction coefficients for A and B, together with a small constant concentration of non-absorbing DOM (0.80 mg DOC L-1). Good unbiased predictions of [DOC] from absorbance data at 270 and 350 nm were obtained (r2 = 0.98), the sum of squared residuals in [DOC] being reduced by 66% compared to a regression model fitted to absorbance at 270 nm alone. The parameterised model can use measured optical absorbance values at any pair of suitable wavelengths to calculate both [DOC] and the relative amounts of A and B in a water sample, i.e. measures of quantity and quality. Blind prediction of [DOC] was satisfactory for 9 of 11 independent data sets (181 of 213 individual samples).

  4. On One Possible Generalization of the Regression Theorem

    NASA Astrophysics Data System (ADS)

    Bogolubov, N. N.; Soldatov, A. V.

    2018-03-01

    A general approach to derivation of formally exact closed time-local or time-nonlocal evolution equations for non-equilibrium multi-time correlations functions made of observables of an open quantum system interacting simultaneously with external time-dependent classical fields and dissipative environment is discussed. The approach allows for the subsequent treatment of these equations within a perturbative scheme assuming that the system-environment interaction is weak.

  5. Interpretation of commonly used statistical regression models.

    PubMed

    Kasza, Jessica; Wolfe, Rory

    2014-01-01

    A review of some regression models commonly used in respiratory health applications is provided in this article. Simple linear regression, multiple linear regression, logistic regression and ordinal logistic regression are considered. The focus of this article is on the interpretation of the regression coefficients of each model, which are illustrated through the application of these models to a respiratory health research study. © 2013 The Authors. Respirology © 2013 Asian Pacific Society of Respirology.

  6. Spatial occupancy models applied to atlas data show Southern Ground Hornbills strongly depend on protected areas.

    PubMed

    Broms, Kristin M; Johnson, Devin S; Altwegg, Res; Conquest, Loveday L

    2014-03-01

    Determining the range of a species and exploring species--habitat associations are central questions in ecology and can be answered by analyzing presence--absence data. Often, both the sampling of sites and the desired area of inference involve neighboring sites; thus, positive spatial autocorrelation between these sites is expected. Using survey data for the Southern Ground Hornbill (Bucorvus leadbeateri) from the Southern African Bird Atlas Project, we compared advantages and disadvantages of three increasingly complex models for species occupancy: an occupancy model that accounted for nondetection but assumed all sites were independent, and two spatial occupancy models that accounted for both nondetection and spatial autocorrelation. We modeled the spatial autocorrelation with an intrinsic conditional autoregressive (ICAR) model and with a restricted spatial regression (RSR) model. Both spatial models can readily be applied to any other gridded, presence--absence data set using a newly introduced R package. The RSR model provided the best inference and was able to capture small-scale variation that the other models did not. It showed that ground hornbills are strongly dependent on protected areas in the north of their South African range, but less so further south. The ICAR models did not capture any spatial autocorrelation in the data, and they took an order, of magnitude longer than the RSR models to run. Thus, the RSR occupancy model appears to be an attractive choice for modeling occurrences at large spatial domains, while accounting for imperfect detection and spatial autocorrelation.

  7. Flood quantile estimation at ungauged sites by Bayesian networks

    NASA Astrophysics Data System (ADS)

    Mediero, L.; Santillán, D.; Garrote, L.

    2012-04-01

    Estimating flood quantiles at a site for which no observed measurements are available is essential for water resources planning and management. Ungauged sites have no observations about the magnitude of floods, but some site and basin characteristics are known. The most common technique used is the multiple regression analysis, which relates physical and climatic basin characteristic to flood quantiles. Regression equations are fitted from flood frequency data and basin characteristics at gauged sites. Regression equations are a rigid technique that assumes linear relationships between variables and cannot take the measurement errors into account. In addition, the prediction intervals are estimated in a very simplistic way from the variance of the residuals in the estimated model. Bayesian networks are a probabilistic computational structure taken from the field of Artificial Intelligence, which have been widely and successfully applied to many scientific fields like medicine and informatics, but application to the field of hydrology is recent. Bayesian networks infer the joint probability distribution of several related variables from observations through nodes, which represent random variables, and links, which represent causal dependencies between them. A Bayesian network is more flexible than regression equations, as they capture non-linear relationships between variables. In addition, the probabilistic nature of Bayesian networks allows taking the different sources of estimation uncertainty into account, as they give a probability distribution as result. A homogeneous region in the Tagus Basin was selected as case study. A regression equation was fitted taking the basin area, the annual maximum 24-hour rainfall for a given recurrence interval and the mean height as explanatory variables. Flood quantiles at ungauged sites were estimated by Bayesian networks. Bayesian networks need to be learnt from a huge enough data set. As observational data are reduced, a stochastic generator of synthetic data was developed. Synthetic basin characteristics were randomised, keeping the statistical properties of observed physical and climatic variables in the homogeneous region. The synthetic flood quantiles were stochastically generated taking the regression equation as basis. The learnt Bayesian network was validated by the reliability diagram, the Brier Score and the ROC diagram, which are common measures used in the validation of probabilistic forecasts. Summarising, the flood quantile estimations through Bayesian networks supply information about the prediction uncertainty as a probability distribution function of discharges is given as result. Therefore, the Bayesian network model has application as a decision support for water resources and planning management.

  8. Postoperative refraction in the second eye having cataract surgery.

    PubMed

    Leffler, Christopher T; Wilkes, Martin; Reeves, Juliana; Mahmood, Muneera A

    2011-01-01

    Introduction. Previous cataract surgery studies assumed that first-eye predicted and observed postoperative refractions are equally important for predicting second-eye postoperative refraction. Methods. In a retrospective analysis of 173 patients having bilateral sequential phacoemulsification, multivariable linear regression was used to predict the second-eye postoperative refraction based on refractions predicted by the SRK-T formula for both eyes, the first-eye postoperative refraction, and the difference in IOL selected between eyes. Results. The first-eye observed postoperative refraction was an independent predictor of the second eye postoperative refraction (P < 0.001) and was weighted more heavily than the first-eye predicted refraction. Compared with the SRK-T formula, this model reduced the root-mean-squared (RMS) error of the predicted refraction by 11.3%. Conclusions. The first-eye postoperative refraction is an independent predictor of the second-eye postoperative refraction. The first-eye predicted refraction is less important. These findings may be due to interocular symmetry.

  9. Football and exchange rates: empirical support for behavioral economics.

    PubMed

    Eker, Gulin; Berument, Hakan; Dogan, Burak

    2007-10-01

    Recently, economic theory has been expanded to incorporate emotions, which have been assumed to play an important role in financial decisions. The present study illustrates this by showing a connection between the sports performance of popular national football teams (Besiktas, Fenerbahce, and Galatasaray) and performance of the Turkish economy. Specifically, a significant positive association was found between the success of three major professional Turkish football teams and the exchange rate of the Turkish lira against the U.S. dollar. The effect of the football success of several Turkish football teams on the exchange rate of the Turkish lira was examined using the simultaneous multiple regression model with predictor measures of wins, losses, and ties for different combinations of teams to predict the depreciation rate of the Turkish lira between the years 1987 and 2003. Wins by Turkish football teams against foreign (non-Turkish) rivals increased with exchange rate depreciation of the Turkish lira against the U.S. dollar.

  10. Outcome-based and Participation-based Wellness Incentives

    PubMed Central

    Barleen, Nathan A.; Marzec, Mary L.; Boerger, Nicholas L.; Moloney, Daniel P.; Zimmerman, Eric M.; Dobro, Jeff

    2017-01-01

    Objective: This study examined whether worksite wellness program participation or achievement of health improvement targets differed according to four incentive types (participation-based, hybrid, outcome-based, and no incentive). Methods: The study included individuals who completed biometric health screenings in both 2013 and 2014 and had elevated metrics in 2013 (baseline year). Multivariate logistic regression modeling tested for differences in odds of participation and achievement of health improvement targets between incentive groups; controlling for demographics, employer characteristics, incentive amounts, and other factors. Results: No statistically significant differences between incentive groups occurred for odds of participation or achievement of health improvement target related to body mass index, blood pressure, or nonhigh-density lipoprotein cholesterol. Conclusions: Given the null findings of this study, employers cannot assume that outcome-based incentives will result in either increased program participation or greater achievement of health improvement targets than participation-based incentives. PMID:28146041

  11. Large data series: Modeling the usual to identify the unusual

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Downing, D.J.; Fedorov, V.V.; Lawkins, W.F.

    {open_quotes}Standard{close_quotes} approaches such as regression analysis, Fourier analysis, Box-Jenkins procedure, et al., which handle a data series as a whole, are not useful for very large data sets for at least two reasons. First, even with computer hardware available today, including parallel processors and storage devices, there are no effective means for manipulating and analyzing gigabyte, or larger, data files. Second, in general it can not be assumed that a very large data set is {open_quotes}stable{close_quotes} by the usual measures, like homogeneity, stationarity, and ergodicity, that standard analysis techniques require. Both reasons dictate the necessity to use {open_quotes}local{close_quotes} data analysismore » methods whereby the data is segmented and ordered, where order leads to a sense of {open_quotes}neighbor,{close_quotes} and then analyzed segment by segment. The idea of local data analysis is central to the study reported here.« less

  12. Allometric scaling: analysis of LD50 data.

    PubMed

    Burzala-Kowalczyk, Lidia; Jongbloed, Geurt

    2011-04-01

    The need to identify toxicologically equivalent doses across different species is a major issue in toxicology and risk assessment. In this article, we investigate interspecies scaling based on the allometric equation applied to the single, oral LD (50) data previously analyzed by Rhomberg and Wolff. We focus on the statistical approach, namely, regression analysis of the mentioned data. In contrast to Rhomberg and Wolff's analysis of species pairs, we perform an overall analysis based on the whole data set. From our study it follows that if one assumes one single scaling rule for all species and substances in the data set, then β = 1 is the most natural choice among a set of candidates known in the literature. In fact, we obtain quite narrow confidence intervals for this parameter. However, the estimate of the variance in the model is relatively high, resulting in rather wide prediction intervals. © 2010 Society for Risk Analysis.

  13. Multilevel joint competing risk models

    NASA Astrophysics Data System (ADS)

    Karunarathna, G. H. S.; Sooriyarachchi, M. R.

    2017-09-01

    Joint modeling approaches are often encountered for different outcomes of competing risk time to event and count in many biomedical and epidemiology studies in the presence of cluster effect. Hospital length of stay (LOS) has been the widely used outcome measure in hospital utilization due to the benchmark measurement for measuring multiple terminations such as discharge, transferred, dead and patients who have not completed the event of interest at the follow up period (censored) during hospitalizations. Competing risk models provide a method of addressing such multiple destinations since classical time to event models yield biased results when there are multiple events. In this study, the concept of joint modeling has been applied to the dengue epidemiology in Sri Lanka, 2006-2008 to assess the relationship between different outcomes of LOS and platelet count of dengue patients with the district cluster effect. Two key approaches have been applied to build up the joint scenario. In the first approach, modeling each competing risk separately using the binary logistic model, treating all other events as censored under the multilevel discrete time to event model, while the platelet counts are assumed to follow a lognormal regression model. The second approach is based on the endogeneity effect in the multilevel competing risks and count model. Model parameters were estimated using maximum likelihood based on the Laplace approximation. Moreover, the study reveals that joint modeling approach yield more precise results compared to fitting two separate univariate models, in terms of AIC (Akaike Information Criterion).

  14. Estimation of inlet flow rates for image-based aneurysm CFD models: where and how to begin?

    PubMed

    Valen-Sendstad, Kristian; Piccinelli, Marina; KrishnankuttyRema, Resmi; Steinman, David A

    2015-06-01

    Patient-specific flow rates are rarely available for image-based computational fluid dynamics models. Instead, flow rates are often assumed to scale according to the diameters of the arteries of interest. Our goal was to determine how choice of inlet location and scaling law affect such model-based estimation of inflow rates. We focused on 37 internal carotid artery (ICA) aneurysm cases from the Aneurisk cohort. An average ICA flow rate of 245 mL min(-1) was assumed from the literature, and then rescaled for each case according to its inlet diameter squared (assuming a fixed velocity) or cubed (assuming a fixed wall shear stress). Scaling was based on diameters measured at various consistent anatomical locations along the models. Choice of location introduced a modest 17% average uncertainty in model-based flow rate, but within individual cases estimated flow rates could vary by >100 mL min(-1). A square law was found to be more consistent with physiological flow rates than a cube law. Although impact of parent artery truncation on downstream flow patterns is well studied, our study highlights a more insidious and potentially equal impact of truncation site and scaling law on the uncertainty of assumed inlet flow rates and thus, potentially, downstream flow patterns.

  15. Subjective stressors in school and their relation to neuroenhancement: a behavioral perspective on students’ everyday life “doping”

    PubMed Central

    2013-01-01

    Background The use of psychoactive substances to neuroenhance cognitive performance is prevalent. Neuroenhancement (NE) in everyday life and doping in sport might rest on similar attitudinal representations, and both behaviors can be theoretically modeled by comparable means-to-end relations (substance-performance). A behavioral (not substance-based) definition of NE is proposed, with assumed functionality as its core component. It is empirically tested whether different NE variants (lifestyle drug, prescription drug, and illicit substance) can be regressed to school stressors. Findings Participants were 519 students (25.8 ± 8.4 years old, 73.1% female). Logistic regressions indicate that a modified doping attitude scale can predict all three NE variants. Multiple NE substance abuse was frequent. Overwhelming demands in school were associated with lifestyle and prescription drug NE. Conclusions Researchers should be sensitive for probable structural similarities between enhancement in everyday life and sport and systematically explore where findings from one domain can be adapted for the other. Policy makers should be aware that students might misperceive NE as an acceptable means of coping with stress in school, and help to form societal sensitivity for the topic of NE among our younger ones in general. PMID:23777577

  16. The difference engine: a model of diversity in speeded cognition.

    PubMed

    Myerson, Joel; Hale, Sandra; Zheng, Yingye; Jenkins, Lisa; Widaman, Keith F

    2003-06-01

    A theory of diversity in speeded cognition, the difference engine, is proposed, in which information processing is represented as a series of generic computational steps. Some individuals tend to perform all of these computations relatively quickly and other individuals tend to perform them all relatively slowly, reflecting the existence of a general cognitive speed factor, but the time required for response selection and execution is assumed to be independent of cognitive speed. The difference engine correctly predicts the positively accelerated form of the relation between diversity of performance, as measured by the standard deviation for the group, and task difficulty, as indexed by the mean response time (RT) for the group. In addition, the difference engine correctly predicts approximately linear relations between the RTs of any individual and average performance for the group, with the regression lines for fast individuals having slopes less than 1.0 (and positive intercepts) and the regression lines for slow individuals having slopes greater than 1.0 (and negative intercepts). Similar predictions are made for comparisons of slow, average, and fast subgroups, regardless of whether those subgroups are formed on the basis of differences in ability, age, or health status. These predictions are consistent with evidence from studies of healthy young and older adults as well as from studies of depressed and age-matched control groups.

  17. A new approach to correct the QT interval for changes in heart rate using a nonparametric regression model in beagle dogs.

    PubMed

    Watanabe, Hiroyuki; Miyazaki, Hiroyasu

    2006-01-01

    Over- and/or under-correction of QT intervals for changes in heart rate may lead to misleading conclusions and/or masking the potential of a drug to prolong the QT interval. This study examines a nonparametric regression model (Loess Smoother) to adjust the QT interval for differences in heart rate, with an improved fitness over a wide range of heart rates. 240 sets of (QT, RR) observations collected from each of 8 conscious and non-treated beagle dogs were used as the materials for investigation. The fitness of the nonparametric regression model to the QT-RR relationship was compared with four models (individual linear regression, common linear regression, and Bazett's and Fridericia's correlation models) with reference to Akaike's Information Criterion (AIC). Residuals were visually assessed. The bias-corrected AIC of the nonparametric regression model was the best of the models examined in this study. Although the parametric models did not fit, the nonparametric regression model improved the fitting at both fast and slow heart rates. The nonparametric regression model is the more flexible method compared with the parametric method. The mathematical fit for linear regression models was unsatisfactory at both fast and slow heart rates, while the nonparametric regression model showed significant improvement at all heart rates in beagle dogs.

  18. Calibrating MODIS aerosol optical depth for predicting daily PM2.5 concentrations via statistical downscaling

    PubMed Central

    Chang, Howard H.; Hu, Xuefei; Liu, Yang

    2014-01-01

    There has been a growing interest in the use of satellite-retrieved aerosol optical depth (AOD) to estimate ambient concentrations of PM2.5 (particulate matter <2.5 μm in aerodynamic diameter). With their broad spatial coverage, satellite data can increase the spatial–temporal availability of air quality data beyond ground monitoring measurements and potentially improve exposure assessment for population-based health studies. This paper describes a statistical downscaling approach that brings together (1) recent advances in PM2.5 land use regression models utilizing AOD and (2) statistical data fusion techniques for combining air quality data sets that have different spatial resolutions. Statistical downscaling assumes the associations between AOD and PM2.5 concentrations to be spatially and temporally dependent and offers two key advantages. First, it enables us to use gridded AOD data to predict PM2.5 concentrations at spatial point locations. Second, the unified hierarchical framework provides straightforward uncertainty quantification in the predicted PM2.5 concentrations. The proposed methodology is applied to a data set of daily AOD values in southeastern United States during the period 2003–2005. Via cross-validation experiments, our model had an out-of-sample prediction R2 of 0.78 and a root mean-squared error (RMSE) of 3.61 μg/m3 between observed and predicted daily PM2.5 concentrations. This corresponds to a 10% decrease in RMSE compared with the same land use regression model without AOD as a predictor. Prediction performances of spatial–temporal interpolations to locations and on days without monitoring PM2.5 measurements were also examined. PMID:24368510

  19. Calibrating MODIS aerosol optical depth for predicting daily PM2.5 concentrations via statistical downscaling.

    PubMed

    Chang, Howard H; Hu, Xuefei; Liu, Yang

    2014-07-01

    There has been a growing interest in the use of satellite-retrieved aerosol optical depth (AOD) to estimate ambient concentrations of PM2.5 (particulate matter <2.5 μm in aerodynamic diameter). With their broad spatial coverage, satellite data can increase the spatial-temporal availability of air quality data beyond ground monitoring measurements and potentially improve exposure assessment for population-based health studies. This paper describes a statistical downscaling approach that brings together (1) recent advances in PM2.5 land use regression models utilizing AOD and (2) statistical data fusion techniques for combining air quality data sets that have different spatial resolutions. Statistical downscaling assumes the associations between AOD and PM2.5 concentrations to be spatially and temporally dependent and offers two key advantages. First, it enables us to use gridded AOD data to predict PM2.5 concentrations at spatial point locations. Second, the unified hierarchical framework provides straightforward uncertainty quantification in the predicted PM2.5 concentrations. The proposed methodology is applied to a data set of daily AOD values in southeastern United States during the period 2003-2005. Via cross-validation experiments, our model had an out-of-sample prediction R(2) of 0.78 and a root mean-squared error (RMSE) of 3.61 μg/m(3) between observed and predicted daily PM2.5 concentrations. This corresponds to a 10% decrease in RMSE compared with the same land use regression model without AOD as a predictor. Prediction performances of spatial-temporal interpolations to locations and on days without monitoring PM2.5 measurements were also examined.

  20. Cost-effectiveness of sacubitril/valsartan in chronic heart-failure patients with reduced ejection fraction.

    PubMed

    Ademi, Zanfina; Pfeil, Alena M; Hancock, Elizabeth; Trueman, David; Haroun, Rola Haroun; Deschaseaux, Celine; Schwenkglenks, Matthias

    2017-11-29

    We aimed to assess the cost effectiveness of sacubitril/valsartan compared to angiotensin-converting enzyme inhibitors (ACEIs) for the treatment of individuals with chronic heart failure and reduced-ejection fraction (HFrEF) from the perspective of the Swiss health care system. The cost-effectiveness analysis was implemented as a lifelong regression-based cohort model. We compared sacubitril/valsartan with enalapril in chronic heart failure patients with HFrEF and New York-Heart Association Functional Classification II-IV symptoms. Regression models based on the randomised clinical phase III PARADIGM-HF trials were used to predict events (all-cause mortality, hospitalisations, adverse events and quality of life) for each treatment strategy modelled over the lifetime horizon, with adjustments for patient characteristics. Unit costs were obtained from Swiss public sources for the year 2014, and costs and effects were discounted by 3%. The main outcome of interest was the incremental cost-effectiveness ratio (ICER), expressed as cost per quality-adjusted life years (QALYs) gained. Deterministic sensitivity analysis (DSA) and scenario and probabilistic sensitivity analysis (PSA) were performed. In the base-case analysis, the sacubitril/valsartan strategy showed a decrease in the number of hospitalisations (6.0% per year absolute reduction) and lifetime hospital costs by 8.0% (discounted) when compared with enalapril. Sacubitril/valsartan was predicted to improve overall and quality-adjusted survival by 0.50 years and 0.42 QALYs, respectively. Additional net-total costs were CHF 10 926. This led to an ICER of CHF 25 684. In PSA, the probability of sacubitril/valsartan being cost-effective at thresholds of CHF 50 000 was 99.0%. The treatment of HFrEF patients with sacubitril/valsartan versus enalapril is cost effective, if a willingness-to-pay threshold of CHF 50 000 per QALY gained ratio is assumed.

  1. Association between age associated cognitive decline and health related quality of life among Iranian older individuals.

    PubMed

    Kazazi, Leila; Foroughan, Mahshid; Nejati, Vahid; Shati, Mohsen

    2018-04-01

    Age associated cognitive decline or normal cognitive aging is related with lower levels of functioning in real life, and may interfere with maintaining independence and health related quality of life (HRQL). In this study, health related quality of life and cognitive function in community-dwelling older adults were evaluated with the aim of exploring the association between them by adjusting for potential confounders. This cross-sectional study, was implemented on 425 community-dwelling older adults aged 60 and over, between August 2016 and October 2016 in health centers of the municipality of Tehran, Iran, using Mini Mental State Examination (MMSE) to assess cognitive function and Short Form-36 scales (SF-36) to assess HRQL. The relation between HRQL and cognitive function was evaluated by Pearson's correlation coefficient, and the impact of cognitive function on HRQL adjusted for potential confounders was estimated by linear regression model. All analyses were done using SPSS, version 22.0. A positive significant correlation between cognitive function and quality of life (r=0.434; p<0.001) and its dimensions was observed. Two variables of educational level (B=2.704; 95% CI: 2.09 to 3.30; p<0.001) and depression (B=2.554; 95% CI: 2.00 to 3.10; p<0.001) were assumed as potential confounder by changing effect measure after entering the model. After adjusting for potential confounders in regression model, the association between MMSE scores and quality of life persisted (B=2.417; 95% CI: 1.86 to 2.96; p<0.001). The results indicate that cognitive function was associated with HRQL in older adults with age associated cognitive function. Two variables of educational level and depression can affect the relation between cognitive decline and HRQL.

  2. Detecting outliers when fitting data with nonlinear regression – a new method based on robust nonlinear regression and the false discovery rate

    PubMed Central

    Motulsky, Harvey J; Brown, Ronald E

    2006-01-01

    Background Nonlinear regression, like linear regression, assumes that the scatter of data around the ideal curve follows a Gaussian or normal distribution. This assumption leads to the familiar goal of regression: to minimize the sum of the squares of the vertical or Y-value distances between the points and the curve. Outliers can dominate the sum-of-the-squares calculation, and lead to misleading results. However, we know of no practical method for routinely identifying outliers when fitting curves with nonlinear regression. Results We describe a new method for identifying outliers when fitting data with nonlinear regression. We first fit the data using a robust form of nonlinear regression, based on the assumption that scatter follows a Lorentzian distribution. We devised a new adaptive method that gradually becomes more robust as the method proceeds. To define outliers, we adapted the false discovery rate approach to handling multiple comparisons. We then remove the outliers, and analyze the data using ordinary least-squares regression. Because the method combines robust regression and outlier removal, we call it the ROUT method. When analyzing simulated data, where all scatter is Gaussian, our method detects (falsely) one or more outlier in only about 1–3% of experiments. When analyzing data contaminated with one or several outliers, the ROUT method performs well at outlier identification, with an average False Discovery Rate less than 1%. Conclusion Our method, which combines a new method of robust nonlinear regression with a new method of outlier identification, identifies outliers from nonlinear curve fits with reasonable power and few false positives. PMID:16526949

  3. Modified Regression Correlation Coefficient for Poisson Regression Model

    NASA Astrophysics Data System (ADS)

    Kaengthong, Nattacha; Domthong, Uthumporn

    2017-09-01

    This study gives attention to indicators in predictive power of the Generalized Linear Model (GLM) which are widely used; however, often having some restrictions. We are interested in regression correlation coefficient for a Poisson regression model. This is a measure of predictive power, and defined by the relationship between the dependent variable (Y) and the expected value of the dependent variable given the independent variables [E(Y|X)] for the Poisson regression model. The dependent variable is distributed as Poisson. The purpose of this research was modifying regression correlation coefficient for Poisson regression model. We also compare the proposed modified regression correlation coefficient with the traditional regression correlation coefficient in the case of two or more independent variables, and having multicollinearity in independent variables. The result shows that the proposed regression correlation coefficient is better than the traditional regression correlation coefficient based on Bias and the Root Mean Square Error (RMSE).

  4. Micro-epidemiological structuring of Plasmodium falciparum parasite populations in regions with varying transmission intensities in Africa.

    PubMed Central

    Omedo, Irene; Mogeni, Polycarp; Bousema, Teun; Rockett, Kirk; Amambua-Ngwa, Alfred; Oyier, Isabella; C. Stevenson, Jennifer; Y. Baidjoe, Amrish; de Villiers, Etienne P.; Fegan, Greg; Ross, Amanda; Hubbart, Christina; Jeffreys, Anne; N. Williams, Thomas; Kwiatkowski, Dominic; Bejon, Philip

    2017-01-01

    Background: The first models of malaria transmission assumed a completely mixed and homogeneous population of parasites.  Recent models include spatial heterogeneity and variably mixed populations. However, there are few empiric estimates of parasite mixing with which to parametize such models. Methods: Here we genotype 276 single nucleotide polymorphisms (SNPs) in 5199 P. falciparum isolates from two Kenyan sites (Kilifi county and Rachuonyo South district) and one Gambian site (Kombo coastal districts) to determine the spatio-temporal extent of parasite mixing, and use Principal Component Analysis (PCA) and linear regression to examine the relationship between genetic relatedness and distance in space and time for parasite pairs. Results: Using 107, 177 and 82 SNPs that were successfully genotyped in 133, 1602, and 1034 parasite isolates from The Gambia, Kilifi and Rachuonyo South district, respectively, we show that there are no discrete geographically restricted parasite sub-populations, but instead we see a diffuse spatio-temporal structure to parasite genotypes.  Genetic relatedness of sample pairs is predicted by relatedness in space and time. Conclusions: Our findings suggest that targeted malaria control will benefit the surrounding community, but unfortunately also that emerging drug resistance will spread rapidly through the population. PMID:28612053

  5. A simple way to classify supermassive black holes

    NASA Astrophysics Data System (ADS)

    Feoli, A.

    2014-02-01

    We propose a classification of supermassive black holes (SMBHs) based on their efficiency in the conversion of infalling mass in emitted radiation. We use a theoretical model that assumes a conservation of angular momentum between the gas falling inside the hole and the photons emitted outwards, and suggests the existence of the scaling relation M-Re \\sigma3, where M is the mass of the central SMBH, whereas Re and \\sigma are the effective radius and velocity dispersion of the host galaxies (bulges), respectively. We apply our model on a data set of 57 galaxies of different morphological types and with M measurements, obtained through the analysis of Spitzer/IRAC 3.6-\\mum images. In order to find the best fit of the corresponding scaling law, we use the FITEXY routine to perform a least-squares regression of M on Re \\sigma3 for the considered sample of galaxies. Our analysis shows that the relation is tight and our theoretical model allows to easily estimate the efficiency of mass conversion into radiation of the central SMBHs. Finally we propose a new appealing way to classify the SMBHs in terms of this parameter.

  6. Assessing the completeness of the fossil record using brachiopod Lazarus taxa

    NASA Astrophysics Data System (ADS)

    Gearty, W.; Payne, J.

    2012-12-01

    Lazarus taxa, organisms that disappear from the fossil record only to reappear later, provide a unique opportunity to assess the completeness of the fossil record. In this study, we apply logistic regression to quantify the associations of body size, geographic extent, and species diversity with the probability of being a Lazarus genus using the Phanerozoic fossil record of brachiopods. We find that both the geographic range and species diversity of a genus are inversely associated with the probability of being a Lazarus taxon in the preceding or succeeding stage. In contrast, body size exhibits little association with the probability of becoming a Lazarus taxon. A model including species diversity and geographic extent as predictors performs best among all combinations examined, whereas a model including only shell size as a predictor performs the worst - even worse than a model that assumes Lazarus taxa are randomly drawn from all available genera. These findings suggest that geographic range and species richness data can be used to improve estimates of extensions on the observed fossil ranges of genera and, thereby, better correct for sampling effects in estimates of taxonomic diversity change through the Phanerozoic.

  7. Estimating evolutionary rates using time-structured data: a general comparison of phylogenetic methods.

    PubMed

    Duchêne, Sebastián; Geoghegan, Jemma L; Holmes, Edward C; Ho, Simon Y W

    2016-11-15

    In rapidly evolving pathogens, including viruses and some bacteria, genetic change can accumulate over short time-frames. Accordingly, their sampling times can be used to calibrate molecular clocks, allowing estimation of evolutionary rates. Methods for estimating rates from time-structured data vary in how they treat phylogenetic uncertainty and rate variation among lineages. We compiled 81 virus data sets and estimated nucleotide substitution rates using root-to-tip regression, least-squares dating and Bayesian inference. Although estimates from these three methods were often congruent, this largely relied on the choice of clock model. In particular, relaxed-clock models tended to produce higher rate estimates than methods that assume constant rates. Discrepancies in rate estimates were also associated with high among-lineage rate variation, and phylogenetic and temporal clustering. These results provide insights into the factors that affect the reliability of rate estimates from time-structured sequence data, emphasizing the importance of clock-model testing. sduchene@unimelb.edu.au or garzonsebastian@hotmail.comSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  8. Numerical simulation of Forchheimer flow to a partially penetrating well with a mixed-type boundary condition

    NASA Astrophysics Data System (ADS)

    Mathias, Simon A.; Wen, Zhang

    2015-05-01

    This article presents a numerical study to investigate the combined role of partial well penetration (PWP) and non-Darcy effects concerning the performance of groundwater production wells. A finite difference model is developed in MATLAB to solve the two-dimensional mixed-type boundary value problem associated with flow to a partially penetrating well within a cylindrical confined aquifer. Non-Darcy effects are incorporated using the Forchheimer equation. The model is verified by comparison to results from existing semi-analytical solutions concerning the same problem but assuming Darcy's law. A sensitivity analysis is presented to explore the problem of concern. For constant pressure production, Non-Darcy effects lead to a reduction in production rate, as compared to an equivalent problem solved using Darcy's law. For fully penetrating wells, this reduction in production rate becomes less significant with time. However, for partially penetrating wells, the reduction in production rate persists for much larger times. For constant production rate scenarios, the combined effect of PWP and non-Darcy flow takes the form of a constant additional drawdown term. An approximate solution for this loss term is obtained by performing linear regression on the modeling results.

  9. Smoking, death, and Alzheimer disease: a case of competing risks.

    PubMed

    Chang, Chung-Chou H; Zhao, Yongyun; Lee, Ching-Wen; Ganguli, Mary

    2012-01-01

    If smoking is a risk factor for Alzheimer disease (AD) but a smoker dies of another cause before developing or manifesting AD, smoking-related mortality may mask the relationship between smoking and AD. This phenomenon, referred to as competing risk, complicates efforts to model the effect of smoking on AD. Typical survival regression models assume that censorship from analysis is unrelated to an individual's probability for developing AD (ie, censoring is noninformative). However, if individuals who die before developing AD are younger than those who survive long enough to develop AD, and if they include a higher percentage of smokers than nonsmokers, the incidence of AD will appear to be higher in older individuals and in nonsmokers. Further, age-specific mortality rates are higher in smokers because they die earlier than nonsmokers. Therefore, if we fail to take into account the competing risk of death when we estimate the effect of smoking on AD, we bias the results and are in fact only comparing the incidence of AD in nonsmokers with that in the healthiest smokers. In this study, we demonstrate that the effect of smoking on AD differs in models that are and are not adjusted for competing risks.

  10. Winter Season Mortality: Will Climate Warming Bring Benefits?

    PubMed

    Kinney, Patrick L; Schwartz, Joel; Pascal, Mathilde; Petkova, Elisaveta; Tertre, Alain Le; Medina, Sylvia; Vautard, Robert

    2015-06-01

    Extreme heat events are associated with spikes in mortality, yet death rates are on average highest during the coldest months of the year. Under the assumption that most winter excess mortality is due to cold temperature, many previous studies have concluded that winter mortality will substantially decline in a warming climate. We analyzed whether and to what extent cold temperatures are associated with excess winter mortality across multiple cities and over multiple years within individual cities, using daily temperature and mortality data from 36 US cities (1985-2006) and 3 French cities (1971-2007). Comparing across cities, we found that excess winter mortality did not depend on seasonal temperature range, and was no lower in warmer vs. colder cities, suggesting that temperature is not a key driver of winter excess mortality. Using regression models within monthly strata, we found that variability in daily mortality within cities was not strongly influenced by winter temperature. Finally we found that inadequate control for seasonality in analyses of the effects of cold temperatures led to spuriously large assumed cold effects, and erroneous attribution of winter mortality to cold temperatures. Our findings suggest that reductions in cold-related mortality under warming climate may be much smaller than some have assumed. This should be of interest to researchers and policy makers concerned with projecting future health effects of climate change and developing relevant adaptation strategies.

  11. Winter season mortality: will climate warming bring benefits?

    NASA Astrophysics Data System (ADS)

    Kinney, Patrick L.; Schwartz, Joel; Pascal, Mathilde; Petkova, Elisaveta; Le Tertre, Alain; Medina, Sylvia; Vautard, Robert

    2015-06-01

    Extreme heat events are associated with spikes in mortality, yet death rates are on average highest during the coldest months of the year. Under the assumption that most winter excess mortality is due to cold temperature, many previous studies have concluded that winter mortality will substantially decline in a warming climate. We analyzed whether and to what extent cold temperatures are associated with excess winter mortality across multiple cities and over multiple years within individual cities, using daily temperature and mortality data from 36 US cities (1985-2006) and 3 French cities (1971-2007). Comparing across cities, we found that excess winter mortality did not depend on seasonal temperature range, and was no lower in warmer vs. colder cities, suggesting that temperature is not a key driver of winter excess mortality. Using regression models within monthly strata, we found that variability in daily mortality within cities was not strongly influenced by winter temperature. Finally we found that inadequate control for seasonality in analyses of the effects of cold temperatures led to spuriously large assumed cold effects, and erroneous attribution of winter mortality to cold temperatures. Our findings suggest that reductions in cold-related mortality under warming climate may be much smaller than some have assumed. This should be of interest to researchers and policy makers concerned with projecting future health effects of climate change and developing relevant adaptation strategies.

  12. Single nucleotide polymorphisms in the Trx2/TXNIP and TrxR2 genes of the mitochondrial thioredoxin antioxidant system and the risk of diabetic retinopathy in patients with Type 2 diabetes mellitus.

    PubMed

    Ramus, Sara Mankoc; Cilensek, Ines; Petrovic, Mojca Globocnik; Soucek, Miroslav; Kruzliak, Peter; Petrovic, Daniel

    2016-03-01

    Oxidative stress plays an important role in the pathogenesis of diabetes and its complications. The aim of this study was to examine the possible association between seven single nucleotide polymorphisms (SNPs) of the Trx2/TXNIP and TrxR2 genes encoding proteins involved in the thioredoxin antioxidant defence system and the risk of diabetic retinopthy (DR). Cross-sectional case-control study. A total of 802 Slovenian patients with Type 2 diabetes mellitus; 277 patients with DR and 525 with no DR were enrolled. Patients genotypes of the SNPs; including rs8140110, rs7211, rs7212, rs4755, rs1548357, rs4485648 and rs5748469 were determined by the competitive allele specific PCR method. Each genotype of examined SNPs was regressed in a logistic model, assuming the co-dominant, dominant and the recessive models of inheritance with covariates of duration of diabetes, HbA1c, insulin therapy, total cholesterol and LDL cholesterol levels. In the present study, for the first time we identified an association between the rs4485648 polymorphism of the TrxR2 gene and DR in Caucasians with Type 2 DM. The estimated ORs of adjusted logistic regression models were found to be as follows: 4.4 for CT heterozygotes, 4.3 for TT homozygotes (co-dominant genetic model) and 4.4 for CT+TT genotypes (dominant genetic model). In our case-control study we were not able to demonstrate any association between rs8140110, rs7211, rs7212, rs4755, rs1548357, and rs5748469 and DR, however, our findings provide evidence that the rs4485648 polymorphism of the TrxR2 gene might exert an independent effect on the development of DR. Copyright © 2016 Elsevier Inc. All rights reserved.

  13. Potential Predictability and Prediction Skill for Southern Peru Summertime Rainfall

    NASA Astrophysics Data System (ADS)

    WU, S.; Notaro, M.; Vavrus, S. J.; Mortensen, E.; Block, P. J.; Montgomery, R. J.; De Pierola, J. N.; Sanchez, C.

    2016-12-01

    The central Andes receive over 50% of annual climatological rainfall during the short period of January-March. This summertime rainfall exhibits strong interannual and decadal variability, including severe drought events that incur devastating societal impacts and cause agricultural communities and mining facilities to compete for limited water resources. An improved seasonal prediction skill of summertime rainfall would aid in water resource planning and allocation across the water-limited southern Peru. While various underlying mechanisms have been proposed by past studies for the drivers of interannual variability in summertime rainfall across southern Peru, such as the El Niño-Southern Oscillation (ENSO), Madden Julian Oscillation (MJO), and extratropical forcings, operational forecasts continue to be largely based on rudimentary ENSO-based indices, such as NINO3.4, justifying further exploration of predictive skill. In order to bridge this gap between the understanding of driving mechanisms and the operational forecast, we performed systematic studies on the predictability and prediction skill of southern Peru summertime rainfall by constructing statistical forecast models using best available weather station and reanalysis datasets. At first, by assuming the first two empirical orthogonal functions (EOFs) of summertime rainfall are predictable, the potential predictability skill was evaluated for southern Peru. Then, we constructed a simple regression model, based on the time series of tropical Pacific sea-surface temperatures (SSTs), and a more advanced Linear Inverse Model (LIM), based on the EOFs of tropical ocean SSTs and large-scale atmosphere variables from reanalysis. Our results show that the LIM model consistently outperforms the more rudimentary regression models on the forecast skill of domain averaged precipitation index and individual station indices. The improvement of forecast correlation skill ranges from 10% to over 200% for different stations. Further analysis shows that this advantage of LIM is likely to arise from its representation of local zonal winds and the position of Intertropical Convergence Zone (ITCZ).

  14. Dynamic Multi-Axial Loading Response and Constitutive/Damage Modeling of Titanium and Titanium Alloys

    DTIC Science & Technology

    2006-06-24

    crystals and assume same yield stress in tension and compression. Some anisotropic models have been proposed and used in the literature for HCP poly...2006), etc. These criteria dealt with the modeling of cubic crystals and assume same yield stress in tension an compression. Some anisotropic...Constitutive/Damage Modeling of Titanium and Titanium Alloys Principal Investigator: Akhtar S. Khan

  15. Towards a Compositional SPIN

    NASA Technical Reports Server (NTRS)

    Pasareanu, Corina S.; Giannakopoulou, Dimitra

    2006-01-01

    This paper discusses our initial experience with introducing automated assume-guarantee verification based on learning in the SPIN tool. We believe that compositional verification techniques such as assume-guarantee reasoning could complement the state-reduction techniques that SPIN already supports, thus increasing the size of systems that SPIN can handle. We present a "light-weight" approach to evaluating the benefits of learning-based assume-guarantee reasoning in the context of SPIN: we turn our previous implementation of learning for the LTSA tool into a main program that externally invokes SPIN to provide the model checking-related answers. Despite its performance overheads (which mandate a future implementation within SPIN itself), this approach provides accurate information about the savings in memory. We have experimented with several versions of learning-based assume guarantee reasoning, including a novel heuristic introduced here for generating component assumptions when their environment is unavailable. We illustrate the benefits of learning-based assume-guarantee reasoning in SPIN through the example of a resource arbiter for a spacecraft. Keywords: assume-guarantee reasoning, model checking, learning.

  16. Modeling of Word Translation: Activation Flow from Concepts to Lexical Items

    ERIC Educational Resources Information Center

    Roelofs, Ardi; Dijkstra, Ton; Gerakaki, Svetlana

    2013-01-01

    Whereas most theoretical and computational models assume a continuous flow of activation from concepts to lexical items in spoken word production, one prominent model assumes that the mapping of concepts onto words happens in a discrete fashion (Bloem & La Heij, 2003). Semantic facilitation of context pictures on word translation has been taken to…

  17. a Geographic Weighted Regression for Rural Highways Crashes Modelling Using the Gaussian and Tricube Kernels: a Case Study of USA Rural Highways

    NASA Astrophysics Data System (ADS)

    Aghayari, M.; Pahlavani, P.; Bigdeli, B.

    2017-09-01

    Based on world health organization (WHO) report, driving incidents are counted as one of the eight initial reasons for death in the world. The purpose of this paper is to develop a method for regression on effective parameters of highway crashes. In the traditional methods, it was assumed that the data are completely independent and environment is homogenous while the crashes are spatial events which are occurring in geographic space and crashes have spatial data. Spatial data have spatial features such as spatial autocorrelation and spatial non-stationarity in a way working with them is going to be a bit difficult. The proposed method has implemented on a set of records of fatal crashes that have been occurred in highways connecting eight east states of US. This data have been recorded between the years 2007 and 2009. In this study, we have used GWR method with two Gaussian and Tricube kernels. The Number of casualties has been considered as dependent variable and number of persons in crash, road alignment, number of lanes, pavement type, surface condition, road fence, light condition, vehicle type, weather, drunk driver, speed limitation, harmful event, road profile, and junction type have been considered as explanatory variables according to previous studies in using GWR method. We have compered the results of implementation with OLS method. Results showed that R2 for OLS method is 0.0654 and for the proposed method is 0.9196 that implies the proposed GWR is better method for regression in rural highway crashes.

  18. Statistical power for the comparative regression discontinuity design with a nonequivalent comparison group.

    PubMed

    Tang, Yang; Cook, Thomas D; Kisbu-Sakarya, Yasemin

    2018-03-01

    In the "sharp" regression discontinuity design (RD), all units scoring on one side of a designated score on an assignment variable receive treatment, whereas those scoring on the other side become controls. Thus the continuous assignment variable and binary treatment indicator are measured on the same scale. Because each must be in the impact model, the resulting multi-collinearity reduces the efficiency of the RD design. However, untreated comparison data can be added along the assignment variable, and a comparative regression discontinuity design (CRD) is then created. When the untreated data come from a non-equivalent comparison group, we call this CRD-CG. Assuming linear functional forms, we show that power in CRD-CG is (a) greater than in basic RD; (b) less sensitive to the location of the cutoff and the distribution of the assignment variable; and that (c) fewer treated units are needed in the basic RD component within the CRD-CG so that savings can result from having fewer treated cases. The theory we develop is used to make numerical predictions about the efficiency of basic RD and CRD-CG relative to each other and to a randomized control trial. Data from the National Head Start Impact study are used to test these predictions. The obtained estimates are closer to the predicted parameters for CRD-CG than for basic RD and are generally quite close to the parameter predictions, supporting the emerging argument that CRD should be the design of choice in many applications for which basic RD is now used. (PsycINFO Database Record (c) 2018 APA, all rights reserved).

  19. Maximum likelihood estimation of correction for dilution bias in simple linear regression using replicates from subjects with extreme first measurements.

    PubMed

    Berglund, Lars; Garmo, Hans; Lindbäck, Johan; Svärdsudd, Kurt; Zethelius, Björn

    2008-09-30

    The least-squares estimator of the slope in a simple linear regression model is biased towards zero when the predictor is measured with random error. A corrected slope may be estimated by adding data from a reliability study, which comprises a subset of subjects from the main study. The precision of this corrected slope depends on the design of the reliability study and estimator choice. Previous work has assumed that the reliability study constitutes a random sample from the main study. A more efficient design is to use subjects with extreme values on their first measurement. Previously, we published a variance formula for the corrected slope, when the correction factor is the slope in the regression of the second measurement on the first. In this paper we show that both designs improve by maximum likelihood estimation (MLE). The precision gain is explained by the inclusion of data from all subjects for estimation of the predictor's variance and by the use of the second measurement for estimation of the covariance between response and predictor. The gain of MLE enhances with stronger true relationship between response and predictor and with lower precision in the predictor measurements. We present a real data example on the relationship between fasting insulin, a surrogate marker, and true insulin sensitivity measured by a gold-standard euglycaemic insulin clamp, and simulations, where the behavior of profile-likelihood-based confidence intervals is examined. MLE was shown to be a robust estimator for non-normal distributions and efficient for small sample situations. Copyright (c) 2008 John Wiley & Sons, Ltd.

  20. Quantile Regression for Analyzing Heterogeneity in Ultra-high Dimension

    PubMed Central

    Wang, Lan; Wu, Yichao

    2012-01-01

    Ultra-high dimensional data often display heterogeneity due to either heteroscedastic variance or other forms of non-location-scale covariate effects. To accommodate heterogeneity, we advocate a more general interpretation of sparsity which assumes that only a small number of covariates influence the conditional distribution of the response variable given all candidate covariates; however, the sets of relevant covariates may differ when we consider different segments of the conditional distribution. In this framework, we investigate the methodology and theory of nonconvex penalized quantile regression in ultra-high dimension. The proposed approach has two distinctive features: (1) it enables us to explore the entire conditional distribution of the response variable given the ultra-high dimensional covariates and provides a more realistic picture of the sparsity pattern; (2) it requires substantially weaker conditions compared with alternative methods in the literature; thus, it greatly alleviates the difficulty of model checking in the ultra-high dimension. In theoretic development, it is challenging to deal with both the nonsmooth loss function and the nonconvex penalty function in ultra-high dimensional parameter space. We introduce a novel sufficient optimality condition which relies on a convex differencing representation of the penalized loss function and the subdifferential calculus. Exploring this optimality condition enables us to establish the oracle property for sparse quantile regression in the ultra-high dimension under relaxed conditions. The proposed method greatly enhances existing tools for ultra-high dimensional data analysis. Monte Carlo simulations demonstrate the usefulness of the proposed procedure. The real data example we analyzed demonstrates that the new approach reveals substantially more information compared with alternative methods. PMID:23082036

  1. Dual job holding general practitioners: the effect of patient shortage.

    PubMed

    Godager, Geir; Lurås, Hilde

    2009-10-01

    In 2001, a listpatient system with capitation payment was introduced in Norwegian general practice. After an allocation process where each inhabitant was listed with a general practitioner (GP), a considerable share of the GPs got fewer persons listed than they would have preferred. We examine whether GPs who experience a shortage of patients to a larger extent than other GPs seek to hold a second job in the community health service even though the wage rate is low compared with the wage rate in general practice. Assuming utility maximization, we model the effect of patient shortage on a GP's decision to contract for a second job in the community health service. The model predicts a positive relationship between patient shortage and participation in the community health service. This prediction is tested by means of censored regression analyses, taking account of labour supply as a censored variable. We find a significant effect of patient shortage on the number of hours the GPs supply to community health service. The estimated marginal effect is 1.72 hours per week.

  2. Collaborative localization in wireless sensor networks via pattern recognition in radio irregularity using omnidirectional antennas.

    PubMed

    Jiang, Joe-Air; Chuang, Cheng-Long; Lin, Tzu-Shiang; Chen, Chia-Pang; Hung, Chih-Hung; Wang, Jiing-Yi; Liu, Chang-Wang; Lai, Tzu-Yun

    2010-01-01

    In recent years, various received signal strength (RSS)-based localization estimation approaches for wireless sensor networks (WSNs) have been proposed. RSS-based localization is regarded as a low-cost solution for many location-aware applications in WSNs. In previous studies, the radiation patterns of all sensor nodes are assumed to be spherical, which is an oversimplification of the radio propagation model in practical applications. In this study, we present an RSS-based cooperative localization method that estimates unknown coordinates of sensor nodes in a network. Arrangement of two external low-cost omnidirectional dipole antennas is developed by using the distance-power gradient model. A modified robust regression is also proposed to determine the relative azimuth and distance between a sensor node and a fixed reference node. In addition, a cooperative localization scheme that incorporates estimations from multiple fixed reference nodes is presented to improve the accuracy of the localization. The proposed method is tested via computer-based analysis and field test. Experimental results demonstrate that the proposed low-cost method is a useful solution for localizing sensor nodes in unknown or changing environments.

  3. Granulite fades Nd-isotopic homogenization in the Lewisian complex of northwest Scotland

    USGS Publications Warehouse

    Whitehouse, M.J.

    1988-01-01

    A published Sm-Nd whole-rock isochron of 2,920 ?? 50 Myr, obtained from a wide range of lithologies in the Lewisian complex of north-west Scotland, was interpreted1 as the time of protolith formation. This date is ???260 Myr older than estimates for the timing of high-grade metamorphism in the complex at ??? 2,660 Myr2'3, and this period is considered to represent the duration of the Lewisian crustal accretion-differentiation superevent (CADS)4. Here we give new Sm-Nd data, obtained specifically from granulite facies tonalitic gneisses, that yield a date of 2,600 ??155 Myr. Although depleted-mantle model ages (tDM suggest >200 Myr of premetamorphic crustal residence, the regression date and its associated initial Nd-isotopic parameters demonstrate Nd-isotopic homogenization during the high-grade event, as well as the probability of general rare-earth-element (REE) mobility. Models for selective element depletion in the complex have previously assumed REE immobility since 2,920 Myr, but the data presented here suggest that a reappraisal of the depletion mechanism is required. ?? 1988 Nature Publishing Group.

  4. Assessing variation in life-history tactics within a population using mixture regression models: a practical guide for evolutionary ecologists.

    PubMed

    Hamel, Sandra; Yoccoz, Nigel G; Gaillard, Jean-Michel

    2017-05-01

    Mixed models are now well-established methods in ecology and evolution because they allow accounting for and quantifying within- and between-individual variation. However, the required normal distribution of the random effects can often be violated by the presence of clusters among subjects, which leads to multi-modal distributions. In such cases, using what is known as mixture regression models might offer a more appropriate approach. These models are widely used in psychology, sociology, and medicine to describe the diversity of trajectories occurring within a population over time (e.g. psychological development, growth). In ecology and evolution, however, these models are seldom used even though understanding changes in individual trajectories is an active area of research in life-history studies. Our aim is to demonstrate the value of using mixture models to describe variation in individual life-history tactics within a population, and hence to promote the use of these models by ecologists and evolutionary ecologists. We first ran a set of simulations to determine whether and when a mixture model allows teasing apart latent clustering, and to contrast the precision and accuracy of estimates obtained from mixture models versus mixed models under a wide range of ecological contexts. We then used empirical data from long-term studies of large mammals to illustrate the potential of using mixture models for assessing within-population variation in life-history tactics. Mixture models performed well in most cases, except for variables following a Bernoulli distribution and when sample size was small. The four selection criteria we evaluated [Akaike information criterion (AIC), Bayesian information criterion (BIC), and two bootstrap methods] performed similarly well, selecting the right number of clusters in most ecological situations. We then showed that the normality of random effects implicitly assumed by evolutionary ecologists when using mixed models was often violated in life-history data. Mixed models were quite robust to this violation in the sense that fixed effects were unbiased at the population level. However, fixed effects at the cluster level and random effects were better estimated using mixture models. Our empirical analyses demonstrated that using mixture models facilitates the identification of the diversity of growth and reproductive tactics occurring within a population. Therefore, using this modelling framework allows testing for the presence of clusters and, when clusters occur, provides reliable estimates of fixed and random effects for each cluster of the population. In the presence or expectation of clusters, using mixture models offers a suitable extension of mixed models, particularly when evolutionary ecologists aim at identifying how ecological and evolutionary processes change within a population. Mixture regression models therefore provide a valuable addition to the statistical toolbox of evolutionary ecologists. As these models are complex and have their own limitations, we provide recommendations to guide future users. © 2016 Cambridge Philosophical Society.

  5. Assessment of the heterogeneous ruminal fiber pool and development of a mathematical approach for predicting the mean retention time of feeds in goats.

    PubMed

    Regadas Filho, J G L; Tedeschi, L O; Vieira, R A M; Rodrigues, M T

    2014-03-01

    The objectives of this study were to evaluate ruminal fiber stratification and to develop a mathematical approach for predicting the mean retention time (MRT) of forage and concentrates in goats. A dataset from 3 studies was used that contained information regarding fiber and lignin intake as well as ruminal content and the kinetics of fiber passage for forage and concentrates. The kinetic information was obtained through pulse dose and the fecal concentration measurement of forage and concentrate markers in the same animals that were used to measure ruminal content. The evaluation of heterogeneous fiber pools in the rumen was performed using the Lucas' test assumptions, and the marker excretion profiles were interpreted using a model known in the literature as GNG1. The GNG1 model assumes an age-dependent fractional rate for the transfer of particles from the raft to the escapable pool in the rumen (λ(r); h(-1)) and an age-independent fractional rate for the escape of particles from the escapable pool to the remaining parts of the stomach (k(e); h(-1)). The equations used to predict the MRT for forage and concentrate fiber were developed using stepwise regression. A sensitivity analysis was conducted using a Monte Carlo simulation to investigate the relationships between the dependent and independent variables and between forage and concentrate passage rates. The Lucas' test yields goodness-of-fit estimates for NDF analysis; however, the homogeneous fiber pool approach could not be applied because a positive intercept (P < 0.05) was identified for lignin ruminal content. The stepwise regression model for MRT estimation had an approximate coefficient of determination and a root mean square error (RMSE) for forage of 0.53 and 9.78 h, respectively, and for concentrate of 0.49 and 5.86 h, respectively. The sensitivity analysis yielded a mean rate of passage (k(p)) value for forage of 0.0322 h(-1) (0.0158 to 0.0556 h(-1)) with 99% confidence interval. For the concentrate, the mean k(p) value was of 0.0334 h(-1) (0.0146 to 0.0570 h(-1)). A heterogeneous ruminal fiber pool should be assumed for goats fed diets with considerable fiber contents. The results of the sensitivity analysis indicated that both λ(r) and k(e) are of similar importance to the rate of passage in goats. The rates of passage of forage and concentrates in goats present a high degree of overlap and are closely related.

  6. Online shaft encoder geometry compensation for arbitrary shaft speed profiles using Bayesian regression

    NASA Astrophysics Data System (ADS)

    Diamond, D. H.; Heyns, P. S.; Oberholster, A. J.

    2016-12-01

    The measurement of instantaneous angular speed is being increasingly investigated for its use in a wide range of condition monitoring and prognostic applications. Central to many measurement techniques are incremental shaft encoders recording the arrival times of shaft angular increments. The conventional approach to processing these signals assumes that the angular increments are equidistant. This assumption is generally incorrect when working with toothed wheels and especially zebra tape encoders and has been shown to introduce errors in the estimated shaft speed. There are some proposed methods in the literature that aim to compensate for this geometric irregularity. Some of the methods require the shaft speed to be perfectly constant for calibration, something rarely achieved in practice. Other methods assume the shaft speed to be nearly constant with minor deviations. Therefore existing methods cannot calibrate the entire shaft encoder geometry for arbitrary shaft speeds. The present article presents a method to calculate the shaft encoder geometry for arbitrary shaft speed profiles. The method uses Bayesian linear regression to calculate the encoder increment distances. The method is derived and then tested against simulated and laboratory experiments. The results indicate that the proposed method is capable of accurately determining the shaft encoder geometry for any shaft speed profile.

  7. Regression modeling of ground-water flow

    USGS Publications Warehouse

    Cooley, R.L.; Naff, R.L.

    1985-01-01

    Nonlinear multiple regression methods are developed to model and analyze groundwater flow systems. Complete descriptions of regression methodology as applied to groundwater flow models allow scientists and engineers engaged in flow modeling to apply the methods to a wide range of problems. Organization of the text proceeds from an introduction that discusses the general topic of groundwater flow modeling, to a review of basic statistics necessary to properly apply regression techniques, and then to the main topic: exposition and use of linear and nonlinear regression to model groundwater flow. Statistical procedures are given to analyze and use the regression models. A number of exercises and answers are included to exercise the student on nearly all the methods that are presented for modeling and statistical analysis. Three computer programs implement the more complex methods. These three are a general two-dimensional, steady-state regression model for flow in an anisotropic, heterogeneous porous medium, a program to calculate a measure of model nonlinearity with respect to the regression parameters, and a program to analyze model errors in computed dependent variables such as hydraulic head. (USGS)

  8. Global and system-specific resting-state fMRI fluctuations are uncorrelated: principal component analysis reveals anti-correlated networks.

    PubMed

    Carbonell, Felix; Bellec, Pierre; Shmuel, Amir

    2011-01-01

    The influence of the global average signal (GAS) on functional-magnetic resonance imaging (fMRI)-based resting-state functional connectivity is a matter of ongoing debate. The global average fluctuations increase the correlation between functional systems beyond the correlation that reflects their specific functional connectivity. Hence, removal of the GAS is a common practice for facilitating the observation of network-specific functional connectivity. This strategy relies on the implicit assumption of a linear-additive model according to which global fluctuations, irrespective of their origin, and network-specific fluctuations are super-positioned. However, removal of the GAS introduces spurious negative correlations between functional systems, bringing into question the validity of previous findings of negative correlations between fluctuations in the default-mode and the task-positive networks. Here we present an alternative method for estimating global fluctuations, immune to the complications associated with the GAS. Principal components analysis was applied to resting-state fMRI time-series. A global-signal effect estimator was defined as the principal component (PC) that correlated best with the GAS. The mean correlation coefficient between our proposed PC-based global effect estimator and the GAS was 0.97±0.05, demonstrating that our estimator successfully approximated the GAS. In 66 out of 68 runs, the PC that showed the highest correlation with the GAS was the first PC. Since PCs are orthogonal, our method provides an estimator of the global fluctuations, which is uncorrelated to the remaining, network-specific fluctuations. Moreover, unlike the regression of the GAS, the regression of the PC-based global effect estimator does not introduce spurious anti-correlations beyond the decrease in seed-based correlation values allowed by the assumed additive model. After regressing this PC-based estimator out of the original time-series, we observed robust anti-correlations between resting-state fluctuations in the default-mode and the task-positive networks. We conclude that resting-state global fluctuations and network-specific fluctuations are uncorrelated, supporting a Resting-State Linear-Additive Model. In addition, we conclude that the network-specific resting-state fluctuations of the default-mode and task-positive networks show artifact-free anti-correlations.

  9. The Application of the Cumulative Logistic Regression Model to Automated Essay Scoring

    ERIC Educational Resources Information Center

    Haberman, Shelby J.; Sinharay, Sandip

    2010-01-01

    Most automated essay scoring programs use a linear regression model to predict an essay score from several essay features. This article applied a cumulative logit model instead of the linear regression model to automated essay scoring. Comparison of the performances of the linear regression model and the cumulative logit model was performed on a…

  10. Hydrologic record extension of water-level data in the Everglades Depth Estimation Network (EDEN), 1991-99

    USGS Publications Warehouse

    Conrads, Paul; Petkewich, Matthew D.; O'Reilly, Andrew M.; Telis, Pamela A.

    2015-01-01

    To hindcast and fill data records, 214 empirical models were developed—189 are linear regression models and 25 are artificial neural network models. The coefficient of determination (R2) for 163 of the models is greater than 0.80 and the median percent model error (root mean square error divided by the range of the measured data) is 5 percent. To evaluate the performance of the hindcast models as a group, contour maps of modeled water-level surfaces at 2-centimeter (cm) intervals were generated using the hindcasted data. The 2-cm contour maps were examined for selected days to verify that water surfaces from the EDEN model are consistent with the input data. The biweekly 2-cm contour maps did show a higher number of issues during days in 1990 as compared to days after 1990. May 1990 had the lowest water levels in the Everglades of the 21-year dataset used for the hindcasting study. To hindcast these record low conditions in 1990, many of the hindcast models would require large extrapolations beyond the range of the predictive quality of the models. For these reasons, it was decided to limit the hindcasted data to the period January 1, 1991, to December 31, 1999. Overall, the hindcasted and gap-filled data are assumed to provide reasonable estimates of station-specific water-level data for an extended historical period to inform research and natural resource management in the Everglades.

  11. Modeling of turbulent supersonic H2-air combustion with an improved joint beta PDF

    NASA Technical Reports Server (NTRS)

    Baurle, R. A.; Hassan, H. A.

    1991-01-01

    Attempts at modeling recent experiments of Cheng et al. indicated that discrepancies between theory and experiment can be a result of the form of assumed probability density function (PDF) and/or the turbulence model employed. Improvements in both the form of the assumed PDF and the turbulence model are presented. The results are again used to compare with measurements. Initial comparisons are encouraging.

  12. Voxelwise multivariate analysis of multimodality magnetic resonance imaging.

    PubMed

    Naylor, Melissa G; Cardenas, Valerie A; Tosun, Duygu; Schuff, Norbert; Weiner, Michael; Schwartzman, Armin

    2014-03-01

    Most brain magnetic resonance imaging (MRI) studies concentrate on a single MRI contrast or modality, frequently structural MRI. By performing an integrated analysis of several modalities, such as structural, perfusion-weighted, and diffusion-weighted MRI, new insights may be attained to better understand the underlying processes of brain diseases. We compare two voxelwise approaches: (1) fitting multiple univariate models, one for each outcome and then adjusting for multiple comparisons among the outcomes and (2) fitting a multivariate model. In both cases, adjustment for multiple comparisons is performed over all voxels jointly to account for the search over the brain. The multivariate model is able to account for the multiple comparisons over outcomes without assuming independence because the covariance structure between modalities is estimated. Simulations show that the multivariate approach is more powerful when the outcomes are correlated and, even when the outcomes are independent, the multivariate approach is just as powerful or more powerful when at least two outcomes are dependent on predictors in the model. However, multiple univariate regressions with Bonferroni correction remain a desirable alternative in some circumstances. To illustrate the power of each approach, we analyze a case control study of Alzheimer's disease, in which data from three MRI modalities are available. Copyright © 2013 Wiley Periodicals, Inc.

  13. A Bayesian Markov-chain-based heteroscedastic regression model for the analysis of 18O-labeled mass spectra.

    PubMed

    Zhu, Qi; Burzykowski, Tomasz

    2011-03-01

    To reduce the influence of the between-spectra variability on the results of peptide quantification, one can consider the (18)O-labeling approach. Ideally, with such labeling technique, a mass shift of 4 Da of the isotopic distributions of peptides from the labeled sample is induced, which allows one to distinguish the two samples and to quantify the relative abundance of the peptides. It is worth noting, however, that the presence of small quantities of (16)O and (17)O atoms during the labeling step can cause incomplete labeling. In practice, ignoring incomplete labeling may result in the biased estimation of the relative abundance of the peptide in the compared samples. A Markov model was developed to address this issue (Zhu, Valkenborg, Burzykowski. J. Proteome Res. 9, 2669-2677, 2010). The model assumed that the peak intensities were normally distributed with heteroscedasticity using a power-of-the-mean variance funtion. Such a dependence has been observed in practice. Alternatively, we formulate the model within the Bayesian framework. This opens the possibility to further extend the model by the inclusion of random effects that can be used to capture the biological/technical variability of the peptide abundance. The operational characteristics of the model were investigated by applications to real-life mass-spectrometry data sets and a simulation study. © American Society for Mass Spectrometry, 2011

  14. Simulated effect of pneumococcal vaccination in the Netherlands on existing rules constructed in a non-vaccinated cohort predicting sequelae after bacterial meningitis

    PubMed Central

    2010-01-01

    Background Previously two prediction rules identifying children at risk of hearing loss and academic or behavioral limitations after bacterial meningitis were developed. Streptococcus pneumoniae as causative pathogen was an important risk factor in both. Since 2006 Dutch children receive seven-valent conjugate vaccination against S. pneumoniae. The presumed effect of vaccination was simulated by excluding all children infected by S. pneumoniae with the serotypes included in the vaccine, from both previous collected cohorts (between 1990-1995). Methods Children infected by one of the vaccine serotypes were excluded from both original cohorts (hearing loss: 70 of 628 children; academic or behavioral limitations: 26 of 182 children). All identified risk factors were included in multivariate logistic regression models. The discriminative ability of both new models was calculated. Results The same risk factors as in the original models were significant. The discriminative ability of the original hearing loss model was 0.84 and of the new model 0.87. In the academic or behavioral limitations model it was 0.83 and 0.84 respectively. Conclusion It can be assumed that the prediction rules will also be applicable on a vaccinated population. However, vaccination does not provide 100% coverage and evidence is available that serotype replacement will occur. The impact of vaccination on serotype replacement needs to be investigated, and the prediction rules must be validated externally. PMID:20815866

  15. A structural equation model relating impaired sensorimotor function, fear of falling and gait patterns in older people.

    PubMed

    Menz, Hylton B; Lord, Stephen R; Fitzpatrick, Richard C

    2007-02-01

    Many falls in older people occur while walking, however the mechanisms responsible for gait instability are poorly understood. Therefore, the aim of this study was to develop a plausible model describing the relationships between impaired sensorimotor function, fear of falling and gait patterns in older people. Temporo-spatial gait parameters and acceleration patterns of the head and pelvis were obtained from 100 community-dwelling older people aged between 75 and 93 years while walking on an irregular walkway. A theoretical model was developed to explain the relationships between these variables, assuming that head stability is a primary output of the postural control system when walking. This model was then tested using structural equation modeling, a statistical technique which enables the testing of a set of regression equations simultaneously. The structural equation model indicated that: (i) reduced step length has a significant direct and indirect association with reduced head stability; (ii) impaired sensorimotor function is significantly associated with reduced head stability, but this effect is largely indirect, mediated by reduced step length, and; (iii) fear of falling is significantly associated with reduced step length, but has little direct influence on head stability. These findings provide useful insights into the possible mechanisms underlying gait characteristics and risk of falling in older people. Particularly important is the indication that fear-related step length shortening may be maladaptive.

  16. Performance and robustness of penalized and unpenalized methods for genetic prediction of complex human disease.

    PubMed

    Abraham, Gad; Kowalczyk, Adam; Zobel, Justin; Inouye, Michael

    2013-02-01

    A central goal of medical genetics is to accurately predict complex disease from genotypes. Here, we present a comprehensive analysis of simulated and real data using lasso and elastic-net penalized support-vector machine models, a mixed-effects linear model, a polygenic score, and unpenalized logistic regression. In simulation, the sparse penalized models achieved lower false-positive rates and higher precision than the other methods for detecting causal SNPs. The common practice of prefiltering SNP lists for subsequent penalized modeling was examined and shown to substantially reduce the ability to recover the causal SNPs. Using genome-wide SNP profiles across eight complex diseases within cross-validation, lasso and elastic-net models achieved substantially better predictive ability in celiac disease, type 1 diabetes, and Crohn's disease, and had equivalent predictive ability in the rest, with the results in celiac disease strongly replicating between independent datasets. We investigated the effect of linkage disequilibrium on the predictive models, showing that the penalized methods leverage this information to their advantage, compared with methods that assume SNP independence. Our findings show that sparse penalized approaches are robust across different disease architectures, producing as good as or better phenotype predictions and variance explained. This has fundamental ramifications for the selection and future development of methods to genetically predict human disease. © 2012 WILEY PERIODICALS, INC.

  17. Moderation analysis using a two-level regression model.

    PubMed

    Yuan, Ke-Hai; Cheng, Ying; Maxwell, Scott

    2014-10-01

    Moderation analysis is widely used in social and behavioral research. The most commonly used model for moderation analysis is moderated multiple regression (MMR) in which the explanatory variables of the regression model include product terms, and the model is typically estimated by least squares (LS). This paper argues for a two-level regression model in which the regression coefficients of a criterion variable on predictors are further regressed on moderator variables. An algorithm for estimating the parameters of the two-level model by normal-distribution-based maximum likelihood (NML) is developed. Formulas for the standard errors (SEs) of the parameter estimates are provided and studied. Results indicate that, when heteroscedasticity exists, NML with the two-level model gives more efficient and more accurate parameter estimates than the LS analysis of the MMR model. When error variances are homoscedastic, NML with the two-level model leads to essentially the same results as LS with the MMR model. Most importantly, the two-level regression model permits estimating the percentage of variance of each regression coefficient that is due to moderator variables. When applied to data from General Social Surveys 1991, NML with the two-level model identified a significant moderation effect of race on the regression of job prestige on years of education while LS with the MMR model did not. An R package is also developed and documented to facilitate the application of the two-level model.

  18. The microcomputer scientific software series 2: general linear model--regression.

    Treesearch

    Harold M. Rauscher

    1983-01-01

    The general linear model regression (GLMR) program provides the microcomputer user with a sophisticated regression analysis capability. The output provides a regression ANOVA table, estimators of the regression model coefficients, their confidence intervals, confidence intervals around the predicted Y-values, residuals for plotting, a check for multicollinearity, a...

  19. A Test of Bayesian Observer Models of Processing in the Eriksen Flanker Task

    ERIC Educational Resources Information Center

    White, Corey N.; Brown, Scott; Ratcliff, Roger

    2012-01-01

    Two Bayesian observer models were recently proposed to account for data from the Eriksen flanker task, in which flanking items interfere with processing of a central target. One model assumes that interference stems from a perceptual bias to process nearby items as if they are compatible, and the other assumes that the interference is due to…

  20. STEADY-STATE MODEL OF SOLAR WIND ELECTRONS REVISITED

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yoon, Peter H.; Kim, Sunjung; Choe, G. S., E-mail: yoonp@umd.edu

    2015-10-20

    In a recent paper, Kim et al. put forth a steady-state model for the solar wind electrons. The model assumed local equilibrium between the halo electrons, characterized by an intermediate energy range, and the whistler-range fluctuations. The basic wave–particle interaction is assumed to be the cyclotron resonance. Similarly, it was assumed that a dynamical steady state is established between the highly energetic superhalo electrons and high-frequency Langmuir fluctuations. Comparisons with the measured solar wind electron velocity distribution function (VDF) during quiet times were also made, and reasonable agreements were obtained. In such a model, however, only the steady-state solution for themore » Fokker–Planck type of electron particle kinetic equation was considered. The present paper complements the previous analysis by considering both the steady-state particle and wave kinetic equations. It is shown that the model halo and superhalo electron VDFs, as well as the assumed wave intensity spectra for the whistler and Langmuir fluctuations, approximately satisfy the quasi-linear wave kinetic equations in an approximate sense, thus further validating the local equilibrium model constructed in the paper by Kim et al.« less

  1. Assumed oxygen consumption based on calculation from dye dilution cardiac output: an improved formula.

    PubMed

    Bergstra, A; van Dijk, R B; Hillege, H L; Lie, K I; Mook, G A

    1995-05-01

    This study was performed because of observed differences between dye dilution cardiac output and the Fick cardiac output, calculated from estimated oxygen consumption according to LaFarge and Miettinen, and to find a better formula for assumed oxygen consumption. In 250 patients who underwent left and right heart catheterization, the oxygen consumption VO2 (ml.min-1) was calculated using Fick's principle. Either pulmonary or systemic flow, as measured by dye dilution, was used in combination with the concordant arteriovenous oxygen concentration difference. In 130 patients, who matched the age of the LaFarge and Miettinen population, the obtained values of oxygen consumption VO2(dd) were compared with the estimated oxygen consumption values VO2(lfm), found using the LaFarge and Miettinen formulae. The VO2(lfm) was significantly lower than VO2(dd); -21.8 +/- 29.3 ml.min-1 (mean +/- SD), P < 0.001, 95% confidence interval (95% CI) -26.9 to -16.7, limits of agreement (LA) -80.4 to 36.9. A new regression formula for the assumed oxygen consumption VO2(ass) was derived in 250 patients by stepwise multiple regression analysis. The VO2(dd) was used as a dependent variable, and body surface area BSA (m2). Sex (0 for female, 1 for male), Age (years), Heart rate (min-1) and the presence of a left to right shunt as independent variables. The best fitting formula is expressed as: VO2(ass) = (157.3 x BSA + 10.0 x Sex - 10.5 x In Age + 4.8) ml.min-1, where ln Age = the natural logarithm of the age. This formula was validated prospectively in 60 patients. A non-significant difference between VO2(ass) and VO2(dd) was found; mean 2.0 +/- 23.4 ml.min-1, P = 0.771, 95% Cl = -4.0 to +8.0, LA -44.7 to +48.7. In conclusion, assumed oxygen consumption values, using our new formula, are in better agreement with the actual values than those found according to LaFarge and Miettinen's formulae.

  2. Climate variations and salmonellosis transmission in Adelaide, South Australia: a comparison between regression models

    NASA Astrophysics Data System (ADS)

    Zhang, Ying; Bi, Peng; Hiller, Janet

    2008-01-01

    This is the first study to identify appropriate regression models for the association between climate variation and salmonellosis transmission. A comparison between different regression models was conducted using surveillance data in Adelaide, South Australia. By using notified salmonellosis cases and climatic variables from the Adelaide metropolitan area over the period 1990-2003, four regression methods were examined: standard Poisson regression, autoregressive adjusted Poisson regression, multiple linear regression, and a seasonal autoregressive integrated moving average (SARIMA) model. Notified salmonellosis cases in 2004 were used to test the forecasting ability of the four models. Parameter estimation, goodness-of-fit and forecasting ability of the four regression models were compared. Temperatures occurring 2 weeks prior to cases were positively associated with cases of salmonellosis. Rainfall was also inversely related to the number of cases. The comparison of the goodness-of-fit and forecasting ability suggest that the SARIMA model is better than the other three regression models. Temperature and rainfall may be used as climatic predictors of salmonellosis cases in regions with climatic characteristics similar to those of Adelaide. The SARIMA model could, thus, be adopted to quantify the relationship between climate variations and salmonellosis transmission.

  3. Quasi-Likelihood Techniques in a Logistic Regression Equation for Identifying Simulium damnosum s.l. Larval Habitats Intra-cluster Covariates in Togo.

    PubMed

    Jacob, Benjamin G; Novak, Robert J; Toe, Laurent; Sanfo, Moussa S; Afriyie, Abena N; Ibrahim, Mohammed A; Griffith, Daniel A; Unnasch, Thomas R

    2012-01-01

    The standard methods for regression analyses of clustered riverine larval habitat data of Simulium damnosum s.l. a major black-fly vector of Onchoceriasis, postulate models relating observational ecological-sampled parameter estimators to prolific habitats without accounting for residual intra-cluster error correlation effects. Generally, this correlation comes from two sources: (1) the design of the random effects and their assumed covariance from the multiple levels within the regression model; and, (2) the correlation structure of the residuals. Unfortunately, inconspicuous errors in residual intra-cluster correlation estimates can overstate precision in forecasted S.damnosum s.l. riverine larval habitat explanatory attributes regardless how they are treated (e.g., independent, autoregressive, Toeplitz, etc). In this research, the geographical locations for multiple riverine-based S. damnosum s.l. larval ecosystem habitats sampled from 2 pre-established epidemiological sites in Togo were identified and recorded from July 2009 to June 2010. Initially the data was aggregated into proc genmod. An agglomerative hierarchical residual cluster-based analysis was then performed. The sampled clustered study site data was then analyzed for statistical correlations using Monthly Biting Rates (MBR). Euclidean distance measurements and terrain-related geomorphological statistics were then generated in ArcGIS. A digital overlay was then performed also in ArcGIS using the georeferenced ground coordinates of high and low density clusters stratified by Annual Biting Rates (ABR). This data was overlain onto multitemporal sub-meter pixel resolution satellite data (i.e., QuickBird 0.61m wavbands ). Orthogonal spatial filter eigenvectors were then generated in SAS/GIS. Univariate and non-linear regression-based models (i.e., Logistic, Poisson and Negative Binomial) were also employed to determine probability distributions and to identify statistically significant parameter estimators from the sampled data. Thereafter, Durbin-Watson test statistics were used to test the null hypothesis that the regression residuals were not autocorrelated against the alternative that the residuals followed an autoregressive process in AUTOREG. Bayesian uncertainty matrices were also constructed employing normal priors for each of the sampled estimators in PROC MCMC. The residuals revealed both spatially structured and unstructured error effects in the high and low ABR-stratified clusters. The analyses also revealed that the estimators, levels of turbidity and presence of rocks were statistically significant for the high-ABR-stratified clusters, while the estimators distance between habitats and floating vegetation were important for the low-ABR-stratified cluster. Varying and constant coefficient regression models, ABR- stratified GIS-generated clusters, sub-meter resolution satellite imagery, a robust residual intra-cluster diagnostic test, MBR-based histograms, eigendecomposition spatial filter algorithms and Bayesian matrices can enable accurate autoregressive estimation of latent uncertainity affects and other residual error probabilities (i.e., heteroskedasticity) for testing correlations between georeferenced S. damnosum s.l. riverine larval habitat estimators. The asymptotic distribution of the resulting residual adjusted intra-cluster predictor error autocovariate coefficients can thereafter be established while estimates of the asymptotic variance can lead to the construction of approximate confidence intervals for accurately targeting productive S. damnosum s.l habitats based on spatiotemporal field-sampled count data.

  4. Judging complex movement performances for excellence: a principal components analysis-based technique applied to competitive diving.

    PubMed

    Young, Cole; Reinkensmeyer, David J

    2014-08-01

    Athletes rely on subjective assessment of complex movements from coaches and judges to improve their motor skills. In some sports, such as diving, snowboard half pipe, gymnastics, and figure skating, subjective scoring forms the basis for competition. It is currently unclear whether this scoring process can be mathematically modeled; doing so could provide insight into what motor skill is. Principal components analysis has been proposed as a motion analysis method for identifying fundamental units of coordination. We used PCA to analyze movement quality of dives taken from USA Diving's 2009 World Team Selection Camp, first identifying eigenpostures associated with dives, and then using the eigenpostures and their temporal weighting coefficients, as well as elements commonly assumed to affect scoring - gross body path, splash area, and board tip motion - to identify eigendives. Within this eigendive space we predicted actual judges' scores using linear regression. This technique rated dives with accuracy comparable to the human judges. The temporal weighting of the eigenpostures, body center path, splash area, and board tip motion affected the score, but not the eigenpostures themselves. These results illustrate that (1) subjective scoring in a competitive diving event can be mathematically modeled; (2) the elements commonly assumed to affect dive scoring actually do affect scoring (3) skill in elite diving is more associated with the gross body path and the effect of the movement on the board and water than the units of coordination that PCA extracts, which might reflect the high level of technique these divers had achieved. We also illustrate how eigendives can be used to produce dive animations that an observer can distort continuously from poor to excellent, which is a novel approach to performance visualization. Copyright © 2014 Elsevier B.V. All rights reserved.

  5. [Evaluation of estimation of prevalence ratio using bayesian log-binomial regression model].

    PubMed

    Gao, W L; Lin, H; Liu, X N; Ren, X W; Li, J S; Shen, X P; Zhu, S L

    2017-03-10

    To evaluate the estimation of prevalence ratio ( PR ) by using bayesian log-binomial regression model and its application, we estimated the PR of medical care-seeking prevalence to caregivers' recognition of risk signs of diarrhea in their infants by using bayesian log-binomial regression model in Openbugs software. The results showed that caregivers' recognition of infant' s risk signs of diarrhea was associated significantly with a 13% increase of medical care-seeking. Meanwhile, we compared the differences in PR 's point estimation and its interval estimation of medical care-seeking prevalence to caregivers' recognition of risk signs of diarrhea and convergence of three models (model 1: not adjusting for the covariates; model 2: adjusting for duration of caregivers' education, model 3: adjusting for distance between village and township and child month-age based on model 2) between bayesian log-binomial regression model and conventional log-binomial regression model. The results showed that all three bayesian log-binomial regression models were convergence and the estimated PRs were 1.130(95 %CI : 1.005-1.265), 1.128(95 %CI : 1.001-1.264) and 1.132(95 %CI : 1.004-1.267), respectively. Conventional log-binomial regression model 1 and model 2 were convergence and their PRs were 1.130(95 % CI : 1.055-1.206) and 1.126(95 % CI : 1.051-1.203), respectively, but the model 3 was misconvergence, so COPY method was used to estimate PR , which was 1.125 (95 %CI : 1.051-1.200). In addition, the point estimation and interval estimation of PRs from three bayesian log-binomial regression models differed slightly from those of PRs from conventional log-binomial regression model, but they had a good consistency in estimating PR . Therefore, bayesian log-binomial regression model can effectively estimate PR with less misconvergence and have more advantages in application compared with conventional log-binomial regression model.

  6. Stochastic and deterministic model of microbial heat inactivation.

    PubMed

    Corradini, Maria G; Normand, Mark D; Peleg, Micha

    2010-03-01

    Microbial inactivation is described by a model based on the changing survival probabilities of individual cells or spores. It is presented in a stochastic and discrete form for small groups, and as a continuous deterministic model for larger populations. If the underlying mortality probability function remains constant throughout the treatment, the model generates first-order ("log-linear") inactivation kinetics. Otherwise, it produces survival patterns that include Weibullian ("power-law") with upward or downward concavity, tailing with a residual survival level, complete elimination, flat "shoulder" with linear or curvilinear continuation, and sigmoid curves. In both forms, the same algorithm or model equation applies to isothermal and dynamic heat treatments alike. Constructing the model does not require assuming a kinetic order or knowledge of the inactivation mechanism. The general features of its underlying mortality probability function can be deduced from the experimental survival curve's shape. Once identified, the function's coefficients, the survival parameters, can be estimated directly from the experimental survival ratios by regression. The model is testable in principle but matching the estimated mortality or inactivation probabilities with those of the actual cells or spores can be a technical challenge. The model is not intended to replace current models to calculate sterility. Its main value, apart from connecting the various inactivation patterns to underlying probabilities at the cellular level, might be in simulating the irregular survival patterns of small groups of cells and spores. In principle, it can also be used for nonthermal methods of microbial inactivation and their combination with heat.

  7. Temporal variation and scaling of parameters for a monthly hydrologic model

    NASA Astrophysics Data System (ADS)

    Deng, Chao; Liu, Pan; Wang, Dingbao; Wang, Weiguang

    2018-03-01

    The temporal variation of model parameters is affected by the catchment conditions and has a significant impact on hydrological simulation. This study aims to evaluate the seasonality and downscaling of model parameter across time scales based on monthly and mean annual water balance models with a common model framework. Two parameters of the monthly model, i.e., k and m, are assumed to be time-variant at different months. Based on the hydrological data set from 121 MOPEX catchments in the United States, we firstly analyzed the correlation between parameters (k and m) and catchment properties (NDVI and frequency of rainfall events, α). The results show that parameter k is positively correlated with NDVI or α, while the correlation is opposite for parameter m, indicating that precipitation and vegetation affect monthly water balance by controlling temporal variation of parameters k and m. The multiple linear regression is then used to fit the relationship between ε and the means and coefficient of variations of parameters k and m. Based on the empirical equation and the correlations between the time-variant parameters and NDVI, the mean annual parameter ε is downscaled to monthly k and m. The results show that it has lower NSEs than these from model with time-variant k and m being calibrated through SCE-UA, while for several study catchments, it has higher NSEs than that of the model with constant parameters. The proposed method is feasible and provides a useful tool for temporal scaling of model parameter.

  8. Contribution of calcaneal and leg segment rotations to ankle joint dorsiflexion in a weight-bearing task.

    PubMed

    Chizewski, Michael G; Chiu, Loren Z F

    2012-05-01

    Joint angle is the relative rotation between two segments where one is a reference and assumed to be non-moving. However, rotation of the reference segment will influence the system's spatial orientation and joint angle. The purpose of this investigation was to determine the contribution of leg and calcaneal rotations to ankle rotation in a weight-bearing task. Forty-eight individuals performed partial squats recorded using a 3D motion capture system. Markers on the calcaneus and leg were used to model leg and calcaneal segment, and ankle joint rotations. Multiple linear regression was used to determine the contribution of leg and calcaneal segment rotations to ankle joint dorsiflexion. Regression models for left (R(2)=0.97) and right (R(2)=0.97) ankle dorsiflexion were significant. Sagittal plane leg rotation had a positive influence (left: β=1.411; right: β=1.418) while sagittal plane calcaneal rotation had a negative influence (left: β=-0.573; right: β=-0.650) on ankle dorsiflexion. Sagittal plane rotations of the leg and calcaneus were positively correlated (left: r=0.84, P<0.001; right: r=0.80, P<0.001). During a partial squat, the calcaneus rotates forward. Simultaneous forward calcaneal rotation with ankle dorsiflexion reduces total ankle dorsiflexion angle. Rear foot posture is reoriented during a partial squat, allowing greater leg rotation in the sagittal plane. Segment rotations may provide greater insight into movement mechanics that cannot be explained via joint rotations alone. Copyright © 2012 Elsevier B.V. All rights reserved.

  9. The impact of Gonoshasthaya Kendra's Micro Health Insurance plan on antenatal care among poor women in rural Bangladesh.

    PubMed

    Islam, Mohammad Touhidul; Igarashi, Isao; Kawabuchi, Koichi

    2012-08-01

    Low utilization of antenatal care (ANC) by pregnant women, particularly in rural areas, is an obstacle to ensuring safe motherhood in Bangladesh. Currently, Micro Health Insurance (MHI) is being considered in many developing countries as a potential method for assuring greater access to health care, especially for the poor. So far, there is only limited evidence evaluating MHI schemes. This study assesses the impact of MHI administered by Gonoshasthaya Kendra (GK) on ANC utilization by poor women in rural Bangladesh. We conducted a questionnaire survey and collected 321 valid responses from women enrolled in GK's MHI scheme and 271 from women not enrolled in any health insurance plan. We used a two-part model in which dependent variables were whether or not women utilized ANC and the number of times ANC was used. The model consisted of logistic regression analysis and ordinary least squares regression analysis. The main independent variables were dummies for socioeconomic classes according to GK, each of which represented the premiums and co-payments charged by class. The results showed that destitute, ultra-poor, and poor women enrolled in MHI used ANC significantly more than women not enrolled in health insurance. Women enrolled in MHI, except for those who were destitute or ultra-poor, utilized ANC significantly more times than women not enrolled in health insurance. We assume that GK's sliding premium and co-payment scales are key to ANC utilization by women. Expanding the MHI scheme may enhance ANC utilization among poor women in rural Bangladesh.

  10. Exposure Knowledge and Risk Perception of RF EMF

    PubMed Central

    Freudenstein, Frederik; Wiedemann, Peter M.; Varsier, Nadège

    2015-01-01

    The presented study is part of the EU-Project Low EMF Exposure Future Networks (LEXNET), which deals among other things with the issue of whether a reduction of the radiofrequency (RF) electro-magnetic fields (EMF) exposure will result in more acceptance of wireless communication networks in the public sphere. We assume that the effects of any reduction of EMF exposure will depend on the subjective link between exposure perception and risk perception (RP). Therefore we evaluated respondents’ RP of different RF EMF sources and their subjective knowledge about various exposure characteristics with regard to their impact on potential health risks. The results show that participants are more concerned about base stations than about all other RF EMF sources. Concerning the subjective exposure knowledge the results suggest that people have a quite appropriate impact model. The question how RF EMF RP is actually affected by the knowledge about the various exposure characteristics was tested in a linear regression analysis. The regression indicates that these features – except distance – do influence people’s general RF EMF RP. In addition, we analyzed the effect of the quality of exposure knowledge on RF EMF RP of various sources. The results show a tendency that better exposure knowledge leads to higher RP, especially for mobile phones. The study provides empirical support for models of the relationships between exposure perception and RP. It is not the aim to extrapolate these findings to the whole population because the samples are not exactly representative for the general public in the participating countries. PMID:25629026

  11. Factors associated with dental health care coverage in Mexico: Findings from the National Performance Evaluation Survey 2002-2003.

    PubMed

    Pérez-Núñez, Ricardo; Medina-Solis, Carlo Eduardo; Maupomé, Gerardo; Vargas-Palacios, Armando

    2006-10-01

    To determine the level of dental health care coverage in people aged > or =18 years across the country, and to identify the factors associated with coverage. Using the instruments and sampling strategies developed by the World Health Organization for the World Health Survey, a cross-sectional national survey was carried out at the household and individual (adult) levels. Dental data were collected in 20 of Mexico's 32 states. The relationship between coverage and environmental and individual characteristics was examined through logistic regression models. Only 6098 of 24 159 individual respondents reported having oral problems during the preceding 12 months (accounting for 14 284 621 inhabitants of the country if weighted). Only 48% of respondents reporting problems were covered, although details of the appropriateness, timeliness and effectiveness of the intervention(s) were not assessed. The multivariate regression model showed that higher level of education, better socioeconomic status, having at least one chronic disease and having medical insurance were positively associated with better dental care coverage. Age and sex were also associated. Overall dental health care coverage could be improved, assuming that ideal coverage is 100%. Some equality of access issues are apparent because there are differences in coverage across populations in terms of wealth and social status. Identifying the factors associated with sparse coverage is a step in the right direction allowing policymakers to establish strategies aimed at increasing this coverage, focusing on more vulnerable groups and on individuals in greater need of preventive and rehabilitative interventions.

  12. Animal Ownership and Touching Enrich the Context of Social Contacts Relevant to the Spread of Human Infectious Diseases.

    PubMed

    Kifle, Yimer Wasihun; Goeyvaerts, Nele; Van Kerckhove, Kim; Willem, Lander; Kucharski, Adam; Faes, Christel; Leirs, Herwig; Hens, Niel; Beutels, Philippe

    2015-01-01

    Many human infectious diseases originate from animals or are transmitted through animal vectors. We aimed to identify factors that are predictive of ownership and touching of animals, assess whether animal ownership influences social contact behavior, and estimate the probability of a major zoonotic outbreak should a transmissible influenza-like pathogen be present in animals, all in the setting of a densely populated European country. A diary-based social contact survey (n = 1768) was conducted in Flanders, Belgium, from September 2010 until February 2011. Many participants touched pets (46%), poultry (2%) or livestock (2%) on a randomly assigned day, and a large proportion of participants owned such animals (51%, 15% and 5%, respectively). Logistic regression models indicated that larger households are more likely to own an animal and, unsurprisingly, that animal owners are more likely to touch animals. We observed a significant effect of age on animal ownership and touching. The total number of social contacts during a randomly assigned day was modeled using weighted-negative binomial regression. Apart from age, household size and day type (weekend versus weekday and regular versus holiday period), animal ownership was positively associated with the total number of social contacts during the weekend. Assuming that animal ownership and/or touching are at-risk events, we demonstrate a method to estimate the outbreak potential of zoonoses. We show that in Belgium animal-human interactions involving young children (0-9 years) and adults (25-54 years) have the highest potential to cause a major zoonotic outbreak.

  13. Animal Ownership and Touching Enrich the Context of Social Contacts Relevant to the Spread of Human Infectious Diseases

    PubMed Central

    Kifle, Yimer Wasihun; Goeyvaerts, Nele; Van Kerckhove, Kim; Willem, Lander; Faes, Christel; Leirs, Herwig; Hens, Niel; Beutels, Philippe

    2015-01-01

    Many human infectious diseases originate from animals or are transmitted through animal vectors. We aimed to identify factors that are predictive of ownership and touching of animals, assess whether animal ownership influences social contact behavior, and estimate the probability of a major zoonotic outbreak should a transmissible influenza-like pathogen be present in animals, all in the setting of a densely populated European country. A diary-based social contact survey (n = 1768) was conducted in Flanders, Belgium, from September 2010 until February 2011. Many participants touched pets (46%), poultry (2%) or livestock (2%) on a randomly assigned day, and a large proportion of participants owned such animals (51%, 15% and 5%, respectively). Logistic regression models indicated that larger households are more likely to own an animal and, unsurprisingly, that animal owners are more likely to touch animals. We observed a significant effect of age on animal ownership and touching. The total number of social contacts during a randomly assigned day was modeled using weighted-negative binomial regression. Apart from age, household size and day type (weekend versus weekday and regular versus holiday period), animal ownership was positively associated with the total number of social contacts during the weekend. Assuming that animal ownership and/or touching are at-risk events, we demonstrate a method to estimate the outbreak potential of zoonoses. We show that in Belgium animal-human interactions involving young children (0–9 years) and adults (25–54 years) have the highest potential to cause a major zoonotic outbreak. PMID:26193480

  14. Proposal for Professional Development Schools.

    ERIC Educational Resources Information Center

    Rajuan, Maureen

    This paper discusses the reluctance of Israeli inservice teachers to assume the role of mentor to student teachers in their classrooms, proposing an alternative Professional Development School (PDS) model as a starting point for rethinking ways to recruit teachers into this role. In this model, two student teachers assume full responsibility for 1…

  15. Extensions of Rasch's Multiplicative Poisson Model.

    ERIC Educational Resources Information Center

    Jansen, Margo G. H.; van Duijn, Marijtje A. J.

    1992-01-01

    A model developed by G. Rasch that assumes scores on some attainment tests can be realizations of a Poisson process is explained and expanded by assuming a prior distribution, with fixed but unknown parameters, for the subject parameters. How additional between-subject and within-subject factors can be incorporated is discussed. (SLD)

  16. Neighboring and Urbanism: Commonality versus Friendship.

    ERIC Educational Resources Information Center

    Silverman, Carol J.

    1986-01-01

    Examines a dimension of neighboring that need not assume friendship as the role model. When the model assumes only a sense of connectedness as defining neighboring, then the residential correlation, shown in many studies between urbanism and neighboring, disappears. Theories of neighboring, study variables, methods, and analysis are discussed.…

  17. Evaluation of weighted regression and sample size in developing a taper model for loblolly pine

    Treesearch

    Kenneth L. Cormier; Robin M. Reich; Raymond L. Czaplewski; William A. Bechtold

    1992-01-01

    A stem profile model, fit using pseudo-likelihood weighted regression, was used to estimate merchantable volume of loblolly pine (Pinus taeda L.) in the southeast. The weighted regression increased model fit marginally, but did not substantially increase model performance. In all cases, the unweighted regression models performed as well as the...

  18. Sample selection and spatial models of housing price indexes, and, A disequilibrium analysis of the U.S. gasoline market using panel data

    NASA Astrophysics Data System (ADS)

    Hu, Haixin

    This dissertation consists of two parts. The first part studies the sample selection and spatial models of housing price index using transaction data on detached single-family houses of two California metropolitan areas from 1990 through 2008. House prices are often spatially correlated due to shared amenities, or when the properties are viewed as close substitutes in a housing submarket. There have been many studies that address spatial correlation in the context of housing markets. However, none has used spatial models to construct housing price indexes at zip code level for the entire time period analyzed in this dissertation to the best of my knowledge. In this paper, I study a first-order autoregressive spatial model with four different weighing matrix schemes. Four sets of housing price indexes are constructed accordingly. Gatzlaff and Haurin (1997, 1998) study the sample selection problem in housing index by using Heckman's two-step method. This method, however, is generally inefficient and can cause multicollinearity problem. Also, it requires data on unsold houses in order to carry out the first-step probit regression. Maximum likelihood (ML) method can be used to estimate a truncated incidental model which allows one to correct for sample selection based on transaction data only. However, convergence problem is very prevalent in practice. In this paper I adopt Lewbel's (2007) sample selection correction method which does not require one to model or estimate the selection model, except for some very general assumptions. I then extend this method to correct for spatial correlation. In the second part, I analyze the U.S. gasoline market with a disequilibrium model that allows lagged-latent variables, endogenous prices, and panel data with fixed effects. Most existing studies (see the survey of Espey, 1998, Energy Economics) of the gasoline market assume equilibrium. In practice, however, prices do not always adjust fast enough to clear the market. Equilibrium assumptions greatly simplify statistical inference, but are very restrictive and can produce conflicting estimates. For example, econometric models of markets that assume equilibrium often produce more elastic demand price elasticity than their disequilibrium counterparts (Holt and Johnson, 1989, Review of Economics and Statistics, Oczkowski, 1998, Economics Letters). The few studies that allow disequilibrium, however, have been limited to macroeconomic time-series data without lagged-latent variables. While time series data allows one to investigate national trends, it cannot be used to identify and analyze regional differences and the role of local markets. Exclusion of the lagged-latent variables is also undesirable because such variables capture adjustment costs and inter-temporal spillovers. Simulation methods offer tractable solutions to dynamic and panel data disequilibrium models (Lee, 1997, Journal of Econometrics), but assume normally distributed errors. This paper compares estimates of price/income elasticity and excess supply/demand across time periods, regions, and model specifications, using both equilibrium and disequilibrium methods. In the equilibrium model, I compare the within group estimator with Anderson and Hsiao's first-difference 2SLS estimator. In the disequilibrium model, I extend Amemiya's 2SLS by using Newey's efficient estimator with optimal instruments.

  19. Parameters Estimation of Geographically Weighted Ordinal Logistic Regression (GWOLR) Model

    NASA Astrophysics Data System (ADS)

    Zuhdi, Shaifudin; Retno Sari Saputro, Dewi; Widyaningsih, Purnami

    2017-06-01

    A regression model is the representation of relationship between independent variable and dependent variable. The dependent variable has categories used in the logistic regression model to calculate odds on. The logistic regression model for dependent variable has levels in the logistics regression model is ordinal. GWOLR model is an ordinal logistic regression model influenced the geographical location of the observation site. Parameters estimation in the model needed to determine the value of a population based on sample. The purpose of this research is to parameters estimation of GWOLR model using R software. Parameter estimation uses the data amount of dengue fever patients in Semarang City. Observation units used are 144 villages in Semarang City. The results of research get GWOLR model locally for each village and to know probability of number dengue fever patient categories.

  20. Modelling fourier regression for time series data- a case study: modelling inflation in foods sector in Indonesia

    NASA Astrophysics Data System (ADS)

    Prahutama, Alan; Suparti; Wahyu Utami, Tiani

    2018-03-01

    Regression analysis is an analysis to model the relationship between response variables and predictor variables. The parametric approach to the regression model is very strict with the assumption, but nonparametric regression model isn’t need assumption of model. Time series data is the data of a variable that is observed based on a certain time, so if the time series data wanted to be modeled by regression, then we should determined the response and predictor variables first. Determination of the response variable in time series is variable in t-th (yt), while the predictor variable is a significant lag. In nonparametric regression modeling, one developing approach is to use the Fourier series approach. One of the advantages of nonparametric regression approach using Fourier series is able to overcome data having trigonometric distribution. In modeling using Fourier series needs parameter of K. To determine the number of K can be used Generalized Cross Validation method. In inflation modeling for the transportation sector, communication and financial services using Fourier series yields an optimal K of 120 parameters with R-square 99%. Whereas if it was modeled by multiple linear regression yield R-square 90%.

  1. Three region steam drum model for a nuclear power plant simulator (BRENDA). Technical report 1 Oct 80-May 81. [LMFBR

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Slovik, G.C.

    1981-08-01

    A new three region steam drum model has been developed. This model differs from previous works for it assumes the existence of three regions within the steam drum: a steam region, a mid region (assumed to be under saturation conditions at steady state), and a bottom region (having a mixed mean subcooled enthalpy).

  2. Evaluating external nutrient and suspended-sediment loads to Upper Klamath Lake, Oregon, using surrogate regressions with real-time turbidity and acoustic backscatter data

    USGS Publications Warehouse

    Schenk, Liam N.; Anderson, Chauncey W.; Diaz, Paul; Stewart, Marc A.

    2016-12-22

    Executive SummarySuspended-sediment and total phosphorus loads were computed for two sites in the Upper Klamath Basin on the Wood and Williamson Rivers, the two main tributaries to Upper Klamath Lake. High temporal resolution turbidity and acoustic backscatter data were used to develop surrogate regression models to compute instantaneous concentrations and loads on these rivers. Regression models for the Williamson River site showed strong correlations of turbidity with total phosphorus and suspended-sediment concentrations (adjusted coefficients of determination [Adj R2]=0.73 and 0.95, respectively). Regression models for the Wood River site had relatively poor, although statistically significant, relations of turbidity with total phosphorus, and turbidity and acoustic backscatter with suspended sediment concentration, with high prediction uncertainty. Total phosphorus loads for the partial 2014 water year (excluding October and November 2013) were 39 and 28 metric tons for the Williamson and Wood Rivers, respectively. These values are within the low range of phosphorus loads computed for these rivers from prior studies using water-quality data collected by the Klamath Tribes. The 2014 partial year total phosphorus loads on the Williamson and Wood Rivers are assumed to be biased low because of the absence of data from the first 2 months of water year 2014, and the drought conditions that were prevalent during that water year. Therefore, total phosphorus and suspended-sediment loads in this report should be considered as representative of a low-water year for the two study sites. Comparing loads from the Williamson and Wood River monitoring sites for November 2013–September 2014 shows that the Williamson and Sprague Rivers combined, as measured at the Williamson River site, contributed substantially more suspended sediment to Upper Klamath Lake than the Wood River, with 4,360 and 1,450 metric tons measured, respectively.Surrogate techniques have proven useful at the two study sites, particularly in using turbidity to compute suspended-sediment concentrations in the Williamson River. This proof-of-concept effort for computing total phosphorus concentrations using turbidity at the Williamson and Wood River sites also has shown that with additional samples over a wide range of flow regimes, high-temporal-resolution total phosphorus loads can be estimated on a daily, monthly, and annual basis, along with uncertainties for total phosphorus and suspended-sediment concentrations computed using regression models. Sediment-corrected backscatter at the Wood River has potential for estimating suspended-sediment loads from the Wood River Valley as well, with additional analysis of the variable streamflow measured at that site. Suspended-sediment and total phosphorus loads with a high level of temporal resolution will be useful to water managers, restoration practitioners, and scientists in the Upper Klamath Basin working toward the common goal of decreasing nutrient and sediment loads in Upper Klamath Lake.

  3. Machine learning methods as a tool to analyse incomplete or irregularly sampled radon time series data.

    PubMed

    Janik, M; Bossew, P; Kurihara, O

    2018-07-15

    Machine learning is a class of statistical techniques which has proven to be a powerful tool for modelling the behaviour of complex systems, in which response quantities depend on assumed controls or predictors in a complicated way. In this paper, as our first purpose, we propose the application of machine learning to reconstruct incomplete or irregularly sampled data of time series indoor radon ( 222 Rn). The physical assumption underlying the modelling is that Rn concentration in the air is controlled by environmental variables such as air temperature and pressure. The algorithms "learn" from complete sections of multivariate series, derive a dependence model and apply it to sections where the controls are available, but not the response (Rn), and in this way complete the Rn series. Three machine learning techniques are applied in this study, namely random forest, its extension called the gradient boosting machine and deep learning. For a comparison, we apply the classical multiple regression in a generalized linear model version. Performance of the models is evaluated through different metrics. The performance of the gradient boosting machine is found to be superior to that of the other techniques. By applying learning machines, we show, as our second purpose, that missing data or periods of Rn series data can be reconstructed and resampled on a regular grid reasonably, if data of appropriate physical controls are available. The techniques also identify to which degree the assumed controls contribute to imputing missing Rn values. Our third purpose, though no less important from the viewpoint of physics, is identifying to which degree physical, in this case environmental variables, are relevant as Rn predictors, or in other words, which predictors explain most of the temporal variability of Rn. We show that variables which contribute most to the Rn series reconstruction, are temperature, relative humidity and day of the year. The first two are physical predictors, while "day of the year" is a statistical proxy or surrogate for missing or unknown predictors. Copyright © 2018 Elsevier B.V. All rights reserved.

  4. Development of a Method to Downscale Soil Moisture Estimates Based on Topography and Other Site Characteristics

    DTIC Science & Technology

    2015-09-22

    to relate to spatial variations in evapotranspiration (Western et al., 1999). All of these attributes are standardized and regressed against each...infiltration F, deep drainage or recharge to groundwater G, lateral flow L, and evapotranspiration E. The water balance is assumed to be at equilibrium...Temperature and insolation measurements were collected to determine the potential evapotranspiration (PET), which was calculated for each location type

  5. Estimating challenge load due to disease outbreaks and other challenges using reproduction records of sows.

    PubMed

    Mathur, P K; Herrero-Medrano, J M; Alexandri, P; Knol, E F; ten Napel, J; Rashidi, H; Mulder, H A

    2014-12-01

    A method was developed and tested to estimate challenge load due to disease outbreaks and other challenges in sows using reproduction records. The method was based on reproduction records from a farm with known disease outbreaks. It was assumed that the reduction in weekly reproductive output within a farm is proportional to the magnitude of the challenge. As the challenge increases beyond certain threshold, it is manifested as an outbreak. The reproduction records were divided into 3 datasets. The first dataset called the Training dataset consisted of 57,135 reproduction records from 10,901 sows from 1 farm in Canada with several outbreaks of porcine reproductive and respiratory syndrome (PRRS). The known disease status of sows was regressed on the traits number born alive, number of losses as a combination of still birth and mummified piglets, and number of weaned piglets. The regression coefficients from this analysis were then used as weighting factors for derivation of an index measure called challenge load indicator. These weighting factors were derived with i) a two-step approach using residuals or year-week solutions estimated from a previous step, and ii) a single-step approach using the trait values directly. Two types of models were used for each approach: a logistic regression model and a general additive model. The estimates of challenge load indicator were then compared based on their ability to detect PRRS outbreaks in a Test dataset consisting of records from 65,826 sows from 15 farms in the Netherlands. These farms differed from the Canadian farm with respect to PRRS virus strains, severity and frequency of outbreaks. The single-step approach using a general additive model was best and detected 14 out of the 15 outbreaks. This approach was then further validated using the third dataset consisting of reproduction records of 831,855 sows in 431 farms located in different countries in Europe and America. A total of 41 out of 48 outbreaks detected using data analysis were confirmed based on diagnostic information received from the farms. Among these, 30 outbreaks were due to PRRS while 11 were due to other diseases and challenging conditions. The results suggest that proposed method could be useful for estimation of challenge load and detection of challenge phases such as disease outbreaks.

  6. A dual-loop model of the human controller in single-axis tracking tasks

    NASA Technical Reports Server (NTRS)

    Hess, R. A.

    1977-01-01

    A dual loop model of the human controller in single axis compensatory tracking tasks is introduced. This model possesses an inner-loop closure which involves feeding back that portion of the controlled element output rate which is due to control activity. The sensory inputs to the human controller are assumed to be system error and control force. The former is assumed to be sensed via visual, aural, or tactile displays while the latter is assumed to be sensed in kinesthetic fashion. A nonlinear form of the model is briefly discussed. This model is then linearized and parameterized. A set of general adaptive characteristics for the parameterized model is hypothesized. These characteristics describe the manner in which the parameters in the linearized model will vary with such things as display quality. It is demonstrated that the parameterized model can produce controller describing functions which closely approximate those measured in laboratory tracking tasks for a wide variety of controlled elements.

  7. Generalized Processing Tree Models: Jointly Modeling Discrete and Continuous Variables.

    PubMed

    Heck, Daniel W; Erdfelder, Edgar; Kieslich, Pascal J

    2018-05-24

    Multinomial processing tree models assume that discrete cognitive states determine observed response frequencies. Generalized processing tree (GPT) models extend this conceptual framework to continuous variables such as response times, process-tracing measures, or neurophysiological variables. GPT models assume finite-mixture distributions, with weights determined by a processing tree structure, and continuous components modeled by parameterized distributions such as Gaussians with separate or shared parameters across states. We discuss identifiability, parameter estimation, model testing, a modeling syntax, and the improved precision of GPT estimates. Finally, a GPT version of the feature comparison model of semantic categorization is applied to computer-mouse trajectories.

  8. Applying Kaplan-Meier to Item Response Data

    ERIC Educational Resources Information Center

    McNeish, Daniel

    2018-01-01

    Some IRT models can be equivalently modeled in alternative frameworks such as logistic regression. Logistic regression can also model time-to-event data, which concerns the probability of an event occurring over time. Using the relation between time-to-event models and logistic regression and the relation between logistic regression and IRT, this…

  9. Estimating animal populations and body sizes from burrows: Marine ecologists have their heads buried in the sand

    NASA Astrophysics Data System (ADS)

    Schlacher, Thomas A.; Lucrezi, Serena; Peterson, Charles H.; Connolly, Rod M.; Olds, Andrew D.; Althaus, Franziska; Hyndes, Glenn A.; Maslo, Brooke; Gilby, Ben L.; Leon, Javier X.; Weston, Michael A.; Lastra, Mariano; Williams, Alan; Schoeman, David S.

    2016-06-01

    Most ecological studies require knowledge of animal abundance, but it can be challenging and destructive of habitat to obtain accurate density estimates for cryptic species, such as crustaceans that tunnel deeply into the seafloor, beaches, or mudflats. Such fossorial species are, however, widely used in environmental impact assessments, requiring sampling techniques that are reliable, efficient, and environmentally benign for these species and environments. Counting and measuring the entrances of burrows made by cryptic species is commonly employed to index population and body sizes of individuals. The fundamental premise is that burrow metrics consistently predict density and size. Here we review the evidence for this premise. We also review criteria for selecting among sampling methods: burrow counts, visual censuses, and physical collections. A simple 1:1 correspondence between the number of holes and population size cannot be assumed. Occupancy rates, indexed by the slope of regression models, vary widely between species and among sites for the same species. Thus, 'average' or 'typical' occupancy rates should not be extrapolated from site- or species specific field validations and then be used as conversion factors in other situations. Predictions of organism density made from burrow counts often have large uncertainty, being double to half of the predicted mean value. Whether such prediction uncertainty is 'acceptable' depends on investigators' judgements regarding the desired detectable effect sizes. Regression models predicting body size from burrow entrance dimensions are more precise, but parameter estimates of most models are specific to species and subject to site-to-site variation within species. These results emphasise the need to undertake thorough field validations of indirect census techniques that include tests of how sensitive predictive models are to changes in habitat conditions or human impacts. In addition, new technologies (e.g. drones, thermal-, acoustic- or chemical sensors) should be used to enhance visual census techniques of burrows and surface-active animals.

  10. Nonparametric evaluation of quantitative traits in population-based association studies when the genetic model is unknown.

    PubMed

    Konietschke, Frank; Libiger, Ondrej; Hothorn, Ludwig A

    2012-01-01

    Statistical association between a single nucleotide polymorphism (SNP) genotype and a quantitative trait in genome-wide association studies is usually assessed using a linear regression model, or, in the case of non-normally distributed trait values, using the Kruskal-Wallis test. While linear regression models assume an additive mode of inheritance via equi-distant genotype scores, Kruskal-Wallis test merely tests global differences in trait values associated with the three genotype groups. Both approaches thus exhibit suboptimal power when the underlying inheritance mode is dominant or recessive. Furthermore, these tests do not perform well in the common situations when only a few trait values are available in a rare genotype category (disbalance), or when the values associated with the three genotype categories exhibit unequal variance (variance heterogeneity). We propose a maximum test based on Marcus-type multiple contrast test for relative effect sizes. This test allows model-specific testing of either dominant, additive or recessive mode of inheritance, and it is robust against variance heterogeneity. We show how to obtain mode-specific simultaneous confidence intervals for the relative effect sizes to aid in interpreting the biological relevance of the results. Further, we discuss the use of a related all-pairwise comparisons contrast test with range preserving confidence intervals as an alternative to Kruskal-Wallis heterogeneity test. We applied the proposed maximum test to the Bogalusa Heart Study dataset, and gained a remarkable increase in the power to detect association, particularly for rare genotypes. Our simulation study also demonstrated that the proposed non-parametric tests control family-wise error rate in the presence of non-normality and variance heterogeneity contrary to the standard parametric approaches. We provide a publicly available R library nparcomp that can be used to estimate simultaneous confidence intervals or compatible multiplicity-adjusted p-values associated with the proposed maximum test.

  11. Automatic reconstruction of surge deposit thicknesses. Applications to some Italian volcanoes

    NASA Astrophysics Data System (ADS)

    Armienti, P.; Pareschi, M. T.

    1987-04-01

    The energy cone concept has been adopted to describe some kinds of surge deposits. The energy cone parameters (height and slope) are evaluated through a regression technique which utilizes deposit thicknesses and the correspondent quotes and heights of the energy cone. The regression also allows to evaluate a coefficient of proportionality linking the deposit thickness to the distance between topographic surface and energy line for a given eruption. Moreover, if an accurate topography is available (in this case a reconstruction of a digitalized topography of the Phlegrean Fields and of the Vesuvius), the energy cone parameters, obtained by the backfitted technique, can be used to evaluate the order of magnitude of the deposit volumes. The hazard map for a surge localized at the Solfatara (Phlegraean Fields, Naples) has been computed. The values of the energy cone parameters and the volume have been assumed to be equal to those estimated with the regression technique applied to a past surge eruption in the same area.

  12. Short communication: Principal components and factor analytic models for test-day milk yield in Brazilian Holstein cattle.

    PubMed

    Bignardi, A B; El Faro, L; Rosa, G J M; Cardoso, V L; Machado, P F; Albuquerque, L G

    2012-04-01

    A total of 46,089 individual monthly test-day (TD) milk yields (10 test-days), from 7,331 complete first lactations of Holstein cattle were analyzed. A standard multivariate analysis (MV), reduced rank analyses fitting the first 2, 3, and 4 genetic principal components (PC2, PC3, PC4), and analyses that fitted a factor analytic structure considering 2, 3, and 4 factors (FAS2, FAS3, FAS4), were carried out. The models included the random animal genetic effect and fixed effects of the contemporary groups (herd-year-month of test-day), age of cow (linear and quadratic effects), and days in milk (linear effect). The residual covariance matrix was assumed to have full rank. Moreover, 2 random regression models were applied. Variance components were estimated by restricted maximum likelihood method. The heritability estimates ranged from 0.11 to 0.24. The genetic correlation estimates between TD obtained with the PC2 model were higher than those obtained with the MV model, especially on adjacent test-days at the end of lactation close to unity. The results indicate that for the data considered in this study, only 2 principal components are required to summarize the bulk of genetic variation among the 10 traits. Copyright © 2012 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.

  13. A Model for Teacher Effects from Longitudinal Data without Assuming Vertical Scaling

    ERIC Educational Resources Information Center

    Mariano, Louis T.; McCaffrey, Daniel F.; Lockwood, J. R.

    2010-01-01

    There is an increasing interest in using longitudinal measures of student achievement to estimate individual teacher effects. Current multivariate models assume each teacher has a single effect on student outcomes that persists undiminished to all future test administrations (complete persistence [CP]) or can diminish with time but remains…

  14. Modeling and Control of a Trailing Wire Antenna Towed by an Orbiting Aircraft

    DTIC Science & Technology

    1992-09-01

    do not induce significant oscillations in the wire. The development of the dynamic model also assumed that wire bending , torsion and stretching...it was reasonable to assume that the wire bending forces did not contribute significantly to the oscillations. 8= (4.54)8EI pg=0.005176 ibf in E

  15. Teacher Leader Model Standards and the Functions Assumed by National Board Certified Teachers

    ERIC Educational Resources Information Center

    Swan Dagen, Allison; Morewood, Aimee; Smith, Megan L.

    2017-01-01

    The Teacher Leader Model Standards (TLMS) were created to stimulate discussion around the leadership responsibilities teachers assume in schools. This study used the TLMS to gauge the self-reported leadership responsibilities of National Board Certified Teachers (NBCTs). The NBCTs reported engaging in all domains of the TLMS, most frequently with…

  16. Do Young Children Have Adult-Like Syntactic Categories? Zipf's Law and the Case of the Determiner

    ERIC Educational Resources Information Center

    Pine, Julian M.; Freudenthal, Daniel; Krajewski, Grzegorz; Gobet, Fernand

    2013-01-01

    Generativist models of grammatical development assume that children have adult-like grammatical categories from the earliest observable stages, whereas constructivist models assume that children's early categories are more limited in scope. In the present paper, we test these assumptions with respect to one particular syntactic category, the…

  17. Chemically reacting supersonic flow calculation using an assumed PDF model

    NASA Technical Reports Server (NTRS)

    Farshchi, M.

    1990-01-01

    This work is motivated by the need to develop accurate models for chemically reacting compressible turbulent flow fields that are present in a typical supersonic combustion ramjet (SCRAMJET) engine. In this paper the development of a new assumed probability density function (PDF) reaction model for supersonic turbulent diffusion flames and its implementation into an efficient Navier-Stokes solver are discussed. The application of this model to a supersonic hydrogen-air flame will be considered.

  18. A Model Comparison for Count Data with a Positively Skewed Distribution with an Application to the Number of University Mathematics Courses Completed

    ERIC Educational Resources Information Center

    Liou, Pey-Yan

    2009-01-01

    The current study examines three regression models: OLS (ordinary least square) linear regression, Poisson regression, and negative binomial regression for analyzing count data. Simulation results show that the OLS regression model performed better than the others, since it did not produce more false statistically significant relationships than…

  19. Breeding value accuracy estimates for growth traits using random regression and multi-trait models in Nelore cattle.

    PubMed

    Boligon, A A; Baldi, F; Mercadante, M E Z; Lobo, R B; Pereira, R J; Albuquerque, L G

    2011-06-28

    We quantified the potential increase in accuracy of expected breeding value for weights of Nelore cattle, from birth to mature age, using multi-trait and random regression models on Legendre polynomials and B-spline functions. A total of 87,712 weight records from 8144 females were used, recorded every three months from birth to mature age from the Nelore Brazil Program. For random regression analyses, all female weight records from birth to eight years of age (data set I) were considered. From this general data set, a subset was created (data set II), which included only nine weight records: at birth, weaning, 365 and 550 days of age, and 2, 3, 4, 5, and 6 years of age. Data set II was analyzed using random regression and multi-trait models. The model of analysis included the contemporary group as fixed effects and age of dam as a linear and quadratic covariable. In the random regression analyses, average growth trends were modeled using a cubic regression on orthogonal polynomials of age. Residual variances were modeled by a step function with five classes. Legendre polynomials of fourth and sixth order were utilized to model the direct genetic and animal permanent environmental effects, respectively, while third-order Legendre polynomials were considered for maternal genetic and maternal permanent environmental effects. Quadratic polynomials were applied to model all random effects in random regression models on B-spline functions. Direct genetic and animal permanent environmental effects were modeled using three segments or five coefficients, and genetic maternal and maternal permanent environmental effects were modeled with one segment or three coefficients in the random regression models on B-spline functions. For both data sets (I and II), animals ranked differently according to expected breeding value obtained by random regression or multi-trait models. With random regression models, the highest gains in accuracy were obtained at ages with a low number of weight records. The results indicate that random regression models provide more accurate expected breeding values than the traditionally finite multi-trait models. Thus, higher genetic responses are expected for beef cattle growth traits by replacing a multi-trait model with random regression models for genetic evaluation. B-spline functions could be applied as an alternative to Legendre polynomials to model covariance functions for weights from birth to mature age.

  20. Evaluating differential effects using regression interactions and regression mixture models

    PubMed Central

    Van Horn, M. Lee; Jaki, Thomas; Masyn, Katherine; Howe, George; Feaster, Daniel J.; Lamont, Andrea E.; George, Melissa R. W.; Kim, Minjung

    2015-01-01

    Research increasingly emphasizes understanding differential effects. This paper focuses on understanding regression mixture models, a relatively new statistical methods for assessing differential effects by comparing results to using an interactive term in linear regression. The research questions which each model answers, their formulation, and their assumptions are compared using Monte Carlo simulations and real data analysis. The capabilities of regression mixture models are described and specific issues to be addressed when conducting regression mixtures are proposed. The paper aims to clarify the role that regression mixtures can take in the estimation of differential effects and increase awareness of the benefits and potential pitfalls of this approach. Regression mixture models are shown to be a potentially effective exploratory method for finding differential effects when these effects can be defined by a small number of classes of respondents who share a typical relationship between a predictor and an outcome. It is also shown that the comparison between regression mixture models and interactions becomes substantially more complex as the number of classes increases. It is argued that regression interactions are well suited for direct tests of specific hypotheses about differential effects and regression mixtures provide a useful approach for exploring effect heterogeneity given adequate samples and study design. PMID:26556903

  1. Evaluating Differential Effects Using Regression Interactions and Regression Mixture Models

    ERIC Educational Resources Information Center

    Van Horn, M. Lee; Jaki, Thomas; Masyn, Katherine; Howe, George; Feaster, Daniel J.; Lamont, Andrea E.; George, Melissa R. W.; Kim, Minjung

    2015-01-01

    Research increasingly emphasizes understanding differential effects. This article focuses on understanding regression mixture models, which are relatively new statistical methods for assessing differential effects by comparing results to using an interactive term in linear regression. The research questions which each model answers, their…

  2. Derivation of equations of motion for multi-blade rotors employing coupled modes and including high twist capability

    NASA Technical Reports Server (NTRS)

    Sopher, R.

    1975-01-01

    The equations of motion are derived for a multiblade rotor. A high twist capability and coupled flatwise-edgewise assumed normal modes are employed instead of uncoupled flatwise - edgewise assumed normal models. The torsion mode is uncoupled. Support system models, consisting of complete helicopters in free flight, or grounded flexible supports, arbitrary rotor-induced inflow, and arbitrary vertical gust models are also used.

  3. An Axisymmetric Hydrodynamical Model for the Torus Wind in AGN. 2; X-ray Excited Funnel Flow

    NASA Technical Reports Server (NTRS)

    Dorodnitsyn, A.; Kallman, T.; Proga, D.

    2008-01-01

    We have calculated a series of models of outflows from the obscuring torus in active galactic nuclei (AGN). Our modeling assumes that the inner face of a rotationally supported torus is illuminated and heated by the intense X-rays from the inner accretion disk and black hole. As a result of such heating a strong biconical outflow is observed in our simulations. We calculate 3-dimensional hydrodynamical models, assuming axial symmetry, and including the effects of X-ray heating, ionization, and radiation pressure. We discuss the behavior of a large family of these models, their velocity fields, mass fluxes and temperature, as functions of the torus properties and X-ray flux. Synthetic warm absorber spectra are calculated, assuming pure absorption, for sample models at various inclination angles and observing times. We show that these models have mass fluxes and flow speeds which are comparable to those which have been inferred from observations of Seyfert 1 warm absorbers, and that they can produce rich absorption line spectra.

  4. The new climate data record of total and spectral solar irradiance: Current progress and future steps

    NASA Astrophysics Data System (ADS)

    Coddington, Odele; Lean, Judith; Rottman, Gary; Pilewskie, Peter; Snow, Martin; Lindholm, Doug

    2016-04-01

    We present a climate data record of Total Solar Irradiance (TSI) and Solar Spectral Irradiance (SSI), with associated time and wavelength dependent uncertainties, from 1610 to the present. The data record was developed jointly by the Laboratory for Atmospheric and Space Physics (LASP) at the University of Colorado Boulder and the Naval Research Laboratory (NRL) as part of the National Oceanographic and Atmospheric Administration's (NOAA) National Centers for Environmental Information (NCEI) Climate Data Record (CDR) Program, where the data record, source code, and supporting documentation are archived. TSI and SSI are constructed from models that determine the changes from quiet Sun conditions arising from bright faculae and dark sunspots on the solar disk using linear regression of proxies of solar magnetic activity with observations from the SOlar Radiation and Climate Experiment (SORCE) Total Irradiance Monitor (TIM), Spectral Irradiance Monitor (SIM), and SOlar Stellar Irradiance Comparison Experiment (SOLSTICE). We show that TSI can be separately modeled to within TIM's measurement accuracy from solar rotational to solar cycle time scales and we assume that SSI measurements are reliable on solar rotational time scales. We discuss the model formulation, uncertainty estimates, and operational implementation and present comparisons of the modeled TSI and SSI with the measurement record and with other solar irradiance models. We also discuss ongoing work to assess the sensitivity of the modeled irradiances to model assumptions, namely, the scaling of solar variability from rotational-to-cycle time scales and the representation of the sunspot darkening index.

  5. Testing the vulnerability and scar models of self-esteem and depressive symptoms from adolescence to middle adulthood and across generations.

    PubMed

    Steiger, Andrea E; Fend, Helmut A; Allemand, Mathias

    2015-02-01

    The vulnerability model states that low self-esteem functions as a predictor for the development of depressive symptoms whereas the scar model assumes that these symptoms leave scars in individuals resulting in lower self-esteem. Both models have received empirical support, however, they have only been tested within individuals and not across generations (i.e., between family members). Thus, we tested the scope of these competing models by (a) investigating whether the effects hold from adolescence to middle adulthood (long-term vulnerability and scar effects), (b) whether the effects hold across generations (intergenerational vulnerability and scar effects), and (c) whether intergenerational effects are mediated by parental self-esteem and depressive symptoms and parent-child discord. We used longitudinal data from adolescence to middle adulthood (N = 1,359) and from Generation 1 adolescents (G1) to Generation 2 adolescents (G2) (N = 572 parent-child pairs). Results from latent cross-lagged regression analyses demonstrated that both adolescent self-esteem and depressive symptoms were prospectively related to adult self-esteem and depressive symptoms 3 decades later. That is, both the vulnerability and scar models are valid over decades with stronger effects for the vulnerability model. Across generations, we found a substantial direct transmission effect from G1 to G2 adolescent depressive symptoms but no evidence for the proposed intergenerational vulnerability and scar effect or for any of the proposed mediating mechanisms. PsycINFO Database Record (c) 2015 APA, all rights reserved.

  6. Impact of correlation of predictors on discrimination of risk models in development and external populations.

    PubMed

    Kundu, Suman; Mazumdar, Madhu; Ferket, Bart

    2017-04-19

    The area under the ROC curve (AUC) of risk models is known to be influenced by differences in case-mix and effect size of predictors. The impact of heterogeneity in correlation among predictors has however been under investigated. We sought to evaluate how correlation among predictors affects the AUC in development and external populations. We simulated hypothetical populations using two different methods based on means, standard deviations, and correlation of two continuous predictors. In the first approach, the distribution and correlation of predictors were assumed for the total population. In the second approach, these parameters were modeled conditional on disease status. In both approaches, multivariable logistic regression models were fitted to predict disease risk in individuals. Each risk model developed in a population was validated in the remaining populations to investigate external validity. For both approaches, we observed that the magnitude of the AUC in the development and external populations depends on the correlation among predictors. Lower AUCs were estimated in scenarios of both strong positive and negative correlation, depending on the direction of predictor effects and the simulation method. However, when adjusted effect sizes of predictors were specified in the opposite directions, increasingly negative correlation consistently improved the AUC. AUCs in external validation populations were higher or lower than in the derivation cohort, even in the presence of similar predictor effects. Discrimination of risk prediction models should be assessed in various external populations with different correlation structures to make better inferences about model generalizability.

  7. Modeling absolute differences in life expectancy with a censored skew-normal regression approach

    PubMed Central

    Clough-Gorr, Kerri; Zwahlen, Marcel

    2015-01-01

    Parameter estimates from commonly used multivariable parametric survival regression models do not directly quantify differences in years of life expectancy. Gaussian linear regression models give results in terms of absolute mean differences, but are not appropriate in modeling life expectancy, because in many situations time to death has a negative skewed distribution. A regression approach using a skew-normal distribution would be an alternative to parametric survival models in the modeling of life expectancy, because parameter estimates can be interpreted in terms of survival time differences while allowing for skewness of the distribution. In this paper we show how to use the skew-normal regression so that censored and left-truncated observations are accounted for. With this we model differences in life expectancy using data from the Swiss National Cohort Study and from official life expectancy estimates and compare the results with those derived from commonly used survival regression models. We conclude that a censored skew-normal survival regression approach for left-truncated observations can be used to model differences in life expectancy across covariates of interest. PMID:26339544

  8. Error Covariance Penalized Regression: A novel multivariate model combining penalized regression with multivariate error structure.

    PubMed

    Allegrini, Franco; Braga, Jez W B; Moreira, Alessandro C O; Olivieri, Alejandro C

    2018-06-29

    A new multivariate regression model, named Error Covariance Penalized Regression (ECPR) is presented. Following a penalized regression strategy, the proposed model incorporates information about the measurement error structure of the system, using the error covariance matrix (ECM) as a penalization term. Results are reported from both simulations and experimental data based on replicate mid and near infrared (MIR and NIR) spectral measurements. The results for ECPR are better under non-iid conditions when compared with traditional first-order multivariate methods such as ridge regression (RR), principal component regression (PCR) and partial least-squares regression (PLS). Copyright © 2018 Elsevier B.V. All rights reserved.

  9. Analyzing Student Learning Outcomes: Usefulness of Logistic and Cox Regression Models. IR Applications, Volume 5

    ERIC Educational Resources Information Center

    Chen, Chau-Kuang

    2005-01-01

    Logistic and Cox regression methods are practical tools used to model the relationships between certain student learning outcomes and their relevant explanatory variables. The logistic regression model fits an S-shaped curve into a binary outcome with data points of zero and one. The Cox regression model allows investigators to study the duration…

  10. Robust geographically weighted regression of modeling the Air Polluter Standard Index (APSI)

    NASA Astrophysics Data System (ADS)

    Warsito, Budi; Yasin, Hasbi; Ispriyanti, Dwi; Hoyyi, Abdul

    2018-05-01

    The Geographically Weighted Regression (GWR) model has been widely applied to many practical fields for exploring spatial heterogenity of a regression model. However, this method is inherently not robust to outliers. Outliers commonly exist in data sets and may lead to a distorted estimate of the underlying regression model. One of solution to handle the outliers in the regression model is to use the robust models. So this model was called Robust Geographically Weighted Regression (RGWR). This research aims to aid the government in the policy making process related to air pollution mitigation by developing a standard index model for air polluter (Air Polluter Standard Index - APSI) based on the RGWR approach. In this research, we also consider seven variables that are directly related to the air pollution level, which are the traffic velocity, the population density, the business center aspect, the air humidity, the wind velocity, the air temperature, and the area size of the urban forest. The best model is determined by the smallest AIC value. There are significance differences between Regression and RGWR in this case, but Basic GWR using the Gaussian kernel is the best model to modeling APSI because it has smallest AIC.

  11. The Relationship between Religious Coping and Self-Care Behaviors in Iranian Medical Students.

    PubMed

    Sharif Nia, Hamid; Pahlevan Sharif, Saeed; Goudarzian, Amir Hossein; Allen, Kelly A; Jamali, Saman; Heydari Gorji, Mohammad Ali

    2017-12-01

    In recent years, researchers have identified that coping strategies are an important contributor to an individual's life satisfaction and ability to manage stress. The positive relationship between religious copings, specifically, with physical and mental health has also been identified in some studies. Spirituality and religion have been discussed rigorously in research, but very few studies exist on religious coping. The aim of this study was to determine the relationship between religious coping methods (i.e., positive and negative religious coping) and self-care behaviors in Iranian medical students. This study used a cross-sectional design of 335 randomly selected students from Mazandaran University of Medical Sciences, Iran. A data collection tool comprised of the standard questionnaire of religious coping methods and questionnaire of self-care behaviors assessment was utilized. Data were analyzed using a two-sample t test assuming equal variances. Adjusted linear regression was used to evaluate the independent association of religious copings with self-care. Adjusted linear regression model indicated an independent significant association between positive (b = 4.616, 95% CI 4.234-4.999) and negative (b = -3.726, 95% CI -4.311 to -3.141) religious coping with self-care behaviors. Findings showed a linear relationship between religious coping and self-care behaviors. Further research with larger sample sizes in diverse populations is recommended.

  12. Patterns and correlates of illicit drug selling among youth in the USA

    PubMed Central

    Vaughn, Michael G; Shook, Jeffrey J; Perron, Brian E; Abdon, Arnelyn; Ahmedani, Brian

    2011-01-01

    Purpose Despite the high rates of drug selling among youth in juvenile justice and youth residing in disadvantage neighborhoods, relatively little is known about the patterns of illicit drug selling among youth in the general population. Methods Using the public-use data file from the adolescent sample (N = 17 842) in the 2008 National Survey on Drug Use and Health (NSDUH), this study employed multiple logistic regression to compare the behavioral, parental involvement, and prevention experiences of youth who sold and did not sell illicit drugs in the past year. Results Findings from a series of logistic regression models indicated youth who sold drugs were far more likely to use a wide variety of drugs and engage in delinquent acts. Drug-selling youth were significantly less likely to report having a parent involved in their life and have someone to talk to about serious problems but were more likely to report exposure to drug prevention programming. Conclusion Selling of drugs by youth appears to be a byproduct of substance abuse and deviance proneness, and the prevention programs these youth experience are likely a result of mandated exposure derived from contact with the criminal justice system. Assuming no major drug supply side reductions, policies, and practices associated with increasing drug abuse treatment, parental involvement and supervision, and school engagement are suggested. PMID:22375100

  13. Relationships between coordination, active drag and propelling efficiency in crawl.

    PubMed

    Seifert, Ludovic; Schnitzler, Christophe; Bideault, Gautier; Alberty, Morgan; Chollet, Didier; Toussaint, Huub Martin

    2015-02-01

    This study examines the relationships between the index of coordination (IdC) and active drag (D) assuming that at constant average speed, average drag equals average propulsion. The relationship between IdC and propulsive efficiency (ep) was also investigated at maximal speed. Twenty national swimmers completed two incremental speed tests swimming front crawl with arms only in free condition and using a measurement of active drag system. Each test was composed of eight 25-m bouts from 60% to 100% of maximal intensity whereby each lap was swum at constant speed. Different regression models were tested to analyse IdC-D relationship. Correlation between IdC and ep was calculated. IdC was linked to D by linear regression (IdC=0.246·D-27.06; R(2)=0.88, P<.05); swimmers switched from catch-up to superposition coordination mode at a speed of ∼1.55ms(-1) where average D is ∼110N. No correlation between IdC and ep at maximal speed was found. The intra-individual analysis revealed that coordination plays an important role in scaling propulsive forces with higher speed levels such that these are adapted to aquatic resistance. Inter-individual analysis showed that high IdC did not relate to a high ep suggesting an individual optimization of force and power generation is at play to reach high speeds. Copyright © 2014 Elsevier B.V. All rights reserved.

  14. The Labor Market and the Second Economy in the Soviet Union

    DTIC Science & Technology

    1991-01-01

    model . WHO WORKS "ON THE LEFT"? 15 (The non-second economy income (V) is in turn composed of official first economy income , pilferage from the first...demands. In other words, the model assumes that the family "pools" all unearned income regardless of source. This is one of the few testable assumptions...of the neoclassical model .16 In the labor supply model in this paper, we have assumed that all first economy income , for both husband and wife, is

  15. Meta-analysis of haplotype-association studies: comparison of methods and empirical evaluation of the literature

    PubMed Central

    2011-01-01

    Background Meta-analysis is a popular methodology in several fields of medical research, including genetic association studies. However, the methods used for meta-analysis of association studies that report haplotypes have not been studied in detail. In this work, methods for performing meta-analysis of haplotype association studies are summarized, compared and presented in a unified framework along with an empirical evaluation of the literature. Results We present multivariate methods that use summary-based data as well as methods that use binary and count data in a generalized linear mixed model framework (logistic regression, multinomial regression and Poisson regression). The methods presented here avoid the inflation of the type I error rate that could be the result of the traditional approach of comparing a haplotype against the remaining ones, whereas, they can be fitted using standard software. Moreover, formal global tests are presented for assessing the statistical significance of the overall association. Although the methods presented here assume that the haplotypes are directly observed, they can be easily extended to allow for such an uncertainty by weighting the haplotypes by their probability. Conclusions An empirical evaluation of the published literature and a comparison against the meta-analyses that use single nucleotide polymorphisms, suggests that the studies reporting meta-analysis of haplotypes contain approximately half of the included studies and produce significant results twice more often. We show that this excess of statistically significant results, stems from the sub-optimal method of analysis used and, in approximately half of the cases, the statistical significance is refuted if the data are properly re-analyzed. Illustrative examples of code are given in Stata and it is anticipated that the methods developed in this work will be widely applied in the meta-analysis of haplotype association studies. PMID:21247440

  16. More green space is related to less antidepressant prescription rates in the Netherlands: A Bayesian geoadditive quantile regression approach.

    PubMed

    Helbich, Marco; Klein, Nadja; Roberts, Hannah; Hagedoorn, Paulien; Groenewegen, Peter P

    2018-06-20

    Exposure to green space seems to be beneficial for self-reported mental health. In this study we used an objective health indicator, namely antidepressant prescription rates. Current studies rely exclusively upon mean regression models assuming linear associations. It is, however, plausible that the presence of green space is non-linearly related with different quantiles of the outcome antidepressant prescription rates. These restrictions may contribute to inconsistent findings. Our aim was: a) to assess antidepressant prescription rates in relation to green space, and b) to analyze how the relationship varies non-linearly across different quantiles of antidepressant prescription rates. We used cross-sectional data for the year 2014 at a municipality level in the Netherlands. Ecological Bayesian geoadditive quantile regressions were fitted for the 15%, 50%, and 85% quantiles to estimate green space-prescription rate correlations, controlling for physical activity levels, socio-demographics, urbanicity, etc. RESULTS: The results suggested that green space was overall inversely and non-linearly associated with antidepressant prescription rates. More important, the associations differed across the quantiles, although the variation was modest. Significant non-linearities were apparent: The associations were slightly positive in the lower quantile and strongly negative in the upper one. Our findings imply that an increased availability of green space within a municipality may contribute to a reduction in the number of antidepressant prescriptions dispensed. Green space is thus a central health and community asset, whilst a minimum level of 28% needs to be established for health gains. The highest effectiveness occurred at a municipality surface percentage higher than 79%. This inverse dose-dependent relation has important implications for setting future community-level health and planning policies. Copyright © 2018 Elsevier Inc. All rights reserved.

  17. Bayesian Unimodal Density Regression for Causal Inference

    ERIC Educational Resources Information Center

    Karabatsos, George; Walker, Stephen G.

    2011-01-01

    Karabatsos and Walker (2011) introduced a new Bayesian nonparametric (BNP) regression model. Through analyses of real and simulated data, they showed that the BNP regression model outperforms other parametric and nonparametric regression models of common use, in terms of predictive accuracy of the outcome (dependent) variable. The other,…

  18. Bayesian Estimation of Multivariate Latent Regression Models: Gauss versus Laplace

    ERIC Educational Resources Information Center

    Culpepper, Steven Andrew; Park, Trevor

    2017-01-01

    A latent multivariate regression model is developed that employs a generalized asymmetric Laplace (GAL) prior distribution for regression coefficients. The model is designed for high-dimensional applications where an approximate sparsity condition is satisfied, such that many regression coefficients are near zero after accounting for all the model…

  19. A simple approach to power and sample size calculations in logistic regression and Cox regression models.

    PubMed

    Vaeth, Michael; Skovlund, Eva

    2004-06-15

    For a given regression problem it is possible to identify a suitably defined equivalent two-sample problem such that the power or sample size obtained for the two-sample problem also applies to the regression problem. For a standard linear regression model the equivalent two-sample problem is easily identified, but for generalized linear models and for Cox regression models the situation is more complicated. An approximately equivalent two-sample problem may, however, also be identified here. In particular, we show that for logistic regression and Cox regression models the equivalent two-sample problem is obtained by selecting two equally sized samples for which the parameters differ by a value equal to the slope times twice the standard deviation of the independent variable and further requiring that the overall expected number of events is unchanged. In a simulation study we examine the validity of this approach to power calculations in logistic regression and Cox regression models. Several different covariate distributions are considered for selected values of the overall response probability and a range of alternatives. For the Cox regression model we consider both constant and non-constant hazard rates. The results show that in general the approach is remarkably accurate even in relatively small samples. Some discrepancies are, however, found in small samples with few events and a highly skewed covariate distribution. Comparison with results based on alternative methods for logistic regression models with a single continuous covariate indicates that the proposed method is at least as good as its competitors. The method is easy to implement and therefore provides a simple way to extend the range of problems that can be covered by the usual formulas for power and sample size determination. Copyright 2004 John Wiley & Sons, Ltd.

  20. Bayesian adjustment for measurement error in continuous exposures in an individually matched case-control study.

    PubMed

    Espino-Hernandez, Gabriela; Gustafson, Paul; Burstyn, Igor

    2011-05-14

    In epidemiological studies explanatory variables are frequently subject to measurement error. The aim of this paper is to develop a Bayesian method to correct for measurement error in multiple continuous exposures in individually matched case-control studies. This is a topic that has not been widely investigated. The new method is illustrated using data from an individually matched case-control study of the association between thyroid hormone levels during pregnancy and exposure to perfluorinated acids. The objective of the motivating study was to examine the risk of maternal hypothyroxinemia due to exposure to three perfluorinated acids measured on a continuous scale. Results from the proposed method are compared with those obtained from a naive analysis. Using a Bayesian approach, the developed method considers a classical measurement error model for the exposures, as well as the conditional logistic regression likelihood as the disease model, together with a random-effect exposure model. Proper and diffuse prior distributions are assigned, and results from a quality control experiment are used to estimate the perfluorinated acids' measurement error variability. As a result, posterior distributions and 95% credible intervals of the odds ratios are computed. A sensitivity analysis of method's performance in this particular application with different measurement error variability was performed. The proposed Bayesian method to correct for measurement error is feasible and can be implemented using statistical software. For the study on perfluorinated acids, a comparison of the inferences which are corrected for measurement error to those which ignore it indicates that little adjustment is manifested for the level of measurement error actually exhibited in the exposures. Nevertheless, a sensitivity analysis shows that more substantial adjustments arise if larger measurement errors are assumed. In individually matched case-control studies, the use of conditional logistic regression likelihood as a disease model in the presence of measurement error in multiple continuous exposures can be justified by having a random-effect exposure model. The proposed method can be successfully implemented in WinBUGS to correct individually matched case-control studies for several mismeasured continuous exposures under a classical measurement error model.

  1. Comparative evaluation of urban storm water quality models

    NASA Astrophysics Data System (ADS)

    Vaze, J.; Chiew, Francis H. S.

    2003-10-01

    The estimation of urban storm water pollutant loads is required for the development of mitigation and management strategies to minimize impacts to receiving environments. Event pollutant loads are typically estimated using either regression equations or "process-based" water quality models. The relative merit of using regression models compared to process-based models is not clear. A modeling study is carried out here to evaluate the comparative ability of the regression equations and process-based water quality models to estimate event diffuse pollutant loads from impervious surfaces. The results indicate that, once calibrated, both the regression equations and the process-based model can estimate event pollutant loads satisfactorily. In fact, the loads estimated using the regression equation as a function of rainfall intensity and runoff rate are better than the loads estimated using the process-based model. Therefore, if only estimates of event loads are required, regression models should be used because they are simpler and require less data compared to process-based models.

  2. Democracy under Uncertainty: The Wisdom of Crowds and the Free-Rider Problem in Group Decision Making

    ERIC Educational Resources Information Center

    Kameda, Tatsuya; Tsukasaki, Takafumi; Hastie, Reid; Berg, Nathan

    2011-01-01

    We introduce a game theory model of individual decisions to cooperate by contributing personal resources to group decisions versus by free riding on the contributions of other members. In contrast to most public-goods games that assume group returns are linear in individual contributions, the present model assumes decreasing marginal group…

  3. Modeling Ignition of HMX with the Gibbs Formulation

    NASA Astrophysics Data System (ADS)

    Lee, Kibaek; Stewart, D. Scott

    2017-06-01

    We present a HMX model with the Gibbs formulation in which stress tensor and temperature are assumed to be in local equilibrium, but phase/chemical changes are not assumed to be in equilibrium. We assume multi-components for HMX including beta- and delta-phase, liquid, and gas phase of HMX and its gas products. Isotropic small strain solid model, modified Fried Howard liquid EOS, and ideal gas EOS are used for its relevant component. Phase/chemical changes are characterized as reactions and are in individual reaction rate. Maxwell-Stefan model is used for diffusion. Excited gas products in the local domain lead unreacted HMX solid to the ignition event. Density of the mixture, stress, strain, displacement, mass fractions, and temperature are considered in 1D domain with time histories. Office of Naval Research and Air Force Office of Scientific Research.

  4. Is perceived patient involvement in mental health care associated with satisfaction and empowerment?

    PubMed

    Tambuyzer, Else; Van Audenhove, Chantal

    2015-08-01

    Patients increasingly assume active roles in their mental health care. While there is a growing interest in patient involvement and patient-reported outcomes, there is insufficient research on the outcomes of patient involvement. The research questions in this study are as follows: 'To what extent is perceived patient involvement associated with satisfaction and empowerment?'; 'What is the nature of the relationship between satisfaction and empowerment?'; and 'To what extent are background variables associated with satisfaction and empowerment?'. We assumed that a higher degree of patient involvement is associated with higher satisfaction and empowerment scores and that satisfaction and empowerment are positively associated. Data were gathered using surveys of 111 patients of 36 multidisciplinary care networks for persons with serious and persistent mental illness. Demographic characteristics, patient involvement and satisfaction were measured using a new questionnaire. Empowerment was assessed using the Dutch Empowerment Scale. Descriptive, univariate (Pearson's r and independent-samples t-tests), multivariate (hierarchical forced entry regression) and mixed-model analyses were conducted. The hypotheses of positive associations between patient involvement, satisfaction and empowerment are confirmed. The demographics are not significantly related to satisfaction or empowerment, except for gender. Men reported higher empowerment scores than did women. Making patient involvement a reality is more than just an ethical imperative. It provides an opportunity to enhance patient-reported outcomes such as satisfaction and empowerment. Future research should focus on the nature of the association between satisfaction and empowerment. © 2013 John Wiley & Sons Ltd.

  5. A generalized right truncated bivariate Poisson regression model with applications to health data.

    PubMed

    Islam, M Ataharul; Chowdhury, Rafiqul I

    2017-01-01

    A generalized right truncated bivariate Poisson regression model is proposed in this paper. Estimation and tests for goodness of fit and over or under dispersion are illustrated for both untruncated and right truncated bivariate Poisson regression models using marginal-conditional approach. Estimation and test procedures are illustrated for bivariate Poisson regression models with applications to Health and Retirement Study data on number of health conditions and the number of health care services utilized. The proposed test statistics are easy to compute and it is evident from the results that the models fit the data very well. A comparison between the right truncated and untruncated bivariate Poisson regression models using the test for nonnested models clearly shows that the truncated model performs significantly better than the untruncated model.

  6. A generalized right truncated bivariate Poisson regression model with applications to health data

    PubMed Central

    Islam, M. Ataharul; Chowdhury, Rafiqul I.

    2017-01-01

    A generalized right truncated bivariate Poisson regression model is proposed in this paper. Estimation and tests for goodness of fit and over or under dispersion are illustrated for both untruncated and right truncated bivariate Poisson regression models using marginal-conditional approach. Estimation and test procedures are illustrated for bivariate Poisson regression models with applications to Health and Retirement Study data on number of health conditions and the number of health care services utilized. The proposed test statistics are easy to compute and it is evident from the results that the models fit the data very well. A comparison between the right truncated and untruncated bivariate Poisson regression models using the test for nonnested models clearly shows that the truncated model performs significantly better than the untruncated model. PMID:28586344

  7. A Technique of Fuzzy C-Mean in Multiple Linear Regression Model toward Paddy Yield

    NASA Astrophysics Data System (ADS)

    Syazwan Wahab, Nur; Saifullah Rusiman, Mohd; Mohamad, Mahathir; Amira Azmi, Nur; Che Him, Norziha; Ghazali Kamardan, M.; Ali, Maselan

    2018-04-01

    In this paper, we propose a hybrid model which is a combination of multiple linear regression model and fuzzy c-means method. This research involved a relationship between 20 variates of the top soil that are analyzed prior to planting of paddy yields at standard fertilizer rates. Data used were from the multi-location trials for rice carried out by MARDI at major paddy granary in Peninsular Malaysia during the period from 2009 to 2012. Missing observations were estimated using mean estimation techniques. The data were analyzed using multiple linear regression model and a combination of multiple linear regression model and fuzzy c-means method. Analysis of normality and multicollinearity indicate that the data is normally scattered without multicollinearity among independent variables. Analysis of fuzzy c-means cluster the yield of paddy into two clusters before the multiple linear regression model can be used. The comparison between two method indicate that the hybrid of multiple linear regression model and fuzzy c-means method outperform the multiple linear regression model with lower value of mean square error.

  8. A Data-Driven Approach to Reverse Engineering Customer Engagement Models: Towards Functional Constructs

    PubMed Central

    de Vries, Natalie Jane; Carlson, Jamie; Moscato, Pablo

    2014-01-01

    Online consumer behavior in general and online customer engagement with brands in particular, has become a major focus of research activity fuelled by the exponential increase of interactive functions of the internet and social media platforms and applications. Current research in this area is mostly hypothesis-driven and much debate about the concept of Customer Engagement and its related constructs remains existent in the literature. In this paper, we aim to propose a novel methodology for reverse engineering a consumer behavior model for online customer engagement, based on a computational and data-driven perspective. This methodology could be generalized and prove useful for future research in the fields of consumer behaviors using questionnaire data or studies investigating other types of human behaviors. The method we propose contains five main stages; symbolic regression analysis, graph building, community detection, evaluation of results and finally, investigation of directed cycles and common feedback loops. The ‘communities’ of questionnaire items that emerge from our community detection method form possible ‘functional constructs’ inferred from data rather than assumed from literature and theory. Our results show consistent partitioning of questionnaire items into such ‘functional constructs’ suggesting the method proposed here could be adopted as a new data-driven way of human behavior modeling. PMID:25036766

  9. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Otake, M.; Schull, W.J.

    This paper investigates the quantitative relationship of ionizing radiation to the occurrence of posterior lenticular opacities among the survivors of the atomic bombings of Hiroshima and Nagasaki suggested by the DS86 dosimetry system. DS86 doses are available for 1983 (93.4%) of the 2124 atomic bomb survivors analyzed in 1982. The DS86 kerma neutron component for Hiroshima survivors is much smaller than its comparable T65DR component, but still 4.2-fold higher (0.38 Gy at 6 Gy) than that in Nagasaki (0.09 Gy at 6 Gy). Thus, if the eye is especially sensitive to neutrons, there may yet be some useful information onmore » their effects, particularly in Hiroshima. The dose-response relationship has been evaluated as a function of the separately estimated gamma-ray and neutron doses. Among several different dose-response models without and with two thresholds, we have selected as the best model the one with the smallest x2 or the largest log likelihood value associated with the goodness of fit. The best fit is a linear gamma-linear neutron relationship which assumes different thresholds for the two types of radiation. Both gamma and neutron regression coefficients for the best fitting model are positive and highly significant for the estimated DS86 eye organ dose.« less

  10. A data-driven approach to reverse engineering customer engagement models: towards functional constructs.

    PubMed

    de Vries, Natalie Jane; Carlson, Jamie; Moscato, Pablo

    2014-01-01

    Online consumer behavior in general and online customer engagement with brands in particular, has become a major focus of research activity fuelled by the exponential increase of interactive functions of the internet and social media platforms and applications. Current research in this area is mostly hypothesis-driven and much debate about the concept of Customer Engagement and its related constructs remains existent in the literature. In this paper, we aim to propose a novel methodology for reverse engineering a consumer behavior model for online customer engagement, based on a computational and data-driven perspective. This methodology could be generalized and prove useful for future research in the fields of consumer behaviors using questionnaire data or studies investigating other types of human behaviors. The method we propose contains five main stages; symbolic regression analysis, graph building, community detection, evaluation of results and finally, investigation of directed cycles and common feedback loops. The 'communities' of questionnaire items that emerge from our community detection method form possible 'functional constructs' inferred from data rather than assumed from literature and theory. Our results show consistent partitioning of questionnaire items into such 'functional constructs' suggesting the method proposed here could be adopted as a new data-driven way of human behavior modeling.

  11. Height reduction among prenatally exposed atomic-bomb survivors: A longitudinal study of growth

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nakashima, Eiji; Funamoto, Sachiyo; Carter, R.L.

    Using a random coefficient regression model, sex-specific longitudinal analyses of height were made on 801 (392 male and 409 female) atomic-bomb survivors exposed in utero to detect dose effects on standing height. The data set resulted from repeated measurements of standing height of adolescents (age 10-18 y). The dose effect, if any, was assumed to be linear. Gestational ages at the time of radiation exposure were divided into trimesters. Since an earlier longitudinal data analysis has demonstrated radiation effects on height, the emphasis in this paper is on the interaction between dose and gestational age at exposure and radiation effectsmore » on the age of occurrence of the adolescent growth spurt. For males, a cubic polynomial growth-curve model applied to the data was affected significantly by radiation. The dose by trimester interaction effect was not significant. The onset of adolescent growth spurt was estimated at about 13 y at 0 Gy. There was no effect of radiation on the adolescent growth spurt For females, a quadratic polynomial growth-curve model was fitted to the data. The dose effect was significant, while the dose by trimester interaction was again not significant. 27 refs., 3 figs., 4 tabs.« less

  12. Uncovering the cognitive processes underlying mental rotation: an eye-movement study.

    PubMed

    Xue, Jiguo; Li, Chunyong; Quan, Cheng; Lu, Yiming; Yue, Jingwei; Zhang, Chenggang

    2017-08-30

    Mental rotation is an important paradigm for spatial ability. Mental-rotation tasks are assumed to involve five or three sequential cognitive-processing states, though this has not been demonstrated experimentally. Here, we investigated how processing states alternate during mental-rotation tasks. Inference was carried out using an advanced statistical modelling and data-driven approach - a discriminative hidden Markov model (dHMM) trained using eye-movement data obtained from an experiment consisting of two different strategies: (I) mentally rotate the right-side figure to be aligned with the left-side figure and (II) mentally rotate the left-side figure to be aligned with the right-side figure. Eye movements were found to contain the necessary information for determining the processing strategy, and the dHMM that best fit our data segmented the mental-rotation process into three hidden states, which we termed encoding and searching, comparison, and searching on one-side pair. Additionally, we applied three classification methods, logistic regression, support vector model and dHMM, of which dHMM predicted the strategies with the highest accuracy (76.8%). Our study did confirm that there are differences in processing states between these two of mental-rotation strategies, and were consistent with the previous suggestion that mental rotation is discrete process that is accomplished in a piecemeal fashion.

  13. Spatial Assessment of Model Errors from Four Regression Techniques

    Treesearch

    Lianjun Zhang; Jeffrey H. Gove; Jeffrey H. Gove

    2005-01-01

    Fomst modelers have attempted to account for the spatial autocorrelations among trees in growth and yield models by applying alternative regression techniques such as linear mixed models (LMM), generalized additive models (GAM), and geographicalIy weighted regression (GWR). However, the model errors are commonly assessed using average errors across the entire study...

  14. Application of balancing methods in modeling the penicillin fermentation.

    PubMed

    Heijnen, J J; Roels, J A; Stouthamer, A H

    1979-12-01

    This paper shows the application of elementary balancing methods in combination with simple kinetic equations in the formulation of an unstructured model for the fed-batch process for the production of penicillin. The rate of substrate uptake is modeled with a Monod-type relationship. The specific penicillin production rate is assumed to be a function of growth rate. Hydrolysis of penicillin to penicilloic acid is assumed to be first order in penicillin. In simulations with the present model it is shown that the model, although assuming a strict relationship between specific growth rate and penicillin productivity, allows for the commonly observed lag phase in the penicillin concentration curve and the apparent separation between growth and production phase (idiophase-trophophase concept). Furthermore it is shown that the feed rate profile during fermentation is of vital importance in the realization of a high production rate throughout the duration of the fermentation. It is emphasized that the method of modeling presented may also prove rewarding for an analysis of fermentation processes other than the penicillin fermentation.

  15. Transport and concentration controls for chloride, strontium, potassium and lead in Uvas Creek, a small cobble-bed stream in Santa Clara County, California, U.S.A. 2. Mathematical modeling

    USGS Publications Warehouse

    Jackman, A.P.; Walters, R.A.; Kennedy, V.C.

    1984-01-01

    Three models describing solute transport of conservative ion species and another describing transport of species which adsorb linearly and reversibly on bed sediments are developed and tested. The conservative models are based on three different conceptual models of the transient storage of solute in the bed. One model assumes the bed to be a well-mixed zone with flux of solute into the bed proportional to the difference between stream concentration and bed concentration. The second model assumes solute in the bed is transported by a vertical diffusion process described by Fick's law. The third model assumes that convection occurs in a selected portion of the bed while the mechanism of the first model functions everywhere. The model for adsorbing species assumes that the bed consists of particles of uniform size with the rate of uptake controlled by an intraparticle diffusion process. All models are tested using data collected before, during and after a 24-hr. pulse injection of chloride, strontium, potassium and lead ions into Uvas Creek near Morgan Hill, California, U.S.A. All three conservative models accurately predict chloride ion concentrations in the stream. The model employing the diffusion mechanism for bed transport predicts better than the others. The adsorption model predicts both strontium and potassium ion concentrations well during the injection of the pulse but somewhat overestimates the observed concentrations after the injection ceases. The overestimation may be due to the convection of solute deep into the bed where it is retained longer than the 3-week post-injection observation period. The model, when calibrated for strontium, predicts potassium equally well when the adsorption equilibrium constant for strontium is replaced by that for potassium. ?? 1984.

  16. Can We Use Regression Modeling to Quantify Mean Annual Streamflow at a Global-Scale?

    NASA Astrophysics Data System (ADS)

    Barbarossa, V.; Huijbregts, M. A. J.; Hendriks, J. A.; Beusen, A.; Clavreul, J.; King, H.; Schipper, A.

    2016-12-01

    Quantifying mean annual flow of rivers (MAF) at ungauged sites is essential for a number of applications, including assessments of global water supply, ecosystem integrity and water footprints. MAF can be quantified with spatially explicit process-based models, which might be overly time-consuming and data-intensive for this purpose, or with empirical regression models that predict MAF based on climate and catchment characteristics. Yet, regression models have mostly been developed at a regional scale and the extent to which they can be extrapolated to other regions is not known. In this study, we developed a global-scale regression model for MAF using observations of discharge and catchment characteristics from 1,885 catchments worldwide, ranging from 2 to 106 km2 in size. In addition, we compared the performance of the regression model with the predictive ability of the spatially explicit global hydrological model PCR-GLOBWB [van Beek et al., 2011] by comparing results from both models to independent measurements. We obtained a regression model explaining 89% of the variance in MAF based on catchment area, mean annual precipitation and air temperature, average slope and elevation. The regression model performed better than PCR-GLOBWB for the prediction of MAF, as root-mean-square error values were lower (0.29 - 0.38 compared to 0.49 - 0.57) and the modified index of agreement was higher (0.80 - 0.83 compared to 0.72 - 0.75). Our regression model can be applied globally at any point of the river network, provided that the input parameters are within the range of values employed in the calibration of the model. The performance is reduced for water scarce regions and further research should focus on improving such an aspect for regression-based global hydrological models.

  17. Developing a predictive tropospheric ozone model for Tabriz

    NASA Astrophysics Data System (ADS)

    Khatibi, Rahman; Naghipour, Leila; Ghorbani, Mohammad A.; Smith, Michael S.; Karimi, Vahid; Farhoudi, Reza; Delafrouz, Hadi; Arvanaghi, Hadi

    2013-04-01

    Predictive ozone models are becoming indispensable tools by providing a capability for pollution alerts to serve people who are vulnerable to the risks. We have developed a tropospheric ozone prediction capability for Tabriz, Iran, by using the following five modeling strategies: three regression-type methods: Multiple Linear Regression (MLR), Artificial Neural Networks (ANNs), and Gene Expression Programming (GEP); and two auto-regression-type models: Nonlinear Local Prediction (NLP) to implement chaos theory and Auto-Regressive Integrated Moving Average (ARIMA) models. The regression-type modeling strategies explain the data in terms of: temperature, solar radiation, dew point temperature, and wind speed, by regressing present ozone values to their past values. The ozone time series are available at various time intervals, including hourly intervals, from August 2010 to March 2011. The results for MLR, ANN and GEP models are not overly good but those produced by NLP and ARIMA are promising for the establishing a forecasting capability.

  18. Unitary Response Regression Models

    ERIC Educational Resources Information Center

    Lipovetsky, S.

    2007-01-01

    The dependent variable in a regular linear regression is a numerical variable, and in a logistic regression it is a binary or categorical variable. In these models the dependent variable has varying values. However, there are problems yielding an identity output of a constant value which can also be modelled in a linear or logistic regression with…

  19. The Formation Environment of Jupiter's Moons

    NASA Technical Reports Server (NTRS)

    Turner, Neal; Lee, Man Hoi; Sano, Takayoshi

    2012-01-01

    Do circumjovian disk models have conductivities consistent with the assumed accretion stresses? Broadly, YES, for both minimum-mass and gas-starved models: magnetic stresses are weak in the MM models, as needed to keep the material in place. Stresses are stronger in the gas-starved models, as assumed in deriving the flow to the planet. However, future minimum-mass modeling may need to consider the loss of dust-depleted gas from the surface layers to the planet. The gas-starved models should have stress varying in radius. Dust evolution is a key process for further study, since the recombination occurs on the grains.

  20. Modelling infant mortality rate in Central Java, Indonesia use generalized poisson regression method

    NASA Astrophysics Data System (ADS)

    Prahutama, Alan; Sudarno

    2018-05-01

    The infant mortality rate is the number of deaths under one year of age occurring among the live births in a given geographical area during a given year, per 1,000 live births occurring among the population of the given geographical area during the same year. This problem needs to be addressed because it is an important element of a country’s economic development. High infant mortality rate will disrupt the stability of a country as it relates to the sustainability of the population in the country. One of regression model that can be used to analyze the relationship between dependent variable Y in the form of discrete data and independent variable X is Poisson regression model. Recently The regression modeling used for data with dependent variable is discrete, among others, poisson regression, negative binomial regression and generalized poisson regression. In this research, generalized poisson regression modeling gives better AIC value than poisson regression. The most significant variable is the Number of health facilities (X1), while the variable that gives the most influence to infant mortality rate is the average breastfeeding (X9).

Top